I recently came across a great natural language dataset from Mark Riedel: 112,000 plots of stories downloaded from English language Wikipedia. This includes books, movies, TV episodes, video games- anything that has a Plot section on a Wikipedia page.

Source: Examining the arc of 100,000 stories: a tidy analysis – Variance Explained