We could do some informal exploratory data analysis of, and create some generative models from, online text sources like project gutenberg (https://www.gutenberg.org/) or from song lyrics (http://www.azlyrics.com/). This would give us the opportunity to gain some experience with a type of data most of us have not used. We could try performing quantitative comparisons of different great works of literature, or create random song lyric generators for artists we like using something simple like a markov chain model or perhaps even something more complicated like a recurrent neural network. It's unlikely anything we do would be novel, but it would be fun. Any ideas welcome.