Text to Music

A project undertaken by Joe Marsh Rossney, John Bamping and Alex Simpson as part of the 'Science of Music' module run by Dr Gavin Bell and IATL.

Introduction and motivation:

The ability to sonify a body of text has many applications in the modern world. Take, for example, a computer application which can read out the text on the screen, making the computer accessible to people with poor sight. A famous success story of speech synthesis is Professor Stephen Hawking's communication system, which converts text to into sound and enables this great mind to continue sharing his ideas with the world and even give lectures.

Aside from strictly practical uses, there has been continuing interest in converting text into sound for aesthetic purposes, often to introduce some algorithmic element into a musical composition. This can aid the creative process by producing music which may not have ever been conceived by a human mind, but which is then open to alteration and adjustment by the composer.

The motivation behind our project is in this same vein: to produce a tool to aid in the creative process. We envisage that the music created by the text will be moulded by a composer with a musical concept in mind, but ideally there should be (at least an option to have) minimal user input in the generation of the music, so that it produces unexpected results.

The use of text as a structured input for music generation:

All sound, with the exception of pure noise, has some structure and, often, periodicity. This is especially true of music, where structure and periodicity is manifest in both the time and frequency domains - i.e. in rhythm and in pitch.

This advocates the use of data types containing structure and periodicity in the algorithmic generation of music. Depending on the origin, text data satisfies these conditions to various degrees: on a fundamental level, there are only 26 letters in the English Alphabet, so the mapping of individual letters to pitches will yield some repetition. Rhythmic structure can be found in such language forms as poetry, and can also be extracted from such features as word length and sentence length.

One novel approach has been taken by Hannah Davis in her project "TransProse", which calculates "emotion density" throughout a novel, and generates music with corresponding "emotional feeling." With so many possible ways to link text to music, and such a wealth of sources of various kinds, there is much to explore.

Project Aims:

Sonify a range of text samples, such that some characteristics of the text can be recognised in the musical product.
Aim to use features of the text itself to provide variation and interest in the music; use direct user input sparingly.
Achieve some degree of 'musicality' - the resulting music need not necessarily adhere to strict rules of tonality, but it should hold the interest of the listener.
Present the tool in a user-friendly interface.

Overview of method:

Python is used to deconstruct some text into it's constituent phonemes. These are then imported into SuperCollider, which generates sequences of values corresponding to musical parameters. A 'Pattern Bind' runs through these lists in parallel, and generates the finished product. A more detailed analysis of the thought process and method is provided in the Method and Discussion section.

Conclusions:

The finished product successfully generates sound from the input text. A level of user control is provided in the form of entropy, scale, and tempo, although complete automation is also an option. Two methods of performing formant synthesis are implemented and the differing results are discussed. Problems were encountered with reading in the output from Python into SuperCollider, but a more experienced programmer with more time would almost certainly make significant improvements.

One of the biggest challenges was creating interesting music that composers might genuinely be drawn to. Two possible avenues to this objective are suggested:

Creating more 'rules' to reflect the processes undergone by classical composers. For example, good melodies tend to be dominated by 'steps' of adjacent notes rather than lots of repeated notes, and use 'jumps' of more than one degree sparingly. Cadences occur in appropriate, not random, locations throughout a piece, so as
to convey a feeling of deliberate movement. Certain cadences (perfect, imperfect, plagal) are usually favoured, and particular notes - often leading notes - are used to reenforce these cadences.
Rejecting tonal confinement altogether, and instead creating ethereal sounds and soundscapes for use in ambient or electronic music. In our opinion, using vowel synthesis with this kind of vision of ambient-style soundscapes is the more promising direction for future work.

Method and Discussion (in depth)

Sound Samples and Discussion

Source Code (GitHub)