"Robot Voices" And Reading For Expression

Voice Synthesisers and text readers are much more lifelike these days, but still lack a certain something. Something I admit to finding difficult to listen to and often "cringeworthy", is that a word? Today in my feeds however I spotted a space, that set me up for a brief playtime, and thinking about how I might reconsider their place in my classroom.

We have used word processors with text reader plugins for a long time, and my colleagues and I have been regularly entertained by the mispronunciations, the effects that misplaced punctuation has on the way the text is read, and extended periods of phoneme play as we have tried to get the reader to say what we meant. My all time favourite has to be the metallic "youglee," read by an MS Word plug we had once, that no matter how hard we tried we could not get the synthesiser to pronounce as "Ugly."

The space that inspired this post today was "Read the Words," posted by colleague Angela Maiers. "Read the Words" hosts and converts text from a range of sources to audio. The files it produces can then be linked to, embedded in blogs or websites, used to podcast with or be downloaded as MP3 files to use locally for playback. They also offer advice about how to have files read more realistically, by spelling phonetically or including punctuation to effect cadance and expression, even slowing down the pace of the reader. Text can also be uploaded in Spanish and French, though not being a linguist, I don't feel able to comment on this. The space also offers a choice of avatars to read the files for you, and it was playing with these this morning that has lead to this post today.

While playing, I discovered choosing two different avatars to read the same text, gave not only two different voices to listen to, but also two completely different renderings of the text. Here are two examples of a section of the fairy tale "Little Red Riding Hood," read by "Michael" and "Lauren" The text was copied and pasted from an online PDF file, at

Lauren |Example

Michael Example

Example read by Your Truly

Listening to these clips got me thinking about how I could use this tool in tandem with student podcasting to help help evaluate and think about our use of expression when reading, and the importance of enunciation when we perform texts reading or speaking aloud.

As well as embedding these files in web pages, they could be prepared and downloaded for use within Powerpoint shows or IWB notebooks as shared texts. Comparison files could be recorded by the students, other colleagues or the class teacher, using tools such as Podium or Audacity, to support discussion around the effects on the audience of different aproaches to reading the text. Perhaps sharing a prepared text in this way, with groups of students could form the starting point for a challenge during guided sessions. Can they listen to and improve the presentation together? This could result in them recording their performance for sharing with the class and comparison with what they originally heard. Using voice synthesized files, prepared pre session might be an interesting starting point and way in to supporting critical review of performance, and using the files alongside or embedded in enlarged texts on the IWB, a means for promoting discussion around the role of punctuation, while mispronunciations could be a vehicle for thinking about how words sound in context and modelling the strategies we might use to ensure sense from the text we are engaged with. I would be really interested in any thoughts you might have around this post, and look forward to hearing from you.

Screen Shot Captured from Read the Words via Kwout


John said...

What immediately interested me was the difference having a wee bit of music at the start made, even before your voice started I was intrigued.
I tried to give my class the opportunity to record and listen to themselves reading last session, but my organisation was not good enough (children accidentally deleted classmates recordings etc), hopefully it will be better next session.

Two Whizzy said...

Hi John thanks for this. Managing the process is one of the key reasons I like and have chosen to use Podium as a tool. Podium files, allow you to split recordings into episodes and chapters and I have found this really helps with organisation, and presenting visually and offline how podcasts are organised. As a management tool, one file can be set up to include all recordings in draft, and each child's contribution addded as a new chapter, so they are saving the whole file each time, and all are in one place. If necesary later each chapter can be exported to MP3 individually, though I like to add a new chapter where we can combine tracks by copy and paste, and add music and effects together. This has been made easier with the new multitracking update. This may not be a whole lot of use, for a Mac user, but would recommend checking out Podium if you haven't already done so, very flexible. I was chatting to Doug Dickinson about some interesting ideas he has on using audio recordings to support AfL, in reading and have a blog post in draft around this that hopefully will get published soon.

Ron Starc said...

The current best text to speech software is Text Speaker. It has customizable pronunciation, reads anything on your screen, and it even has talking reminders. It is great for learning languages as it highlights the words as they are being read. The bundled voices are well priced and sound very human. Voices are available in English, French, Italian, Spanish, German, and more. Easily converts blogs, email, e-books, and more to MP3 or for listening instantly.