Saturday, May 20, 2017

Linux Text To Speech with Saved Audio

In my last blog post I described a procedure to find a forgotton PIN for 10 digit mechanical lock boxes where you enter a specific sequence of button presses to efficiently test all the combinations. The generated sequence was supplied in the form of a text file, and although this works, it's a little cumbersome moving your eyes between the buttons and paper all the time. It occurred to me that this would be a lot easier if the numbers were read to me. I then imagined how easy it would be if I had a pair of headphones and the instruction in an audio file on my phone. This seemed like the perfect application for a text to speech application and Linux.

After a little bit of research I decided to use the eSpeak speech synthesizer. It has many options for different voices for different languages and countries and allows quite a bit of customisation of the the way the text is read.

The command below that converts the text in "lockbox.txt" to audio in "lockbox.wav" uses the english voice (-ven), pronounces capital letters in a certain way (-k20), leaves a certain gap between words (-g4), reads back at a certain words per minutes (-s90), and  has a certain pitch (-p29). It's that easy!

espeak -ven -k20 -g4 -s90 -p20 -f lockbox.txt -w lockbox.wav

Before processing the file I made some slight alterations to it by replacing some of the commands in the lock box opening sequence. Originally the commands were zero thought nine, open, and clear. I replaced open with test as it was only one syllable and easier to hear.  It's also important to leave spaces between numbers otherwise it will read 11 as "eleven" instead of "one one".

Here is the instructional WAV file converted to an MP3.  It goes for 30 minutes or so and with a little bit of practice you should be able to follow along at that speed.  If you can't, that's fine, just slow the speed down in your music player.  If you screw up just go back 15 or twenty seconds to catch up.

For the YouTube fans out there, here is another version. It might possibly be the most boring and monotonous video on YouTube. That's my speciality though :-)


To be serious though I'd like to try eSpeak on a Raspberry Pi.  I think it'd be great to read out status updates and events.  Compared to some of the other synthesized voices I've heard it's actually pretty good.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.