Wednesday, November 26, 2014

Attempting to Determine How Audio Data is Stored in Flash Memory

You may have seen that I've recently been analysing a toy that plays animal sounds.  It's nothing complicated, it just plays one of 108 stored sounds when a collector card is scanned though an optical interface.  For some reason I thought it would be fun to see if I could replace the audio data with my own sounds.  From what I can tell, the data is stored in a 2 MiB flash memory IC on the PCB, I'm 99% sure it doesn't hold program memory.  2 MiB is way too much for such a simple task.  Besides, that amount of memory seems the perfect amount to store the sounds the player uses.

To examine the memory I'd have to get a programming adapter to download the contents of the chip.  I did a quick search of ebay and found the EZP2010 for $40.  I know what you're saying, "there are cheaper options available", and although that's true, this had a few things going for it.  It had a faster delivery, but most importantly it came with ZIF SOIC to DIL adapters, which turn a 30 minute job into a 30 second job.

Memory Programmer
EZP2010 Memory  Programmer
After stuffing around for an hour trying to set-up drivers, I finally got the programmer installed.  There isn't much to it, but it does the job.  It has the ability to copy chips as a standalone device not connected to a computer, but as I didn't need that I didn't bother testing it.

Memory Programmer
Device Under test in DIL ZIF socket
To read the memory of the IC it was removed from the PCB with a hot air gun and placed in the SOIC ZIF adapter.  Having these made the task so much easier.

Memory Programmer
Device Under Test in SOIC ZIF scoket to DIL adapter
There's a bit of confusion over what chip I'm actually trying to read.  The image below indicates that the Chip is a 25L1605D, but the programmer detects it as a 25L1635D.  Both have the same memory capacity and both give the same results when used as a setting to read the data form the memory.

Flash Memory IC
Flash Memory IC
Once you have the software set-up it's idiot proof.  Put the IC into the socket as shown in the diagram, detect device or configure it manually, then hit the read button.

Flash Programmer Software
I tried playing the recovered data as audio by importing it as different types of raw data in Audacity and Goldwave, but each time all I got was static.  It would've been unlikely to get the exact format, but I was hoping for some type of recognisable distortion that would help to reveal the nature of the data to me.  No such luck.

I expect to see an area to tell the device how many sounds are in memory and something like a lookup table to indicate the location of each sound byte.

My goal is to see if I can determine the structure of the data, and as a first quick test I checked out the data using the histogram function of HxD.  As you can see from the image below, apart from the spike in the centre, all the bytes seem to be evenly distributed.  Not what I was wanting to see.  It's not a certainty, but If you see an even distribution of bytes it indicates encryption or compression has been used.  I was a little excited to see the spike in the middle though.

Byte Histogram of recovered Data
That excitement was short lived.  It turns out that there is a large block of unused memory at the end of the file containing the character 0x80

Repeated 0x80 at end of file
To get a better idea of what I'm looking for, I had a look through digikey to find a sound playing IC that could be similar to what's used in this toy.  There's no way to know what device has been used as it's a chip on board device, but chip manufactures like to compete on features, and if it's in one companies IC there's a good chance that it's in the others too.

The cheapest device I found was a ISD3800 chip corder and a quick look at the data sheet gives us some important insights.  It supports the type of memory that our toy uses, and shows some of the audio compression algorithms that could be used.  The algorithms used may not observe byte boundaries i.e. 2,3,4,5,6,7,8, 10, 12 bit samples.

ISD3800 sound player IC data
For more analysis the data was opened in Audacity with the spectrograph view turned on.  There are three distinct features visible here.  Two vertical lines and a gap at the end.

Spectrogram of Data when opened as a raw audio file in Audacity
Zooming in on the waveform at the first vertical line shows a couple of triangular shaped waveforms.

Interesting Section of Data in Audacity
The second vertical line indicates this descending step feature in the waveform.

Interesting Section of Data in Audacity
As seen before, the section at the end of the file is a grouping of the 0x80 byte, in this format interpreted as a zero.

Nothing at end of File
I follow +Oona Räisänen on twitter and have seen how useful baudline can be.  So I gave it a try.  I'm still learning the interface, but it will come in handy for a few other tests I want to run.  While in Linux I tried the data in binwalk but got absolutely nothing.  The entropy plot did show the regions noted above though.

Baudline Interface
The bit view window turned out to be not so helpful,  the poincare plot was all black.  I'm not sure if I used it right, did I over saturate it and it just shows everything  as black.  I might do my own in octave.

Binary Data
Statistical Analysis of Data in Baudline
So where am I at?  I have more of an idea of what I'm looking for, but have no leads.  I have an SPI bus protocol analyser coming that will come in handy.  I can play one sound and record what memory addresses it accesses and what data is returned, for some reason they may not use a sequential addressing system. The rate it does this could also reveal that a variable bit rate compression algorithm was used.

To sum up, I don't like my chances, but it's a fun cat and mouse game.  I'm learning some new techniques, while solving a challenging puzzle.

No comments:

Post a Comment