JS1K SPEECH SYNTHESIZER

1439 words ~ 6-10 minsMathieu 'p01' Henri on March 14th, 2012

JS1K Speech Synthesizer

Sorry but I didn't have the time nor felt like doing heavy 3D, FPS game, particle systems, flying hearts and flowers for JS1K#4. This time, Audio was ON and several entries featured a melody and other simple sound effects, so I had to make a Speech Synthesizer in 1K of JavaScript.

Try JS1k Speech Synthesizer

Features

JS1k Speech Synthesizer is a simple formant based speech synthesizer that:

The updated version even lets you:

Background

JS1K#4 was colliding with our long awaited family vacation. I had very little time to work, a few of hours. Audio was ON this time and I had this tiny speech synthesizer laying around waiting to be ported to JavaScript and optimized to oblivion.

To go under 1K, I had to sacrifice quality a bit and limit the synthesizer to two formant filters using either a sawtooth or noise and discard plosive sounds.

Post JS1K update: Better and Smaller

Although I had to rush a bit this speech synthesizer, the feedback has been overwhelming: Despite the relatively low quality of the speech, people reported how well it worked for various languages, some started to make "dubstep" tracks with it and of course there was a couple of feature requests like:

All these features were easy to address by making the Audio element visible: The native controls include all these features.

That was the perfect excuse to try the size optimizations I had in mind that should allow to implement the extra features and add a touch of "LOVE" within the 1k limit.

Further size optimizations

1. Optimizing the data

The most obvious candidate for optimization was the ridiculously big object describing the formant filters. There is 26 sounds with 1 character, 5 integer values ( between 1 and 120, two values per formant filter plus one amplifier ) and a boolean ( describing whether the sound uses a sawtooth or noise pulse ) weighing 484 bytes!

Sorting the object by value of the boolean and replacing these silly arrays resulted in two strings of 130 and 26 bytes.

2. Selective packing

The second key optimization was to NOT pack the data. The 130 bytes string had a relatively high entropy and uses valuable control characters that are better used as tokens for the packer. On top of that First Crush and I have a Crush on JS fail to compress the code+data properly; both yield an incorrect script.

3. Fine tuning and packers

Eventually it all boils down to fine tuning the code to increase the redundancy and packing ratio. I must admit that I don't know all the inner workings of First Crush and I have a Crush on JS but they were soon biting the dust by over 18 bytes against my experimental packer.

Final thoughts

It was a breath of fresh air to make such a different entry for JS1K. Also I'm glad I had a second look at the code and optimized it by over a 100 bytes while adding a few features and styling.

I hope the public and jury of JS1K will appreciate this little Speech Synthesizer as much as I enjoyed making it.

Last but not least, a big THANK YOU to my #1 fan, my wife, for her support.

Uncompressed and commented source codes

/*
 * 1K JavaScript Speech Synthesizer
 *
 * This is a simple formant based speech synthesizer in less than 1K of JavaScript.
 * Synthesizes speech as you type, and whole sentences upon pressing ENTER.
 *
 * The folllowing sounds/phonemes are supported:
 *
 * a,b,d,e,E,f,g,h,i,j,k,l,m,n,o,p,r,s,S,T,t,u,v,w,z,Z
 *
 * Hope you like this entry for JS1K#4
 *
 * Based on Tiny Speech Synth by Stepanov Andrey - http://www.pouet.net/prod.php?which=50530
 * Optimized and minified manually, by yours truly, @p01 - http://www.p01.org/releases/
 * Compressed using First Crush by @tpdown - http://js1k.com/2012-love/demo/1189
 *
 * To go under 1K, I had to limit the synthesizer to two formant filters using either a sawtooth or noise and discard plosive sounds. In other words I had to sacrifice quality a bit.
 *
 * Mathieu 'p01' Henri - @p01 - http://www.p01.org/releases/
 *
 */

// title and fullsize input
document.write('<h1>1K JavaScript Speech Synthesizer<input id=d value="diz is a spich syntheSizer in oan kay. type your text and press enter" style=position:fixed;background:transparent;top:0;left:0;width:99%;height:99%>');
// keypress handler
(onkeypress=function(e)
{
    // loop through either the whole text or the current keypress
    for(M=!e||e.which==13?document.getElementById('d').value:String.fromCharCode(e.which),S='',h=g=l=k=s=0;s<M.length;s+=1/1024,S+=String.fromCharCode(t>255?255:t<0?0:0|t))

        // sliding window of the formant filter + check if we have formant info to proess the current character
        if(f=g,g=h,
        j=k,k=l,
t=128,p={o:[52,55,10,10,6],i:[45,96,10,10,3],j:[45,96,10,10,3],u:[45,54,10,10,3],a:[58,70,10,10,15],e:[54,90,10,10,15],E:[60,80,10,10,12],w:[43,54,10,10,1],v:[42,60,20,10,3],T:[42,60,40,1,5],z:[45,68,10,5,3],Z:[44,70,50,1,5],b:[44,44,10,10,2],d:[44,99,10,10,2],m:[44,60,10,10,2],n:[44,99,10,10,2],r:[43,50,30,8,3],l:[48,60,10,10,5],g:[42,50,15,5,1],f:[48,60,10,10,4,1],h:[62,66,30,10,10,1],s:[120,150,80,40,5,1],S:[20,70,99,99,10,1],p:[44,50,5,10,2,1],t:[44,60,10,20,3,1],k:[60,99,10,10,6,1]}[M[0|s]])
            // 2 formant filters
            i=1-p[2]/255,
            m=1-p[3]/255,
            h=i*(g*2*Math.sin(p[0]/25)-i*f)+(p[5]?Math.random():s*16%1)-.5,
            l=m*(k*2*Math.sin(p[1]/25)-m*j)+(p[5]?Math.random():s*16%1)-.5,
            t+=Math.min(1,4*Math.sin(Math.PI*s))*((h+l)*p[4]+(g+k)/2+(f+j)/8);

    // generate and play a WAVE PCM file
    t='data:audio/wav;base64,UklGRl9vT19XQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgA',new Audio(t+btoa(t+S)).play()
})
// synthesize the default sentence right away
()

Source code of the updated JS1K Speech Synthesizer

/*
 * 1K JavaScript Speech Synthesizer
 *
 * This is a simple formant based speech synthesizer in less than 1K of JavaScript.
 * Synthesizes speech as you type, and whole sentences upon pressing ENTER.
 * Allow the user to set the volume/stop/pause/replay/save the synthesized speech.
 *
 * The folllowing sounds/phonemes are supported:
 *
 * a,b,d,e,E,f,g,h,i,j,k,l,m,n,o,p,r,s,S,T,t,u,v,w,z,Z
 *
 * Hope you like this entry for JS1K#4
 *
 * Based on Tiny Speech Synth by Stepanov Andrey - http://www.pouet.net/prod.php?which=50530
 * Optimized and minified manually, by yours truly, @p01 - http://www.p01.org/releases/
 * Compressed down to 909 bytes using my experimental packer ( sorry it's not ready for release )
 *
 * To go under 1K, I had to sacrifice quality a bit and limit the synthesizer to two formant filters using either a sawtooth or noise and discard plosive sounds.
 *
 * Mathieu 'p01' Henri - @p01 - http://www.p01.org/releases/
 *
 */

// formant filters and amplifier info of each sound/phoneme
// NOTE the string is base64 encoded here to make the code  easier to read
H=atob('NDcLCwYtYAsLAy1gCwsDLTYLCwM6RgsLDzZaCwsPPFALCwwrNgsLASo8FAsDKjwoAQUtRAsFAyxGMgEFLCwLCwIsYwsLAiw8CwsCLGMLCwIrMh4IAzA8CwsFKjIPBQEwPAsLBD5CHgsLeJZQKAUURmNjCywyBQsCLDwLFAM8YwsLBg==');

// title, fullsize input and Audio element
b.innerHTML='<h1>JS1K Speech Synthesizer<input id=d value="diz is a spich syntheSizer in oan kay. type your text and press enter" style=display:block;background:pink;width:99%;height:9em><audio id=a autoplay controls>';

// keypress handler
(onkeypress=function(e)
{
    // the string to speak ( either the whole text or the current keypress )
    M=!e||e.which==13?document.getElementById('d').value:String.fromCharCode(e.which);
    // loop through the string and generate the sound of each character
    for(S='',h=g=l=k=s=0;s<M.length;s+=1/1024,t=0|t,S+=String.fromCharCode(t&gt;255?255:t&lt;0?0:0|t))
        // sliding window of the formant filter + check if we have formant info to process the current character
        if(t=128,f=g,g=h,j=k,k=l,~(p='oijuaeEwvTzZbdmnrlgfhsSptk'.indexOf(M[0|s])))
            // 2 formant filters
            b=g,c=f,d=1-H.charCodeAt(p*5+2|a)/255,l=d*(b*2*Math.sin(H.charCodeAt(p*5+0|a)/25)-d*c)+(p>20?Math.random():s*16%1)-.5,a^=1,h=l,
            b=k,c=j,d=1-H.charCodeAt(p*5+2|a)/255,l=d*(b*2*Math.sin(H.charCodeAt(p*5+0|a)/25)-d*c)+(p>20?Math.random():s*16%1)-.5,a^=1,
            // combine the formant filters
            t+=Math.min(1,4*Math.sin(Math.PI*s))*((h+l)*H.charCodeAt(p*5+4)+(g+k)/2+(f+j)/8);

    // generate a WAVE PCM file and update the Audio element
    t='data:audio/wav;base64,UklGRl9vT19XQVZFZm10IBAAAAABAAEAQB8AAEAfAAABAAgA',document.getElementById('a').src=t+btoa(t+S)
})
// synthesize the default sentence right away
()