Prooskalia's Blog: Free Speech

I've always wanted to do some basic speech analysis using comforting, old C#.
The pretext is simple, but any implementation would certainly be non-trivial:

Take some spoken sentences.
Break down each sentence into its words.
Analyse the words for their component phonemes.
Build a library of these by analysing multiple sound samples.
Tease out the commonality and factor out accent and expression.
Construct a mechanism for recombining the sounds and applying expression.
Voila! Speech synthesis (like no-one's done that, already - ah! the arrogance!).

If I can get as far as extracting and analysing the phonemes, I'll have learned something!

Firstly, I'll need some voice samples:

EUSTACE is an academic site with some royalty free downloads - they'll do.
The voices are mostly expressionless and the quality is pretty good.
They're available as either ESPS files (meh!) or RIFF (.wav) files.
ESPS is awkward so let's go with the WAVE files.

On to the parsing of the files:

This article on the Microsoft site on the WAVE format has typos on field sizes.
This one is much better: concise, clear and seems free of fundamental errors.
The parsing is trivial: either write a simple class or wrap the Win32 utilities.

I'll also need a free Fast Fourier Transform implementation for the frequency-analysis:

I've used ExoCortex before: that was for .NET 2 and could do with refactoring.
Better seems to be Lomont FFT which is more up-to-date (2010) and compact.
It's kind of Chris to have coded and shared it, and the quality seems good.
Briefly searching t'internet yields no bug-whines.
Just a few control-comments to help ReSharper swallow it without hiccoughs.

All set up! Now for some exploratory coding.

Prooskalia's Blog

Sunday, 8 March 2015

Free Speech

No comments:

Post a Comment