Sunday, 8 March 2015

Free Speech

I've always wanted to do some basic speech analysis using comforting, old C#.
The pretext is simple, but any implementation would certainly be non-trivial:
  • Take some spoken sentences.
  • Break down each sentence into its words.
  • Analyse the words for their component phonemes.
  • Build a library of these by analysing multiple sound samples.
  • Tease out the commonality and factor out accent and expression.
  • Construct a mechanism for recombining the sounds and applying expression.
  • Voila! Speech synthesis (like no-one's done that, already - ah! the arrogance!).
If I can get as far as extracting and analysing the phonemes, I'll have learned something!

Firstly, I'll need some voice samples:
  • EUSTACE is an academic site with some royalty free downloads - they'll do.
  • The voices are mostly expressionless and the quality is pretty good.
  • They're available as either ESPS files (meh!) or RIFF (.wav) files.
  • ESPS is awkward so let's go with the WAVE files.
On to the parsing of the files:
  • This article on the Microsoft site on the WAVE format has typos on field sizes.
  • This one is much better: concise, clear and seems free of fundamental errors.
  • The parsing is trivial: either write a simple class or wrap the Win32 utilities.
I'll also need a free Fast Fourier Transform implementation for the frequency-analysis:
  • I've used ExoCortex before: that was for .NET 2 and could do with refactoring.
  • Better seems to be Lomont FFT which is more up-to-date (2010) and compact.
  • It's kind of Chris to have coded and shared it, and the quality seems good.
  • Briefly searching t'internet yields no bug-whines.
  • Just a few control-comments to help ReSharper swallow it without hiccoughs.
All set up! Now for some exploratory coding.

No comments:

Post a Comment