The pretext is simple, but any implementation would certainly be non-trivial:
- Take some spoken sentences.
- Break down each sentence into its words.
- Analyse the words for their component phonemes.
- Build a library of these by analysing multiple sound samples.
- Tease out the commonality and factor out accent and expression.
- Construct a mechanism for recombining the sounds and applying expression.
- Voila! Speech synthesis (like no-one's done that, already - ah! the arrogance!).
Firstly, I'll need some voice samples:
- EUSTACE is an academic site with some royalty free downloads - they'll do.
- The voices are mostly expressionless and the quality is pretty good.
- They're available as either ESPS files (meh!) or RIFF (.wav) files.
- ESPS is awkward so let's go with the WAVE files.
- This article on the Microsoft site on the WAVE format has typos on field sizes.
- This one is much better: concise, clear and seems free of fundamental errors.
- The parsing is trivial: either write a simple class or wrap the Win32 utilities.
- I've used ExoCortex before: that was for .NET 2 and could do with refactoring.
- Better seems to be Lomont FFT which is more up-to-date (2010) and compact.
- It's kind of Chris to have coded and shared it, and the quality seems good.
- Briefly searching t'internet yields no bug-whines.
- Just a few control-comments to help ReSharper swallow it without hiccoughs.
All set up! Now for some exploratory coding.
No comments:
Post a Comment