ATTENTION ALL FANS!!! THIS BLOG HAS MOVED!!!
go to: http://www.taotekaching.com

Saturday, September 01, 2007

SAPI and Me

So I've been messing around with SAPI 5.1. Pretty damn cool API, if you ask me.

Very easy to work with it through C#. The documentation is, for some reason, almost all in C++ and like VB 6. It took a little Google-ing to get what I wanted done: suck in a WAV file and transcribe it. Here's the class I wrote for doing it:



using System;
using System.Collections.Generic;
using System.Text;

using SpeechLib;

namespace Transcriber
{
public class TransSpeech
{
public class RecoEventArgs : EventArgs
{
public struct RecoBlock
{
public int index;
public ISpeechRecoResult result;
public RecoBlock(int idx, ISpeechRecoResult rez)
{
this.index = idx;
this.result = rez;
}
// TODO: order by index
}
private RecoBlock _block;
public RecoBlock Block
{
get
{
return _block;
}
}
public RecoEventArgs(int Index, ISpeechRecoResult Result)
{
this._block = new RecoBlock(Index, Result);
}
}
public delegate void RecoEventDelegate(RecoEventArgs args);
public event RecoEventDelegate RecoEvent;
public delegate void RecoFinishedDelegate(EventArgs args);
public event RecoFinishedDelegate RecoFinished;

static int objNumber = 0;

SpInprocRecognizerClass rec;
SpFileStreamClass fs;
SpInProcRecoContext cntxt;
ISpeechRecoGrammar g;

public TransSpeech()
{
rec = new SpInprocRecognizerClass();
fs = new SpFileStreamClass();
cntxt = (SpInProcRecoContext)rec.CreateRecoContext();
cntxt.RetainedAudio = SpeechRetainedAudioOptions.SRAORetainAudio;
cntxt.Recognition += new _ISpeechRecoContextEvents_RecognitionEventHandler(cntxt_Recognition);
cntxt.EndStream += new _ISpeechRecoContextEvents_EndStreamEventHandler(cntxt_EndStream);
g = cntxt.CreateGrammar(1);
g.DictationLoad("", SpeechLoadOption.SLOStatic);
}
~TransSpeech()
{
// TODO: final cleanup here
}
public void ReadInFile(string filename)
{
try
{
objNumber = 0;
// TODO: lock shit?

fs.Open(filename, SpeechStreamFileMode.SSFMOpenForRead, true);
rec.AudioInputStream = fs;
g.DictationSetState(SpeechRuleState.SGDSActive);
}
catch (Exception ex)
{
throw new Exception("TransSpeech error in ReadInFile:", ex);
}
}

void cntxt_EndStream(int StreamNumber, object StreamPosition, bool StreamReleased)
{
g.DictationSetState(SpeechRuleState.SGDSInactive);
g.DictationUnload();
fs.Close();

// TODO: additional cleanup

if (RecoFinished != null)
RecoFinished(new EventArgs());
}
void cntxt_Recognition(int StreamNumber, object StreamPosition, SpeechRecognitionType RecognitionType, ISpeechRecoResult Result)
{
lock (this)
{
objNumber++;

if (RecoEvent != null)
RecoEvent(new RecoEventArgs(objNumber, Result));
#region old code
//string msg = "";
//foreach (ISpeechPhraseElement el in Result.PhraseInfo.Elements)
//{
// msg += el.DisplayText + " ";
//}
//msg += "\r\n";
#endregion
}
}
}
}

Note that you'll need to download the SAPI 5.1 SDK and reference the Speech Library something or other in COM.


Trying it out on just random office conversation gave pretty poor results, but then I ran it on one of Cringley's weekly podcasts and it fared pretty damn well, for having not been trained or anything


Next test plan is to re-feed "good" recognitions back as training (if I can) and see if it improves recognition


NOTE: if anyone gives this code a try, please let me know if you know / have figured out how to take the retained audio from ISpeechRecoResult and send it directly to DirectSound or something, rather than save it in a WAV file. If I get to it before any responses (likely, given my massive following), I'll post the solution


I've been masterdebating lately about whether to pursue advanced studies in Computer Science. Part of me really wants to, in particular to be able to teach as well as just personal ambition / goal. All comments on this are very, very welcome.


Some of my colleagues at work despise this idea, including some with a BS in Computer Science. The common arguments seem to be as follows:



  • None of my BS has helped or come into play here in the "real" development world

  • It's a waste of money, when you can learn all you want from Google, books, and just doing it

  • The worst programmer's I've seen are fresh out of college with a BS in Computer Science

  • You've got a kid on the way...how the hell will you afford it? (this is particularly relevant to me, but thought I'd throw it in as a outlier argument


Your thoughts on this?

Submit this story to DotNetKicks