Although we have been considering doing an integration with a 3rd party Voice vendor, this has not been released to date. The reason for this is that most vendor’s technology is network based, and we would like to have capability even when not connected to the network. Also, because we are not a Google Approved Device, we do not have access to some of the Android voice services provided by Google. Finally, most of the voice licensing models are per Application, not per platform. We would rather find a platform based solution that can used by all Applications on the device.
There are these standard Android APIs related to speech:
but they appear to depend on having some Google components installed. So, although the references mentioned here are not included in the Google Mobile Services (gms) packages, it appears they have a dependancy. Also, most services require a network connection to work.
There are many other options to do speech recognition. An Application Developer can directly license embedded technology from a vendor (example: Nuance) and pay a per App instance license fee for the speech recognition library. We did some voice integration into our platform using Nuance libraries, & built some evaluation Applications, but have not released this to our customers yet.
These were the components we evaluated:
- Nuance Mobile Toolkit SDK for Android – V1.6
- VoCon Hybrid Base (which includes VoCon Wake-up Word), – V4.6
- Vocalizer Expressive – V1.3
- Nuance Voice Security Library (“NVSL”) – V4.5
Overall, we thought voice command recognition worked pretty well; We developed 3 different Apps for evaluation. We also integrated command recognition into our baseline launcher so a user can use voice to start Applications and some other functions. If our customers are interested in this, please inquire and we can talk about releasing this functionality.
Here’s a brief description of this integration: When the service is turned on, it is in wake up mode, you need to say “hi ODG” or “hi Reticle” in order to wake it up. “Thank you ODG” will put it back to wake up mode. Other than those phrases, here is the vocabulary list, which could easily be expanded.
“Call” | “Dial”
“Record video” | “Stop recording” | “Take photo”
Open Source Voice :
There is a pretty good open source option from Carnegie Mellon: http://cmusphinx.sourceforge.net/wiki/tutorialandroid
Their site: Pocket Sphix, CMU – http://www.speech.cs.cmu.edu/pocketsphinx
We used it for a while about 2 years ago, and one of our customers has used it recently and that App works very well, although its vocabulary is very small.
SpeechListener.java class has the init code, and doesn't programmatically select a mic (it might be the default one that gets selected). Below is example recognizer initialization code:
recognizer = defaultSetup()
.setAcousticModel(new File(assetsDir, "en-us-ptm"))
.setDictionary(new File(assetsDir, "cmudict-en-us.dict"))
// To disable logging of raw audio comment out this call (takes
// a lot of space on the device)
// Threshold to tune for keyphrase to balance between false
// alarms and misses
// Use context-independent phonetic search, context-dependent is
// too slow for mobile
Low Microphone Input Gain:
Multiple partners are using PocketSphinx successfully, with some settings that are somewhat mysterious, but still, the voice reco works well. Make sure you are picking the audio source correctly. See this reference:
These are the audio sources we know work. Choose based on if you want to record the voice of the glasses wearer or who they are talking to / video recording. We are told that CAMCORDER definitely works; note that there was a report that the audio source VOICE_COMMUNICATION does not work (this now switches to DEFAULT). VOICE_COMMUNICATION is suppose to fall back to DEFAULT if VOICE_COMMUNICATION is not supported. and DEFAULT is MIC. That is the correct API behavior. We believe that in ReticleOS v3.5.17 and later, this is fixed.
Also note, the default mic input gain can be set pretty low. Most Apps either enable automatic again control, or they do a calibration and adjust. Here's a reference: http://developer.android.com/reference/android/media/audiofx/AutomaticGainControl.html
We have a report from one partner that auto gain control does NOT work. Note that an App can adjust the input gain digitally itself. A voice recorder that does not allow the mic input gain to be adjusted, the recording is very quiet. [Note : In release 3.5.19, the default input gain was raised to a good value for both DEFAULT and CAMCORDER].
Here’s a list of other voice vendors we’ve looked at or have heard about, listed here to show the range of support an App can have available to it.
- Speak With Me: http://www.speakwithme.com/
- Speech Dialog Management, server based, with hybrid client option to do offline recognition
- capable of building customized vocabulary backend recognition domains
- Sensory Inc,
- Samsung Embedded Speech Recognition
- Brighten.ai - check out John Burkey's articles on LinkedIn Pulse:
- Working on an Android SDK - http://keenresearch.com/
- Embedded ASR
- Personalize Speech recognition
- Embedded Voice Recognition