Late last month, Google released itsCloud Text-to-speechengine to developers worldwide which featured 32 different voices spanning across 12 languages and variants. Now, the company has released a major update for another product from its Cloud AI speech lineup- the Cloud Speech-to-text engine (formerly known as the Cloud Speech API).
According to Google’sblog post, the new and updated Cloud Speech-to-Text engine now supports:
At least a few of these could have real world consumer applications – such as using the engine for transcribing voice recordings.
The API can support up to 4 speakers for phone calls and over 4 speakers on video calls, while seamlessly accounting for background noise, static from the phone line, and other agents.
In order to train the model,Googleused real data from customers who volunteered to provide the data in exchange for getting access to the improvements. Due to the use of real data, the new model now have 54% fewer errors than the previous model. In the blog post, Dan Aharon, Product Manager, Cloud AI at Google, wrote:
“Most major cloud providers use speech data from incoming requests to improve their products. Here at Google Cloud, we’ve avoided this practice, but customers routinely request that we use real data that’s representative of theirs, to improve our models. We want to meet this need, while being thoughtful about privacy and adhering to our data protection policies. That’s why today, we’re putting forth one of the industry’s first opt-in programs for data logging, and introducing the first model based on this data”.