How to determine the language spoken on the Google Cloud Platform

Is it possible to automatically recognize a spoken language using the Google Cloud Platform Machine Learning Speech API?

https://cloud.google.com/speech/docs/languages indicates a list of supported languages, and the user must manually set this parameter to perform speech to text.

Thanks Mahesh

+7
machine-learning google-cloud-platform speech-to-text
source share
2 answers

Google Cloud Speech API requests require the following configuration options: encoding , sampleRateHertz and languageCode . https://cloud.google.com/speech/reference/rest/v1/RecognitionConfig

Therefore, the Google Cloud Speech API cannot automatically determine the language used. The service will be configured with this parameter ( languageCode ) to start speech recognition in that particular language.

If you had in mind a parallel with the Google Cloud Translation API, where the input language is automatically detected, keep in mind that automatic detection of the language used in the audio file requires much more bandwidth, storage space and processing power than in the text file. In addition, the Google Cloud Speech API offers Streaming Speech Recognition, a real-time text-based voice service where languageCode especially needed.

+1
source share

As of last month, Google has added support for detecting spoken languages โ€‹โ€‹in its text-to-text API. Google Cloud Speech v1p1beta1

It is a bit limited - you need to provide a list of likely language codes, up to 3 of them, and, according to him, it is supported only for voice commands and voice search modes. This is useful if you know what languages โ€‹โ€‹may be in your audio.

From your documents:

alternative_language_codes []: string

Optional A list of three additional BCP-47 language tags listing possible alternative languages โ€‹โ€‹for the supplied audio. See "Language Support" for a list of currently supported language codes. If alternative languages โ€‹โ€‹are listed, the recognition result will contain recognition in the most likely recognizable language, including the main code_ language. The recognition result will include the language tag of the language detected in the audio. NOTE. This feature is only supported for use with voice command and voice search, and performance may vary for other use cases (for example, transcription of a phone call). "

+1
source share

All Articles