Google Speech Recognition API Result Is Empty

I am executing an asynchronous request to the Google Cloud Speech API and I do not know how to get the result of the operation:

POST request: https://speech.googleapis.com/v1beta1/speech:asyncrecognize

Body

{ "config":{ "languageCode" : "pt-BR", "encoding" : "LINEAR16", "sampleRate" : 16000 }, "audio":{ "uri":"gs://bucket/audio.flac" } } 

Returns:

{ "name": "469432517" }

So I'm doing POST: https://speech.googleapis.com/v1beta1/operations/469432517

What returns:

 { "name": "469432517", "metadata": { "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata", "progressPercent": 100, "startTime": "2016-08-11T21:18:29.985053Z", "lastUpdateTime": "2016-08-11T21:18:31.888412Z" }, "done": true, "response": { "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse" } } 

I need to get the result of the operation: decrypted text.

How can i do this?

+7
speech-recognition google-api google-cloud-speech
source share
3 answers

You have the result of the operation and it is empty. The reason for the empty result is a format mismatch. You should have sent the “LINEAR16” file (compressed PCM data, mainly a WAV file), and you are trying to send FLAC (compressed format).

Another reason for an empty result may be an incorrect sampling rate, an incorrect number of channels, etc.

Finally, a file with pure silence will lead to empty answers.

+3
source share

I have this problem too. The problem may be coding and speed. Here's how I found the right encoding and speed:

 audio = types.RecognitionAudio(content = content ) ENCODING = [enums.RecognitionConfig.AudioEncoding.LINEAR16, enums.RecognitionConfig.AudioEncoding.FLAC,enums.RecognitionConfig.AudioEncoding.MULAW,enums.RecognitionConfig.AudioEncoding.AMR,enums.RecognitionConfig.AudioEncoding.AMR_WB,enums.RecognitionConfig.AudioEncoding.OGG_OPUS,enums.RecognitionConfig.AudioEncoding.SPEEX_WITH_HEADER_BYTE] SAMPLE_RATE_HERTZ = [8000, 12000, 16000, 24000, 48000] for enco in ENCODING: for rate in SAMPLE_RATE_HERTZ: config = types.RecognitionConfig( encoding=enco, sample_rate_hertz=rate, language_code='fa-IR') # Detects speech in the audio file response = [] try: response = CLIENT.recognize(config, audio) except: pass print("-----------------------------------------------------") print(str(rate) + " " + str(enco)) print("response: ", str(response)) 
+2
source share

The Google Speech Recognition API result may be empty because the parameters are incorrect. My suggestion is to first analyze the properties of sound, for example, using command line tools like ffmpeg.

List of audio encoding formats

Language Code Information

My complete example:

 $ ffmpeg -i 1515244791.flac -hide_banner Input #0, flac, from '1515244791.flac': Metadata: ARTIST : artist YEAR : year Duration: 00:00:59.98, start: 0.000000, bitrate: 363 kb/s Stream #0:0: Audio: flac, 44100 Hz, mono, s16 

then use the correct configuration:

 import io from google.cloud import speech from google.cloud.speech import enums from google.cloud.speech import types LANG = "es-MX" RATE = 44100 ENC = enums.RecognitionConfig.AudioEncoding.FLAC def transcribe_streaming(stream_file): """Streams transcription of the given audio file.""" client = speech.SpeechClient() with io.open(stream_file, 'rb') as audio_file: content = audio_file.read() # In practice, stream should be a generator yielding chunks of audio data. stream = [content] requests = (types.StreamingRecognizeRequest(audio_content=chunk) for chunk in stream) config = types.RecognitionConfig( encoding=ENC, sample_rate_hertz=RATE, language_code=LANG) streaming_config = types.StreamingRecognitionConfig(config=config) # streaming_recognize returns a generator. print(streaming_config) responses = client.streaming_recognize(streaming_config, requests) for response in responses: print(response) # Once the transcription has settled, the first result will contain the # is_final result. The other results will be for subsequent portions of # the audio. for result in response.results: print('Finished: {}'.format(result.is_final)) print('Stability: {}'.format(result.stability)) alternatives = result.alternatives # The alternatives are ordered from most likely to least. for alternative in alternatives: print('Confidence: {}'.format(alternative.confidence)) print('Transcript: {}'.format(alternative.transcript)) 

So, the transcription service works:

 config { encoding: FLAC sample_rate_hertz: 44100 language_code: "es-MX" } results { alternatives { transcript: "lo tienes que saber tienes derecho a recibir informaci\303\263n de todas las instituciones que reciben recursos p\303\272blicos M\303\251xico 4324 plataformadetransparencia.org.mx derecho Porque adem\303\241s de defender tu voto te atiende si no se respetan tus derechos pol\303\255tico-electorales imparten justicia cuando existen inconformidades en elecciones internas de partidos pol\303\255ticos comit\303\251s ciudadanos y consejos de los pueblos resuelve controversias en elecciones de autoridades en la Ciudad de M\303\251xico y en consulta ciudadana en tu elecci\303\263n MVS 102.5 espacio a las nuevas voces de la radio continuamos" confidence: 0.9409132599830627 } is_final: true } Finished: True Stability: 0.0 Confidence: 0.9409132599830627 Transcript: lo tienes que saber tienes derecho a recibir información de todas las instituciones que reciben recursos públicos México 4324 plataformadetransparencia.org.mx derecho Porque además de defender tu voto te atiende si no se respetan tus derechos político-electorales imparten justicia cuando existen inconformidades en elecciones internas de partidos políticos comités ciudadanos y consejos de los pueblos resuelve controversias en elecciones de autoridades en la Ciudad de México y en consulta ciudadana en tu elección MVS 102.5 espacio a las nuevas voces de la radio continuamos 
+1
source share

All Articles