Is it impossible to use curl, use the Google Cloud Speech API to recognize files within 10-15 minutes?

I use the REST API with cURL because I need to do something quick and simple, and I am in a box where I cannot run the trash; that is, some thick developer SDK.

I started base64 encoding flac files and started speech.syncrecognize .

As a result, this failed:

 { "error": { "code": 400, "message": "Request payload size exceeds the limit: 10485760.", "status": "INVALID_ARGUMENT" } } 

So good, you cannot send 31,284,578 bytes per request; must use cloud storage. So, I upload the flac audio file and try again to use the file in Cloud Storage. This fails:

 { "error": { "code": 400, "message": "For audio inputs longer than 1 min, use the 'AsyncRecognize' method.", "status": "INVALID_ARGUMENT" } } 

Great, speech.syncrecognize doesn't like the size of the content; try again with speech.asyncrecognize . This fails:

 { "error": { "code": 400, "message": "For audio inputs longer than 1 min, please use LINEAR16 encoding.", "status": "INVALID_ARGUMENT" } } 

Ok, so speech.asyncrecognize can only do LPCM; upload a file in pcm_s16le format and try again. So finally, I get a manual manipulator:

 { "name": "9174269756763138681" } 

Keep checking it, and in the end it ends:

 { "name": "9174269756763138681", "done": true, "response": { "@type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse" } } 

So wait, after the result is now in the queue, is there no REST method to request the result? Someone, please tell me that I missed the obvious look at me directly in the face, and that Google did not create a completely meaningless, incomplete, REST API.

+5
source share
1 answer

So, the answer to the question: "No, you can use curl, use the Google Cloud Speech API, recognize files from 10 to 15 minutes ... provided that you navigate and follow a fairly strict set of restrictions ... at least in beta version.

What is not obvious from the documentation, the result should be returned by the operations.get method ... which would be obvious if any of my attempts actually returned something other than empty results.

The source frequency in my files is 44,100 or 48,000 Hz, and I set sample_rate to the source source speed. However, contrary to the documentation that states:

The sampling rate in hertz of audio data sent in all RecognitionAudio Messages. Valid Values: 8000-48000. 16000 is optimal. For best results, set the sampling frequency of the sound source to 16000 Hz. If that is not possible, use your own sampling rate of the sound source (instead of re-sampling).

after re-sampling up to 16,000 Hz, I started to get results using operations.get .

I think it is worth noting that correlation does not imply causality. After re-sampling up to 16,000 Hz, the files become much smaller. Thus, I cannot prove that this is a problem with the sampling frequency, and not just strangulation of services over files of a certain size.

It is also worth noting that the documentation refers to the sampling rate inconsistently. It seems that the gRPC API might expect sample_rate , and the REST API might expect sampleRate according to their detailed definitions, in which case Quickstart might give an incorrect example for the REST API.

+3
source

All Articles