Code speech to text with Google cloud:

Convert WAV to FLAC, by running sox in terminal:

sox ZOOM0005_Tr1.wav ZOOM0005_Tr1.flac

or use FFMPEG to convert MP3 to FLAC in terminal:

ffmpeg -i input.mp3 output.flac

Upload file to Google Cloud Storage folder (if running gcloud with a file from computer, the file must be less than 10MB):

or use the gsutil in terminal, e.g.:

gsutil cp 1996-03-17-TNiemHoangXuanHan.flac gs://dang_gstorage/speech/

Run gcloud in terminal:
dang@KubuntuElite:~/google_cloud/google-cloud-sdk/bin$ ./gcloud ml speech recognize-long-running ‘gs://dang_gstorage/ZOOM0005_Tr1.flac‘ –language-code=’vi-VN’ –async
-> output:
Check operation [607280323405008030] for status.
“name”: “607280323405008030”

Test command:
dang@KubuntuElite:~/google_cloud/google-cloud-sdk/bin$ ./gcloud ml speech operations wait 607280323405008030

Collect data when done:

dang@KubuntuElite:~/google_cloud$ gcloud ml speech operations describe 607280323405008030 > test_transcribe_vn.json

Run Python code to gather results:

import json

def postprocess_json(google_json_file):

with open(google_json_file, ‘rt’) as json_file:
data = json.load(json_file)
for result in data[‘response’][‘results’]:
print(‘Confidence: ‘+str(result[‘alternatives’][0][‘confidence’]))


Google Cloud:
Install Google Cloud SDK for speech to text:
Transcribing long audio files with Cloud Speech-to-Text API:
Process json file in Python:

Speech to text with Google Cloud
%d bloggers like this: