Speech to text with Google Cloud – Short notes of Dang

Code speech to text with Google cloud:

Convert WAV to FLAC, by running sox in terminal:

sox ZOOM0005_Tr1.wav ZOOM0005_Tr1.flac

or use FFMPEG to convert MP3 to FLAC in terminal:

ffmpeg -i input.mp3 output.flac

Upload file to Google Cloud Storage folder (if running gcloud with a file from computer, the file must be less than 10MB): https://console.cloud.google.com/storage/browser/dang_gstorage?project=steel-flare-233717

or use the gsutil in terminal, e.g.:

gsutil cp 1996-03-17-TNiemHoangXuanHan.flac gs://dang_gstorage/speech/

Run gcloud in terminal:
dang@KubuntuElite:~/google_cloud/google-cloud-sdk/bin$ ./gcloud ml speech recognize-long-running ‘gs://dang_gstorage/ZOOM0005_Tr1.flac‘ –language-code=’vi-VN’ –async
-> output:
Check operation [607280323405008030] for status.
{
“name”: “607280323405008030”
}

Test command:
dang@KubuntuElite:~/google_cloud/google-cloud-sdk/bin$ ./gcloud ml speech operations wait 607280323405008030

Collect data when done:

dang@KubuntuElite:~/google_cloud$ gcloud ml speech operations describe 607280323405008030 > test_transcribe_vn.json

Run Python code to gather results:

import json

def postprocess_json(google_json_file):

with open(google_json_file, ‘rt’) as json_file:

data = json.load(json_file)

for result in data[‘response’][‘results’]:

print(result[‘alternatives’][0][‘transcript’])
print(‘Confidence: ‘+str(result[‘alternatives’][0][‘confidence’]))

postprocess_json(‘test_transcribe_vn.json’)

Ref:
Google Cloud: https://cloud.google.com/storage/docs/quickstart-console
Install Google Cloud SDK for speech to text: https://cloud.google.com/speech-to-text/docs/quickstart-gcloud
Transcribing long audio files with Cloud Speech-to-Text API: https://cloud.google.com/speech-to-text/docs/async-recognize
Process json file in Python: https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/

Speech to text with Google Cloud