Code speech to text with Google cloud:
Convert WAV to FLAC, by running sox in terminal:
sox ZOOM0005_Tr1.wav ZOOM0005_Tr1.flac
or use FFMPEG to convert MP3 to FLAC in terminal:
ffmpeg -i input.mp3 output.flac
Upload file to Google Cloud Storage folder (if running gcloud with a file from computer, the file must be less than 10MB): https://console.cloud.google.com/storage/browser/dang_gstorage?project=steel-flare-233717
or use the gsutil in terminal, e.g.:
gsutil cp 1996-03-17-TNiemHoangXuanHan.flac gs://dang_gstorage/speech/
Run gcloud in terminal:
dang@KubuntuElite:~/google_cloud/google-cloud-sdk/bin$ ./gcloud ml speech recognize-long-running ‘gs://dang_gstorage/ZOOM0005_Tr1.flac‘ –language-code=’vi-VN’ –async
-> output:
Check operation [607280323405008030] for status.
{
“name”: “607280323405008030”
}
Test command:
dang@KubuntuElite:~/google_cloud/google-cloud-sdk/bin$ ./gcloud ml speech operations wait 607280323405008030
Collect data when done:
dang@KubuntuElite:~/google_cloud$ gcloud ml speech operations describe 607280323405008030 > test_transcribe_vn.json
Run Python code to gather results:
import json
def postprocess_json(google_json_file):
with open(google_json_file, ‘rt’) as json_file:data = json.load(json_file)for result in data[‘response’][‘results’]:print(result[‘alternatives’][0][‘transcript’])
print(‘Confidence: ‘+str(result[‘alternatives’][0][‘confidence’]))postprocess_json(‘test_transcribe_vn.json’)
Ref:
Google Cloud: https://cloud.google.com/storage/docs/quickstart-console
Install Google Cloud SDK for speech to text: https://cloud.google.com/speech-to-text/docs/quickstart-gcloud
Transcribing long audio files with Cloud Speech-to-Text API: https://cloud.google.com/speech-to-text/docs/async-recognize
Process json file in Python: https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/