Getting word timestamps in audio
Last updated
Last updated
Speechace segments and aligns user audio at the word, syllable, and phoneme levels. The Speechace API provides detailed extent information for each level:
Syllable Level: Data is returned in the syllable_score_list[]
array.
Phoneme Level: Data is returned in the phone_score_list[]
array.
The extent[]
field contains begin and end timestamps for that syllable or phoneme in units of 10 msec.
In the example below the phoneme /sh/ is at msec 250 to 350 in the user audio file:
Timestamp extent information can be used to zoom in and playback specific words, allowing for the demonstration of a test-taker's mistakes or the correct pronunciation of a word from a reference example.
To do so you need to iterate through the Speechace API JSON result as follows: