Getting word timestamps in audio
Speechace segments and aligns user audio at the word, syllable, and phoneme levels. The Speechace API provides detailed extent information for each level:
Syllable Level: Data is returned in the
syllable_score_list[]array.Phoneme Level: Data is returned in the
phone_score_list[]array.
The extent[] field contains begin and end timestamps for that syllable or phoneme in units of 10 msec.
In the example below the phoneme /sh/ is at msec 250 to 350 in the user audio file:

Timestamp extent information can be used to zoom in and playback specific words, allowing for the demonstration of a test-taker's mistakes or the correct pronunciation of a word from a reference example.
To do so you need to iterate through the Speechace API JSON result as follows:
Last updated