Speechace
  • Introduction
    • Overview
    • Use-Cases
  • Getting Started
    • Pre-requisites
      • API Features
      • Getting the API Key
      • API Regions and endpoints
      • API Limits
    • API Samples
    • Supported Languages
    • API Versioning
    • Authentication
    • Try the Speechace API
    • Error Handling
      • Common Errors
      • Retry Strategies
  • Solutions
    • Speaking Practice for Language Learning
    • Automated Language Assessment with AI
    • Voice AI for Early Literacy
    • Test Prep for Standardized tests
      • PTE Speaking Questions
      • IELTS Speaking Questions
      • TOEFL Speaking Questions
      • CEFR Speaking Questions
      • TOEIC Speaking Questions
    • Speaking Practice in Spanish and French
  • Features
    • Introduction
    • Scripted activities
      • Pronunciation Scoring
        • Word and Sentence pronunciation
        • Multiple choice
        • Custom pronunciations
        • Phoneme list
      • Fluency scoring
        • Passage scoring
      • Lexical stress and intonation
    • Spontaneous activities
      • Open-ended scoring
        • Language scoring
        • Relevance scoring
        • Language detection
      • Task achievement scoring
        • Describe Image
        • Re-tell Lecture
        • Answer Question
  • API Reference
    • Postman API reference
    • Score Text/Pronunciation
      • Handling overall scores
      • Handling word scores
      • Handling phoneme and syllable scores
    • Score Text/Multiple choice
      • Handling multiple choice response
    • Score Text/Markup Language
      • Handling Markup Response
    • Score Text/Stress & Intonation
      • Handing stress and intonation response
    • Score Text/Phoneme list
      • Handling phoneme list response
    • Score Text/Fluency
      • Handling fluency response
      • Fidelity detection
    • Score Text/Validate Text
    • Score Speech/Open-ended
      • Handling language scores
      • Per metric feedback
        • Grammar metrics
        • Vocabulary metrics
        • Coherence metrics
    • Score Speech/Relevance
      • Handling relevance response
    • Score Speech/Language Detection
    • Score Task/Task Achievement
  • Guides on common topics
    • Intepreting quality score
    • Interpreting overall scores
      • Pronunciation Bands
      • Fluency Bands
      • Vocabulary Bands
      • Grammar Bands
      • Coherence Bands
    • Scoring rubrics
    • Interpreting fidelity class
    • Phonetic notation
      • US English (en-us)
      • UK English (en-gb)
      • French (fr-fr, fr-ca)
      • Spanish (es-es, es-mx)
    • Getting word timestamps in audio
    • Automatic handling of unknown words
    • Phoneme to letter mapping
    • Markup Language
  • Other Resources
    • Requesting Support
    • Rate Limiting
    • Data Retention
    • FAQs
    • Appendices
Powered by GitBook
On this page
  1. Guides on common topics

Getting word timestamps in audio

PreviousSpanish (es-es, es-mx)NextAutomatic handling of unknown words

Last updated 8 months ago

Speechace segments and aligns user audio at the word, syllable, and phoneme levels. The Speechace API provides detailed extent information for each level:

  • Syllable Level: Data is returned in the syllable_score_list[] array.

  • Phoneme Level: Data is returned in the phone_score_list[] array.

The extent[] field contains begin and end timestamps for that syllable or phoneme in units of 10 msec.

In the example below the phoneme /sh/ is at msec 250 to 350 in the user audio file:

Timestamp extent information can be used to zoom in and playback specific words, allowing for the demonstration of a test-taker's mistakes or the correct pronunciation of a word from a reference example.

To do so you need to iterate through the Speechace API JSON result as follows:

for each word in text_score.word_score_list[]

    get first and last elements of phone_score_list[] for that word
    
    start_timestamp is extent[0] for the first element
    end_timestamp is extent[1] for the last element
    
    # timestamps are in unit of 10 msec