Speechace
  • Introduction
    • Overview
    • Use-Cases
  • Getting Started
    • Pre-requisites
      • API Features
      • Getting the API Key
      • API Regions and endpoints
      • API Limits
    • API Samples
    • Supported Languages
    • API Versioning
    • Authentication
    • Try the Speechace API
    • Error Handling
      • Common Errors
      • Retry Strategies
  • Solutions
    • Speaking Practice for Language Learning
    • Automated Language Assessment with AI
    • Voice AI for Early Literacy
    • Test Prep for Standardized tests
      • PTE Speaking Questions
      • IELTS Speaking Questions
      • TOEFL Speaking Questions
      • CEFR Speaking Questions
      • TOEIC Speaking Questions
    • Speaking Practice in Spanish and French
  • Features
    • Introduction
    • Scripted activities
      • Pronunciation Scoring
        • Word and Sentence pronunciation
        • Multiple choice
        • Custom pronunciations
        • Phoneme list
      • Fluency scoring
        • Passage scoring
      • Lexical stress and intonation
    • Spontaneous activities
      • Open-ended scoring
        • Language scoring
        • Relevance scoring
        • Language detection
      • Task achievement scoring
        • Describe Image
        • Re-tell Lecture
        • Answer Question
  • API Reference
    • Postman API reference
    • Score Text/Pronunciation
      • Handling overall scores
      • Handling word scores
      • Handling phoneme and syllable scores
    • Score Text/Multiple choice
      • Handling multiple choice response
    • Score Text/Markup Language
      • Handling Markup Response
    • Score Text/Stress & Intonation
      • Handing stress and intonation response
    • Score Text/Phoneme list
      • Handling phoneme list response
    • Score Text/Fluency
      • Handling fluency response
      • Fidelity detection
    • Score Text/Validate Text
    • Score Speech/Open-ended
      • Handling language scores
      • Per metric feedback
        • Grammar metrics
        • Vocabulary metrics
        • Coherence metrics
    • Score Speech/Relevance
      • Handling relevance response
    • Score Speech/Language Detection
    • Score Task/Task Achievement
  • Guides on common topics
    • Intepreting quality score
    • Interpreting overall scores
      • Pronunciation Bands
      • Fluency Bands
      • Vocabulary Bands
      • Grammar Bands
      • Coherence Bands
    • Scoring rubrics
    • Interpreting fidelity class
    • Phonetic notation
      • US English (en-us)
      • UK English (en-gb)
      • French (fr-fr, fr-ca)
      • Spanish (es-es, es-mx)
    • Getting word timestamps in audio
    • Automatic handling of unknown words
    • Phoneme to letter mapping
    • Markup Language
  • Other Resources
    • Requesting Support
    • Rate Limiting
    • Data Retention
    • FAQs
    • Appendices
Powered by GitBook
On this page
  • Request Format
  • Query Parameters
  • Request Body
  • Response Example
  1. API Reference

Score Task/Task Achievement

PreviousScore Speech/Language DetectionNextIntepreting quality score

Last updated 6 months ago

Run in postman:

The Speechace Task Achievement API supports following task types:

  • Describe-Image: The speaker is presented with an image and asked to describe the details, relationships, and conclusion to be drawn from elements of the image.

  • Retell-Lecture: The speaker listens to a 1-2 minute lecture and is asked to summarize the lecture focusing on key elements, concepts and conclusions from the lecture.

  • Answer-Question: The speaker is presented with a short question which typically requires a one or two word answer.

Each task type has particular input and outputs:

Task Type
Inputs
Outputs

describe-image

task_context: A model description of the image which is presented to the speaker. Max length: 1024 chars.

Task score on scale of 0-5.

retell-lecture

task_context: A model summary of the lecture which is presented to the speaker. Max length: 1024 chars.

Task score on scale of 0-5.

answer-question

task-question: The question presented to the user.

Task score on scale of 0-1 where 0 is incorrect and 1 is correct.

The API supports different modes in combining task scores and language scores in assessment:

  1. user_audio_file or user_audio_text: The speaker's response can be submitted as either audio or text, allowing task scoring to be used with written responses as well.

  2. include_speech_score: Speech scoring can be included or excluded along with the task score. Note that if user_audio_text is used, the include_speech_score will always be zero. Therefore, in written responses, only task scores are provided.

All tasks are available in the following languages:

  • English (en-us, en-gb)

  • Spanish (es-es, es-mx)

  • French (fr-fr, fr-ca)

Request Format

curl --location -g 'https://api.speechace.co/api/scoring/task/v9/json?key={{speechace_premiumkey}}&task_type=describe-image&dialect=en-us' \
--form 'task_context="This bar chart illustrates the declining trend related to the percent of U.S. workforce engaged in farm labor in the 19th century. In 1840, for example, around 69% of the U.S. workforce was engaged in farm labor; in 1860, almost 60% of the U.S. workforce was engaged in farm labor; in 1880, only 50% of the U.S. workforce was engaged in farm labor; and in 1900, less than 40% of the U.S. workforce was engaged in farm labor."' \
--form 'user_audio_file=@"barchataudiofile.mp3"' \
--form 'include_speech_score="1"'
curl --location -g 'https://api.speechace.co/api/scoring/task/v9/json?key={{speechace_premiumkey}}&task_type=describe-image&dialect=en-us' \
--form 'task_context="This line graph illustrates the overall behaviour of France’s national debt in the period 1995-2011, calculated in comparison to the country’s GDP (Gross Domestic Product). Thus, in 1995, France’s national debt was equivalent to about 56% of its GDP; between 1996 and 1997, the debt rose to a little more than 60% of the country’s GDP, dropping to a little less than 60% between the years 2000 and 2001; however, it started to rise again in 2002, reaching almost 70% of the country’s GDP in 2005; unfortunately, between 2009 and 2010, the debt had reached around 85% of France’s GDP, reaching roughly 88% by 2011."' \
--form 'user_audio_text="This is a beautiful image infront of me with a chart depicting many colors and numbers. I can see 1995, 1996, 1997, 1998, 1999 and France'\''s national debt."' \
--form 'include_speech_score="0"'

Query Parameters

Parameter
Type
Description

key

String

dialect

String

user_id

String

Optional: A unique anonymized identifier (generated by your applications) for the end-user who spoke the audio.

task_type

String

The task_type to score. Supported types are:

  • describe-image

  • retell-lecture

  • answer-question.

Request Body

Parameter
Type
Description

task_context

String

The context or model or model answer for the task presented to the speaker.

Used in the following task-types:

  • describe-image: a model description of the image

  • retell-lecture: a model description of the lecture

This must be provided in the same language as the one being assessed.

task_question

String

The task question presented to the speaker, used in task-type = answer-question.

This must be provided in the same language as the one being assessed.

user_audio_file

File

file with user audio (wav, mp3, m4a, webm, ogg, aiff)

include_speech_score

String

  • Set to 1, to include scoring other aspects of the speech: Pronunciation, Fluency, Grammar, Vocab, Coherence.

  • Set to 0 if you only want to receive the task score only.

user_audio_text

String

A text transcript of the speaker's response.

  • Use this field instead of user_audio_file if you already have a transcript of the user's response and do not wish to re-transcribe an audio.

  • Note: In this case, you will only be able to receive an overall task_score.

Response Example

Notice the task_score.score key for the overall task achievement score in the response below:

{
  "status": "success",
  "task_score": {
    "type": "describe-image",
    "version": "0.1",
    "score": 4,
    "transcript": "This bar graph shows the percent of US workforce engaged in farm labor, and that's data from 1840 to 1900. Ear now starting with 1840, the percentage was 70 percentage. After that there is a gradual decrease in the number of workforce engaged in farm labor to 60 percentage in 1860 and further down to 18 around 50% in 1880. And then in 1900 it decreased to 40 percentage. Overall, there is a continuous decrease in the engagement in the farm sector."
  },
  "quota_remaining": -1,
  "speech_score": {
    "transcript": "This bar graph shows the percent of US workforce engaged in farm labor, and that's data from 1840 to 1900. Ear now starting with 1840, the percentage was 70 percentage. After that there is a gradual decrease in the number of workforce engaged in farm labor to 60 percentage in 1860 and further down to 18 around 50% in 1880. And then in 1900 it decreased to 40 percentage. Overall, there is a continuous decrease in the engagement in the farm sector.",
    "word_score_list": [<.....pronunciation metrics>],
    "ielts_score": {<....ielts scores>},
    "pte_score": {<...pte scores>},
    "speechace_score": {<...speechace scores>},
    "toeic_score": {<...toeic scores>},
    "cefr_score": {
      "pronunciation": "B2",
      "fluency": "B2",
      "grammar": "B1+",
      "coherence": "B1",
      "vocab": "B1+",
      "overall": "B1+"
    },
    "fluency": {<...fluency metrics>},
    "asr_version": "0.4"
  },
  "version": "9.7"
}
{
  "status": "success",
  "task_score": {
    "type": "retell-lecture",
    "version": "0.1",
    "score": 2,
    "transcript": "The lecture was about the ecosystem. The lecture said that ecology is the study of living organisms in an environment. The lecturer also said that there are two factors in an ecosystem. The first one is biotic which is considered as the living things. The second one is abiotic which is considered as the non living things in the environment. The biotic factors is considered as the primary producers. Herbivores, carnivores, omnivores and detritivores. However, the abiotic factors."
  },
  "quota_remaining": -1,
  "speech_score": {
    "transcript": "The lecture was about the ecosystem. The lecture said that ecology is the study of living organisms in an environment. The lecturer also said that there are two factors in an ecosystem. The first one is biotic which is considered as the living things. The second one is abiotic which is considered as the non living things in the environment. The biotic factors is considered as the primary producers. Herbivores, carnivores, omnivores and detritivores. However, the abiotic factors.",
    "word_score_list": [<.....pronunciation metrics>],
    "ielts_score": {<....ielts scores>},
    "pte_score": {<...pte scores>},
    "speechace_score": {<...speechace scores>},
    "toeic_score": {<...toeic scores>},
    "cefr_score": {<...cefr score>},
    "fluency": {<...fluency metrics>},
    "asr_version": "0.4"
  },
  "version": "9.7"
}
{
  "status": "success",
  "task_score": {
    "type": "answer-question",
    "version": "0.1",
    "score": 1,
    "transcript": "A democracy?"
  },
  "quota_remaining": -1,
  "speech_score": {
    "transcript": "A democracy?",
    "word_score_list": [<.....pronunciation metrics>],
    "ielts_score": {<....ielts scores>},
    "pte_score": {<...pte scores>},
    "speechace_score": {<...speechace scores>},
    "toeic_score": {<...toeic scores>},
    "cefr_score": {<...cefr score>}
    ],
    "fluency": {<...fluency metrics>},
    "asr_version": "0.4"
  },
  "version": "9.7"
} 

The new addition is the task score parameters, which indicate the extent to which the task has been achieved.

Difference between task_context and relevance_context

For a general question such as "Do you think the government should subsidize healthcare?" relevance is primarily assessed, as there is no definitive right or wrong answer; the focus is on whether the response is on topic.

In contrast, for a specific question like "What does the following business chart tell us?" a specific answer is expected. Therefore, a nuanced task context and detailed task score are required to evaluate how well the response addresses the specific elements of the task.

The endpoint that is to be used will depend on the of your subscription. For example, for US West, the endpoint is .

POST

API issued by Speechace.

This is the in which the speaker will be assessed. Supported values are: en-us, en-gb, fr-fr, fr-ca, es-es, es-mx.

The and interpretation of the key elements in the response of the spoken word or sentence remains the same.

is binary and is higher level. It evaluates whether the response is on-topic or not (True or False)

is more nuanced and scores how well the response addresses the task

Score Task Achievement
region
https://api.speechace.co
https://api.speechace.co/api/scoring/task/v9/json
pronunciation
fluency
Relevance
Task Achievement
key
dialect