Detecting Speech Interference

When scoring pronunciation, the audio submitted to the Speechace API should contain only the speech the user intends to be assessed — the words in the text parameter. In real-world conditions, however, recordings often contain additional speech beyond what was expected: a speaker in the background, the user continuing to speak after the expected text, or another person's voice overlapping with the user's.

This additional speech is called interference. The Speechace API can detect and measure it, returning an interference_ratio that tells you how much of the audio contains speech that competes with the intended text for alignment and scoring.

Interference vs. General Noise

It is important to distinguish between interference and general background noise:

  • Background noise — non-speech sounds such as music, traffic, typing, or ambient room noise. These affect recording quality but do not compete directly with the alignment and scoring of the expected words.

  • Speech interferencespeech that is present in the audio beyond what the text parameter describes. This directly competes with the scoring engine when it attempts to align and score what was spoken. Common sources include:

    • A second speaker in the room talking over the user

    • The user continuing to speak additional sentences after the expected text ends

    • A TV, podcast, or voice recording playing in the background

Speechace's interference detection is specifically focused on competing speech, not general noise.

How to Request Interference Metrics

Pass include_interference_metrics = 1 as a form body parameter in your Score Text/Pronunciation request:

curl --location -g 'https://api.speechace.co/api/scoring/text/v9/json?key={{speechacekey}}&dialect=en-us' \
--form 'text="Some parents admire famous athletes as strong role models."' \
--form 'user_audio_file=@"recording.wav"' \
--form 'include_interference_metrics=1'

Without this parameter, interference_ratio is not returned in the response.

Response

When include_interference_metrics = 1 is set, the text_score object in the response includes an interference_ratio field:

Interpreting interference_ratio

The interference_ratio is a numeric value that increases with the severity of detected speech interference. Use the following bands as a guide:

interference_ratio
Level
Interpretation

0

None

No excess speech detected. The audio closely matches the expected text with no competing speech.

01

Low

A small amount of extra speech is present — for example, a brief word spoken before or after the intended text. Scoring results are reliable.

23

Mid

A moderate amount of competing speech is detected. Scoring may be affected; consider surfacing a caution to the user.

3+

High

Significant speech interference is present. Scoring reliability is reduced. The user should be prompted to retry in a quieter environment.

Using interference_ratio in Your Application

The interference_ratio gives your application actionable signal about recording conditions. Here are the recommended patterns:

Warn about reduced scoring reliability

Guide users to retry in a quieter environment

Apply reduced weighting to high-interference items

Notes

chevron-rightAdditional notes about interference metricshashtag
  • interference_ratio is only present in the response when include_interference_metrics = 1 is explicitly passed. It is not returned by default.

  • Interference detection applies to the Score Text/Pronunciation endpoint. See the API Reference for the full parameter list.

Last updated