300 300 Princess Lanting

EnglishCentral’s IntelliSpeech℠ Technology



EnglishCentral’s  “special sauce” has always been our  internally developed IntelliSpeech℠ assessment technology.    What teachers appreciate is how IntelliSpeech℠ motivates students to speak outside of the classroom.  Students appreciate the instant feedback and the game dynamic of trying to improve their speaking grade.

EnglishCentral’s Intellispeech℠ system assesses learners’ speaking ability as a combination of 3 elements:

  • Pronunciation Score
  • Fluency Score
  • Completion

Pronunciation Score previously measured learners’ speech across  39 phonemes. In its latest version, Intellispeech℠  measures performance across all 64,000 possible triphones (combinations of phonemes).  This change dramatically improves the accuracy of the system.  

Fluency Score is still based on duration and pause rate.  

Completion is still based on whether user speaks all words, or drops words.

Real Time Feedback in the Player

As the learner speaks, Intellispeech℠  provides feedback according to the follow types of errors:



Line Score

Learners get a line score between 0 and 100 points for each line spoken. Only the final version of each line spoken counts towards the video.  So, if a line is repeated several times, only the last version counts towards the video line score.

On each line,  we show the line score earned.     

Video Grades

The video grade is the cumulative measure of  the Intellispeech℠ system across all lines spoken in the video.

Intellispeech℠ computes a percentile relative to other learners in the target language group to produce a video grade. For instance, if we determine that the speech for the video was 75% better than other users from the same geography (based on past speech for millions of users from the geo), the learner would get a “B+” as a video grade.


Percentiles are mapped to grades for each native language of the learner. Here is the grade table for learners:

Phoneme Tiles

Our IntelliSpeech℠ is always listening to learners speech and is continually assessing the learner’s performance across  39 phonemes.  We have patent-pending (Patent App# 13/338,383) on the system’s ability to determine the learners strongest and weakest pronunciations among the 40 phonemes, thereby guiding the learner on which to practice.   There are 4 states of the Phoneme Tiles which correspond how closely they resemble native speech with Green being the closest to native, and red being the furthest.   Grey means the system does not have enough data to make a determination.

These phones are tracked in each learner’s customized Pronunciation Center.


Because measuring phoneme performance can be noisy for individual utterances, in general a learner must speak 10 lines before the IntelliSpeech℠ is confident in setting the color of the Phoneme Tile.

Our Speech Database

Our research is supported by what we believe is the largest corpora of non-native speech with transcription in the world.  Our speech assessment system is now based on over 450M utterances collected from over 100 countries or different L1 languages.  


Our reference models are also training on large amounts of data collected from native speakers speaking the authentic speech from our videos (as opposed to many other corpora which may contain “read speech”).

Common Pronunciation Challenges

IntelliSpeech℠ has analyzed users’ English pronunciation based on over 250 Million recorded utterances in the EnglishCentral database and identified the five most common problematic sounds of speakers for each native language region.   For instance, for Japanese, they are:


Based on this analysis and speech data, we have designed a pronunciation course  for each of our main market.  — Top 10 Challenges —  where learners can focus on the most challenging pronunciations for speakers from their native language, using authentic videos.