"Partii" Solution

Partii Thai Speech-to-Text Engine

Version 1.0 (May 2015)

General Performance

 

Open domain – Partii uses a novel algorithm to include only less than 40,000 lexical words combining both word and syllable-like units frequently used in Thai. The syllable-like unit could be used to construct new words unseen in the engine. This algorithm has made the engine capable to cover as many Thai words as that covered by an over 140,000 lexical-word engine. It thus drastically reduces the resource required to operate the engine.

80% accuracy – Under the data-channel smartphone environment, Partii has achieved nearly 80% recognition accuracy regardless of speaking domains, speakers, or speaking styles. This performance is comparable to some oversea engines (tested since May 2014).

1.5xRT response – Under the WiFi and simulated 3G networks, Partii requires less than 1.5 times the duration of the input speech to recognize such input speech. This performance is also comparable to the oversea Thai speech recognition services.

Customizable – One of the most important features of Partii is the excellent research and development team available for system customization. At present, Partii is available as a web service interface ready to install in user servers. The service is scalable and adaptable to new environments or speakers to enhance the system performance.

 

Benefits

Partii is a fundamental component for making a number of innovative applications, especially in the current era where people can communicate and access information quickly via smartphones.

Telecommunication – Speech recognition system has been widely used to convert customer speech in telephone contact centers to text. This innovative solution facilitates the customer analysis, reduces cost for training telephone operators, increases the service efficiency, and monitors the operator performance.

Voice data input – While the government has pushed toward universal service obligation, the speech recognition technology becomes an important element for people with disability. Voice data input is also capable for new business such as building services for rapid input during emergency, smart input in stock management, and in smart home.

TV captioning – In today era of digital TV, there has been an official regulation that at least news reports in TV programs must contain their transcriptions as a closed caption in order to facilitate people with disability. The speech recognition technology hence becomes a potential technology to make such regulation possible with compromised cost.

Audio transcription – In our big data era where large audio/video records are increasingly created, transcribing such audio/video records is thus increasingly desired. For example, there has been needs to make reports of parliament meetings, to make reports of court justice, to index audio/video records for better search in the future. These problems all open to include the speech recognition engine to accelerate the work process.