Tested & Proven

To provide users with more intuitive and transparent information, we present the speech transcription accuracy of our product under various scenarios and conditions.

In real-world use, to enable longer continuous recordings, lower power consumption, and more efficient data transmission, we have independently developed an advanced audio processing algorithm.

By showcasing the transcription accuracy before and after processing with our algorithm, we aim to strike the optimal balance between product performance and user experience, continuously improving our technology to enhance customer satisfaction.

1 Comparison of Speech Transcription Accuracy Using Proprietary Compression Algorithm

The table below compares the speech recognition (ASR) performance of the Whisper-tiny model before and after processing with our proprietary audio algorithm. The evaluation metric is Word Error Rate (WER), where a lower value indicates higher transcription accuracy. The standard accent dataset is sourced from the LibriSpeech corpus, while the non-standard accent dataset is derived from the Speech Accent Archive to reflect a range of typical usage scenarios.

Dataset ASR Model Compressed WER
LibriSpeech (standard narrated stories) whisper-tiny-en No ≈8%
LibriSpeech (standard narrated stories) whisper-tiny-en Yes 10.08%
Speech Accent Archive (various accents) whisper-tiny-en No ≈13%
Speech Accent Archive (various accents) whisper-tiny-en Yes 18.38%

Note: After compression, the Word Error Rate for standard pronunciation shows a slight increase but remains at a satisfactory level. For speech with various accents, the performance under the compression algorithm still meets the requirements of everyday use.

2 Speech Transcription Accuracy in Simulated Real-User Scenarios

The table below presents the performance of the Whisper-tiny model in real user scenarios, before and after processing with our proprietary audio algorithm. The standard accent dataset is sourced from the LibriSpeech corpus, while the non-standard accent dataset is taken from the Speech Accent Archive to represent a range of typical application scenarios.

Scenario Accent Present WER
Everyday user conversation No 8.10%
Everyday user conversation Yes 20.72%
User speaking in meeting mode No 6.78%
Other participants speaking in meeting mode No 7.85%

Note: The product performs well in common use cases involving users without accents, delivering high transcription accuracy. In scenarios involving accented speech, the model’s performance shows greater variability.