Tested & Proven
To provide users with more intuitive and transparent information, we present the speech transcription accuracy of our product under various scenarios and conditions.
In real-world use, to enable longer continuous recordings, lower power consumption, and more efficient data transmission, we have independently developed an advanced audio processing algorithm.
By showcasing the transcription accuracy before and after processing with our algorithm, we aim to strike the optimal balance between product performance and user experience, continuously improving our technology to enhance customer satisfaction.
1 Comparison of Speech Transcription Accuracy Using Proprietary Compression Algorithm
The table below compares the speech recognition (ASR) performance of the Whisper-tiny model before and after processing with our proprietary audio algorithm. The evaluation metric is Word Error Rate (WER), where a lower value indicates higher transcription accuracy. The standard accent dataset is sourced from the LibriSpeech corpus, while the non-standard accent dataset is derived from the Speech Accent Archive to reflect a range of typical usage scenarios.
Dataset | ASR Model | Compressed | WER |
---|---|---|---|
LibriSpeech (standard narrated stories) | whisper-tiny-en | No | ≈8% |
LibriSpeech (standard narrated stories) | whisper-tiny-en | Yes | 10.08% |
Speech Accent Archive (various accents) | whisper-tiny-en | No | ≈13% |
Speech Accent Archive (various accents) | whisper-tiny-en | Yes | 18.38% |
Note: After compression, the Word Error Rate for standard pronunciation shows a slight increase but remains at a satisfactory level. For speech with various accents, the performance under the compression algorithm still meets the requirements of everyday use.
2 Speech Transcription Accuracy in Simulated Real-User Scenarios
The table below presents the performance of the Whisper-tiny model in real user scenarios, before and after processing with our proprietary audio algorithm. The standard accent dataset is sourced from the LibriSpeech corpus, while the non-standard accent dataset is taken from the Speech Accent Archive to represent a range of typical application scenarios.
Scenario | Accent Present | WER |
---|---|---|
Everyday user conversation | No | 8.10% |
Everyday user conversation | Yes | 20.72% |
User speaking in meeting mode | No | 6.78% |
Other participants speaking in meeting mode | No | 7.85% |
Note: The product performs well in common use cases involving users without accents, delivering high transcription accuracy. In scenarios involving accented speech, the model’s performance shows greater variability.