Automatic Speech Recognition for Chinese

Red Hen has a preliminary Automatic Speech Recognition pipeline on Chinese. Would you like to help improve it?

If so, write to

and we will try to connect you with a mentor.

Related Scrolls

Related Links

More Information

Red Hen has a pipeline in production at the Case HPC that runs Chinese ASR using Baidu's DeepSpeech2 with PaddlePaddle inside a Singularity container built on Singularity Hub from a recipe. It starts with this command:

singularity exec -e --nv ../Chinese_Pipeline.simg bash infer.sh $DAY

In the Slurm job submission, it requests a GPU:

#SBATCH -p gpu -C gpuk40 --mem=100gb --gres=gpu:2

abc123@server:~/cp$ squeue -u abc123

JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)

12267389 gpu work.slu abc123 PD 0:00 1 (Priority)

12267373 gpu work.slu abc123 R 27:51 1 gput025

12267379 gpu work.slu abc123 R 15:41 1 gput026

It takes about four minutes to run ASR on a standard one-hour recording.

To Do

Chinese Red Hens report that the output makes sense, but has copious errors and disfluencies; to improve it, the audio should be cut at pauses or in word breaks rather than mechanically at ten-second intervals. A news content training dataset would also help.

Thoughts

Other approaches are also worth exploring, notably Baidu DeepSpeech3.