Automatic Speech Recognition for Chinese

Red Hen has a preliminary Automatic Speech Recognition pipeline on Chinese. Would you like to help improve it?
If so, write to 
and we will try to connect you with a mentor.

Related Scrolls

Related Links

More Information

Red Hen has a pipeline in production at the Case HPC that runs Chinese ASR using Baidu's DeepSpeech2 with PaddlePaddle inside a Singularity container built on Singularity Hub from a recipe. It starts with this command:

singularity exec -e --nv ../Chinese_Pipeline.simg bash infer.sh $DAY

In the Slurm job submission, it requests a GPU:

#SBATCH -p gpu -C gpuk40 --mem=100gb --gres=gpu:2 

abc123@server:~/cp$ squeue -u abc123
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          12267389       gpu work.slu   abc123 PD       0:00      1 (Priority)
          12267373       gpu work.slu   abc123  R      27:51      1 gput025
          12267379       gpu work.slu   abc123  R      15:41      1 gput026

It takes about four minutes to run ASR on a standard one-hour recording. 

To Do

Chinese Red Hens report that the output makes sense, but has copious errors and disfluencies; to improve it, the audio should be cut at pauses or in word breaks rather than mechanically at ten-second intervals. A news content training dataset would also help. 

Thoughts

Other approaches are also worth exploring, notably Baidu DeepSpeech3.