Chinese OCR

Red Hen uses Tessaract for Optical Character Recognition. Tesseract can be configured for Chinese characters. See

Chinese Character Recognition Using Tessaract OCR. Which says:
You need to download chinese trained data (it will be a file like chi_sim.traineddata) and add it to your tessdata folder. Download the file https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata and use like this:
Tesseract* tesseract= [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"chi_sim"]

Would you like to establish a Chinese OCR pipeline for Red Hen's large Chinese audiovisual holdings?

If so, write to

and we will try to connect you with a mentor.

Related Scrolls

Thoughts