— CCExtractor


CCExtractor is an open-source project, led by Carlos Fernandez, that collaborates closely with Red Hen. CCExtractor extracts closed captioning, teletext, and other metadata from television transport streams. See their project page at http://ccextractor.com.


To use the new OCR capabilities, see Abhinav Shukla's GSoC2016 Report.


We should use the latest version, which is on github:


and often not yet on Sourceforge or http://ccextractor.com. It typically has new features we want.

To download it, issue this command in Linux (or Mac):

    wget https://github.com/CCExtractor/ccextractor/archive/master.zip

This command will download the software in a zipped (compressed) format in a file called master.zip. To unzip (decompress) the file, issue

    unzip master.zip

The files will be unzipped into a directory (folder) called ccextractor-master. Rename it to the current version number (which keeps incrementing):

    mv ccextractor-master ccextractor_0.84

Walk into the directory:

    cd ccextractor_0.84

You'll see the file raspberrypi.md -- read it for the simple instructions to build ccextractor for these devices. Typically, you'll need these:

  apt-get install libleptonica-dev libtesseract-dev libcurl4-gnutls-dev tesseract-ocr 

You'll also see several subdirectories, including one called "linux" and one called "mac". Walk into the appropriate subdirectory:

    cd linux

You'll see a file called "build". Run it like this:


This compiles (builds) the CCExtractor program; it can take anywhere from a few seconds to a couple of minutes, depending on how fast your computer is.

The build command creates a file that's always called 'ccextractor'. Rename it to track which version you just built:

    mv ccextractor ccextractor-0.78

Copy that file into your program directory:

    sudo cp ccextractor-0.78 /usr/local/bin

Walk into your program directory and create a symbolic link to the new version:

    cd /usr/local/bin

    sudo ln -sf ccextractor-0.78 ccextractor

In the list of files (ls -l), you should see something like this:

lrwxrwxrwx 1 root staff          16 Oct  2 05:48 ccextractor -> ccextractor-0.78

-rwxr-xr-x 1 root staff     1687840 Oct  2 05:47 ccextractor-0.78

The program is now fully installed.