Changsha capture station

Data flow

FS (2018-01-25): Here is how the new flow of data from China works.

A. Recording station

News shows are recorded on sands, a Raspberry Pi with a Joker-TV tuner and an 8TB hard drive in a server room at Hunan Normal University. The large hard drive means that it has room to record and store several months of video -- at the current rate of 20GB a day, around a year.

Jacek provides the broadcast schedule on vila in Poland; it's transferred to cartago, and sands picks it up there once a day with xmltv-download. The schedule script then reprograms the crontabs automatically between 7am and 8am every day.

We currently record local, regional, and national news -- typically 5-10 shows a day -- with the script channel, currently It also extracts the particular stream we want from the multi-stream file that Joker records.

Finally it calls the script check-cc-joker, which adds header information and copies the file to xingfu, using

rsync $DDIR/${FIL%.*}.{mpg,txt,len} xingfu:$DDIR/ -avP

B. File processing

The virtual machine xingfu, provided to us for exclusive use and with root access, attempts to extract captions, currently to no effect.

It then rescales the h264 transport stream to simplify a complex mix of display aspect ratio and sample aspect ratio; the resulting files play fine in any player. However, note that there is an error in the rescaling; the aspect ratio is not correct (fixme).

The audio is transcoded to aac. This is all accomplished by the script mpg2mp4-bulk, which is started by cron a few times each day.

For these operations, we use a custom-built ffmpeg:

wget -O- | tar xj

./configure --prefix=/usr --extra-version=0ubuntu0.18.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --enable-libfontconfig --enable-libfreetype --enable-libx264 --enable-libfdk-aac --enable-nonfree


checkinstall -y --deldoc=yes --pkgversion=10:4.1

C. Fetching the files

On cartago, the script fetch-China looks for completed files on xingfu and copies them to ~/China on user sands on cartago.

It checks if the files are already present in cartago's tv tree, and removes them from user ~/China if they are.

D. Integrating the files

Finally, user tna on cartago runs cc-integrate-China, which picks up any new files present on ~/China and copies them to ca:/tv, ca:/sweep (netapp), and roma:/sweep.

The script rsync-cc then copies the text files to /tvnews, where edge picks them up and integrates them into the solr index for the search engines.

E. Pending tasks

The main pending tasks are to extract text where available, and to add on-screen OCR.

The show 新闻1+1 does contain captions; Red Hen has made a sample available to CCExtractor, and Carlos has added a task for this on CCExtractor's GSoC 2018 Ideas page.

A faster way forward is likely Jacek's satellite, which may be able to bring us CCTV4-Europe with Chinese teletext.

This means we now have a complete and fully automated production pipeline for Chinese in Red Hen.

Scanning for channels

MT (2018-01-03): I have placed the antenna that comes with the JokerTVgadget (jtvg) in the window of the server room and connected it to jtvg. I have connected jtvg to my Macbook Pro and told joker-player to scan for DTM. Reception of CCTV1, which is the most important, on channels 31, 79, and 90, seems is perfect. Reception of CCTV13 on channel 91 also seems good. Even better, it continues to work with the same quality when I place the antenna on (what I am pleased to find is an unremembered ledge on the top of the cupboard). So, I have attached the Joker-TV tuner to the sands station network. Let’s test this configuration.

I have reinforced the USB connections with electrical tape. I have also sealed with electrical tape the unused USB port on HD1 and the satellite connector on JokerTV gadget. In other fun news, I bought a clothesline at the corner convenience mart from which to hang the ethernet cable connecting sands to the ethernet port in the server rack. It’s yellow, with little clothes pins.

That’s all, folks. Time to test remotely: scan, locate, record, test the recordings.