There are currently two video processing pipelines on Hoffman2: one to process incoming files and one to process files from the digitizing project (the Rosenthal pipeline). They both compress the files; the first also extracts the on-screen text using OCR. This scroll provides the development perspective; for the monitoring perspective, see How to monitor the Rosenthal pipeline on Hoffman2.
(Peter Uhrig, 14 Jan 2016)
I have taken a look at the AAC Encoding guide:
It turns out that ffmpeg in its latest version has a default built-in converter that likely is better than libfdk_aac. This seems to be a very recent development:
Since we are AT 96kbps and not below, this may be adequate. However, libfdk_aac is still described as the best aac codec.
I read a bit more about one-pass with Constant Rate Factor (CRF) vs two-pass. The basic point is that if the file size is the same, the quality of the two methods will be pretty much the same.
So if we use two-pass, we can target a bitrate and with thus, keep file size per minute constant. If the video is very easy to compress, this will lead to higher quality. If the video is harder to compress, this will lead to lower quality. The method is particularly suitable if you want to compress to exactly one DVD or CD or something. With two-pass you can make sure that the resulting mp4 is exactly 4.7 Gigs or 700 MB and you get the best possible quality for the size. Thus you do not waste space on the CD/DVD, which would be lost anyway.
If we choose a CRF, we target a certain quality level and keep that constant. If the file is harder to compress, this will lead to a larger file, if the file is easier to compress, this will result in a smaller file. This means we will not ruin the quality of a video that is particularly difficult to compress or have unnecessarily good quality on a video that is easy to compress, which may happen with the fixed bitrate of a two-pass encoding. The only trouble is that file size will vary, but I do not see an actual issue here. As long as we choose a CRF that results in the same AVERAGE file size as our current choice of bitrate, we should get more stable quality without extra demands on storage. Also, since it is faster (but apparently not by factor two, if I read that correctly), one could choose a slower preset and indeed get a better image quality at the same size and the same time needed for encoding.
Given all that, I recommend we follow the advice from the encoding guide (https://trac.ffmpeg.org/wiki/Encode/H.264) and use a CRF. Probably, a CRF around 25 or so (higher numbers give lower bitrates) would give us a good file size in conjunction with the “veryslow” preset:
ffmpeg version 2.6.5 Copyright (c) 2000-2015 the FFmpeg developers
built with gcc 4.9.2 (Debian 4.9.2-10)
configuration: --prefix=/usr --extra-cflags='-g -O2 -fstack-protector-strong -Wformat -Werror=format-security ' --extra-ldflags='-Wl,-z,relro' --cc='ccache cc' --enable-shared --enable-libmp3lame --enable-gpl --enable-nonfree --enable-libvorbis --enable-pthreads --enable-libfaac --enable-libxvid --enable-postproc --enable-x11grab --enable-libgsm --enable-libtheora --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libx264 --enable-libspeex --enable-nonfree --disable-stripping --enable-libvpx --enable-libschroedinger --disable-encoder=libschroedinger --enable-version3 --enable-libopenjpeg --enable-librtmp --enable-avfilter --enable-libfreetype --enable-libvo-aacenc --disable-decoder=amrnb --enable-libvo-amrwbenc --enable-libaacplus --libdir=/usr/lib/x86_64-linux-gnu --disable-vda --enable-libbluray --enable-libcdio --enable-gnutls --enable-frei0r --enable-openssl --enable-libass --enable-libopus --enable-fontconfig --enable-libpulse --disable-mips32r2 --disable-mipsdspr1 --disable-mipsdspr2 --enable-libvidstab --enable-libzvbi --enable-avresample --disable-htmlpages --disable-podpages --enable-libutvideo --enable-libfdk-aac --enable-libx265 --enable-libiec61883 --enable-vaapi --enable-libdc1394 --disable-altivec --shlibdir=/usr/lib/x86_64-linux-gnu
libavutil 54. 20.100 / 54. 20.100
libavcodec 56. 26.100 / 56. 26.100
libavformat 56. 25.101 / 56. 25.101
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 11.102 / 5. 11.102
libavresample 2. 1. 0 / 2. 1. 0
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 1.100 / 1. 1.100
libpostproc 53. 3.100 / 53. 3.100
This is still experimental; so far we've used a constant bitrate encoding. The preliminary verdict is that this method gives us files that have the target quality and a file size that is 25% smaller, but it is too slow -- it encodes at 30 frames a second or worse.
Our main video processing pipeline on Hoffman2 extracts the on-screen text from images at one-second intervals, in several languages: English, Spanish, French, German, Italian, Danish, Norwegian, and Swedish.
Here's the final interpretation of this image from 2015-11-26_2000_PT_RTP-1_Telejornal.ocr:
20151126200001.000|20151126200026.999|OCR1|000002|115 51 113 21|RTP 1525
20151126200001.001|20151126200234.999|OCR1|000002|76 585 322 24|OS AVISOS DO PRESIDENTE
20151126200001.002|20151126200142.999|OCR1|000002|72 629 515 34|Cavaco Silva diz que nÃo vai abdicar dos poder poderes
20151126200001.003|20151126200142.999|OCR1|000002|73 670 162 28|constitucionais
20151126200003.000|20151126200005.999|OCR1|000004|718 267 125 81|XII
20151126200003.001|20151126200006.999|OCR1|000004|712 360 223 56|GOVERNO
This is correct, with the exception of the capital Ã, which should have been lowercase. Even the blue "XXI Governo" on a low-contrast background is captured.
Here is the teletext:
The on-screen text provides a valuable complement to the captions. It can be used to search for specific content, as is done by the Edge2 search engine, and it can be used computationally, for instance to determine topic boundaries.