Solved by Sergiy Turchyn 2016-03-01.
The conversion code is on github at https://github.com/sergiy-turchyn/eaf_to_seg/ and on cartago at /usr/local/bin/eaf2seg-01.py
Usage: python eaf_to_seg.py input_filename output_filename
Example: python eaf_to_seg.py 2007-03-07_1900_US_KTTV-FOX_Montel_Williams_Show_797-1277.eaf 2007-03-07_1900_US_KTTV-FOX_Montel_Williams_Show.seg
The .seg file will be found in sweep location and copied to the outputFile directory. You can specify a folder in the outputFile if you want output to be put in a folder:
python eaf_to_seg.py 2007-03-07_1900_US_KTTV-FOX_Montel_Williams_Show_797-1277.eaf output_seg/2007-03-07_1900_US_KTTV-FOX_Montel_Williams_Show.seg
Update: The script is called eaf2seg-01.py -- see the help screen. Run it first in the dday tree and then if everything works in the sweep tree.
Use relative paths, not absolute.
There is also a maxAnnDifference parameter at the top of the file that defines how close annotations must be (in ms) to be considered the same group. Currently it is set to 0 which means that if we want to group annotations, they all must have the exact same start and end time.
- Manual tagging (with proposed Red Hen gesture tagging scheme)
- How to annotate with ELAN (simple instructions to get started)
- How to set up the iMotion annotator (draws rectangles on images to indicate event location)
- How to use the online tagging interface (integrated into Red Hen, but not frame accurate)
- How to use the Video Annotation Tool (online multi-dimensional video annotation interface for talks and demos)
ELAN is a professional tool for the creation of complex annotations on video and audio resources, developed as an open-source project by the Max Planck Institute in Nijmegen, see https://tla.mpi.nl/tools/tla-tools/elan. It is a java-based desktop application that runs on Windows, OS X, and Linux. We are integrating ELAN into the Red Hen research workflow by creating standard annotation templates, providing basic instructions to get started, writing export scripts that convert ELAN annotations into Red Hen formats, and writing import scripts that allow ELAN to read Red Hen files.
Annotating audio and video with ELAN
Learn How to annotate with ELAN.
Exporting ELAN annotations to Red Hen
Gesture researchers can already tag and annotate a video clip from Red Hen in ELAN; what we need is a way to export those tags and annotations back into Red Hen. The data structure of Red Hen supports an open set of timestamped annotations, allowing researchers to use and label their own coding schemes. The goal is to "free ELAN," so that work done in ELAN would no longer be held only locally. Rather, the annotated videos would be searchable and viewable within Red Hen's multi-tag search engine. This allows cumulative progress to take place, where researchers learn from each other, and facilitates large collaborative research projects with multiple coders, including student teams. Such annotations will also become available to Red Hen's machine learning initiatives for the development of new classifiers, which allows us to search the entire collection for similar features.
- Elan templates (third-party contributions)
The first challenge is to write a python script that takes an .eaf file and converts it to the Red Hen data format; see example .seg files. Red Hen uses a file name convention that includes a date and time (e.g., 2015-02-13_2200_US_CNN_Situation_Room.seg); this naming convention should be used to give the annotated video its baseline (absolute) time. Relative timestamps are then assigned in relation to that baseline. More generally, to integrate tags in Red Hen, we need either a file name or a date and a UID, plus the location in seconds.
To convert an .eaf file to Red Hen's format, we need to do something like this:
- pick out the "tiers" and use them as field names
- convert each annotation in each tier to a .seg file line, with start time and end time
We'll also need to assign a primary tag to each line; this will need to be done per coding template, most simply in a parameter file.
Let's say our first conversion script just reads the Gaze tier in this sample file. There are ten gaze annotations; each annotation becomes one line in the .seg file, with a start time and end time. The relative times of the gaze annotations need to be converted into absolute times, with the baseline time given by the file name -- in this case 2014-10-13_1800 or unix epoch 1413223200 seconds -- that's
date -ud "2014-10-13 18:00" +%s
The date in the file name is always UTC. So the relative duration in the gaze annotation start time gets converted to seconds and added to the unix epoch, which is then converted back to UTC with no punctuation or spaces, in the form 201410131800.
The first gaze annotation is "camera" from 00:00:06.150 to 00:00:18.230. We convert that to absolute times, including milliseconds, and assign the tier to the primary tag "GES_11" (in the credit block, we attribute this primary tag to ELAN and a coder, researcher, or project name):
This line we integrate this line into 2014-10-13_1800_US_CNN_Newsroom.seg in the correct temporal position -- and that's it. It's now in the standard Red Hen format.
ELAN's native .eaf file format can be parsed by the poio python library. It depends on python-graf and regex; these have all been installed on cartago (2016-01-27).
>>> import poioapi.annotationgraph
>>> ag = poioapi.annotationgraph.AnnotationGraph.from_elan("2014-10-13_1800_US_CNN_Newsroom_12-493.eaf")
Importing Red Hen files to ELAN
The second challenge is to convert Red Hen files to ELAN's .eaf format. The place to start is with Red Hen's .txt files, which typically contain a timestamped transcript extracted from the television transport stream.
A related project is to create scripts that convert between Red Hen's annotated files and Red Hen's online Video Annotation Tool. Once this is accomplished, we will be able to display ELAN's annotations live online in color-coded labels.
Add gesture detection to Elan
ELAN already incorporates several "Recognizers", including some video modules, such as "Estimates YUV intervals representing skin" (Windows installed by default, Linux available on request, unclear if it works on a Mac) and "Human motion analysis and annotation". Red Hen may add additional modules, such as gesture detection classifiers. There is already a "video Hand Head Tracking / Human motion analysis recognizer" -- we should test it. These plugins may be designed for desktop use.
Creating an export plugin for ELAN
Since ELAN is an open-source project, a plug-in could be created allowing the ELAN user to "Export to Red Hen," "Import from Red Hen," and even "Save To Repository," with Red Hen as one of the available repositories.
- poio-api -- a python library to access and search data from ELAN and other linguistic tools
- R package for reading XML-format .eaf files from ELAN annotation software
- pympi -- a python module for processing ELAN and Praat annotation files
- eaf-scripts -- convert .srt files to .eaf (and others)
- ELAN Analysis Companion -- analysis improvements for the ELAN behavioral annotation software
- fflipper -- takes ELAN files as input and generates clips based on the annotations in a selected tier