Integrating ELAN

Solved by Sergiy Turchyn 2016-03-01
The conversion code is on github at https://github.com/sergiy-turchyn/eaf_to_seg/ and on cartago at /usr/local/bin/eaf2seg-01.py

Usage: python eaf_to_seg.py input_filename output_filename

Example: python eaf_to_seg.py 2007-03-07_1900_US_KTTV-FOX_Montel_Williams_Show_797-1277.eaf 2007-03-07_1900_US_KTTV-FOX_Montel_Williams_Show.seg

The .seg file will be found in sweep location and copied to the outputFile directory. You can specify a folder in the outputFile if you want output to be put in a folder:

python eaf_to_seg.py 2007-03-07_1900_US_KTTV-FOX_Montel_Williams_Show_797-1277.eaf output_seg/2007-03-07_1900_US_KTTV-FOX_Montel_Williams_Show.seg

Use relative paths, not absolute.
There is also a maxAnnDifference parameter at the top of the file that defines how close annotations must be (in ms) to be considered the same group. Currently it is set to 0 which means that if we want to group annotations, they all must have the exact same start and end time.


Task

Can you help us integrate Red Hen and ELAN? We are making it part of Red Hen's research workflow.
If so, write to 
and we will try to connect you with a mentor.

Related pages

More information

ELAN is a professional tool for the creation of complex annotations on video and audio resources, developed as an open-source project by the Max Planck Institute in Nijmegen, see https://tla.mpi.nl/tools/tla-tools/elan. It is a java-based desktop application that runs on Windows, OS X, and Linux. We are integrating ELAN into the Red Hen research workflow by creating standard annotation templates, providing basic instructions to get started, writing export scripts that convert ELAN annotations into Red Hen formats, and writing import scripts that allow ELAN to read Red Hen files.

Annotating audio and video with ELAN

Learn How to annotate with ELAN.

Exporting ELAN annotations to Red Hen

Gesture researchers can already tag and annotate a video clip from Red Hen in ELAN; what we need is a way to export those tags and annotations back into Red Hen. The data structure of Red Hen supports an open set of timestamped annotations, allowing researchers to use and label their own coding schemes. The goal is to "free ELAN," so that work done in ELAN would no longer be held only locally. Rather, the annotated videos would be searchable and viewable within Red Hen's multi-tag search engine. This allows cumulative progress to take place, where researchers learn from each other, and facilitates large collaborative research projects with multiple coders, including student teams. Such annotations will also become available to Red Hen's machine learning initiatives for the development of new classifiers, which allows us to search the entire collection for similar features.

Script outline

The first challenge is to write a python script that takes an .eaf file and converts it to the Red Hen data format; see example .seg files. Red Hen uses a file name convention that includes a date and time (e.g., 2015-02-13_2200_US_CNN_Situation_Room.seg); this naming convention should be used to give the annotated video its baseline (absolute) time. Relative timestamps are then assigned in relation to that baseline. More generally, to integrate tags in Red Hen, we need either a file name or a date and a UID, plus the location in seconds. 

To convert an .eaf file to Red Hen's format, we need to do something like this:

  • pick out the "tiers" and use them as field names
  • convert each annotation in each tier to a .seg file line, with start time and end time

We'll also need to assign a primary tag to each line; this will need to be done per coding template, most simply in a parameter file.

Let's say our first conversion script just reads the Gaze tier in this sample file. There are ten gaze annotations; each annotation becomes one line in the .seg file, with a start time and end time. The relative times of the gaze annotations need to be converted into absolute times, with the baseline time given by the file name -- in this case 2014-10-13_1800 or unix epoch 1413223200 seconds -- that's

      date -ud "2014-10-13 18:00" +%s

The date in the file name is always UTC. So the relative duration in the gaze annotation start time gets converted to seconds and added to the unix epoch, which is then converted back to UTC with no punctuation or spaces, in the form 201410131800.

The first gaze annotation is "camera" from 00:00:06.150 to 00:00:18.230. We convert that to absolute times, including milliseconds, and assign the tier to the primary tag "GES_11" (in the credit block, we attribute this primary tag to ELAN and a coder, researcher, or project name):

    201410131806.150|201410131818.230|GES_11|Gaze=camera

This line we integrate this line into 2014-10-13_1800_US_CNN_Newsroom.seg in the correct temporal position -- and that's it. It's now in the standard Red Hen format.

Python script

ELAN's native .eaf file format can be parsed by the poio python library. It depends on python-graf and regex; these have all been installed on cartago (2016-01-27).

>>> import poioapi.annotationgraph
>>> ag = poioapi.annotationgraph.AnnotationGraph.from_elan("2014-10-13_1800_US_CNN_Newsroom_12-493.eaf")

Importing Red Hen files to ELAN

The second challenge is to convert Red Hen files to ELAN's .eaf format. The place to start is with Red Hen's .txt files, which typically contain a timestamped transcript extracted from the television transport stream.

A related project is to create scripts that convert between Red Hen's annotated files and Red Hen's online Video Annotation Tool.  Once this is accomplished, we will be able to display ELAN's annotations live online in color-coded labels.

Add gesture detection to Elan

ELAN already incorporates several "Recognizers", including some video modules, such as "Estimates YUV intervals representing skin" (Windows installed by default, Linux available on request, unclear if it works on a Mac) and "Human motion analysis and annotation". Red Hen may add additional modules, such as gesture detection classifiers. There is already a "video Hand Head Tracking / Human motion analysis recognizer" -- we should test it. These plugins may be designed for desktop use.

Creating an export plugin for ELAN

Since ELAN is an open-source project, a plug-in could be created allowing the ELAN user to "Export to Red Hen," "Import from Red Hen," and even "Save To Repository," with Red Hen as one of the available repositories.

Related projects

Comments