— How to use the online tagging interface

The online tagging interface is integrated into the Edge2 search engine for the NewsScape dataset at UCLA, the main Red Hen dataset. Because this is an online interface, it is not frame-accurate, so this is not the tool to use if you need a very high level of precision in your annotations. However, it can be very useful for projects where you identify categories of events or entities, such as gestures, people, objects, or locations. The annotations can be searched in conjunction with the wide array of metadata already available in NewsScape, such as parts of speech, linguistic frames, and sentiment. 

For frame-accurate gesture coding, see Manual tagging and Integrating ELAN. For a colorful web-based tagging interface suitable for public presentations, see Video Annotation Tool.

Related pages

Open the Tagging Interface

After logging in to Red Hen's Edge search engine, search for a term or a time period. You will see the top line for each search result a list of links like this:

   CNN The Lead With Jake Tapper   ( video | text | montage | imageflow | metadata | tagging | permalink )

For online tagging, click on the "tagging" link. You will be taken to the tagging interface, which begins with a list of existing tags. Kinds of tags include segment boundaries, gestures, grammatical patterns, etc.
The tags you enter are stored in a mysql database. You can export the tags to a comma-separated values file.
At regular intervals, we export the tags and incorporate them selectively back into the main search engine. 
We are currently working on extending the Edge Search Engine capabilities so that manually added tags can be included in a search request.

Tagging schemes

The current tagging schemes include:

  • SEG -- segment boundaries (mostly added automatically)

  • CAU -- causal reasoning analysis (see below)

  • CRX -- corrected text (see below)
  • GES -- gestures (see below)

  • IMG -- Image characteristics (see below)

  • LAN -- Language tag (see below)

  • NER -- named entities (see below)

Red Hens can suggest new tagging schemes; see example below.

List of tags present

At the top of many pages in the tagging interface, you'll see a list of existing tags, which function as a table of contents. Typically this includes some segment tags. These are added automatically by post-processing routines following media capture, and block out story boundaries and commercials. You may also see other kinds of tags, depending on what has been added, either automatically by NLP annotation tools (see Current state of text tagging), or manually in this interface.

   Segment List

   # start time end time tag
   1 10:00:02 pm 10:00:02 pm SEG
   2 10:00:14 pm 10:00:16 pm SEG

The list of tags functions as a table of contents; you can click on the boundary start or end times to navigate within the document.
In the body of the document, you will see the current tags with navigation links and a plus (+) sign at the end of each line:

Creating a new tag

To create a new tag, click on the plus sign, and you will see a drop-down list of tag types:

In this case, let's select the NER or Named Entity Recognition tag and press [add]:

Then click on the down arrow (˅) at the end of the new segment tag to access the segment options window:

Fill in the form, using the full name and official title. If known, add the relevant time period for the role.

Also adjust the boundaries as needed, so that the tag spans the relevant lines:

When you are done, click [update] to save the tag:

Causal Reasoning

The causal reasoning tagging scheme classifies events by five stages (cf. Steen's talk at MIT), and leaves room for comments:

Causal reasoning stages

Comment Tags

The comment tags CI1, CI2, CI3, and CI4 are open-field tags. This makes them less useful, as the tagged information has no structure, so they should be avoided.

Comment tag

Corrected Text

The CRX tag lets you enter a corrected text:

Corrected text tag

Gesture Tags

In this section, we present some screenshots of gesture tags.

Update 2015-05-21: the web-based tagging interface has new features, allowing the tagger to include the query that produced the hit (query) and the full text surrounding the hit (transcript).  See screenshot below:
Updated tagging features

Image characteristics

The IMG tag is designed to provide ground truth for machine learning in Computer Vision.

It currently includes visual features of a person:

Visual features of a person

Language tag

The LAN tag is designed to allow you to tag the language used within a file. Files where the main language is not English are typically tagged for language in the header, but there are cases where different languages are used within the show. The tag currently supports Persian and Pushto, languages used in Afghan broadcasts, but we can easily add others:

Language tag

Named Entities

See Creating a new tag

How to create your own tagging scheme

To propose a new tagging scheme, use the format <tag name> <field name> <select-multi or text> <value>. Here is an example of a gesture tagging scheme:

GES    Type    select-multi    left hand
GES    Type    select-multi    right hand
GES    Type    select-multi    both hands
GES    Type    select-multi    point
GES    Type    select-multi    trajectory
GES    Type    select-multi    telic
GES    Type    select-multi    atelic
GES    Type    select-multi    viewpoint
GES    Type    select-multi    iconic
GES    Type    select-multi    metaphoric
GES    Orientation    select-multi    sagittal
GES    Orientation    select-multi    lateral
GES    Topic    select-multi    time
GES    Topic    select-multi    emotion
GES    Topic    select-multi    argument
GES    Comment    text    

Assemble your own tagging scheme and contact Francis Steen or Mark Turner to add it to the current definitions.

How To Export Tags

At the top of the tagging page, you'll see this export menu:

Red Hen tag export menu

Select the tag you want; it will currently create a file with all instances of that tag in the MySQL tag database. You'll get the usual option to save the file:

RedHen tag export filename

The file will have a default name like "storycoding_export_GES_20141005222224.csv" that gives the tag and today's date and a bunch of numbers. Note this is a comma-separated values (csv) file; it can be imported into a spreadsheet application or a statistics package. The file looks like this:

# Export file generated by you@ucle.edu on 2014-10-05 22:22:24 for tag GES
"2006-09-29_0300_US_CNN_Anderson_Cooper_360.tpt","a08e7ea2-be6d-11dc-8108-7b5efc535ceb","GES","2006-09-29 03:00:41","2006-09-29 03:00:43","","","",""
"2007-09-25_0100_US_CNN_Larry_King_Live.tpt","137a1d7a-be89-11dc-b23d-434b216e77ed","GES","2007-09-25 01:07:19","2007-09-25 01:07:22","left hand","","","sweeping motion - 'all over america'"
"2008-10-23_0630_US_KCET_Charlie_Rose.txt","0590f186-a0cc-11dd-80d6-00e0815fe83e","GES","2008-10-23 06:31:21","2008-10-23 06:31:23","both hands","lateral","time","Roundtable: interlocutor left, colleague opposite

Looking at gesture or at a blank spot in gesture 1 and 2. At the end of gesture 2, and in gesture 3, looks at interlocutor

coordination of beginning and end with inception and completion, and then coordination of final landmark with ""by doing THAT"", ""by doing that"" with ""path"" and ""result"" with final landmark. In gesture 3: ""toward"" coordinated with final landmark

Deserves a detailed qualitative analysis. It could be the object of ONE GESTURE-BLENDING PAPER OF ITS OWN, with some quantitative data in the background"
"2012-12-01_0300_US_CNN_Anderson_Cooper_360.tpt","8fc6e660-3b6b-11e2-a367-001517add6fb","GES","2012-12-01 03:30:11","2012-12-01 03:30:11","","","",""
"2012-12-13_1600_US_CNN_Newsroom.tpt","a4b78490-4547-11e2-a15d-001517add6fb","GES","2012-12-13 16:46:00","2012-12-13 16:46:03","both hands","lateral","time","Both hand move at the same time and the direction is from left-to-right"
"2013-05-13_1900_US_CNN_Newsroom.tpt","ad5b3b8c-bc07-11e2-9ad0-001517add4ae","GES","2013-05-13 19:00:18","2013-05-13 19:00:20","","","",""
"2013-05-13_1900_US_CNN_Newsroom.tpt","ad5b3b8c-bc07-11e2-9ad0-001517add4ae","GES","2013-05-13 19:00:20","2013-05-13 19:00:23","","","",""
"2013-05-13_1900_US_CNN_Newsroom.tpt","ad5b3b8c-bc07-11e2-9ad0-001517add4ae","GES","2013-05-13 19:07:13","2013-05-13 19:07:15","","","",""
"2013-05-22_1400_US_CNN_Newsroom.tpt","41aec9e0-c2f0-11e2-8d86-001517add4ae","GES","2013-05-22 14:00:20","2013-05-22 14:00:22","","","",""
"2013-06-17_1900_US_CNN_Newsroom.tpt","2954a736-d789-11e2-bc51-001517add6fb","GES","2013-06-17 19:00:12","2013-06-17 19:00:14","","","",""
"2013-06-19_1500_US_CNN_Newsroom.tpt","decd961a-d8f9-11e2-8837-001517add6fb","GES","2013-06-19 15:01:08","2013-06-19 15:01:11","","","",""

A future iteration of the export function may limit the exported records to those created by you or your group. At the moment, you get everything; for some tags, that's nothing or not much; for others, it's a massive file.