— How to use the online tagging interface
The online tagging interface is integrated into the Edge2 search engine for the NewsScape dataset at UCLA, the main Red Hen dataset. Because this is an online interface, it is not frame-accurate, so this is not the tool to use if you need a very high level of precision in your annotations. However, it can be very useful for projects where you identify categories of events or entities, such as gestures, people, objects, or locations. The annotations can be searched in conjunction with the wide array of metadata already available in NewsScape, such as parts of speech, linguistic frames, and sentiment.
For frame-accurate gesture coding, see Manual tagging and Integrating ELAN. For a colorful web-based tagging interface suitable for public presentations, see Video Annotation Tool.
- Integrating ELAN (desktop tagging with export to Red Hen)
- Manual tagging (with proposed Red Hen gesture tagging scheme)
- How to set up the iMotion annotator (draws rectangles on images to indicate event location)
- How to use the Video Annotation Tool (online multi-dimensional video annotation interface for talks and demos)
Open the Tagging Interface
After logging in to Red Hen's Edge search engine, search for a term or a time period. You will see the top line for each search result a list of links like this:
CNN The Lead With Jake Tapper ( video | text | montage | imageflow | metadata | tagging | permalink )
For online tagging, click on the "tagging" link. You will be taken to the tagging interface, which begins with a list of existing tags. Kinds of tags include segment boundaries, gestures, grammatical patterns, etc.
The tags you enter are stored in a mysql database. You can export the tags to a comma-separated values file.
At regular intervals, we export the tags and incorporate them selectively back into the main search engine.
We are currently working on extending the Edge Search Engine capabilities so that manually added tags can be included in a search request.
The current tagging schemes include:
- SEG -- segment boundaries (mostly added automatically)
- CAU -- causal reasoning analysis (see below)
- CRX -- corrected text (see below)
- GES -- gestures (see below)
- IMG -- Image characteristics (see below)
- LAN -- Language tag (see below)
- NER -- named entities (see below)
Red Hens can suggest new tagging schemes; see example below.
List of tags present
At the top of many pages in the tagging interface, you'll see a list of existing tags, which function as a table of contents. Typically this includes some segment tags. These are added automatically by post-processing routines following media capture, and block out story boundaries and commercials. You may also see other kinds of tags, depending on what has been added, either automatically by NLP annotation tools (see Current state of text tagging), or manually in this interface.
The list of tags functions as a table of contents; you can click on the boundary start or end times to navigate within the document.
In the body of the document, you will see the current tags with navigation links and a plus (+) sign at the end of each line:
Creating a new tag
To create a new tag, click on the plus sign, and you will see a drop-down list of tag types:
In this case, let's select the NER or Named Entity Recognition tag and press [add]:
Then click on the down arrow (˅) at the end of the new segment tag to access the segment options window:
Fill in the form, using the full name and official title. If known, add the relevant time period for the role.
Also adjust the boundaries as needed, so that the tag spans the relevant lines:
When you are done, click [update] to save the tag:
The causal reasoning tagging scheme classifies events by five stages (cf. Steen's talk at MIT), and leaves room for comments:
The comment tags CI1, CI2, CI3, and CI4 are open-field tags. This makes them less useful, as the tagged information has no structure, so they should be avoided.
The CRX tag lets you enter a corrected text:
In this section, we present some screenshots of gesture tags.
Update 2015-05-21: the web-based tagging interface has new features, allowing the tagger to include the query that produced the hit (query) and the full text surrounding the hit (transcript). See screenshot below:
The IMG tag is designed to provide ground truth for machine learning in Computer Vision.
It currently includes visual features of a person:
The LAN tag is designed to allow you to tag the language used within a file. Files where the main language is not English are typically tagged for language in the header, but there are cases where different languages are used within the show. The tag currently supports Persian and Pushto, languages used in Afghan broadcasts, but we can easily add others:
See Creating a new tag.
How to create your own tagging scheme
To propose a new tagging scheme, use the format <tag name> <field name> <select-multi or text> <value>. Here is an example of a gesture tagging scheme:
GES Type select-multi left hand
GES Type select-multi right hand
GES Type select-multi both hands
GES Type select-multi point
GES Type select-multi trajectory
GES Type select-multi telic
GES Type select-multi atelic
GES Type select-multi viewpoint
GES Type select-multi iconic
GES Type select-multi metaphoric
GES Orientation select-multi sagittal
GES Orientation select-multi lateral
GES Topic select-multi time
GES Topic select-multi emotion
GES Topic select-multi argument
GES Comment text
Assemble your own tagging scheme and contact Francis Steen or Mark Turner to add it to the current definitions.
How To Export Tags
At the top of the tagging page, you'll see this export menu:
Select the tag you want; it will currently create a file with all instances of that tag in the MySQL tag database. You'll get the usual option to save the file:
The file will have a default name like "storycoding_export_GES_20141005222224.csv" that gives the tag and today's date and a bunch of numbers. Note this is a comma-separated values (csv) file; it can be imported into a spreadsheet application or a statistics package. The file looks like this:
# Export file generated by firstname.lastname@example.org on 2014-10-05 22:22:24 for tag GES
"2006-09-29_0300_US_CNN_Anderson_Cooper_360.tpt","a08e7ea2-be6d-11dc-8108-7b5efc535ceb","GES","2006-09-29 03:00:41","2006-09-29 03:00:43","","","",""
"2007-09-25_0100_US_CNN_Larry_King_Live.tpt","137a1d7a-be89-11dc-b23d-434b216e77ed","GES","2007-09-25 01:07:19","2007-09-25 01:07:22","left hand","","","sweeping motion - 'all over america'"
"2008-10-23_0630_US_KCET_Charlie_Rose.txt","0590f186-a0cc-11dd-80d6-00e0815fe83e","GES","2008-10-23 06:31:21","2008-10-23 06:31:23","both hands","lateral","time","Roundtable: interlocutor left, colleague opposite
Looking at gesture or at a blank spot in gesture 1 and 2. At the end of gesture 2, and in gesture 3, looks at interlocutor
coordination of beginning and end with inception and completion, and then coordination of final landmark with ""by doing THAT"", ""by doing that"" with ""path"" and ""result"" with final landmark. In gesture 3: ""toward"" coordinated with final landmark
Deserves a detailed qualitative analysis. It could be the object of ONE GESTURE-BLENDING PAPER OF ITS OWN, with some quantitative data in the background"
"2012-12-01_0300_US_CNN_Anderson_Cooper_360.tpt","8fc6e660-3b6b-11e2-a367-001517add6fb","GES","2012-12-01 03:30:11","2012-12-01 03:30:11","","","",""
"2012-12-13_1600_US_CNN_Newsroom.tpt","a4b78490-4547-11e2-a15d-001517add6fb","GES","2012-12-13 16:46:00","2012-12-13 16:46:03","both hands","lateral","time","Both hand move at the same time and the direction is from left-to-right"
"2013-05-13_1900_US_CNN_Newsroom.tpt","ad5b3b8c-bc07-11e2-9ad0-001517add4ae","GES","2013-05-13 19:00:18","2013-05-13 19:00:20","","","",""
"2013-05-13_1900_US_CNN_Newsroom.tpt","ad5b3b8c-bc07-11e2-9ad0-001517add4ae","GES","2013-05-13 19:00:20","2013-05-13 19:00:23","","","",""
"2013-05-13_1900_US_CNN_Newsroom.tpt","ad5b3b8c-bc07-11e2-9ad0-001517add4ae","GES","2013-05-13 19:07:13","2013-05-13 19:07:15","","","",""
"2013-05-22_1400_US_CNN_Newsroom.tpt","41aec9e0-c2f0-11e2-8d86-001517add4ae","GES","2013-05-22 14:00:20","2013-05-22 14:00:22","","","",""
"2013-06-17_1900_US_CNN_Newsroom.tpt","2954a736-d789-11e2-bc51-001517add6fb","GES","2013-06-17 19:00:12","2013-06-17 19:00:14","","","",""
"2013-06-19_1500_US_CNN_Newsroom.tpt","decd961a-d8f9-11e2-8837-001517add6fb","GES","2013-06-19 15:01:08","2013-06-19 15:01:11","","","",""
A future iteration of the export function may limit the exported records to those created by you or your group. At the moment, you get everything; for some tags, that's nothing or not much; for others, it's a massive file.