Overview of the Red Hen Vision and Program

Last updated: 2018-07-26

Red Hen Lab coordinates research in the study of multimodal communication. This page describes its vision, principles, illustrative ongoing projects, and prospective projects.

Red Hen’s Research Vision

Human beings are evolved for elaborate multimodal communication. Cultures support this power. Communicating seems easy to human beings, just as seeing seems easy. But it is immensely complex, involving not only vision but also movement, sound, interpersonal interaction, dynamic coordination across agents, conceiving of the intentions of other agents, and so on. Unlike vision, advanced multimodal communication is found in only human beings; there are no good animal models. Red Hen seeks to gather and develop mathematical, computational, statistical, technical, and computational tools to help advance research into multimodal communication.

The study of multimodal communication made advances with the invention of on-line corpus presentation—the British National Corpus, the Russian National Corpus, the Corpus of Contemporary American, and so on—but the limitations of these corpora were sharp: the data were mostly text, with limited and dated holdings.

The Red Hen Research Program

The principles of the Red Hen program are as follows:

  1. Continual amassing of shared big data of all sorts having to do with multimodal communication, accessible by all members of the lab worldwide but physically distributed.
  2. Using this data to develop new computational, statistical, and technical methods and tools that can be applied to many other kinds of data involving multimodal communication.
  3. Online provision of as much of the data and the tools as possible, worldwide to Red Hen researchers.
  4. Overriding to the extent possible impediments to scientific collaboration that arise from institutions of law, finance, university structure, academic disciplines, and government, usually by distributing parts of the workflow to whatever venue globally is most conducive to ease and efficiency.
  5. Open-minded inclusion of all sorts of data.
  6. Open-minded inclusion of work done from any theoretical perspective—Red Hen Lab is a bazaar, not a cathedral.
  7. Science depends upon detecting where theories differ; ontologies derive from theories and are implicit theories; accordingly, Red Hen seeks for every aspect of communication to include metadata derived from conflicting theories and tagged as such. Therefore, Red Hen deploys conflicting ontologies in parallel, so that we can see which theories are good for what, and where different theories are weak, and test theories against big data. For example, voice is sound and text is visible marks, and there are many theories of how to map between these sounds or these marks and grammatical structure. Red Hen uses several different Natural Language Processing taggers and is open to adding others. For example, all these NLP parsers will tag "the" in "The more, the merrier" as a definite article, but recent research argues pretty convincingly that it is a survival of an instrumental case of demonstrative "that," used to mean "by so much," and processed by current speakers as something close to that. Red Hen seeks parsers that can do such tagging! This principle—a bazaar, not a cathedral, so we can compare and weigh what is offered in each tent—is general for Red Hen, because its purpose is not sales or getting attention but science.
  8. Presenting achievements in such a way that they can be taken up by other researchers world-wide and built upon.
  9. Preventing the obscuring of data by privileging one or another theoretical lens.
  10. Inclusion of conflicting theoretical and methodological approaches and outcomes, not least so the competing interpretations can be compared.
  11. Tagging all data and tools with their sources, so that users can filter by source as well as by other features.
  12. Using and developing only open-source tools (with some constrained exceptions, but nothing that could impede further work), and more, full engagement with the open-source community: e.g., merging upstream with open-source projects.
  13. A sociological understanding among the members that Red Hen is a cooperative of researchers, all of whom contribute substantially to the advancement of the power of Red Hen, by locating and providing resources, contributing infrastructure and tools, amassing and sharing data, on the principle that all Red Hens should receive access to every other Red Hen’s contributions—Red Hen is in no way a service, but rather a mostly informal international league of reliable researchers who work together.
  14. Continual induction of researchers and institutions who support this program and who agree to follow the Red Hen principles.
  15. Frequent lab meetings, including workshops, grant writing sessions, project and experiment design, and conferences, exploiting and developing distance technologies to do collaborative research in real time and to dynamically foster new networks of collaboration between Red Hens and potential Red Hens, including students and postdocs, so that meetings are for newcomers something like medical rounds for medical students.
  16. In addition to publications offering proof-of-concept theoretical advances, the creation of actual systems, tools, datasets, and metadata as ever-expanding public goods supporting research.
  17. And, perhaps most important, a spirit of innovation, adventure, trust, and responsibility shared by Red Hens worldwide.

Some Projects Guided By The Red Hen Lab Research Program

As a research program, Red Hen functions as a cooperative exchange of research agendas and priorities, domain expertise, datasets, funding, and funding opportunities. Red Hen projects mostly arise whenever teams of Red Hens form to take responsibility for a particular specific project, and much of Red Hen’s operation is designed to foster the development of such teams. Here are some examples of completed and ongoing projects within Red Hen’s program.

Holdings. Red Hen gathers and connects datasets of many different kinds: text, photographs of paintings and sculpture, and audio or video or audiovisual recordings. In principle, any record of human communication is of interest to Red Hen, but above all, Red Hen needs massive datasets in consistent formats with time-correlated image, text, and audio data on which to develop computational and statistical tools. Accordingly, her largest holding by far consists of recordings of TV broadcast news. Such recording and archiving for the purpose of research is protected by section 108 of the U.S. Copyright Act. “News” includes any sort of broadcast in which current events are a topic, and so includes talk shows, interview programs, and so on. The TV holdings include at present about 350,000 hours of recordings, in an expanding variety of languages (English, Spanish, French, both European and Brazilian Portuguese, Italian, Norwegian, Swedish, Danish, German, Arabic, Russian, Polish, Czech, Chinese). Each day, roughly 150 hours of news are ingested robotically. This dataset of course has special features, and it is crucial in research always to keep in mind the nature of the data being used: TV news is not pillow talk. But such recordings, stretching now in Red Hen back to the 1970s because of digitization of analog holdings, include not only scripted speakers but vast footage of people being interviewed, having conversations, making presentations to crowds, operating in public spaces, or being recorded without their knowledge (such as surveillance footage). It also includes advertisements. Beyond TV news, Red Hen connects datasets of photographs, art works, texts, illuminated manuscripts, lab recordings of human beings as subjects of experiments, video conference communication, Youtube videos, Twitter data, cartoons and graphic novels, and so on, with new datasets routinely being located and networked. Having a variety of datasets makes it possible not only to locate differences in communication across these genres (e.g. the way deictics like “ ere” and “now” are used in TV news versus the way they are used in personal letters or Skype) but also regularities: the TV news and pillow talk might both be in English, for example, and both offer evidence for the use of various grammatical patterns.

Data Structure. Data are stored in flat files. A given record consists of a set of files with the same file name, indicating absolute start time, location, and event. Each file contains images, text, and audio data with precise start and end time, so that all aspects of the data and metadata can be kept in millisecond registration.

Tagging. Red Hen works with a global network of developers to add new features to existing tools for extracting metadata and annotating text and images. We work with CCExtractor to support text extraction from Brazilian, Russian, Czech, and Chinese television; we work with Stanford NLP to improve sentence splitting in single-case text. Our deployment of SEMAFOR to query FrameNet has generated by far the largest frame-annotated dataset in the world to date; we are working with researchers on improvements. Within the Red Hen Program, there is a significant potential for developing targeted test datasets for specific problems. Red Hens are working on creating test datasets for partially occluded timeline gestures, discourse management strategies, scare quote usage, and more.

Pipelines. Red Hen develops automated processing pipelines hosted on high-performance computing clusters, with the capacity to process hundreds of thousands of hours of video, audio, and text. Incoming recordings from around the world are picked up by UCLA’s Hoffman2 cluster, where on-screen text is retrieved in twelve languages via optical character recognition, using screenshots fed to custom versions of Tesseract, and the video is compressed. The text is split into words and sentences at the University of Erlangen, using custom code with Stanford NLP. The two streams come together again in the audio pipeline at Case Western Reserve University for forced alignment, speaker diarization, gender identification, and speaker recognition, and in the video pipeline for shot characterization and gesture detection. For the web site http://viz2016.com, joint text and image analyses are utilized for speaker and location detection and for topic detection and clustering in television data, joined with twitter data. These pipeline projects and others are open-ended. For example, Red Hen is conducting her third consecutive Google Summer of Code team in the development of these pipelines, to include elements such as emotion detection and characterization, controversy detection and sentiment tagging, and word-based multi-dimensional audio analysis.

Search. Red Hen has developed both command-line search utilities on text and tags using a variety of *nix calls and web search interfaces for text, metadata, and visual features. CQPweb, which is based on the software used to search the British National Corpus, is available to search for patterns in Red Hen English-language holdings. Development of such search tools is open-ended. Red Hens search reports are optimized for analysis using the statistical software package R.

Machine Learning. Data tagged using such open-source tools as ELAN and Red Hen’s Rapid Annotator are ingested into Red Hen’s metadata, in part so that tagging done by individual researchers is no longer withheld from the global research community. That “ground truth” datawhich is manually tagged by expertsis then made available to machine learning teams for the training of recognizers and classifiers. These machine-learning tools are then used to tag the Red Hen data automatically, thereby helping researchers in multimodal communication.

Multimodal constructions and co-speech gesture. To know a language is to know a vast relational network of form-meaning pairs (called by linguists “constructions”) and how they can blend. Nearly all the massive research in linguistics on constructions takes text as its data, but form-meaning pairs can include aspects of speech, gesture, the manipulation of material affordances in the environment, and so on. Researchers can locate large numbers of uses of (even infrequent) constructions in Red Hen data because it is so large and diverse and easily searched. Constructions that have been researched in Red Hen include comparative correlatives (“The closer you come, the more I hear"), XYZ (“Causation is the cement of the universe"), kinds of “absolute" constructions (“Absent diplomacy, this will fail"), conditional constructions, and many others. The researcher can see not only the text but the full human performance of the communication, including voice, gesture, and so on (Turner 2015).

Errors. “Errors" in expression are not random; they instead indicate cognitive processing. But such mistakes often go unrecognized by the human hearer, who “accommodates” mentally, and those mistakes that are detected in text are typically eliminated under editing. Red Hen data, however, frequently include such communicative performances. The researcher can predict such patterns, and check the predictions against the dataset (Turner 2017).

Deictics in different contexts. How are deictic expressions (e.g. “here,” “now,” “there,” “then”) used in different communicative environments and in different languages with different structures of deictic expressions? Nesset et al. (2013) used Red Hen to explore this topic in English versus Russian.

Embryonic and Speculative Target Projects

Multi-language frame detection. Red Hen already tags its entire English subset for conceptual frames using FrameNet. But there are FrameNet projects for other languages, e.g., Spanish. It is a natural but new extension for Red Hen to seek to detect frames in a variety of languages, which would not only produce better metadata for the researcher in that language but also provide insight for cross-linguistic frame resources.

Prosody indicating viewpoint. Speakers express their viewpoint, attitude, or perspective on the meaning of what they are saying, often framing its source. The stance a speaker or a listener adopts towards some content can be expressed wordlessly, with a shrug, a stare, a gasp, a wave of the hand, a smack, a tearful eye, a hollow laugh, or prosodically, through delicate modulations of the speed, pitch, and quality of the voice. A speaker in the act of presenting claims may for instance indicate epistemic distance—that is, a viewpoint of doubt or distrust—from these claims. This epistemic distance is crucial to the communication, but is often irretrievably lost in a mere verbal transcript. Red Hen is launching a project on the automatic detection of viewpoint as expressed by prosody.

Automatic gesture recognition. Human faces exhibit a wide variety of expressions and emotions. To recognize them, Joo et al. (2015) developed a hierarchical model to judge the perceived personalities of politicians automatically from their facial photographs and detected traits. Many similar projects could be pursued within Red Hen. Red Hen is also beginning to produce automatic classifiers for arm and hand gestures used in, for example, co-speech gesture for timelines.

Conclusion

Red Hen deploys the contributions of researchers from complementary fields, from AI and statistics to linguistics and political communication, to create rich datasets of parsed and intelligible multimodal communication and to develop tools to process these data and any other data susceptible to such analysis. Red Hen’s social organization and computational tools are designed for reliable and cumulative progress in a dynamic and extremely challenging field: the systematic understanding of the full complexity of human multimodal communication. The study of how human beings make meaning and interpret forms depends upon such collaboration.

References

  • Joo, Jungseock, Francis Steen, and Song-Chun Zhu. 2015. Automated facial trait judgment and election outcome prediction: Social dimensions of face. In Proceedings of the IEEE International Conference on Computer Vision, pages 3712–3720.
  • Nesset, Tore, Anna Endresen, Laura A Janda, Anastasia Makarova, Francis Steen, and Mark Turner. 2013 How “here” and “now” in Russian and English establish joint attention in tv news broadcasts. Russian linguistics, 37(3): 229–251.
  • Turner, Mark. 2015. Blending in language and communication. In Handbook of Cognitive Linguistics, pages 211–232. De Gruyter Mouton.
  • Turner, Mark. 2017. Multimodal form-meaning pairs for blended classic joint attention. Linguistics Vanguard.