Google Summer of Code 2016 Ideas Page

Red Hen Summer of Code site on Google: https://summerofcode.withgoogle.com/organizations/6587220198686720/

Red Hen was allotted 6 slots by Google in Google Summer of Code 2016.

See our Red Hen Reports from Google Summer of Code 2016.

Video: What Kind of Red Hen Are You? (2016)

Red Hen is once again an approved organization for Google Summer of Code.

See the Red Hen Lab page on the Google Summer of Code 2016 Site.

Students, Apply! Propose a project to redhenlab@gmail.com. Code the summer away. Achieve ultimate glory. And get a nice paycheck!

About us

What is Red Hen?

Red Hen is an international consortium for research on multimodal communication. Red Hen develops open-source tools for joint parsing and tagging of text, audio & speech, and video, using a very large international dataset, mostly of television news broadcasts in a variety of languages. Red Hen automatically ingests, processes, and tags about 150 hours of new recordings daily. Red Hen develops machine learning classifiers, search instruments, visualization tools, and other computational and statistical instruments for the purposes of research into multimodal communication. It operates as a cooperative of engaged researchers who collaborate closely and contribute power and content to Red Hen and hence to each other and to future researchers. It lacks the resources and organization to serve scholars other than those who work in the cooperative, but all of the code and systems it develops, all of which are open source, are available to not only other researchers but the world public.

Who is behind Red Hen?

Faculty, staff, and students at several research universities around the world, including Case Western Reserve University, UCLA; the Universities of Oxford, Basel, Osnabrück, Southern Denmark, and Navarra; FAU Erlangen, Centro Federal de Educação Tecnológica in Rio, and many more. Click here for details.

Exactly what software is used in Red Hen?

Red Hen uses 100% open source software. In fact, not just the software but everything else—including recording nodes—is shared in the consortium.

Among other tools, Red Hen uses CCExtractor, ffmpeg, and OpenCV (all of which have been part of GSoC in the past). Since Red Hen depends thoroughly on established open source software, there are many opportunities for cross-project collaboration. Red Hen is happy to see her students submit ideas to improve any of the programs on which she relies.

Of course, Red Hen also has her own software; all of it is also available, even if some parts are very specific to Red Hen's work at this point.

Who uses Red Hen's infrastructure? Can I have access to it?

Section 108 of the U.S. Copyright Act permits Red Hen, as a library archive, to record news broadcasts from and to loan recorded materials to Red Hen researchers engaged in projects monitored directly by Red Hen's directors and senior personnel. Section 108 restrictions apply to only the corpus of recordings, not the software. Because everything we do is open source, anyone can replicate our infrastructure.

Participants in the Summer of Code will have full access to the main NewsScape dataset at UCLA and other datasets that have been added to Red Hen, and applicants have full access to sample datasets.

What's the Red Hen Corpus?

The Corpus is a huge archive of TV programming. The stats as of February 2016 are:

Total networks: 38

Total series: 2,443

Total duration in hours: 274,994 (224,988)

Total metadata files (CC, OCR, TPT): 731,080 (601,983)

Total words in metadata files (CC, OCR, TPT): 3.39 billion, 3,387,525,947 exactly (2.81)

Total caption files: 354,474 (289,463)

Total words in caption files: 2.24 billion, 2,244,431,301 exactly (1.86)

Total OCR files: 343,994 (284,482)

Total TPT files: 32,612 (28,038)

Total words in OCR files: 756.86 million, 756,856,280 exactly (619.10)

Total words in TPT files: 386.24 million, 386,238,366 exactly (331.77)

Total video files: 354,312 (289,315)

Total thumbnail images: 98,997,974 (80,995,756)

Storage used for core data: 87.07 terabytes (71.64)

General considerations

Red Hen is interested in novel and innovative solutions to a wide range of text, speech, and image parsing tasks; suggestions below are merely starter concepts. Red Hen invites you to expand on them considerably to make them your own, or to suggest ideas Red Hen didn't think of.

Some of the ideas below spell out tasks that cannot possibly be perfectly implemented in the space of one summer, but you might make an honorable start on any of them. Red Hen will work with you to define just the right level of challenge for you; this is an important part of the whole process.
You are free to use any open source tool for any task, or write your own code. If you use something that is not part of Red Hen, you are encouraged to submit all your work to the official maintainers. Build on the work of others and prepare your work so that others can build upon it.
If you are interested and have questions, the sooner you contact Red Hen about them the better. Red Hen wants to help you prepare a great proposal. Don't be shy.
You can find links to source code next to the ideas where it's relevant. Because each organization and university contributes a bit, the repository is not centralized. An index page where all source code can be easily be found is being prepared.
See the Personal Software Process (PSP) (details)

Contacting us

Please send your proposal to redhenlab@gmail.com and not to the individual mentors. Write "GSoC-student", your name, and the proposed topic at the start of the subject so Red Hen can prioritize GSoC emails. We reply as soon as possible.

During the actual GSoC, mentors will be available on Skype and/or Google Hangouts. Red Hen has at its disposal a variety of videoconferencing systems, including Cisco Telepresence and Cisco Jabber, Scopia, WebEx, and Adobe Connect. It is called the "Distributed Little Red Hen Lab" because we operate across many nations and many time zones, often running lab meetings through multipoint videoconferencing. You can also connect with us via linked-in. When requesting the connection, please mention GSoC.

Sister projects: CCExtractor and Vitrivr

One of Red Hen's tools is CCExtractor. It's a tool that takes media files and produces a transcript of the subtitles. The output of CCExtractor is used to 'follow the chain' and start the language analysis.

CCExtractor is also applying to Summer of Code this year. Red Hen encourages students to also check their ideas page. We share some resources (a couple of mentors and even hardware) with them, so there are many opportunities for connections between the two projects. See their ideas page at http://www.ccextractor.org/gsoc2016.html.

Red Hen is also working closely with Vitrivr for the development of a sketch-based search engine. Please see their Ideas page at http://vitrivr.com/gsoc.html.

Project ideas

General

Your project should be in the general area of multimodal communication, whether it is parsing, analyzing, searching, or visualizing. When you plan your proposal, bear in mind that your project should result in a module that is ready to be deployed and can be operated on a massive dataset.

A. Bootstrapping Human Motion Data for Gesture Analysis (advanced)

End goal : We want to detect and identify different types of human gestures of people in news videos using automated computer vision methods. Some examples includes finger-pointing, nodding, moving up one's shoulder, and so on.

Problem 1 - indefinite categories: We do not know the complete list of human gestures we want to detect.

Problem 2 - gathering sufficient training data efficiently: The frequencies of certain gesture types in news video may be very low, which will lead to either a shortage of training examples or exhaustive search on the entire collection which is very time consuming.

Suggested approach:

1) Start by using existing human detection or tracking models on the news videos and obtain a number of human body sequences (video segments).

2) Using simple measure (eg, magnitude of motion flow), identify when and where any motions occur in each segments.

3) Clustering segments based on motion features and pass the cluster to human annotators who will then identify the types of gestures and also correct errors.

4) Training gesture models using the modified training videos. Go back to 1) if necessary.

B. Recognition of Elements of Blended Classic Joint Attention Using Machine Learning.

See general notes under (B) below on machine learning. Red Hen will develop a training set. See Blended Classic Joint Attention for further details.

C. Gesture recognition using machine learning

Red Hen is developing an integrated workflow for locating, characterizing, and identifying elements of co-speech gestures in a massive dataset of television news. By co-speech gestures we mean behaviors that are typically associated with a particular utterance—for instance, the utterance "from start to finish" is often associated with a particular timeline gesture of both arms and hands. Other co-speech gestures may include gaze direction, head movements, shoulder movements, or specific body postures. The research workflow includes the following steps:

Identify a co-speech gesture you want to target (we have suggestions)
List the verbal constructions associated with the gesture
Locate instances of the gesture in our NewsScape Edge search engine (we will give you access on request)
Create a list of instances with timestamps (we will submit them and generate the clips)
Annotate the clips with https://tla.mpi.nl/tools/tla-tools/elan/download (free download), using the Red Hen generic template (see Manual tagging and How to annotate with Elan)
Export Elan's .eaf file to the Red Hen data structure (we provide the conversion tool)
Train your machine learning network on your annotations and develop a classifier
Run the classifier on a larger test dataset
Provide feedback and improve
Generate tags to be integrated into our database

We invite students to propose specific multimodal constructions of gesture and speech that they would like to develop a classifier for. We have some annotated datasets and will work with you to prepare additional sets.

We are also open to proposals that target sub-components of gesture recognition, such as the parsing of shoulders, arms, torsos, or whole bodies, improved algorithms for skin detection, gaze direction, and arm movement detection.

Other relevant components include scene detection and scene continuity detection through camera angle changes and changes in lighting. Three-dimensional modeling may also be appropriate and useful.

D. News Shot Classification (modest)

Goal : We want to classify the scene type of a news shot. Given a news video, the code will segment the whole news video into distinct shots and classify the type of each shot into one of the predefined shot categories such as studio shot, reporter in the field, weather map, graphics, and so on. This could be multi-label classification where we allow more than one label per shot.

Problem - Developing the code for classification is straightforward. The challenge is to deploy the code in a distributed computing cluster and get the result efficiently. ----- (The same applies to the former task, but that task itself might be complex, so I don't feel it's worthwhile to put additional requirement on top of that.)

Suggested approach:

1) Obtain some samples and develop a list of common shot categories.

2) Use a publicly available visual models and library to extract visual features from video frames (eg, use Imagenet models with Caffe)

3) Train classifiers using the features.

4) Deploy it in the cluster.

Scene detection: https://pyscenedetect.readthedocs.org/en/latest/

E. Multimodal parsing

In addition to machine learning approaches, we are open to other proposals in multimodal parsing, using Natural Language Processing, audio parsing, and computer vision.

F. Search and visualization

Red Hen has a long-standing interest in developing multimodal search and data visualization tools. If you have interesting ideas in these areas, we would like to hear from you.

G. Machine learning for automatic tagging of vast art databases

Over the last two decades, various foundations have provided considerable resources to create digital images, including not only 2D still images but also dynamic 2D-for-3D images, of temples, cathedrals, museums, paintings, sculptures, ritual sites, and so on, and also to digitize existing non-digital images. The results are impressive: a human being can search via web-based graphical user interfaces for images in vast repositories. But such searches typically depend upon linguistic tags created by the human beings who placed the image in the archive. Some advances have been made in automatic recognition of elements of such image repositories. Google Images provides one set of advances. Red Hen is already in the forefront of allowing for search and statistical analysis of images, including dynamic images, based on linguistic correlates. For example, Red Hen has a pilot deep neural net classifier for timeline gestures that accompany expressions such as "from start to finish." Such a classifier can be unleashed on a vast repository to tag images without need of human attention. Such an automatic classifier is developed by submitting a training set to a deep neural net machine learning system, and working from there. Machine learning presents a great opportunity for Red Hen. Working with a number of machine learning teams, Red Hen aspires to develop tools for automatic tagging of vast repositories of images. There are many possibilities. Here are just two illustrative examples: (1) The visual iconography of Christian theology has been studied for centuries and elaborately cataloged. The works of Émile Mâle, for example, such as Religious Art in France: The Late Middle Ages: A Study of Medieval Iconography and Its Sources, sedulously indicate which visual elements accompany which saints, which events, which people, which theological characters, and so on. Such knowledge is indispensable to the interpretation of, for example, the architectural design and artistic decoration of cathedrals and great churches. But finding such visual features relies almost solely on the memory of a trained individual art historian or tour guide. Red Hen is interested in developing machine classifiers for automatic tagging of images in vast repositories dedicated to these artifacts. Imagine, for example, instructing a search engine to find all examples in art from the 13th century in which there is a man looking out of the painting and pointing in a direction he is not looking, a direction at the end of which is a lamb, and then separating what one finds into two categories: those in which the man is wearing a hair shirt, and those in which he is not. Or imagine any other set of search terms. Conceivably, the world's vast repositories (in the Vatican, the Getty, the Cleveland Museum of Art, shutterstock, Google Images, ArtStor, etc.) could be transformed into databases that could be searched with great nuance, thereby advancing the power of anyone interested in art, broadly conceived, representation, ritual, performance, group events, and so on. (2) Gesture researchers worldwide investigate, for the most part, co-speech gesture, that is, gestures that accompany speech, as part of a coordinated system. Mostly, these scholars painstakingly make a few recordings in a laboratory, recordings that they cannot release to other scholars, because they were created under the restrictions of Human Subjects Research. Worse, these very researchers often cannot investigate that data again subsequently, because authorization for investigating the data was restricted to only the explicitly stated original purpose. Indeed, gesture researchers must often delete their such data once authorization to keep it has expired. Red Hen is already in the forefront of allowing researchers worldwide to investigate enormous instances of gestures because those gestures are part of recordings (e.g. broadcast news) that are not constrained by principles of Human Subjects Research. In the current moment, Red Hen seeks to extend the research into co-speech gesture into silent art: paintings, sculpture, friezes, mime, and so on. These instances of "silent art" very often include representations of co-speech gesture: the representation of a person (which is performed by an actual person in the case of acting) may include a representation of a co-speech gesture, such as ticking off points on a finger, or hushing the other participants (the adloqutio floor-taking device), and so on. Red Hen is beginning to assemble databases of images of co-speech gesture in art for the purpose of providing training sets to deep neural net machine learning systems, to produce classifiers for the automatic tagging of repositories of images for co-speech gestures. See, e.g., http://www.redhenlab.org/home/the-cognitive-core-research-topics-in-red-hen/the-barnyard/speech-gestures-in-art

H. Development of a Query Interface for Parsed Data

The task is to create a new and improved version of a graphical user interface for graph-based search on dependency-annotated data.

The new version should have all functionality provided by the prototype plus a set of new features. The back-end is already in place.

Current functionality:

- add nodes to the query graph

- offer choice of dependency relation, PoS/word class based on the configuration in the database (the database is already there)

- allow for use of a hierarchy of dependencies (if supported by the grammatical model)

- allow for word/lemma search

- allow one node to be a "collo-item" (i.e. collocate or collexeme in a collostructional analysis)

- color nodes based on a finite list of colors

- paginate results

- export xls of collo-items

- create a JSON object that represents the query to pass it on to the back-end

New functionality:

- allow for removal of nodes

- allow for query graphs that are not trees

- allow for specification of the order of the elements

- pagination of search results should be possible even if several browser windows or tabs are open.

- configurable export to csv for use with R

- compatibility with all major Web Browsers (IE, Firefox, Chrome, Safari) [currently, IE is not supported]

- parse of example sentence can be used as the basis of a query ("query by example")

Steps:

1. Go to http://www.treebank.info and play around with the interface (user: gsoc2015, password: redhen) [taz is a German corpus, the other two are English]

2. Decide on a suitable JavaScript Framework (we'd suggest reactJS paired with jQuery or something along these lines - this will have to be discussed)

3. Think about html representation. We would like to have it HTML5/CSS3, but for the moment we are not sure whether we can meet the requirements without major work on <canvas> or whether we can have sensible widgets without having to dig into the <canvas> tag.

4. Contact Peter Uhrig to discuss details or ask for clarification on any point.

Don't let the range or scope of ideas overwhelm you. Red Hen has many projects because it is a large collaborative with an ambitious agenda.

Be honest about what you think you will be able to accomplish over the summer. Red Hen realizes that some of the ideas may take more than a month and not all of them will be accomplished in a summer.

Remember that as a student you are not only allowed but encouraged to bring your own ideas. The most important thing is that you are passionate about what you are going to do over the summer.