Red Hen Lab - GSoC 2023 Ideas

Red Hen Lab has been approved as Google Summer of Code 2023 mentor organization.

Ideas are listed at the bottom of this page, but read the page thoroughly before you apply.

Beginning 2022, GSoC no longer requires that contributors be students; it allowed both medium (~175 hour) projects and long (~350 hour) projects; and was open to extended timelines for project completion (from 12 to 22 weeks; 12 weeks is the standard). Red Hen Lab, unless noted otherwise for a specific project, is willing to consider both medium and short versions of contributions. Red Hen Lab considers 12 weeks to be the default, but, as warranted, is open to discussing other arrangements.  

Red Hen Lab works closely with FrameNet Brasil and with vitrivr. Members of each group routinely serve as mentors for GSoC projects run by the other two groups. Feel free to submit similar proposals to two or three of these groups. Red Hen, FrameNet Brasil, and vitrivr will coordinate to decide on best placement. 


Red Hen Google Summer of Code 2023

redhenlab@gmail.com

See Guidelines for Red Hen Developers and Guidelines for Red Hen Mentors

How to Apply

The great distinction between a course and a research opportunity is that in a course, the professor lays it out, gives assignments, start to finish.  In research, the senior researchers are expected to do all of that themselves.  They cogitate on the possibilities, use their library and networking skills to locate and review the state of the art, make judicious decisions about investment of time and other resources, and chart a path.  Usually, the path they choose turns out to be a dead-end, but in research, success even some of the time is a great mark of distinction. Research is about doing something that has not ever been done before.  The junior researcher, or student learning to do research, is not expected to do everything that a senior researcher does, but is expected, first, to work continuously to learn how to improve by studying senior researchers, and, second, to explore general research opportunities picked out by senior researchers, review the literature, get a strong sense of the state of the art, and think about how it could be built upon.  The junior researcher, having been directed to an area or areas by the senior researchers, is expected to find and read the research articles, explore the possibilities, and propose a tractable piece of work to undertake.  The senior researchers then mentor now and then. The senior researchers are especially valuable for their experience, which usually gives them a much sharper sense of which path is more likely to be fruitful, but nonetheless, the senior researchers are sometimes surprised by what the junior researchers manage to hit upon. Junior researchers are largely self-organizing, self-starting, self-inspiring.  A proposal from a student of the form, “I am highly motivated and know about X, Y, and Z and would love to do something related to topic W.  Where do I start?” is not a research proposal and is inappropriate for Red Hen Lab. 


Template

Some projects below are marked as requiring that proposals use this Latex Template.   But everyone should follow its instructions. Here  is a text version:

GSOC 2023 Project Proposal for RedHenLab

Write Your Name Here 

Date

Summary of the Proposal

Write a brief summary of the proposal. The summary should not exceed 120 words. A single paragraph is best. The summary should include a few lines about the background information, the main research question or problem that you want to write about, and your methods. The proposal summary should not contain any references or citations. Your entire proposal cannot exceed 2000 words, so choose the words in this section carefully. The 1500 words you will write in the proposal document will exclude any words contained in the tables, figures, and references.

Background

In the background section, write briefly using as many paragraphs, lists, tables, figures as you can about the main problem. This background section will typically have three sections:

What is known about the topic

• What is not known about the topic, and the challenges 

• What unknowns and challenges you will address 

Cite all relevant references. If you use any part from previous research, you must cite it properly. Proposals assembled by copypasta from papers and websites will be ignored.

Goal and Objectives

Describe the goal(s) of your project and how you will meet those goals. Typically, the way to write this is something like, "The goal of this research is to ...", and then continue with something like, "Goal 1 will be met by achieving the following objectives ...", and so on. The goal is a broad based statement, and the objectives are very specific, achievable tasks that will show how you will achieve the goal you set out.

Methods

In this section, discuss:

Tentative Timeline

This is the fourth and final section of your proposal. You need to provide a tentative timeline, on what time frame you are planning to accomplish the goals you mentioned in Section . We recommend using this Gantt chart [1], something like this, with your objectives and milestones listed on the y-axis and the weeks of GSoC on the x-axis.

These are the compulsory sections that you will need to include in your proposal. Then submit it to the named mentor for the project or just to redhenlab@gmail.com. If you use this Latex Template on Overleaf, then you can generate a PDF of your project proposal by selecting the PDF symbol on the top of its window. Save the PDF to your hard drive and then upload that one copy of PDF to Learn. Include complete citations and references. For example, we have cited here a secondary analysis of data from papers published for about 40 years on statistical inference. It was an interesting paper written by Stang et.al. [2] and published in 2016. A full citation of the paper is mentioned in the references section.


References

[1] HL Gantt. 1910. Work, wages and profit. Engineering Magazine. New York.

[2] Andreas Stang, Markus Deckert, Charles Poole, and Kenneth J Rothman. 2016. Statistical inference in abstracts of major medical and epidemiology journals 1975-2014: a systematic review. European journal of epidemiology, November.

The Great Range of Projects that You Might Design

Red Hen Lab will consider any mature pre-proposal related to the study of multimodal communication.  A pre-proposal is not a collaboration between a mentor and a student; rather, the mentor begins to pay attention once a reasonably mature and detailed outline for a pre-proposal is submitted. A mature pre-proposal is one that completes all the Template sections in a thorough and detailed manner.  Red Hen lists a few project ideas below; more are listed in the Barnyard of Potential Possible Projects.  But you are not limited to these lists.  Do not write to ask whether you may propose something not on this list. The answer is, of course! We look forward to your mature and detailed pre-proposals.

Once you have a mature and detailed idea, and a pretty good sketch for the template above, you may send them to Red Hen to learn whether a mentor is interested in your idea and sketch, and then to receive some initial feedback and direction for finishing your pre-proposal. Red Hen mentors are extremely busy and influential people, and typically do not have time to respond to messages that do not include a mature and detailed idea and a pretty good sketch of the template above. Use the Template above to sketch your pre-proposal, print it to pdf, and send it to redhenlab@gmail.comIf a mentor is already listed for a specific project, send it also to that mentor.

The ability to generate a meaningful pre-proposal is a requirement for joining the team; if you require more hand-holding to get going, Red Hen Lab is probably not the right organization for you this year. Red Hen wants to work with you at a high level, and this requires initiative on your part and the ability to orient in a complex environment. It is important that you read the guidelines of the project ideas, and you have a general idea of the project before writing your pre-proposal. 

When Red Hen receives your pre-proposal, Red Hen will assess it and attempt to locate a suitable mentor; if Red Hen succeeds, she will get back to you and provide feedback to allow you to develop a fully-fledged proposal to submit to GSoC 2022. Note that your final proposal must be submitted directly to Google, not to redhenlab@gmail.com

Red Hen is excited to be working with skilled students on advanced projects and looks forward to your pre-proposals.

Know Red Hen Before You Apply

Red Hen Lab is an international cooperative of major researchers in multimodal communication, with mentors spread around the globe. Together, the Red Hen cooperative has crafted this Ideas page, which offers some information about the Red Hen dataset of multimodal communication (see some sample data here and here) and a long list of tasks.

To succeed in your collaboration with Red Hen, the first step is to orient yourself carefully in the relevant material. The Red Hen Lab website that you are currently visiting is voluminous.  Please explore it carefully. There are many extensive introductions and tutorials on aspects of Red Hen research. Make sure you have at least an overarching concept of our mission, the nature of our research, our data, and the range of the example tasks Red Hen has provided to guide your imagination. Having contemplated the Red Hen research program on multimodal communication, come up with a task that is suitable for Red Hen and that you might like to embrace or propose. Many illustrative tasks are sketched below. Orient in this landscape, and decide where you want to go.

The second step is to formulate a pre-proposal sketch of 1-3 pages that outlines your project idea. In your proposal, you should spell out in detail what kind of data you need for your input and the broad steps of your process through the summer, including the basic tools you propose to use. Give careful consideration to your input requirements; in some cases, Red Hen will be able to provide annotations for the feature you need, but in other cases successful applicants will craft their own metadata, or work with us to recruit help to generate it. Please use the Latex template to write your pre-proposal, and send us the pdf format.

Red Hen emphasizes: Red Hen has programs and processes—see, e.g., her Τέχνη Public Site, Red Hen Lab's Learning Environment—for tutoring high-school and college students. But Red Hen Google Summer of Code does not operate at that level.  Red Hen GSoC seeks mature students who can think about the entire arc of a project: how to get data, how to make datasets, how to create code that produces an advance in the analysis of multimodal communication, how to put that code into production in a Red Hen pipeline.  Red Hen is looking for the 1% of students who can think through the arc of a project that produces something that does not yet exist. Red Hen does not hand-hold through the process, but she can supply elite and superb mentoring that consists of occasional recommendations and guidance to the dedicated and innovative student.

Requirements for Commitment

In all but exceptional cases, recognized as such in advance, your project must be put into production by the end of Google Summer of Code or you will not be passed or paid. Most projects will create a pipeline or contribute to an existing pipeline in the Red Hen central operations.  This can mean, e.g., scripting (typically in bash) an automated process for reading input files from Red Hen's data repository, submitting jobs to the CWRU HPC using the Slurm workload manager, running your code, and finally formatting the output to match Red Hen's Data Format. Consider these requirements as opportunities for developing all-round skills and for being proud of having written code that is not only merged but in regular production! Explore the current Red Hen Lab pipelines and think about how your project would work with them.

Tips for working with your mentors

Note that your project will probably need to be implemented inside a Singularity container (see instructions). This makes it portable between Red Hen's high-performance computing clusters. Red Hen has no interest in toy, proof-of-concept systems that run on your laptop or in your user account on a server. Red Hen is dedicated exclusively to pipelines and applications that run on servers anywhere and are portable. Please study Guidelines for Red Hen Developers, including the section on building Singularity containers. You are required to maintain a github account and a blog. 

In almost all cases, you will do your work on CWRU HPC, although of course you might first develop code on your device and then transfer it to CWRU HPC. On CWRU HPC, do not try to sudo; do not try to install software.  Check for installed software on CWRU HPC using the command

module

e.g.,

module spider singularity

module load gcc

module load python

On CWRU HPC, do not install software into your user account; instead, if it is not already installed on CWRU HPC, install it inside a Singularity container so that it is portable.  Red Hen expects that Singularity will be used in most cases.  Why Singularity? Here are 4 answers; note especially #2 and #4:

What is so special about Singularity?

While Singularity is a container solution (like many others), Singularity differs in its primary design goals and architecture:

A few further tips for rare, outlier cases:

Remember to study the blogs of other students for tips, and document on your own blogs anything you think would help other students.

More Tips for Working with your Mentors

Background Information

Red Hen Lab participated in Google Summer of Code in 2015, 2016, 2017, 2018,  2019, 2020, 2021, and 2022, working with brilliant students and expert mentors from all over the world. Each year, Red Hen has mentored students in developing and deploying cutting-edge techniques of multimodal data mining, search, and visualization, with an emphasis on automatic speech recognition, tagging for natural language, co-speech gesture, paralinguistic elements, facial detection and recognition, and a great variety of behavioral forms used in human communication. With significant contributions from Google Summer of Code students from all over the world, Red Hen has constructed tagging pipelines for text, audio, and video elements. These pipelines are undergoing continuous development, improvement, and extension. Red Hens have excellent access to high-performance computing clusters at UCLA, Case Western Reserve University, and FAU Erlangen; for massive jobs Red Hen Lab has an open invitation to apply for time on NSF's XSEDE network.

Red Hen's largest dataset is the NewsScape Library of International Television News, a collection of more than 600,000 television news programs, initiated by UCLA's Department of Communication, developed in collaboration with Red Hens from around the world, and curated by the UCLA Library, with processing pipelines at UCLA, Case Western Reserve University, and FAU Erlangen in Germany.  Red Hen develops and tests tools on this dataset that can be used on a great variety of data—texts, photographs, audio and audiovisual recordings. Red Hen also acquires big data of many kinds in addition to television news, such as photographs of Medieval art, and is open to the acquisition of data needed for particular projects. Red Hen creates tools that are useful for generating a semantic understanding of big data collections of multimodal data, opening them up for scientific study, search, and visualization. See Overview of Research for a description of Red Hen datasets.

In 2015, Red Hen's principal focus was audio analysis; see the Google Summer of Code 2015 Ideas page. Red Hen students created a modular series of audio signal processing tools, including forced alignment, speaker diarization, gender detection, and speaker recognition (see the 2015 reports, extended 2015 collaborations, and github repository). This audio pipeline is currently running on Case Western Reserve University's high-performance computing cluster, which gives Red Hen the computational power to process the hundreds of thousands of recordings in the Red Hen dataset. With the help of GSoC students and a host of other participants, the organization continues to enhance and extend the functionality of this pipeline. Red Hen is always open to new proposals for high-level audio analysis.

In 2016, Red Hen's principal focus was deep learning techniques in computer vision; see the Google Summer of Code 2016 Ideas page and Red Hen Lab page on the Google Summer of Code 2016 site. Talented Red Hen students, assisted by Red Hen mentors, developed an integrated workflow for locating, characterizing, and identifying elements of co-speech gestures, including facial expressions, in Red Hen's massive datasets, this time examining not only television news but also ancient statues; see the Red Hen Reports from Google Summer of Code 2016 and code repository. This computer vision pipeline is also deployed on CWRU's HPC in Cleveland, Ohio, and was demonstrated at Red Hen's 2017 International Conference on Multimodal Communication. Red Hen is planning a number of future conferences and training institutes. Red Hen GSoC students from previous years typically continue to work with Red Hen to improve the speed, accuracy, and scope of these modules, including recent advances in pose estimation.

In 2017, Red Hen invited proposals from students for components for a unified multimodal processing pipeline, whose purpose is to extract information about human communicative behavior from text, audio, and video. Students developed audio signal analysis tools, extended the Deep Speech project with Audio-Visual Speech Recognition, engineered a large-scale speaker recognition system, made progress on laughter detection, and developed Multimodal Emotion Detection in videos. Focusing on text input, students developed techniques for show segmentation, neural network models for studying news framing, and controversy and sentiment detection and analysis tools (see Google Summer of Code 2017 Reports). Rapid development in convolutional and recurrent neural networks is opening up the field of multimodal analysis to a slew of new communicative phenomena, and Red Hen is in the vanguard.

In 2018, Red Hen GSoC students created Chinese and Arabic ASR (speech-to-text) pipelines, a fabulous rapid annotator, a multi-language translation system, and multiple computer vision projects. The Chinese pipeline was implemented as a Singularity container on the Case HPC, built with a recipe on Singularity Hub, and put into production ingesting daily news recordings from our new Center for Cognitive Science at Hunan Normal University in Hunan Province in China, directed by Red Hen Lab Co-Director Mark Turner. It represents the model Red Hen expects projects in 2019 to follow.

In 2019, Red Hen Lab GSoC students made significant contributions to add speech to text and OCR to Arabic, Bengali, Chinese, German, Hindi, Russian, and Urdu. We built a new global recording monitoring system, developed a show-splitting system for ingesting digitized news shows, and made significant improvements to the Rapid Annotator. For an overview with links to the code repositories, see Red Hen Lab's GSoC 2019 Projects.

Red Hen's themes for 2020 can be found here.

Red Hen's themes for 2021 can be found here.

Red Hen's themes for 2022 can be found here.

In large part thanks to Google Summer of Code, Red Hen Lab has been able to create a global open-source community devoted to computational approaches to parsing, understanding, and modeling human multimodal communication. With continued support from Google, Red Hen will continue to bring top contributors from around the world into the open-source community.

What kind of Red Hen are you?

More About Red Hen

Our mentors

Stephanie Wood
University of Oregon

Vaibhav Gupta 

IIIT Hyderabad

https://sites.google.com/site/inesolza/home

Inés Olza

University of Navarra 

https://sites.google.com/site/cristobalpagancanovas/

Cristóbal Pagán Cánovas.

University of Murcia

Anna Pleshakova

 Anna Wilson, Oxford

Heiko Schuldt

Heiko Schuldt,

University of Basel

Gulshan Kumar

IIIT Hyderabad

https://www.anglistik.phil.fau.de/staff/uhrig/

Peter Uhrig. ScaDS.AI, TU Dresden

Grace Kim

UCLA  

Tiago Torrent

Federal University of Juiz de Fora

José Fonseca, Polytechnic 

Higher Education Institute of Guarda 

Ahmed Ismail

Ahmed Ismail

Cairo University & DataPlus

Leonardo Impett

EPFL & Bibliotheca Hertziana

Frankie Robertson, GSoC student 2020

Wenyue Xu

Smith College

GSoC student 2020



Maria M. Hedblom

www.mariamhedblom.com


Sumit Vohra

NSIT, Delhi University

Swadesh Jana

Oliver Czulo

Uni-Leipzig

Marcelo Viridiano

Federal University of Juiz de Fora

Ely Matos

Federal University of Juiz de Fora

Arthur Lorenzi

Federal University of Juiz de Fora

Fred Belcavello

Federal University of Juiz de Fora

Mark Williams

Dartmouth College

John Bell

Dartmouth College

Nitesh Mahawar

Sabyaschi Ghosal

Bosch Global Software Technologies, Bengalaru

The profiles of mentors not included in the portrait gallery are linked to their name below.

More guidelines for project ideas

Your project should be in the general area of multimodal communication, whether it involves tagging, parsing, analyzing, searching, or visualizing. Red Hen is particularly interested in proposals that make a contribution to integrative cross-modal feature detection tasks. These are tasks that exploit two or even three different modalities, such as text and audio or audio and video, to achieve higher-level semantic interpretations or greater accuracy. You could work on one or more of these modalities. Red Hen invites you to develop your own proposals in this broad and exciting field.

Red Hen studies all aspects of human multimodal communication, such as the relation between verbal constructions and facial expressions, gestures, and auditory expressions. Examples of concrete proposals are listed below, but Red Hen wants to hear your ideas! What do you want to do? What is possible? You might focus on a very specific type of gesture, or facial expression, or sound pattern, or linguistic construction; you might train a classifier using machine learning, and use that classifier to identify the population of this feature in a large dataset. Red Hen aims to annotate her entire dataset, so your application should include methods of locating as well as characterizing the feature or behavior you are targeting. Contact Red Hen for access to existing lists of features and sample clips. Red Hen will work with you to generate the training set you need, but note that your project proposal might need to include time for developing the training set.

Red Hen develops a multi-level set of tools as part of an integrated research workflow, and invites proposals at all levels. Red Hen is excited to be working with the Media Ecology Project to extend the Semantic Annotation Tool, making it more precise in tracking moving objects. The "Red Hen Rapid Annotator" is also ready for improvements. Red Hen is open to proposals that focus on a particular communicative behavior, examining a range of communicative strategies utilized within that particular topic. See for instance the ideas "Tools for Transformation" and "Multimodal rhetoric of climate change". Several new deep learning projects are on the menu, from "Hindi ASR" to "Gesture Detection and Recognition". On the search engine front, Red Hen also has several candidates: the "Development of a Query Interface for Parsed Data" to "Multimodal CQPweb". Red Hen welcomes visualization proposals; see for instance the "Semantic Art from Big Data" idea below.

Red Hen is now capturing television in China, Egypt, and India and is happy to provide shared datasets and joint mentoring with our partners CCExtractor, who provides the vital tools for text extraction in several television standards,  for on-screen text detection and extraction.. 

When you plan your proposal, bear in mind that your project should result in a production pipeline. For Red Hen, that means it finds its place within the integrated research workflow. The application will typically be required to be located within a Singularity module that is installed on Red Hen's high-performance computing clusters, fully tested, with clear instructions, and fully deployed to process a massive dataset. The architecture of your project should be designed so that it is clear and understandable for coders who come after you, and fully documented, so that you and others can continue to make incremental improvements. Your module should be accompanied by a python application programming interface or API that specifies the input and output, to facilitate the construction of the development of a unified multimodal processing pipeline for extracting information from text, audio, and video. Red Hen prefers projects that use C/C++ and python and run on Linux. For some of the ideas listed, but by no means all, it's useful to have prior experience with deep learning tools.

Your project should be scaled to the appropriate level of ambition, so that at the end of the summer you have a working product. Be realistic and honest with yourself about what you think you will be able to accomplish in the course of the summer. Provide a detailed list of the steps you believe are needed, the tools you propose to use, and a weekly schedule of milestones. Chose a task you care about, in an area where you want to grow. The most important thing is that you are passionate about what you are going to work on with us. Red Hen looks forward to welcoming you to the team!

Ideas for Projects

Red Hen strongly emphasizes that a student should not browse the following ideas without first having read the text above them on this page. Red Hen remains interested in proposals for any of the activities listed throughout this website (http://redhenlab.org). 

See especially the 

Barnyard of Possible Specific Projects

Red Hen is uninterested in a preproposal that merely picks out one of the following ideas and expresses an interest.  Red Hen looks instead for an intellectual engagement with the project of developing open-source code that will be put into production in our working pipelines to further the data science of multimodal communication.  What is your full idea? Why is it worthy? Why are you interested in it? What is the arc of its execution? What data will you acquire, and where? How will you succeed? 

Please read the instructions on how to apply carefully before applying for any project. Failing to follow the guidelines of the application will result in your (pre)proposal's not being considered for GSoC2023.

1. Automatic detection and analysis of relationships between prosody and co-speech gesture.

For an orientation, see https://imcc.web.ox.ac.uk/event/benefits-prosodic-embodiment-second-language-pronunciation-learning. The recording of this talk will be posted on the IMCC YouTube channel at https://www.youtube.com/channel/UC1UbYkb_DUYvzdf_bbB5vhg

Thoughts: [Post here links to the usual taxonomies for analyzing gesture, to papers and demos of openpose and mediapipe etc. for the detection of co-speech gesture, and to some introductory works on prosody.]  When do people point while talking? How is pointing related to prosody? When do their hand movements "demonstrate" prosody, as when a Chinese speaker, explaining a tone, uses a dynamic hand gesture to "map" the tone? When, while speaking, do people beat with an open palm hand lateral toward the body, and is it related to prosody? Individual researchers have investigated such questions, usually with painstaking methods of manual annotation.  Can AI help, by proposing recognitions of gesture, recognitions of prosody, and analyzing their relationships? This project is at the intersection of computer vision, audio recognition, pattern detection, and automatic statistical analyses (e.g. in R).

2. Red Hen Anonymizer

Build on and improve the existing Red Hen Anonymizer.

See 

There are many new ideas for elaboration of RHA.  For example, does StarGAN's use of Generative Adversarial Networks add functionality? See https://arxiv.org/pdf/1711.09020v3.pdf. Contact turner@case.edu to discuss details or ask for clarification. This is a task that can take 12 weeks and 175 hours of work, although a more sophisticated proposal with strech goals could be consider for 350 hours of work. Difficulty rating: easy to medium.

3. AI Frame Blend Nominator

Build on and develop the existing Frame Blend Nomination Tool.  

Mentored by Wenyue Xi and Mark Turner. The purpose of this project is to extend the highly-successful work done by Wenyue Xi during Google Summer of Code 2020. Mark Turner was her mentor. Wenyue and Mark will mentor this project. Study Wenyue Xi's blog and github page at Red Hen Lab GSoC 2020 Projects.

Contact turner@case.edu to discuss details or ask for clarification.

Red Hen already has a frame tagging system for English that exploits FrameNet; for details, see Tagging for Conceptual Frames. Red Hen Lab works closely with Framenet Brasil, another Google Summer of Code organization, and is eager to involve other languages in her tagging of conceptual frames. Conceptual blending of frames is a major area of research in cognitive science and cognitive linguistics. Can we develop a system that locates them in language and images?  Wenyue Xi's Frame Blend Nomination System does just that. Study http://redhenlab.org to familiarize yourself with the Red Hen data holdings and other existing tools before submitting a pre-proposal for this project.

Long project (350 hours); difficulty: hard.

4. Émile Mâle Pipeline

Build on and improve the previous work on the Émile Mâle pipeline.

See https://sites.google.com/site/distributedlittleredhen/home/the-cognitive-core-research-topics-in-red-hen/the-barnyard/christian-iconography-the-emile-male-pipeline

Contact turner@case.edu to discuss details or ask for clarification.  Depending on the level of ambition, this is a task for 12 weeks and either 175 or 350 hours of work. Difficulty level: medium.

5. Red Hen Audio Tagger

This is a continued project from GSOC 2022. For the work done till now, please refer to the links below.

Github Codebase: https://github.com/technosaby/RedHenAudioTagger 

GIST:  https://gist.github.com/technosaby/b1e68810f63ff47207a18ee5e46359c6

GSOC 2022 Weekly Progress & Experience: https://technosaby.github.io/gsoc/gsoc.

In the last year we focussed more on creating the Red Hen pipeline where we generated outputs in the form of SFX and CSV to import into the ELAN tool, and this year we can extend it for performance. Like, we had a lot of mislabeled tags in the Red Hen video database. Is it possible to create a new model which can detect tags with more accuracy? Can we include tags that are more relevant for Red Hen use cases?  Can we fine-tune any preexisting model for better performance?

Contact turner@case.edu to discuss details or ask for clarification.  Depending on the level of ambition, this is a task for 12 weeks and either 175 or 350 hours of work. Difficulty level: medium.

6. Red Hen Hatcher - Installation, Configuration and Management of a Pi Station

Currently, RedHen has more than a dozen remote Raspberry Pi capture stations all around the World that provide media from many different languages and cultures so it can be used by researchers from around the globe.

To help further increase this number, new capture stations should be deployed by volunteers that are willing to contribute, but want to spend the least amount of time dealing with technical aspects of setting up and configuring a capture station, as well as managing its daily operation.

This project proposes to develop a desktop application, Red Hen Hatcher, that is able to prepare an SD card to be used in a Raspberry Pi capture station. It should provide a wizard set of dialogs that go through the process, helping the user make informed decisions. After obtaining these data, the Red Hen Hatcher application automatically generates the necessary scripts and configuration files to prepare and set up the capture station.

Once the capture station is up and running, the Red Hen Hatcher application should be able to access a central Red Hen repository to obtain updates, as well as to the capture station in order to apply the updates and allow its management. The management page should look like a dashboard, able to view and act on several tasks. Examples of these tasks are the backup generation, the visualisation of the SD card's and hard drive's free space, the internet connectivity status, low voltage warning, the check of the SD Card's health, the timetable of the capture channels allowing edits, the captured files that have not yet been uploaded to the Red Hen Lab central server.

The Red Hen Hatcher application must be developed using only open source software. It is suggested to be programmed with Python or Java.

This is a task that should be completed in 12 weeks and around 175 hours of work. Difficulty level: easy

Contact jozefonseca@gmail.com to discuss details or ask for clarification. 

7. Multimodal Sentiment Detection in Televised News Content

 

Summary

Building on past work using automated techniques to identity the sentiment of text, audio, pictures or video, this project seeks to develop a multimodal approach to the identification of sentiment in television news content. 


Goal and Objectives

The objective of this project is to combine techniques in text-, audio-, and video-based automated annotation techniques to identify the “sentiment” or “tone” of televised news stories. Doing so serves several purposes. We do not yet understand how much of the valence of news content is conveyed by the script, or audio or video content, nor do we have a a have a sense for the frequency with which these different “streams” correlate or deviate. We consequently do not know whether text-only annotations capture most, or all, or only a little of the variation in valence within and across television news stories. Exploring valence in a more multimodal way will provide novel information on how valence is conveyed in news, and it will allow us to test the validity of past work that relies on text alone. This project aims to develop tools that scholars of political communication will use for research related to the production, consumption, and downstream effects of valence of news content.

Methods

This project will develop three lines of functionality and integrate them into a single tool:

(1) an implementation of dictionary- and supervised-learning-based models of sentiment detection in new transcripts,

(2) the use of machine-learning to identity tonal differences in audio content from television news programs,

(3) the use of machine-learning to annotate characters, movement, objects, etc in video content from television news, and

(4) the development of a system combining information from three three streams to produce a range of multimodal measures of news sentiment.

Potential approach:

Contact Stuart Soroka <snsoroka@ucla.edu> for ideas and developments. 

8. Multimodal stance detection


While most Americans still obtain their news through television, this remains one of the understudied aspects of the information ecosystem. Although network news consumption has declined, cable news stations account for an increasing percentage of American television viewing. It is important to understand not only what stations people are consuming, but also what these stations are creating. Recent studies have shown that small differences in wording influence study participants to view certain groups as socially distant and less human. Even biased information—not necessarily extreme—can have real-world impacts ranging from micro-aggression and harassment to more serious violence. It is unclear how prevalent biases are among the three major cable news networks, FNC, CNN, and MSNBC. For instance, the study will investigate whether FNC is more likely to promote, neutralize, or debunk pro-gun control arguments over anti-gun control arguments compared with CNN. We propose to develop a stance detection method for television news programs using multimodal data. We focuse on topics: gun control, abortion, immigration, and covid/vaccination at the stance level, i.e., promoting, neutralizing, or debunking. A news program may cover more than one story, but the stance may vary from story to story. Thus, we divide a news program into segments consisting of coherent stories. We can perform stance detection on a topic-by-topic basis (or story-by-story) once we have identified consecutive sentences relating to the same topic using mixture topic models. The use of text, while useful for dividing a news program into segments of coherent stories, can be insufficient to capture stance, as much of the signal is transmitted via audio and image (facial expressions). We will hand code a limited set of videos that are intended to promote, neutralize, or debunk the topics under study. In light of the fact that manual coding is an expensive and time-consuming process, we use a small sample of annotated videos utilizing weak-supervised and unsupervised methods. We develop an embedding that incorporates both image and audio to develop a joint low-dimension representation of the data. Developing a deep variational autoencoder, we minimize the reconstruction error for images and audio jointly. Lastly, the embedding space data points will be used as inputs to weak-supervised and unsupervised models that infer the stance of a story.


Mentored by Homa Hosseinmardi <homahm@seas.upenn.edu>

9. Feature tracking based on moving geometric targets


The Semantic Annotation Tool (https://github.com/novomancy/waldorf-scalar/) is a web-based interface for adding time-based annotations with geometric targets to videos. Currently SAT defines an annotation’s geometric target using a starting and ending set of vertices and linearly tweening the target between those two sets of points over the duration of the annotation as an svg animation. However, motion in the underlying film is often not linear, so the geometric target does not follow the features that are intended to be shown if motion is erratic or changes direction. This project seeks to extend the SAT and related toolsets to automatically move a geometric target by tracking the underlying feature it annotates based on analysis of the image data inside the start/end vertices, generating a description for an svg animation of the geometric target that tracks the feature, and updating SAT to be able to overlay the new annotation on a video. How tracking is implemented is up to you, so long as it can be integrated with the Red Hen and SAT pipelines; the SAT itself is built using jquery and npm/grunt tooling. (350 hours, medium difficulty)

10. Multi-Stage Classification Aggregator

Red Hen and the Media Ecology Project seek proposals for a system to unify and transform metadata generated by multiple video classifiers. This project builds on work done by Shreyan Ganguly in Red Hen GSoC 2021 and 2022 to create an automated cut detector resilient enough to use on archival footage. A new project underway will use the existing cut detector as the first stage in a processing pipeline that also includes 3D pose estimation, scene depth information, and potentially other ML-powered types of classification. Additional data to be tracked includes basic content management information such as file location and descriptive metadata. (175 hours, easy difficulty)

11. 3D Pose Smoothing

Red Hen and the Media Ecology Project seek proposals for code that will analyze automatic pose detection results from commercial narrative video to smooth skeletal motion data. 3D data provided by software like BlazePose typically shows a great deal of jitter on a frame-by-frame basis. This project intends to prepare extracted poses to animate 3D avatars, recreating movement portrayed on film in an abstracted VR environment. Using the raw, jittery results would be disconcerting in this context, however, so we need to fill gaps, reduce irrelevant noise, and generally clean pose data so it can be transformed into FBX animations. See also: https://wimpouw.github.io/EnvisionBootcamp2021/MergingAcousticsMT.html (350 hours, medium to hard difficulty)

12. IPTV capture

Red Hen looks to expand her capture to include IPTV. Resources such as BBC iPlayer and Channel4 in the UK, RTVE in Spain, or ARD and ZDF Mediathek in Germany need to be channeled into Red Hen's standard scheduling and capture pipelines, including closed-captions or subtitles that are included in the broadcast stream. Red Hen routinely uses youtube-dl for such captures. You are asked to (1) extend youtube-dl's capabilities; see, e.g. https://github.com/ytdl-org/youtube-dl/issues/16779#issuecomment-781608403; and (2) create robust automated ingestion of IPTV broadcast into the Red Hen format. See http://redhenlab.org for specifics on Red Hen data format. 

This project can be carried out both as a medium (175 hours) and long project (350 hours), depending on the number of supported sites. Difficulty is medium.

Mentors: Francis Steen, Jacek Wózny, Melanie Bell, and Javier Valenzuela.

13. CQPweb plugins (and plugin structure)

Red Hen uses an open-source software called CQPweb to facilitate linguistic research. However, CQPweb is not yet fully equipped to handle audio and video data, so it needs modifications for our purposes. Your task is to create plugins for audio analysis using the EMU webApp and better query options (e.g. the ability to search by sounds using IPA symbols), and additional downloaders for ELAN and Praat files. Where CQPweb's plugin structure cannot cater to our needs, you will submit merge requests to the CQPweb codebase. Proficiency in PHP, JavaScript and HTML is required.

Mentors: Peter Uhrig, Javier Valenzuela, and others

This project can be a medium-sized (175 hours) or a long (350 hours) project, depending on the number of features to be implemented. This is a medium to hard project.

14. Development of a Query Interface for Parsed Data

Mentored by Peter Uhrig's team

This infrastructure task is to create a new and improved version of a graphical user interface for graph-based search on dependency-annotated data. The new version should have all functionality provided by the prototype plus a set of new features. The back-end is already in place. 

Develop current functionality:

Develop new functionality:

Contact Peter Uhrig <peter.uhrig@tu-dresden.de> to discuss details or to ask for clarification on any point.

This is a long project (350 hours), requiring some architectural decisions, i.e. we would classifiy it as a hard project on GSoC's scale (easy/medium/hard).

15. Detection of Intonational Units

Mentored by: Peter Uhrig, Anna Wilson

There are two potential projects in here, a medium sized project that works on the detection of intonational phrases based on the Santa Barbara Corpus (and possibly further annotations), and a large project that attempts to replicate the AuToBI system with modern machine learning applications (See Andrew Rosenberg’s PhD thesis for details).


16. Extraction of Gesture Features

Mentored by: Swadesh Jana, Ilya Burenko, Peter Uhrig

1. Detect gesture due to which hand(s) (left, right or both) in which direction (up, down, right, or left) and other such movements, if possible

2. Find commonly occurring phrase and gesture combinations so as to get meaningful gestures and identify particular phrases where some gestures are common.

This is a long-shot project, because neither gestures segmentation nor the nature of the phrasal units are well-defined at the moment. Large-scale project (350 hours). We may accept multiple students on projects related to this idea.

17. Whisper fine-tuning with time stamps

Mentored by: Raúl Sánchez, Cristóbal Pagán, Rosa Illán, Inés Olza

The fine-tuning methods available for the automatic speech recognition system Whisper (https://github.com/openai/whisper) lose the time stamps in the transcripts. Thus a transcription using time stamps for words or phrases is currently not possible with Whisper. Fine-tuning Whisper including transcription time stamps would allow us to traine smaller models for each language, increasing effectiveness in processing. It would also be a major step in time-aligning our own automatically-generated transcripts rather than having to rely on TV subtitles, which are typically much less accurate than Whisper.

HuggingFace (https://huggingface.co/blog/fine-tune-whisper) organized a fine-tuning event, albeit without time stamps. Huggingface libraries are currently undergoing modifications that could lead to a fine-tuning incorporating time stamps.

There are various projects seeking to improve Whisper through different approaches, but we are not yet there:

   * WhisperX (https://github.com/m-bain/whisperX)

   * whisper-timestamped (https://github.com/linto-ai/whisper-timestamped)

   * whisper-finetuning (https://github.com/jumon/whisper-finetuning) <- An actual attempt to incorporate time-stamps to the fine-tuning

   * Whisper Webui (https://gitlab.com/aadnk/whisper-webui)

Difficulty: medium. The anticipated duration for this project is 12 weeks, with a medium workload (175 hours). 

18. Automating the generation of CQPWeb + automatic API

Mentored by: Raúl Sánchez, Cristóbal Pagán, Rosa Illán, Inés Olza

We have a tool for corpus generation with CQPWeb using Docker Compose. Automation is complete except for the final stages: corpus configuration, adding mapping tables, index generation. Currently, those need to be carried out manually through the CQPweb interface. See the up-to-date documentation of the project so far: https://github.com/daedalusLAB/cqpweb-docker.

We need to automate those remaining actions, so that containers are generated in a fully automatic way when they are “built”. We also seek to add an API REST to be able to perform remote searches in a corpus without accessing the web interface. The backend for this API should be developed using the python cwb wrapper (https://github.com/ausgerechnet/cwb-ccc). We will provide a corpus.txt with the vrts, a folder with the mapping table and the script to carry out the cwb-encode/cwb-make. The build process should result in a functional CQPWeb with the sample corpus being usable, either via web (the usual case) or via API.

Difficulty: medium. The anticipated duration for this project is 12 weeks, with a medium workload (175 hours). 

19. Semantic search in video datasets

Mentored by: Raúl Sánchez, Cristóbal Pagán, Javier Valenzuela

This project aims to develop a multimodal semantic search engine for a set of videos with their corresponding vrts (time-aligned transcripts). Using a model for image description, such as CLIP, we will generate the description of a limited number of frames per phrase/sentence or any other linguistic chunk that we choose to delimit. We will then generate an embedding combining the phrase and the frame descriptions. The embeddings will be stored in a vector database.

When performing semantic searches, we will produce the embedding of the phrase searched and find the embeddings in the database that are closest to the embedding searched. The metadata of stored embeddings will have the file name and moment of the video in NewsScape. The website will be able to list the N first moments in the videos that are most similar to the embedding searched.

The deep-learning models used in this project must be open-source: Python FastAPI such as backend and Vue3.js with Frontend. The following models could in principle be used, although they need to be expanded and evaluated:

- Embedding generation: all_datasets_v4_MiniLM-L6 (https://huggingface.co/flax-sentence-embeddings/all_datasets_v4_MiniLM-L6)

- Frame description with CLIP: CoCa (https://huggingface.co/spaces/laion/CoCa)

- Weaviate vector database to perform semantic/hybrid searches (https://weaviate.io)

BONUS: Generate a question/answer platform using the "augmented knowledge". In this case, embeddings should encompass sets of phrases/sentences plus frame descriptions. The metadata to be stored would be texts, videos, and time stamps. When a question is asked, embeddings + metadata of the N (e.g. 5) most relevant results should be downloaded. Answers are generated with a language model that uses the texts (metadata of the embeddings returned by the search) as context. Suggestions:

-          Generate question/answering with this language model:  https://huggingface.co/google/flan-t5-xxl . Evaluate the computational needs to execute it and, if they are excessive, downgrade to flan-t5-xl or flan-t5-l ....

-          Use Haystack(https://github.com/deepset-ai/haystack) / Langchain (https://github.com/hwchase17/langchain) / IndexGPT (https://github.com/jerryjliu/gpt_index) to concatenate and fine-tune searches

Difficulty: medium. The anticipated duration for this project is 12 weeks, with a medium workload (175 hours). A longer duration and/or workload could be possible (up to 22 weeks and 350 hours), if adequately justified.

20. Automated Annotations using Semi-Supervised Clustering

Summary of Proposal:

This project aims to develop a pipeline to support semi-supervised learning (SSL) or clustering methods to quickly annotate a large unlabeled dataset.

Workflow:

The following set of tasks briefly introduces the scope of the pipeline:

(1) Begin by clustering or grouping similar features. Otherwise, if a smaller subset of manually annotated data (video/audio/text) is available then utilize it.

(2) Once clusters are formed, request users to annotate at the cluster level.

(3) These manual annotations would then be propagated to the instance level. 

(4) Finally, the automated annotations should contain clear labels to indicate the process of annotation (manual/semi-supervised).

The fundamental objective behind this project would be to alleviate the necessity for manual annotation through a rapid automated annotation process. 

Bonus:

This pipeline could be adapted to integrate with the Red Hen Rapid Annotator, Multimodal Television Show Segmentation, or the SkelShop pipeline. This process requires a well-documented interchange format.

Difficulty: This project can be a medium-sized (175 hours) or a long (350 hours) project, depending on the level of integration of this pipeline with existing tools.

Contact Harshith Mohan Kumar <hiharshith18@gmail.com> for clarifications or details.

21. From Perception to Thought: Designing a hybrid AI pipeline from identifying object relationships in visual stimulus to structure increasingly complex and abstract natural language expressions. 

 Mentored by: Maria M. Hedblom


Summary of the project:  
Using visual datasets, the task is to develop a system that is able to translate visual stimulation into conceptual "thought."

The idea behind this project is to simulate the identification and cognitive application of image schemas. In cognitive science, image schemas are described as spatiotemporal relationships between objects, agents and environments that construct the conceptual skeleton of particular situations, objects and events. Learned through repeated exposure to particular relationships in early infancy, these concepts are formed as the generalisable patterns that are frequently used in natural language, analogical thinking, concept invention and event conceptualisation. Boldly speaking, these generalised patterns can be argued to construct the core structure of the human experience and thought. For instance,  the classic image schema CONTAINMENT, encompasses the dynamic relationships of objects going in and out of other objects, and the static relationships of objects remaining inside or outside other objects.  In early infancy, it is learned from repeated exposure to situations where containment takes a central role, and as children become increasingly cognitively mature, this generalized pattern can be used to analogically reason about similar situations, construct natural language constructs and even ground increasingly abstract notions into the embodied experiences.  Inherently based on logical systems such as transitivity (e.g. how a contained object moves with the container object), these concepts are not possible to model with ML algorithms alone but need to be represented using a more rigid system of stored information in classic knowledge representation. 

Building from the idea that children predominantly learn image-schematic patterns from perception, the task in this project is to design and implement the cognitive pipeline from visual stimulus to intelligent thought expressed using natural language based on these conceptual structures.  It delimits the problem space by looking at a limited number of image schemas (and their encompassed relationships) with the idea to construct a three-stage system that encompasses perception -> pattern generalization -> thought generation.

The system should be generalizable enough to take a visual input, identify a particular set of spatiotemporal object relationships, the image schemas,  and use these for some conceptually complex behaviour such as generating explanatory text, or even abstracting away from this into increasingly conceptual domains.

Main tasks:
The tasks in the project are to develop a system that is able to some level be able to go through the following steps:
1) by using ML the system is to learn how to automatically detect and identify object relations in pictures (easier) and/or to identify changes in object relationships in videos (harder),
2) map these identified relationships to some form of "rigid," formal (aka not data-driven ML) representation to generalize away from the particular scenario and offer patterns that can be used to structure the conceptualisation of different situations and application areas,
3) finally, design an outlet for these generalized relationship patterns that can produce natural language expressions to describe the scene/event in the picture/video with the possibility to extend it into metaphoric and non-literal settings. 

Some future direction of this system is to increase the complexity and include an increased number of image-schematic relationships to, ultimately, be able to embed it into embodied intelligent systems such as demonstrated in cognitive robotics. 

Difficulty: Medium
Recommended prerequisites: ML-based Computer Vision; Formal modelling (semantic web technologies/knowledge representation/logic); basic knowledge of cognitive architecture.  


22. Automatic E2E Speech Tagger for T.V News

Summary of Proposal:

Tagging of text is generally done in a 2-step process, an ASR generates text for speech input and then a text tagger tags for entities. We projects explores 1-step non-autoregressive solution which can tag or transcribe each time-step in speech input stream. This system can tag both NL-events (for e.g: begin and end of person-name, credit card etc.) and speech-events (e.g: speaker change) while also generating transcripts of what’s is being spoken.

Requirements:
Experience with Python, PyTorch and deep interest towards speech and language processing.


Contact Karan Singla <ksingla025@gmail.com> for clarifications or details. 

Please attach your resume when you reach out.