Red Hen Lab - GSoC 2024 Ideas

Watch this space. Red Hen Lab has applied to GSoC to be a managed org for 2024. Red Hen Lab will respond to messages beginning 2024-02-22.



Red Hen Google Summer of Code 2024

redhenlab@gmail.com

See Guidelines for Red Hen Developers and Guidelines for Red Hen Mentors

How to Apply

Red Hen will consider proposals for only the following specific projects. Send your pre-proposals for any of these projects to the mentor listed. Your pre-proposal should be substantial, including a summary of the proposal, a review of the background of research on which you will rely, your goals and objectives, the methods you will use to accomplish your goals, and a timeline for performance and completion. Red Hen assumes that all projects will last the standard 12 weeks, but feel free to ask the mentors about other arrangements.

Possible projects

Mentor: Mark Turner (turner@case.edu) and team.  Default but negotiable size: medium 175 hour project. Difficulty: MEDIUM-HARD. Coders would need to work inside the Case Western Reserve University High Performance Computing Center so as to have adequate hardware resources. Study https://sites.google.com/case.edu/techne-public-site/cwru-hpc-orientation .  Skills include working inside CWRU HPC (study the site for specifics), the ability to use standard Linux commands to interact with Red Hen Lab's vast data set, standard techniques of machine learning for fine-tuning an open-source foundation model (such as LaMDA, OpenAssistant, etc.). For a guide to such machine learning skills, ask Turner for a copy of Copilots for Linguists: AI, Constructions, and Frames (Cambridge University Press, 2024).

1.1. Red Hen Lab AI chatbot. Red Hen Lab has a voluminous website at http://redhenlab.org and another at https://sites.google.com/case.edu/techne-public-site/home. It also has many publications, listed on those sites and at http://markturner.org. Interested people constantly write email to redhenlab@gmail.com asking questions about Red Hen and asking for guidance to details. We do not have the time or resources to answer, for the most part. The project is to train, refine, and deploy a chatbot on all things Red Hen that could hold conversations with interested parties, explaining subjects, giving directions to resources, etc.  Of course, this chatbot must be open-source. The proposal for the project would need to do the work to locate the entire training set of such items, to devise a training method, to do the training, and also to design a way of presenting that chatbot to the world. We are not interested in proposals asking us how to do this. Do not submit a proposal if you are unable to do the work to design in detail the creation of such a chatbot. Red Hen's role would be to mentor the project at a high-level, and to have some discussions about the compute resources needed.

1.2. Construction Grammar and FrameNet AI chatbot.  Red Hen Lab has a strong interest in Construction Grammar (CxG) and FrameNet. You can learn about those areas of research by asking ChatGPT or Bard Gemini or just searching the internet, but you can also get started by asking the head mentor (turner@case.edu) to send you a copy of the new book from Cambridge University Press, described at http://copilotsforlinguists.org. We would like to create a sophisticated chatbot trained on research publications in Construction Grammar and FrameNet. Part of this training set would include the materials in FrameNet 1.7 (See https://framenet.icsi.berkeley.edu/).  FrameNet includes, for many frames, .xml code presenting the details of the frame.  For example, the .xml file for the Cause_motion frame runs 422 lines, beginning

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<?xml-stylesheet type="text/xsl" href="frame.xsl"?>

<frame cBy="ChW" cDate="02/07/2001 04:12:10 PST Wed" name="Cause_motion" ID="55" xsi:schemaLocation="../schema/frame.xsd" xmlns="http://framenet.icsi.berkeley.edu" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <definition>&lt;def-root&gt;An &lt;fen&gt;Agent&lt;/fen&gt; causes a &lt;fen&gt;Theme&lt;/fen&gt; to move from a &lt;fen&gt;Source&lt;/fen&gt;, along a &lt;fen&gt;Path&lt;/fen&gt;, to a &lt;fen&gt;Goal&lt;/fen&gt;.  Different members of the frame emphasize the trajectory to different degrees, and a given instance of the frame will usually leave some of the &lt;fen&gt;Source&lt;/fen&gt;, &lt;fen&gt;Path&lt;/fen&gt; and/or &lt;fen&gt;Goal&lt;/fen&gt; implicit. The completion of motion is not required (unlike the Placing frame, see below), although individual sentences annotated with this frame may emphasize the &lt;fen&gt;Goal&lt;/fen&gt;.  

&lt;ex&gt;&lt;/ex&gt;

This frame is very broad and contains several different kinds of words that refer to causing motion.  Some words in this frame do not emphasize the &lt;fen&gt;Manner&lt;/fen&gt;/&lt;fen&gt;Means&lt;/fen&gt; of causing the motion (transfer.v, move.v).  For many of the others (cast.v, throw.v, chuck.v, etc.), the &lt;fen&gt;Agent&lt;/fen&gt; has control of the &lt;fen&gt;Theme&lt;/fen&gt; only at the &lt;fen&gt;Source&lt;/fen&gt; of motion, and does not experience overall motion.  For others (e.g. drag.v, push.v, shove.v, etc.) the &lt;fen&gt;Agent&lt;/fen&gt; has control of the &lt;fen&gt;Theme&lt;/fen&gt; throughout the motion; for these words, the &lt;fen&gt;Theme&lt;/fen&gt; is resistant to motion due to some friction with the surface along which they move.  

&lt;ex&gt;&lt;fex name="Agent"&gt;She&lt;/fex&gt; &lt;t&gt;threw&lt;/t&gt; &lt;fex name="Theme"&gt;her shoes&lt;/fex&gt; &lt;fex name="Goal"&gt;into the dryer&lt;/fex&gt; .&lt;/ex&gt;

&lt;ex&gt;&lt;fex name="Agent"&gt;The mechanic&lt;/fex&gt; &lt;t&gt;dragged&lt;/t&gt; &lt;fex name="Theme"&gt;the jack&lt;/fex&gt; &lt;fex name="Source"&gt;out from under the car&lt;/fex&gt; .&lt;/ex&gt;

&lt;ex&gt;&lt;fex name="Agent"&gt;We&lt;/fex&gt; will &lt;t&gt;move&lt;/t&gt; &lt;fex name="Theme"&gt;the sofa&lt;/fex&gt; &lt;fex name="Source"&gt;out of the room&lt;/fex&gt; &lt;fex name="Path"&gt;through the french doors&lt;/fex&gt;, &lt;fex name="Path"&gt;down the stairs&lt;/fex&gt;, and &lt;fex name="Goal"&gt;onto the sidewalk&lt;/fex&gt; .&lt;/ex&gt;

&lt;ex&gt;&lt;/ex&gt;

&lt;ex&gt;&lt;/ex&gt;

Have a look at https://framenet.icsi.berkeley.edu/frameIndex to get oriented.

The project is to train, refine, and deploy a chatbot on all things having to do with Construction Grammar and FrameNet that could hold conversations with interested parties, explaining subjects, giving directions to resources, etc. Of course, this chatbot must be open-source. The proposal for the project would need to do the work to locate the entire training set of such items, to devise a training method, to do the training, and also to design a way of presenting that chatbot to the world. We are not interested in proposals asking us how to do this. Do not submit a proposal if you are unable to do the work in advance to design in detail the creation of such a chatbot. Red Hen's role would be to mentor the project at a high-level, and to have some discussions about the compute resources needed. 

2. Frame Blending by LLMs

Mentor: Wenyue Suzie Xi, wenyue.sxi@gmail.com, and team. Default but negotiable size: medium 175 hour project. Difficulty: MEDIUM-HARD. Coders would need to work remotely inside the Case Western Reserve University High Performance Computing Center so as to have adequate hardware resources. Study https://sites.google.com/case.edu/techne-public-site/cwru-hpc-orientation. Other skills: basic *nix abilities, Python.


This project intends to train and fine-tune open-source LLMs with FrameNet data to generate frame blending examples, with techniques such as prompt engineering, chain-of-thought, and causal inference. After inputting the FrameNet xml data of Cause_Motion, Judgment, and Communication, the following example is a possible frame blending case generated by ChatGPT: 


"The judge's ruling pushed the defendant towards a new trial."

"Her criticism drove the conversation into deeper introspection."

"The leader's decision propelled the company towards innovative strategies."

"His refusal nudged the team away from the conventional approach."

"The teacher's encouragement steered the student towards academic excellence."

"The critic's harsh words thrust the artist into the spotlight of controversy."

"The mentor's advice guided her thoughts towards a more positive outlook."

"The jury's verdict sent the community into a state of unrest."

"The coach's strategy shifted the team's focus towards defensive plays."

"The therapist's insights led the patient into a journey of self-discovery.”


The tasks include developing the project infrastructure, implementing and examining different methods of prompt engineering, defining the measuring metrics, and evaluating the performance of various methods/models with statistical results. This journey requires both the literature review for exploring methods and hands-on coding for implementing the methods, plus some statistical experiments to evaluate the effectiveness of the proposed methods. This project will be valuable for those who are interested in Large Language Models and Natural Language Processing with solid coding skills. The ideal proposal should demonstrate your understanding of the FrameNet dataset and multiple LLMs(their advantages and limitations), and it’s also helpful to read (and potentially implement some simple tasks) more about chain-of-thought, prompt engineering, and frame blending. This project is an open-ended exploratory process, and it’s exciting to push forward the study of frame blending in this LLMs era with collective effort.  


The following are some related references: 

FrameNet https://framenet.icsi.berkeley.edu/framenet_data 

ChatGPT https://openai.com/blog/chatgpt 

Llama2 https://huggingface.co/docs/transformers/main/model_doc/llama2 

Chain-of-thought  https://arxiv.org/abs/2201.11903 

Prompt-engineering https://github.com/thunlp/PromptPapers

3. Super Rapid Annotator - Multimodal vision tool to annotate videos

Mentor: Raúl Sánchez Sánchez, raul@um.es, and team. Default but negotiable size: medium 175 hour project. Difficulty: MEDIUM

Objective

Develop a system that utilizes a multimodal vision model to process videos and return json output with annotation output.

The system will have several parts

Example

The tool will work thus:

Input:

Annotate this image <video.mp4> with this schema:


{

“description”: “Is the person in the image standup?”,

“value”: “standup”

},

{

“description”: “Can you see the hands of the person?”,

“value”: “hands”

},

{

“description”: “Is it inside or outside?”,

“value”: “inside”

}

Output:

{

  "standup" : "true",

  "hands": "true",

  "inside": "false"

}

We are open to suggestions but the initial logical idea is to use a multimodal vision model and a json parser/generator or fine-tune a multimodal model to output json as response.

Multimodal models links ideas:

LLaVA

https://github.com/SkunkworksAI/BakLLaVA

https://github.com/PKU-YuanGroup/MoE-LLaVA

https://github.com/PKU-YuanGroup/Video-LLaVA

Video-Con

https://github.com/THUDM/CogVLM

JSON parser/output links ideas:

LangChain

https://github.com/eyurtsev/kor

https://github.com/1rgs/jsonformer

https://github.com/tanchongmin/strictjson

4. Red Hen TV News Multilingual Chat - LLM 

Mentor: Sridhar Vanga and team (sridharvanga2001@gmail.com),  saby.ghosal@gmail.com, karan@whissle.ai). 

Default but negotiable size: medium 175 hour project. 

Difficulty: MEDIUM-HARD (we will only consider exceptional proposals on this)

Description:  Red Hen boasts access to a large news archieve, processed with speeech and natural lanuage processing pipelines over previous google-summer-of-code and collaborative efforts. We propose to connect our rich large T.V News data to make a LLM that can answer questions about the world, also make the model accessible to a large open-source audience. This news conversational LLM can be the paired with other services to make automated bots.  

We will soon add some data samples, format etc. which will help with detailed executable proposal before coding perioid starts. 

Skills required:

Proven experience with fine-tuning open-source LLMs. Hands-on interview will be conducted to evaluate knowledge on depth and breadth of LLM fine-tuning

Passion to drive the project, make a proposal which is achievable over the summer, and meeting set milestones.

5. Visual aware E2E Speech Recognition

Mentor: Karan Singla (karan@whissle.ai),  Default but negotiable size: medium 175 hour project.

Difficulty: MEDIUM-HARD

Description:  We want to push a baseline on E2E Speech recognition by including visual information for improved  rich transcription, which incorporates visual information into generated output.

Skills required:

Familiarity with visual extraction tools and methods.

Experience and understanding of fine-tuning E2E ASR systems (for e.g: Conformer / Citrinet / Wav2Vec2 / EspNet models)

6. Modeling Wayfinding

Possible mentor: Mark Turner (turner@case.edu) and Francis Steen (profsteen@gmail.com),  

Default but negotiable size: medium 175 hour project.

Difficulty: MEDIUM-HARD

Description: Develop a mathematical and computational model of human decision-making using the Wayfinding theory, a process where individuals navigate through a complex space of possible actions. Your project should model how individuals make decisions when faced with limited time and cognitive resources, leading to choices that are formally sub-optimal yet resource-rational.  For example, to develop a formal and computational model that captures these dynamics, you may begin by formalizing a "choice" functional with sub-functionals representing priorities. Each priority sub-functional can then be weighted by an evolving activation function that activates a subset of priority sub-functionals at each timestep to simulate changing priorities.  The application domain can be various scenarios such as market behavior, communicative interactions, or animal foraging.  

Expected outcome: A working model in python or C++ 

Skills required: Some background in computational modeling and mathematics. Preferred platform Google Colaboratory


7. Computational Wave Function for Decision-Making

Possible mentors: Paavo Pylkkänen (paavo.pylkkanen@helsinki.fi), Francis Steen (profsteen@gmail.com), and colleagues

Default but negotiable size: medium 175 hour project.

Difficulty: MEDIUM-HARD

Description:  Decision-making involves the simultaneous consideration of multiple alternatives. Develop a computational model of elementary decision-making using Schrödinger's wave equation, leveraging the idea that quantum wave functions have multiple valid latent solutions and that some to-be-defined process leads to the collapse of the wave function into a single outcome. Model the simplest conditions required to map decision-making onto wave equations, so that they can function as computational engines. The application domain can be physical navigation of a simple organism towards a food source, a perceptual process, or an artificial life process. 

Expected outcome: A working computational toy model of quantum decision-making that can serve a platform for iterative improvements and elaborations

Skills required: A basic familiarity with Schrödinger's wave equation and preferably some experience with computational modeling of dynamic systems


8. Speech and Language Processing for a multimodal corpus of Farsi 

Mentor: Peter Uhrig (peter.uhrig@fau.de) and colleagues: default but negotiable size: medium 175 hour project.

Difficulty: MEDIUM

Red Hen would like to build a multimodal corpus of Farsi. In a first step, this is going to be based on media data captured  from public broadcasts.

The entire process will be based on Red Hen’s YouTube pipeline, i.e. data acquisition will be based on yt-dlp. For many videos that come without subtitles, we are going run Whisper. We then need to determine the most suitable NLP pipeline by researching questions such as “Which system works best for spoken Persian data?” “Do we need punctuation restoration for better results?” For videos from sites other than YouTube, we will need to adapt the metadata extraction. 

This project is to create the full pipeline, which takes as its input a list of video URLs and creates a working multimodal corpus in CQPweb. (If you are interested, write to Peter Uhrig on how to access an English multimodal corpus in CQPweb to play around.)

Required skills: Fluency both in Bash scripting and in Python. Familiarity with NLP tools and pipelines, ideally for Farsi. Ability to read Farsi is a strong plus. 

9. Detection of Intonational Units

Mentor: Peter Uhrig (peter.uhrig@fau.de);  default but negotiable size: medium 175 hour project.

Difficulty: MEDIUM-HARD

There are two potential projects in here, a medium sized project that works on the detection of intonational phrases based on the Santa Barbara Corpus (and possibly further annotations), and a large project that attempts to replicate the AuToBI system with modern machine learning applications (See Andrew Rosenberg’s PhD thesis for details).

Required skills: Strong machine learning skills and experience with audio processing. The methods used in AuToBi were state-of-the-art more than 15 years ago. With the advent of large pre-trained models, we expect to be able to improve on that baseline. You need a good understanding of annotation, the ability to work with obscure file formats and to extract relevant information from them, i.e. good data processing skills.

What kind of Red Hen are you?

More About Red Hen

Our mentors

Stephanie Wood
University of Oregon

Vaibhav Gupta 

IIIT Hyderabad

https://sites.google.com/site/inesolza/home

Inés Olza

University of Navarra 

https://sites.google.com/site/cristobalpagancanovas/

Cristóbal Pagán Cánovas.

University of Murcia

Anna Pleshakova

 Anna Wilson, Oxford

Heiko Schuldt

Heiko Schuldt,

University of Basel

Gulshan Kumar

IIIT Hyderabad

 

Karan Singla

Whissle-AI

https://www.anglistik.phil.fau.de/staff/uhrig/

Peter Uhrig. FAU Erlangen-Nürnberg

Grace Kim

UCLA  

Tiago Torrent

Federal University of Juiz de Fora

José Fonseca, Polytechnic 

Higher Education Institute of Guarda 

Ahmed Ismail

Ahmed Ismail

Cairo University & DataPlus

Leonardo Impett

EPFL & Bibliotheca Hertziana

Frankie Robertson, GSoC student 2020

Wenyue Xu

Smith College

GSoC student 2020



Maria M. Hedblom

www.mariamhedblom.com


Sumit Vohra

NSIT, Delhi University

Swadesh Jana

Oliver Czulo

Uni-Leipzig

Marcelo Viridiano

Federal University of Juiz de Fora

Ely Matos

Federal University of Juiz de Fora

Arthur Lorenzi

Federal University of Juiz de Fora

Fred Belcavello

Federal University of Juiz de Fora

Mark Williams

Dartmouth College

John Bell

Dartmouth College

Nitesh Mahawar


Raúl Sánchez

University of Murcia

Sabyaschi Ghosal

Bosch Global Software Technologies, Bengalaru