Red Hen Lab - GSoC 2024 Ideas

10 years in a row! Google has announced Red Hen's participation in Google Summer of Code 2024 (GSoC2024). Study our Ideas Page. Final contributor proposals must be submitted to Google, not Red Hen Lab. The proposal deadline is 2024-04-02. Potential contributors are invited to submit a full but preliminary pre-proposal directly to the lead mentor for the project. If the designated mentors find the pre-proposal promising, they may respond with some brief, high-level mentoring on how to develop a final proposal. Cluck, cluck! Here we go!

Pro tip: Make sure you have studied this Ideas Page and the sites listed below before writing to the mentors.

Timeline: https://developers.google.com/open-source/gsoc/timeline All final proposals must be submitted directly to Google, not to Red Hen.
2024 Program: https://summerofcode.withgoogle.com/programs/2024.
Guide for Potential Contributors: https://google.github.io/gsocguides/student/
Mentor Guide: https://google.github.io/gsocguides/mentor/
FAQs: https://developers.google.com/open-source/gsoc/faq
Google Resources Page: https://developers.google.com/open-source/gsoc/resources/
Welcome to Potential Contributors
The Barnyard of Possible Specific Projects
Τέχνη Public Site: Red Hen Lab's public tutorial site
Nearly all projects are run inside the Case Western Reserve University High Performance Cluster. Study its operations at https://sites.google.com/case.edu/techne-public-site/cwru-hpc-orientation

Red Hen Google Summer of Code 2024

redhenlab@gmail.com

See Guidelines for Red Hen Developers and Guidelines for Red Hen Mentors

How to Apply

Send your pre-proposals for any of these projects to the mentor listed. Your pre-proposal should be substantial, including a summary of the proposal, a review of the background of research on which you will rely, your goals and objectives, the methods you will use to accomplish your goals, and a timeline for performance and completion. Red Hen assumes that all projects will last the standard 12 weeks, but feel free to ask the mentors about other arrangements.

Possible projects

Chatty AI

Mentor: Mark Turner (turner@case.edu) and team. Default but negotiable size: medium 175 hour project. Difficulty: MEDIUM-HARD. Coders would need to work inside the Case Western Reserve University High Performance Computing Center so as to have adequate hardware resources. Study https://sites.google.com/case.edu/techne-public-site/cwru-hpc-orientation . Skills include working inside CWRU HPC (study the site for specifics), the ability to use standard Linux commands to interact with Red Hen Lab's vast data set, standard techniques of machine learning for fine-tuning an open-source foundation model (such as LaMDA, OpenAssistant, etc.). For a guide to such machine learning skills, ask Turner for a copy of Copilots for Linguists: AI, Constructions, and Frames (Cambridge University Press, 2024).

1.1. Red Hen Lab AI chatbot. Red Hen Lab has a voluminous website at http://redhenlab.org and another at https://sites.google.com/case.edu/techne-public-site/home. It also has many publications, listed on those sites and at http://markturner.org. Interested people constantly write email to redhenlab@gmail.com asking questions about Red Hen and asking for guidance to details. We do not have the time or resources to answer, for the most part. The project is to train, refine, and deploy a chatbot on all things Red Hen that could hold conversations with interested parties, explaining subjects, giving directions to resources, etc. Of course, this chatbot must be open-source. The proposal for the project would need to do the work to locate the entire training set of such items, to devise a training method, to do the training, and also to design a way of presenting that chatbot to the world. We are not interested in proposals asking us how to do this. Do not submit a proposal if you are unable to do the work to design in detail the creation of such a chatbot. Red Hen's role would be to mentor the project at a high-level, and to have some discussions about the compute resources needed.

1.2. Construction Grammar and FrameNet AI chatbot. Red Hen Lab has a strong interest in Construction Grammar (CxG) and FrameNet. You can learn about those areas of research by asking ChatGPT or Bard Gemini or just searching the internet, but you can also get started by asking the head mentor (turner@case.edu) to send you a copy of the new book from Cambridge University Press, described at http://copilotsforlinguists.org. We would like to create a sophisticated chatbot trained on research publications in Construction Grammar and FrameNet. Part of this training set would include the materials in FrameNet 1.7 (See https://framenet.icsi.berkeley.edu/). FrameNet includes, for many frames, .xml code presenting the details of the frame. For example, the .xml file for the Cause_motion frame runs 422 lines, beginning

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<?xml-stylesheet type="text/xsl" href="frame.xsl"?>

<definition><def-root>An <fen>Agent</fen> causes a <fen>Theme</fen> to move from a <fen>Source</fen>, along a <fen>Path</fen>, to a <fen>Goal</fen>. Different members of the frame emphasize the trajectory to different degrees, and a given instance of the frame will usually leave some of the <fen>Source</fen>, <fen>Path</fen> and/or <fen>Goal</fen> implicit. The completion of motion is not required (unlike the Placing frame, see below), although individual sentences annotated with this frame may emphasize the <fen>Goal</fen>.

This frame is very broad and contains several different kinds of words that refer to causing motion. Some words in this frame do not emphasize the <fen>Manner</fen>/<fen>Means</fen> of causing the motion (transfer.v, move.v). For many of the others (cast.v, throw.v, chuck.v, etc.), the <fen>Agent</fen> has control of the <fen>Theme</fen> only at the <fen>Source</fen> of motion, and does not experience overall motion. For others (e.g. drag.v, push.v, shove.v, etc.) the <fen>Agent</fen> has control of the <fen>Theme</fen> throughout the motion; for these words, the <fen>Theme</fen> is resistant to motion due to some friction with the surface along which they move.

<ex><fex name="Agent">She</fex> <t>threw</t> <fex name="Theme">her shoes</fex> <fex name="Goal">into the dryer</fex> .</ex>

<ex><fex name="Agent">The mechanic</fex> <t>dragged</t> <fex name="Theme">the jack</fex> <fex name="Source">out from under the car</fex> .</ex>

<ex><fex name="Agent">We</fex> will <t>move</t> <fex name="Theme">the sofa</fex> <fex name="Source">out of the room</fex> <fex name="Path">through the french doors</fex>, <fex name="Path">down the stairs</fex>, and <fex name="Goal">onto the sidewalk</fex> .</ex>

Have a look at https://framenet.icsi.berkeley.edu/frameIndex to get oriented.

The project is to train, refine, and deploy a chatbot on all things having to do with Construction Grammar and FrameNet that could hold conversations with interested parties, explaining subjects, giving directions to resources, etc. Of course, this chatbot must be open-source. The proposal for the project would need to do the work to locate the entire training set of such items, to devise a training method, to do the training, and also to design a way of presenting that chatbot to the world. We are not interested in proposals asking us how to do this. Do not submit a proposal if you are unable to do the work in advance to design in detail the creation of such a chatbot. Red Hen's role would be to mentor the project at a high-level, and to have some discussions about the compute resources needed.

2. Frame Blending by LLMs

Mentor: Wenyue Suzie Xi, wenyue.sxi@gmail.com, and team. Default but negotiable size: medium 175 hour project. Difficulty: MEDIUM-HARD. Coders would need to work remotely inside the Case Western Reserve University High Performance Computing Center so as to have adequate hardware resources. Study https://sites.google.com/case.edu/techne-public-site/cwru-hpc-orientation. Other skills: basic *nix abilities, Python.

This project intends to train and fine-tune open-source LLMs with FrameNet data to generate frame blending examples, with techniques such as prompt engineering, chain-of-thought, and causal inference. After inputting the FrameNet xml data of Cause_Motion, Judgment, and Communication, the following example is a possible frame blending case generated by ChatGPT:

"The judge's ruling pushed the defendant towards a new trial."

"Her criticism drove the conversation into deeper introspection."

"The leader's decision propelled the company towards innovative strategies."

"His refusal nudged the team away from the conventional approach."

"The teacher's encouragement steered the student towards academic excellence."

"The critic's harsh words thrust the artist into the spotlight of controversy."

"The mentor's advice guided her thoughts towards a more positive outlook."

"The jury's verdict sent the community into a state of unrest."

"The coach's strategy shifted the team's focus towards defensive plays."

"The therapist's insights led the patient into a journey of self-discovery.”

The tasks include developing the project infrastructure, implementing and examining different methods of prompt engineering, defining the measuring metrics, and evaluating the performance of various methods/models with statistical results. This journey requires both the literature review for exploring methods and hands-on coding for implementing the methods, plus some statistical experiments to evaluate the effectiveness of the proposed methods. This project will be valuable for those who are interested in Large Language Models and Natural Language Processing with solid coding skills. The ideal proposal should demonstrate your understanding of the FrameNet dataset and multiple LLMs(their advantages and limitations), and it’s also helpful to read (and potentially implement some simple tasks) more about chain-of-thought, prompt engineering, and frame blending. This project is an open-ended exploratory process, and it’s exciting to push forward the study of frame blending in this LLMs era with collective effort.

The following are some related references:

FrameNet https://framenet.icsi.berkeley.edu/framenet_data

ChatGPT https://openai.com/blog/chatgpt

Llama2 https://huggingface.co/docs/transformers/main/model_doc/llama2

Chain-of-thought https://arxiv.org/abs/2201.11903

Prompt-engineering https://github.com/thunlp/PromptPapers

3. Super Rapid Annotator - Multimodal vision tool to annotate videos

Mentor: Raúl Sánchez Sánchez, raul@um.es, and team. Default but negotiable size: medium 175 hour project. Difficulty: MEDIUM

Objective

Develop a system that utilizes a multimodal vision model to process videos and return json output with annotation output.

The system will have several parts

A vision tool that get an image or video and a json structure as input and return a json structure with annotated fields.
The model should be compatible huggingface ecosystem libraries.
A gradio app where:
- A .zip file is uploaded or a url is given to download it
- Return a csv file with all the annotated info compatible with Red Hen Rapid Annotator

Example

The tool will work thus:

Input:

Annotate this image <video.mp4> with this schema:

{

“description”: “Is the person in the image standup?”,

“value”: “standup”

{

“description”: “Can you see the hands of the person?”,

“value”: “hands”

{

“description”: “Is it inside or outside?”,

“value”: “inside”

}

Output:

{

"standup" : "true",

"hands": "true",

"inside": "false"

}

We are open to suggestions but the initial logical idea is to use a multimodal vision model and a json parser/generator or fine-tune a multimodal model to output json as response.

Multimodal models links ideas:

LLaVA

https://github.com/SkunkworksAI/BakLLaVA

https://github.com/PKU-YuanGroup/MoE-LLaVA

https://github.com/PKU-YuanGroup/Video-LLaVA

Video-Con

https://github.com/THUDM/CogVLM

JSON parser/output links ideas:

LangChain

https://github.com/eyurtsev/kor

https://github.com/1rgs/jsonformer

https://github.com/tanchongmin/strictjson

4. Red Hen TV News Multilingual Chat - LLM

Mentor: Sridhar Vanga and team (sridharvanga2001@gmail.com), saby.ghosal@gmail.com, ksingla@whissle.ai).

Default but negotiable size: medium 175 hour project.

Difficulty: MEDIUM-HARD (we will only consider exceptional proposals on this)

Description: Red Hen boasts access to a large news archieve, processed with speeech and natural lanuage processing pipelines over previous google-summer-of-code and collaborative efforts. We propose to connect our rich large T.V News data to make a LLM that can answer questions about the world, also make the model accessible to a large open-source audience. This news conversational LLM can be the paired with other services to make automated bots.

We will soon add some data samples, format etc. which will help with detailed executable proposal before coding perioid starts.

Skills required:

Proven experience with fine-tuning open-source LLMs. Hands-on interview will be conducted to evaluate knowledge on depth and breadth of LLM fine-tuning

Passion to drive the project, make a proposal which is achievable over the summer, and meeting set milestones.

5. Visual aware E2E Speech Recognition

Mentor: Karan Singla (ksingla@whissle.ai), Default but negotiable size: medium 175 hour project.

Difficulty: MEDIUM-HARD

Description: We want to push a baseline on E2E Speech recognition by including visual information for improved rich transcription, which incorporates visual information into generated output.

Skills required:

Familiarity with visual extraction tools and methods.

Experience and understanding of fine-tuning E2E ASR systems (for e.g: Conformer / Citrinet / Wav2Vec2 / EspNet models)

6. Modeling Wayfinding

Possible mentor: Mark Turner (turner@case.edu) and Francis Steen (profsteen@gmail.com),

Default but negotiable size: medium 175 hour project.

Difficulty: MEDIUM-HARD

Description: Develop a mathematical and computational model of human decision-making using the Wayfinding theory, a process where individuals navigate through a complex space of possible actions. Your project should model how individuals make decisions when faced with limited time and cognitive resources, leading to choices that are formally sub-optimal yet resource-rational. For example, to develop a formal and computational model that captures these dynamics, you may begin by formalizing a "choice" functional with sub-functionals representing priorities. Each priority sub-functional can then be weighted by an evolving activation function that activates a subset of priority sub-functionals at each timestep to simulate changing priorities.

The task is designed to develop a general model of decision-making as an alternative to game-theoretic models. For background see

McCubbins, M. D., & Turner, M. (2020). Collective action in the wild. The Extended Theory of Cognitive Creativity: Interdisciplinary Approaches to Performativity, 89-102

McCubbins, C. H., McCubbins, M. D., & Turner, M. B. (2018). Building a new rationality from the new cognitive neuroscience. Handbook on Bounded Rationality, Routledge Publishing House (Forthcoming), Duke Law School Public Law & Legal Theory Series, (2018-52).

The application domain can be various scenarios such as market behavior, communicative interactions, or animal foraging. For instance, you could model the movements and foraging behavior of a unicellular organism such as a paramecium. The organism is in an environment where food is unevenly distributed and the organism must expend energy to move. Let's grant it some simple sensory capability such as smell, some ability to learn and remember, some ability to discriminate between candidate nutrients and selectively ingest, and some ability to monitor its own energy reserves. Each decision incurs a cost -- the cost of sensing, of comparing with a past sensory datum, of generating a strategy, and of carrying out that strategy. In any given situation, the paramecium needs to decide how much time and energy it allocates to sensing its environment to identify potential food sources, generating a dimensional map of options, assessing the different possible ways forward, moving and feeding, and during this process tracking its energy levels. Your model should visualize how the decisions of the organism vary depending on the cost of the various tasks, such as sensing, assessing likely costs relative to benefits, moving, and feeding.

I'm just spelling out one example of a domain you could use to explore the dynamics of wayfinding. A critical feature of wayfinding is that information-processing carries a metabolic cost and needs to generate a payoff. You want to build the model so that it can be elaborated -- for instance, the goal dimension could be expanded to introduce reproductive opportunities; the strategy to sense and move could be expanded with an energy-conserving state if a minimal threshold is unmet; the sensing could be expanded to include the threat of predation, and so forth. The model should allow us to explore the tradeoffs between cognition and action. In the paramecium example, we would for instance be interested in exploring the minimum information-harvesting and processing requirements for survival and to visualize the consequences of varying the cost of the various cognitive and motoric processes.

I have spelled out one example of a conceptual model but you are invited to develop others. We encourage you to develop a simple model system that brings out decision-making dynamics in situations where an agent's priorities varies between different goals, where there are multiple possible solutions that differ in their cost and payoffs, and where these costs and payoffs are associated with various degrees of uncertainty. A key part of the wayfinding model is that both information processing and action incur a cost, creating a dynamic tradeoff between sensing and assessing on the one hand and acting on the other. The measure of cost should include some proxy of energy and time, but could also have additional dimensions.

Expected outcome: A working model in python or C++

Skills required: Some background in computational modeling and mathematics. Preferred platform Google Colaboratory.

7. Quantum Wave Function for Information-Processing

Possible mentors: Paavo Pylkkänen (paavo.pylkkanen@helsinki.fi), Francis Steen (profsteen@gmail.com), and colleagues

Default but negotiable size: medium 175 hour project.

Difficulty: MEDIUM-HARD

Description: Neural networks are descendants of McCulloch & Pitts' threshold-based mathematical model of a binary neuron, but there is ample evidence that unicellular organisms are capable of relatively complex maze navigation and other cognitive tasks, indicating information-processing capabilities in cellular subsystems. For example, Picard & Shirihai (2022) argue that "mitochondria are the processor of the cell, and together with the nucleus and other organelles they constitute the mitochondrial information processing system." Light-harvesting complexes in chloroplasts have been shown to rely on quantum tunneling; cells may also have recruited quantum physics for other functions, including information harvesting. According to Bennett & Onyango (2021), "Mitochondria can be viewed as transitional organelles that bridge the quantum world of very small wave-particle behavior and the classical world of decoherent larger, more macroscopic structures such as cells."

The task is to develop a computational model of elementary processing of analog information using Schrödinger's wave equation, leveraging the fact that a quantum wave function has multiple valid solutions and only one of them manifests. Heisenberg characterized the elementary particles of quantum physics as "not as real" as things or facts; instead "they form a world of potentialities or possibilities", suggesting a potential use in information processing.

According to the de Broglie-Bohm Pilot Wave theory, the particle is associated with a quantum wave that interacts with its environment; for instance, in the double-slit experiment, it passes through both slits and interferes with itself. The particle responds with its own energy to the form of this wave, similarly to how a model airplane responds to a guiding radio wave.

In Bohm & Hiley's elaboration of Pilot Wave theory, the Quantum Hamilton-Jacobi Equation can be decomposed into a classical component and a quantum component, the quantum potential. Most of the energy is in the classical component, but a small part of it is in the quantum potential. As the energy in the quantum potential is informed by and responds to the shape of the quantum wave, the process creates new information, expressed in the trajectory of the particle. The diagram shows the Bohmian particle trajectories.

Simple unicellular organisms may have recruited this information-generating process for survival purposes. For instance, it could be used to categorize an analog sensory signal and serve to trigger an appropriate motor response. The task is to develop a mathematical and computational model of this process.

We suggest two different approaches and are open to additional approaches. First of all, we suggest modeling a simple discrimination task using a quantum wave function, based on the double-slit experiment. In this experiment, a succession of individual particles create an interferece pattern on the photographic plate. This interference pattern contains information about the shape of the two slits. Consider a biological system where the organism's sensory system responds to an external object by encoding it through a process of transduction, generating an electrochemical signal that carries an analog representation of the external event along a signaling pathway (such as the optic nerve) into protected organelle we can call the Probium. Inside the Probium, this transduced signal forms an irregular analog shape.

The task of the quantum wave function is to probe this irregular analog shape and create new information that is useful for the organism. Imagine for instance that the object detected is a candidate food particle, but may also be inedible. The information harvesting task is to determine the appropriate category membership of this irregular analog shape to guide the organism to ingest the object or not.

Inside the Probium, a particle is emitted to pass through some part of the irregular analog shape. The particle passes through the irregular analog shape, but the particle's associated quantum wave wraps around and through the particle in three dimensions, interfering with itself and creating a complex quantum wave form. This wave form now carries a quantum-transduced representation of the sensed object. The shape of this object has now been transported into the quantum domain, inside a walled organelle capable of maintaining quantum coherence for long enough to make it useful for the organism.

The emitted particle responds to this quantum wave form by generating new information. This information is active in the sense that it guides the subsequent movement of the particle along a specific Bohmian trajectory. The organism uses an ensemble of such trajectories, forming an interference pattern fingerprint, as a decision system to determine whether to ingest the external object or not. This entire process constitutes an act of information harvesting.

The first approach to task 7 is thus to develop the mathematical formalisms to model this process and the visual simulation of it. The simulation should show that a transduction of a sensed object in the form of an irregular analog shape can be probed by a wave function and result in an ensemble of particle trajectories that the organism treats as a decision about the category membership of the sensed object.

You may find the following article helpful -- Philippidis, C., Bohm, D., & Kaye, R. D. (1982). Aharonov-Bohm effect and the quantum potential. Nuovo Cimento B;(Italy), 71(1), available on request.

The second approach to task 7 is described in Engel et al. (2007). Evidence for wavelike energy transfer through quantum coherence in photosynthetic systems. Nature, 446(7137), 782-786), available on request. They write,

"[S]uperposition states formed during a fast excitation event allow the excitation to reversibly sample relaxation rates from all component exciton states, thereby efficiently directing the energy transfer to find the most effective sink for the excitation energy (which, in the isolated FMO complex, is the lowest energy state). When viewed in this way, the system is essentially performing a single quantum computation, sensing many states simultaneously and selecting the correct answer, as indicated by the efficiency of the energy transfer."

In this account, the excitation event (in effect, the particle) is given agency: the superposition states enable it to "reversibly sample" multiple possible trajectories and realize the optimal one every time. This presents a highly efficient paradigm for quantum computation; however, it makes claims that are incompatible with the widely accepted Copenhagen Interpretation.

Red Hen is open to either or both of these approaches as paradigms of biological information processing using the quantum wave function.

Expected outcome: A working computational toy model of quantum information processing that can serve a platform for iterative improvements and elaborations

Skills required: A basic familiarity with Schrödinger's wave equation and preferably some experience with computational modeling of dynamic systems

8. Speech and Language Processing for a multimodal corpus of Farsi

Mentor: Peter Uhrig (peter.uhrig@fau.de) and colleagues: default but negotiable size: medium 175 hour project.

Difficulty: MEDIUM

Red Hen would like to build a multimodal corpus of Farsi. In a first step, this is going to be based on media data captured from public broadcasts.

The entire process will be based on Red Hen’s YouTube pipeline, i.e. data acquisition will be based on yt-dlp. For many videos that come without subtitles, we are going run Whisper. We then need to determine the most suitable NLP pipeline by researching questions such as “Which system works best for spoken Persian data?” “Do we need punctuation restoration for better results?” For videos from sites other than YouTube, we will need to adapt the metadata extraction.

This project is to create the full pipeline, which takes as its input a list of video URLs and creates a working multimodal corpus in CQPweb. (If you are interested, write to Peter Uhrig on how to access an English multimodal corpus in CQPweb to play around.)

Required skills: Fluency both in Bash scripting and in Python. Familiarity with NLP tools and pipelines, ideally for Farsi. Ability to read Farsi is a strong plus.

9. Detection of Intonational Units

Mentor: Peter Uhrig (peter.uhrig@fau.de); default but negotiable size: medium 175 hour project.

Difficulty: MEDIUM-HARD

There are two potential projects in here, a medium sized project that works on the detection of intonational phrases based on the Santa Barbara Corpus (and possibly further annotations), and a large project that attempts to replicate the AuToBI system with modern machine learning applications (See Andrew Rosenberg’s PhD thesis for details).

Required skills: Strong machine learning skills and experience with audio processing. The methods used in AuToBi were state-of-the-art more than 15 years ago. With the advent of large pre-trained models, we expect to be able to improve on that baseline. You need a good understanding of annotation, the ability to work with obscure file formats and to extract relevant information from them, i.e. good data processing skills.