Red Hen has vast audiovisual holdings. It would be useful to be able to tag with OpenPose. Would you like to work on this task?
If so, write to
and we will try to connect you with a mentor.
- Gesture detection 2017 (Sergiy Turchyn's project, with slides)
- Manual tagging (with proposed Red Hen gesture tagging scheme)
- Red Hen Rapid Annotator
- How to annotate with ELAN (simple instructions to get started)
- How to set up the iMotion annotator (draws rectangles on images to indicate event location)
- How to use the online tagging interface (integrated into Red Hen, but not frame accurate)
- How to use the Video Annotation Tool (online multi-dimensional video annotation interface for talks and demos)
- Integrating ELAN
- Machine Learning
- Video processing pipelines
- For a quick taxonomy of gesture, see http://www.janabressem.de/wp-content/uploads/2016/10/Bressem_notational-system-overview_final.pdf
- OpenPose on github by CMU Perceptual Computing
- Examples of OpenPose on videos: https://arvrjourney.com/human-pose-estimation-using-openpose-with-tensorflow-part-1-7dd4ca5c8027
- Hand Keypoint Detection using Deep Learning and OpenCV
Information Regarding Proposals
We expect your proposal to have a clear problem statement, Methodology and project plan.
In addition, we ask you to submit solution to two out of three tasks below in your preferred programming language together with your proposal:
Task 1: Given as input two matrices of sizes 512x512 representing grayscale images of an object in motion at moments T - 33.36 milliseconds and T, write a general purpose algorithm that can estimate what the image will look like at moment T+1ms, T+33.36 ms, T+1 s . Test and provide input-output example.
Task 2: Given a buffer containing x and y positions of a gamepad analog stick as floats in the (-1.0,1.0) range, extracted 60 times per second during the last 5 seconds, write a function that would best detect the combo "up left right down up down" in a game.
Task 3: Generate a list of 100'000 tuples (x,id), where x is a value between 0 and 1000 and id is a unique identifier. Return the id of the 500 elements with the smallest x. You may return more than 500 elements, if the 500th element has the same value as the 501st, 502nd, etc., however, you should not return fewer than 500 elements. Make sure you do this efficiently. Hint: this can be done in O(n)
- Create Singularity container for GPU version (CPU-Version is here but still in the wrong branch...)
- Evaluate speed and parallel performance of GPU version
- Is it possible to
- create stick videos from JSON?
- blended videos from JSON and original video?
- If not -> would it be theoretically possible, i.e. could we implement such a tool?
- Find out when and how the tracking part of OpenPose is going to be available.
- Maybe they want to co-operate with us on a project of gesture recognition?
- Is it possible to improve the detection for PUOH (Palm Up Open Hand)?
- Is it possible to run the detectors independently of one another? Face and Hands seem to depend on body, but can we run the body model now and then run face and hands later?
- Will higher resolution help? --> Test on recent recordings of Ellen De Generes Show in high and low resolution
- Should we use keypoint-scale?
- Should we enable --part_candidates?
- Writing search queries is difficult. Can one search by demonstration? Note that vitrivr, https://vitrivr.org/, takes visual sketches as prompts for search. Suppose a researcher at a computer with webcam demonstrates a gesture. No doubt OpenPose can analyze it. Could the result of that analysis then serve as the search prompt? Could one provide a clip of someone else performing a gesture to serve as a search prompt? In such cases, the researchers need not write search queries in order to search.