— Open Data Sets

Introduction

Research in general, and machine learning in particular, depend on big data.  Red Hen Lab seeks to create Open Data Sets and to list open data sets here that might be useful for research in multimodal communication. For Red Hen, at least, "Open" does not necessarily mean "public." It may be that there are data sets available to only certain researchers, such as Red Hens, and only under certain research licenses.

Related pages


Open Data Sets

  • ViMELF - The Corpus of Video-Mediated English as a Lingua Franca Conversations, Version 1.0.
    • Dataset: ViMELF contains 20 fully transcribed Skype conversations with gestures and pragmatic elements between 40 speakers from Germany (20 speakers), Spain (5), Italy (5), Finland (5), and Bulgaria (5), totaling 744.5 minutes (ca. 12.5 hours), with an average conversation length of 37.23 minutes. The corpus comprises 113 670 words in the plain text version and 152 472 items in the annotated version. The transcripts are available as .docx and .txt files; the anonymized videos in MPEG4 format. Several versions are available: the fully annotated pragmatic version as text and XML (XTranscript, Gee 2018), a lexical version (XTranscript, Gee 2018), and a POS-tagged version (auto-tagged with the CLAWS C7 tagset).
    • Website and further info: http://umwelt-campus.de/case
    • Access: ViMELF transcripts are freely available for non-commercial research purposes. If you would like to use the dataset, please register via the project website – you will then receive download instructions. The video and audio data is available separately for viewing/listening via a dedicated university server.
    • Project coordination: Stefan Diemer & Marie-Louise BrunnerLanguage & Communication, Trier University of Applied Sciences, Germany
    • Citation: To cite ViMELF in your own research, please use the following citation:
      ViMELF. 2018. Corpus of Video-Mediated English as a Lingua Franca Conversations. Birkenfeld: Trier University of Applied Sciences. Version 1.0. The CASE project [http://umwelt-campus.de/case].
    • Contact: sk@umwelt-campus.de
  • Red Hen Interview Gesture Collection (RHIGC)
    • Dataset: The RHIGC is based on 20 interviews from the Ellen De Generes Show which were hand-annotated for gesture by Suwei Wu and Yao Tong at VU Amsterdam for their PhD projects under the supervision of Prof. Alan Cienki. It will contain video snippets of hand gestures (and possibly of similar shots without hand gestures). An alternative version with pre-annotated data generated with OpenPose may also be made available.
    • Project coordination: Yao Tong (VU Amsterdam) & Peter Uhrig (Universität Osnabrück/FAU Erlangen-Nürnberg).

Some Open Data Sets for Gesture Recognition

Hat tip to Søren Gran for this section!

List of Gesture Datasets

Note: {} mean that the websites within the {} were located on the website in front of the {. So I found the uow.edu website on the cvpapers.com website, and I found the uowmailedu-my.sharepoint.com website on the uow.edu website, and so on.

2D or 3D

For these lists, the number refers to the numbers I assigned to the datasets following these lists.

2D

None

3D

1. “MSRGesture3D”

2. “MSRDaily Activity3D”

3. “Kinect Gesture Data Set”

4. “Two-Handed Datasets” (I think)

5. “CVVR-HANDS 3D”

6. “Praxis Gesture Dataset”

7. “ChairGest”

8. “CHALEARN Multi-modal Gesture Challenge”

9. “Sheffield Kinect Gesture dataset”

 

Datasets

 

Description for next two datasets: https://www.uow.edu.au/~wanqing/#Datasets {

 

1. “MSRGesture3D”

“The dataset was captured by using a Kinect device. There are 16 activities: drink, eat, read book, call cellphone, write on a paper, use laptop, use vacuum cleaner, cheer up, sit still, toss paper, play game, lie down on sofa, walk, play guitar, stand up, sit down. There are 10 subjects. Each subject performs each activity twice, once in standing position, and once in sitting position. There is a sofa in the scene. Three channels are recorded: depth maps (.bin), skeleton joint positions (.txt), and RGB video (.avi). There are 16*10*2=320 files for each channel. In total, there are 320*3=960 files. Note that the RGB channel anddepth channel are recorded independently, so they are not strictly synchronized.

The format of the skeleton file is as follows. The first integer is the number of frames. The second integer is the number of joints which is always 20. For each frame, the first integer is the number of rows. This integer is 40 when there is exactly one skeleton being detected in this frame. It is zero when no skeleton is detected. It is 80 when two skeletons are detected (in that case which is rare, we simply use the first skeleton in our experiments). For most of the frames, the number of rows is 40. Each joint corresponds to two rows. The first row is its real world coordinates (x,y,z) and the second row is its screen coordinates plus depth (u, v, depth) where u and v are normalized to be within [0,1]. For each row, the integer at the end is supposed to be the confidence value, but it is not useful.”

2. “MSRDaily Activity3D”

“20 action types, 10 subjects, each subject performs each action 2 or 3 times. There are 567 depth map sequences in total. The resolution is 640x240. The data was recorded with a depth sensor similar to the Kinect device. The dataset is described in the following paper.

Action Recognition Based on A Bag of 3D Points, Wanqing Li, Zhengyou Zhang, Zicheng Liu, IEEE International Workshop on CVPR for Human Communicative Behavior Analysis (in conjunction with CVPR2010), San Francisco, CA, June, 2010.”

}

3. “Kinect Gesture Data Set”

“The Microsoft Research Cambridge-12 Kinect gesture data set consists of sequences of human movements, represented as body-part locations, and the associated gesture to be recognized by the system. The data set includes 594 sequences and 719,359 frames—approximately six hours and 40 minutes—collected from 30 people performing 12 gestures. In total, there are 6,244 gesture instances. The motion files contain tracks of 20 joints estimated using the Kinect Pose Estimation pipeline. The body poses are captured at a sample rate of 30Hz with an accuracy of about two centimeters in joint positions”

4. “Two-Handed Datasets”

“This database consists of 7 different two-handed gestures (rotations in all the 6 directions and a "push" gesture). 7 persons have performed these gestures during 2 sessions with 5 records per gesture. 4 persons have been used for training and the 3 others for testing.”

5. “CVVR-HANDS 3D”

“The CVRR-HANDS 3D dataset was designed in order to study natural human activity under difficult settings of cluttered background, volatile illumination, and frequent occlusion. The dataset was captured using a Kinect under real-world driving settings. The approach is motivated by studying actions-as well as semantic elements in the scene and the driver's interaction with them-which may be used to infer driver inattentiveness. For more information see related publications below. The dataset contains three subsets: Hand localization, hand and objects localization, and 19 hand gestures for occupant-vehicle interaction.”

}

Next datasets taken from this site: http://riemenschneider.hayko.at/vision/dataset/index.php {

6. “Praxis Gesture Dataset”

“PRAXIS GESTURE DATASET is a new challenging RGB-D upper-body gesture dataset recorded by Kinect v2. The dataset is unique in the sense that it addresses the Praxis test, however, it can be utilized to evaluate any other gesture recognition method. The collected dataset consists of selected gestures for Praxis test. There are two types of gestures in the dataset: dynamic (14 gestures) and static (15 gestures) gestures.”

7. “ChairGest”

“ChairGest is an open challenge / benchmark. The task consists in spotting and recognizing gestures from multiple synchronized sensors: 1 Kinect and 4 Xsens Inertial Motion Units (IMU). The complete corpus contains 1200 gesture occurrences.”

8. “CHALEARN Multi-modal Gesture Challenge”

“The CHALEARN Multi-modal Gesture Challenge is a dataset +700 sequences for gesture recognition using images, kinect depth, segmentation and skeleton data.”

9. “Sheffield Kinect Gesture dataset”

“The Sheffield Kinect Gesture (SKIG) dataset contains 2160 hand gesture sequences (1080 RGB sequences and 1080 depth sequences) collected from 6 subjects. All these sequences are synchronously captured with a Kinect sensor (including a RGB camera and a depth camera).

This dataset collects 10 categories of hand gestures in total: circle (clockwise), triangle (anti-clockwise), up-down, right-left, wave, "Z", cross, comehere, turn-around, and pat. In the collection process, all these ten categories are performed with three hand postures: fist, index and flat. To increase the diversity, we recorded the sequences under 3 different backgrounds (i.e., wooden board, white plain paper and paper with characters) and 2 illumination conditions (i.e., strong light and poor light).”

}

 

Top of Form

Top of Form

Bottom of Form

Bottom of Form

Bottom of Form