Update the FrameNet tagger to Open Sesame

Red Hen annotates all of its English-language television caption data with conceptual frames from UC Berkeley's FrameNet project -- to our knowledge, by far the largest dataset thus annotated. To achieve this, we use the Semafor 3.0 frame-semantic parser from CMU, and the results are excellent. However, Semafor only works with FrameNet 1.5, and the project has now been superceded by Open Sesame, which also handles FrameNet 1.7. 

The task is to implement Open Sesame's annotation of Red Hen's English textual data with FrameNet 1.7. Would you like to work on this task?
If so, write to 
and we will connect you with a mentor.

How to start

A set of suggestions to find out more about Open-SESAME works and how we can integrate it into our existing data format to make the annotations searchable. Once we have gathered the required intelligence from the steps below, we will come up with a plan how to interface with Open-SESAME.
  1. Install Open-SESAME in a Singularity container. If you have a Linux machine with root access, you can play around locally, but in the end you should make available a working image through Singularity Hub. If you do not have a Linux machine with root access, you can use SingularityHub directly (but this will be a bit more tedious).
  2. Test it by running it with one sentence per line of normal text (i.e. no spaces between words and periods/full stops, commas, and so on).
    1. What does it do?
    2. What does the output look like?
    3. Are words and punctuation separated in the output? If so, we know that Open-SESAME performs tokenization. Try sentences with hyphenated words (ice-cream, co-pilot, ...). Are they split up or not?
    4. Can you identify the tagset used? It will very likely be the Penn Treebank Tagset, but we'd better verify.
    5. Are there syntactic annotations? (nsubj, dobj, ... or NP, PP, ...)
  3. Report back with your results so we can decide which route to take from here.

Expanding FrameNet

FrameNet has initiated a Multi-lingual FrameNet project, funded by the NSF. Anything that comes out of this project should also be used by Red Hen.
In late October 2018, Red Hen directors Turner and Steen met in Zoom with FrameNet principal Eve Sweetser at Berkeley, lead developer of Brazilian FrameNet Tioga in Rio, former ICSI PhD Nancy Chang at Google, and Anna Pleshakova at Oxford. Tiago was asked to prepare a brief description of the web-based infrastructure he is setting up to allow submissions of frames in multiple languages. Olga Lynshevskaya, who was part of Laura Janda's team at some point and speaks Norwegian, is involved in the Russian FrameBank project.