— How to use the Edge2 search engine


Introduction

The Edge2 search engine is an experimental project designed to give you access to the metadata in the NewsScape dataset. While the Edge search engine gives you access to the transcripts (closed captioning, teletext, and speech-to-text), Edge2 will allow you to search for the different types of tags that have been added to the dataset, such as parts of speech in several languages, linguistic frames, gestures, named entities, and sentiment (see Current state of text tagging). 

Related scrolls

Interesting use cases

Please contribute interesting examples of your searches, to help provide feedback to our search engine developers.

How to search for gestures

The Edge2 search engine is designed as a query builder. This software architecture allows you to construct your search incrementally, selecting which dimensions of the dataset you want to query.

1. In the opening screen of https://tvnews.sscnet.ucla.edu/edge2/, click on "Advanced Search":



2. In the Advanced Search interface, click on "add" next to "word/phrase":



3. Under "tag(s)", select "GES_02"


4. Under "field(s)", select a field that you would like to search in


5. Under "word(s)/phrases(s): click on "add". Then, in the text box that appears, enter a word or a phrase to search for:


6. Enter the word or phrase to search for:


7. Click on "Search"


8. Display the results


How to search for frames

Frame information is generated with FrameNet and Semafor. There are three types of frame information that can be searched.
  1. Frame name (e.g. "calendric_unit", "leadership", "entity"). Add prefix "[[framename=" and suffix "]]" to your search term, e.g. "[[framename=leadership]]".
  2. Frame element name (e.g. "theme", "unit", "locale"). Add prefix "[[frameelement=" and suffix "]]" to your search term, e.g. "[[frameelement=theme]]".
  3. Sementic role (e.g. "I", "you", "people"). Add prefix "[[semanticrole=" and suffix "]]" to your search term, e.g. "[[semanticrole=people]]".
Note: The prefixes and suffixes are subject to change in the future.

Example: Searching for the frame name "Experiencer".

Advanced search page.


Search result without highlighting.


Search result with highlighting.


How to search for parts of speech

Part of speech information is generated by MBSP and Stanford Part of Speech TaggerThere are three types of frame information that can be searched.
  1. Part of speech (e.g. "nn", "dt", "in"). Add prefix "[[partofspeech=" and suffix "]]" to your search term, e.g. "[[framename=nn]]".
  2. Chunk (e.g. "o"). Add prefix "[[chunk=" and suffix "]]" to your search term, e.g. "[[chunk=o]]".
  3. Lemma (e.g. "be"). Add prefix "[[lemma=" and suffix "]]" to your search term, e.g. "[[lemma=be]]".
Note: The prefixes and suffixes are subject to change in the future.

Example: Searching for lemma "be". Note: The screenshot is outdated. Type "[[lemma=be]]" instead of "[[lemma be]]" in the workd(s)/phrase(s) input.

Advanced search page.


Search result without highlighting.


Search result with highlighting.


How to search for linguistic patterns

In the Advanced Search interface, click on "add" next to "regex". Change "regex mode" to "multi". Click on "add" next to "pattern(s)". Enter a pattern without double-quotes.

 PatternMeaning  ExampleExample explanation Notes
wordexact word or word with wildcardssurpriseMatches the word "surprise".Wildcard characters are supported: "?" stands for one character. "*" stands for zero or more characters. E.g. "an?" matches both "and" and "ant". "ask*" matches all of "ask", "asking", "asked", "askew", etc.
/regex/any words that match the given regex/.*ization/Matches words ending in "ization", such as "kardashianization", "putinization", "goldwaterization".Regex syntax. This can potentially be quite slow and return a lot of results, if the pattern is too broad.
word1 word2 word3words as a phrase in the given orderthe kardashianization ofMatches the phrase "the kardashianization of".Separate each word in the phrase by space. Can specify a regex (with slashes at both ends) instead of a word. E.g. "the /.*ization/ of".
word1 & word2overlap, or "is also a"[[lemma=surprise]] & [[partofspeech=vb]]Matches the word that has the lemma "surprise" and is also used as a verb (base form).See the list of field names for a list of lemmas, parts of speech, etc.
word1 | word2any of the wordsthis | thatMatches the word "this" or the word "that".Parentheses are optional (but recommended) unless needed to limit the "any" effect. E.g. "one two | three four". matches the phrase "one two" or "three four", while "one (two | three) four" matches the phrase "one two four" or "one three four".
word1 | word2 | ... | word Nany of the wordsthis | that | these | thoseMatches any of the four words.
(word)?optional wordthis is (so)? coolMatches the phrase "this is so cool" or "this is cool".Cannot be used by itself, i.e. "(so)?" is not a valid pattern, but "this is (so)? cool" is. There must be no space before the question mark. Otherwise, the pattern becomes one matching for a phrase ending with a question mark (which is treated as a separate token). Do not omit the parentheses or the pattern's meaning changes (see the "word" pattern).
^(word)excluded word ^(you know) whatMatches "guest what" but not "you know what".Parentheses are required after the caret.
*placeholder for one wordthe * of *Matches "the" followed by one word followed by "of" followed by one word. E.g. "the group of people".Cannot be used by itself, i.e. "*" is not a valid pattern, but "the * of *" is.
*?optional placeholder for one wordit is *? rainingMatches "it is not raining", "it is so raining", or "it is raining"Cannot be used by itself, i.e. "*?" is not a valid pattern by itself.


How to search for clauses where a field has anything (i.e. field exists)


1. In the Advanced Search interface, click on "add" next to "field exists".


2. Select the "NER_03" tag and the "NER_03 TIME" field. 


3. Click on "Search".


4. Display the results (with highlighting).


5. Display the results (with highlighting).


How to search for clauses in the same sentence (i.e. with the same start/end times)


Example: Look for "trump" in the transcript/caption that is on the same sentence as a time reference.

1. In the Advanced Search interface, click on "add" next to "same start/end time".


2. Click on "add" next to "word/phrase".


3. Select "TEXT" tag and "TEXT Text" field.


4. Enter the search word/phrase (e.g. "trump"). Then, click on "field exists" in the "Same Start/End Time" clause.


5. Select the "NER_03" tag and the "NER_03 TIME" field. 


6. Make sure the "NER_03" tag and the "NER_03 TIME" field are selected. Scroll down to the "Search" button at the bottom of the page.


7. Click on "Search".


8. Display the results.


9. Display the results (with highlighting).