Edge2 Interesting Use Cases

Introduction

In developing the Edge2 search engine, we are very interested in receiving your use cases. Please contribute!

Search tools

Interesting use cases

1. Part of Speech combined with lemma

(Uhrig)
This query should find all instances where "surprise" (that includes the forms surprising, surprised, surprises) is used as a verb.

(Kai)
In Regex, set 
tag: "POS_01"
field: "POS_01 Text"
pattern: "[[lemma=surprise]] & ([[partofspeech=vb]] | [[partofspeech=vbd]] | [[partofspeech=vbg]] | [[partofspeech=vbn]] | [[partofspeech=vbp]] | [[partofspeech=vbz]])" (without quotes)
regex mode: "multi"

2. Wildcards on words/phrases

(Uhrig)
the .*ization of -> should find "the kardashianization of", "the putinization of", "the goldwaterization of"

(Kai)
In Regex, set 
tag: "POS_01"
field: "POS_01 Text"
pattern: "the *ization of" or "the /.*ization/ of" (without quotes)
regex mode: "multi"

3. Filter the results 

(Javier)
I've been looking for sentences ending in "what?" but I'd like to take out those that have "you know" before (you know what?).

(Kai)
In Regex, set 
tag: "POS_01"
field: "POS_01 Text"
pattern: "^(you know) what ?" (without quotes)
regex mode: "multi"

4. Repeated phrase

(Javier --this is not saying that Edge2 has to be able to do this, but it is a problem a user came up with.)

A really complicated question:

I've been interested in "echo questions"; two persons are speaking and then the second one repeats literally something that the first one said, normally to indicate the unexpected nature of the content of the utterance. Speaker A: . and then Mary left the room. Speaker B: Mary left the room?

These questions are really interesting from a multimodal point of view because the "surprise" or "unexpectedness" is usually reinforced with a facial expression (raising your eye-brows, for example), and also with prosodic cues, so they are ideal for their study in NewsScape. Nobody has done any work on these structures "in the real world", and even less so from a real multimodal point of view, something that it's only possible with NewsScape, of course.

Any idea of how we could localize these type of things? My first (probably too complex) idea was some sort of algorithm that would detect repetition. The structure of an echo question could help in many cases. We could for example, limit our search to some specific forms, such as the structure SUBJ + Verb + OBJ (then Peter entered the house (Peter entered the house?). We could also try other formats to limit the search (with intransitives, or double transitives or whatever).

One possibility (that could perhaps be easier) could be to detect clauses with a declarative form which nonetheless finish with a question mark (she cried? he went out of his way to help her? he told you you couldn't buy that?).

Any idea of how to do this?

5. Lemma +ING

(Uhrig)
Find any form of be followed by an optional adverb followed by a verb in the -ing form.
In CQPweb that reads {be/V} (_RB)? _VBG
Expected results:
was really doing
is singing
...

(Kai)
In Regex, set 
tag: "POS_01"
field: "POS_01 Text"
pattern: "[[lemma=be]] ([[partofspeech=rb]])? [[partofspeech=vbg]]" (without quotes)
regex mode: "multi"

The query matches "was really doing", "is singing", "is saying", "are also doing", etc.

6. Within commercials

(Steen)
Search for text inside commercials. We can use SEG_01|Text=Commercial to identify commercials; this provides a start time and end time. Can we use this information to search for text within commercials? 
This could be generalized to search for any topic, once we have topic labels.

7. Without commercials

(Groeling)
Search for text except commercials. Again, use SEG_01|Text=Commercials to identify commercials, but this time use the start and end times to exclude the commercial text from the search. This is useful for frequency counts of mentions to characterize the coverage of politicians, where political commercials should not be included.

Comments