Empirical Study: Finding Examples of a Theme, by Example

A common task in literature study is to find examples of a theme. Until now, literary scholars searching for examples have had to rely on searching for sets of words they think are associated with the theme.

Theme-finding by searching for words poses a problem. Synonymy and the infinite variance of language mean that the same theme might surface in many different forms using many different words.  Even for scholars with intimate knowledge of the text, a single set of words is not enough. Depending on their mental context, the  words that come to mind might not always be complete and representative.

For example, take the Shakespearean theme of “seeing is believing” — that seeing an event with one’s own eyes is more credible than hearing about it second-hand. A scholar might search for the words “believe”, “speak”, “eyes”, and “see”. That search might be able to capture this example (from The Winter’s Tale 5.2):

Then have you lost a sight, which was to be seen, can not be spoken of.

but not this one (from King Lear 4.6):

I would not take this from report; it is, And my heart breaks at it.

As a solution, we  at WordSeer propose search-by-example. This technology dates back to the 80′s in the field of information retrieval, and so far, it’s been successful in helping find relevant documents. We think it could work for theme-finding too.

With search-by-example, instead of inferring which words represent a theme, and then searching for those words, a scholar can search for sentences that match a set of examples. A scholar marks a set of examples of a theme, and the system returns a list of sentences it thinks are relevant.

This process is a cycle. When the system returns results, the scholar gives it feedback by labeling sentences “relevant” if they match the theme, and “not-relevant” if they don’t. The system gradually builds a model of what the scholar is interested in, and eventually returns results that are mostly relevant.

For example, in under five minutes, I was able to use the examples above to come up with seven more candidates:

Gracious my lord, I should report that which I say I saw, But know not how to do’t. (Macbeth 5.5)

Most noble sir, That which I shall report will bear no credit, Were not the proof so nigh. (Winter’s Tale 5.1)

I would not hear your enemy say so, Nor shall you do mine ear that violence, To
make it truster of your own report Against yourself: I know you are no truant. (Hamlet 1.2)

If in Naples I should report this now, would they believe me? (The Tempest 3.3)

They call him Doricles; and boasts himself To have a worthy feeding: but I have it Upon his own report and I believe it; He looks like sooth. (Winter’s tale 4.4)

It is not so; thou hast misspoke, misheard; Be well advised, tell o’er thy tale again: It can not be thou dost but say’ tis so: I trust I may not trust thee; for thy word Is but the vain breath of a common man: Believe me, I do not believe thee, man; I have a king’s oath to the contrary. (King John 3.1)

I do beseech you, either not believe The envious slanders of her false accusers; Or, if she be accused on true report, Bear with her weakness, which, I think, proceeds From wayward sickness, and no grounded malice. (Richard III 1.3)

Of course, this is all theory until it’s been proven to work. And while I’m not a Shakespeare scholar, I did build this particular system, so it might not be surprising that I can get a few results out of it.

So to find out whether search-by-example works, we’ve designed a five-minute study around three Shakespearean themes. There are three systems: one search, and two different example-based ones. Participants are shown an example of a theme, and asked to use a system to find as many relevant results as they can in five minutes. The systems and theme are randomly assigned.

We’ll find our answer by comparing the quality and quantity of the sentences the participants find on the three systems. Expert scholars will help us judge quality: they will rate the relevance of sentences the different systems produce (without knowing which system produced which sentence). For quantity, there is a time limit — which system produces more high-quality  results in five minutes?

So, does example-based exploration work better than search for theme finding?

If you have five minutes, you can help us find out by participating in the study: