De cerca, nadie es normal

Winograd Schema Challenge: A Step beyond the Turing Test

Posted: March 17th, 2016 | Author: | Filed under: Artificial Intelligence | Tags: , , , , , , , | Comments Off on Winograd Schema Challenge: A Step beyond the Turing Test

The well-known Turing test was first proposed by Alan Turing (1950) as a practical way to defuse what seemed to him to be a pointless argument about whether or not machines could think. He put forward that, instead of formulating such a vague question, we should ask whether a machine would be capable of producing behavior that we would say required thought in people. The sort of behavior he had in mind was participating in a natural conversation in English over a teletype in what he called the Imitation Game. The idea, roughly, was the following: if an interrogator was unable to tell after a long, free flowing and unrestricted conversation with a machine whether s/he was dealing with a person or a machine, then we should be prepared to say that the machine was thinking. The Turing test does have some troubling aspects though.

Firstly, the central role of deception. Consider the case of an intelligent machine trying to pass the test. It must converse with an interrogator and not just show its stuff, but fool her/him into thinking s/he is dealing with a person. To imitate a person well without being evasive, the machine will need to assume a false identity to be able to answer questions such as: “How tall are you?” or “Tell me about your parents.”

Secondly, we might also question whether a conversation in English is the right sort of test. Conversations are so adaptable and can be so wide-ranging, they facilitate trickery. The deception works at least in part because we are extremely forgiving in terms of what we will accept as legitimate conversation. We may consider as such everything except clear and direct answers to questions; i.e., elaborate wordplay, puns, jokes, quotations, clever asides, emotional outbursts, points of order… Therefore, a free-form conversation as advocated by Turing may not be the best vehicle for a formal intelligence test. In fact the nature of the Turing test came under serious scrutiny, especially since an AI chat box named Eugene was claimed to pass it in 2014. The chat bot was not intelligent at all—it was just really good at making you overlooked the times when it was stupid, while emphasizing the periodic interactions when its algorithm knew how to answer the questions that you asked it.

In order to counteract these deception and trickery, the Winograd Schema (WS) challenge was developed. This is a test of machine intelligence proposed by Hector Levesque, a computer scientist at the University of Toronto, in 2011. The WS Challenge is a small reading comprehension test involving single binary questions -the so-called Winograd Schemas, named after Terry Winograd, a professor of computer science at Stanford University- in which the complexity lies in the anaphor problem. Unlike the Turing test, the machine is not required to engage in a conversation and fool an interrogator into believing s/he is dealing with a person. Two examples will illustrate:

The trophy would not fit in the brown suitcase because it was too big.

What was too big?

Answer 0: the trophy

Answer 1: the suitcase

——————-

Joan made sure to thank Susan for all the help she had given.

Who had given the help?

Answer 0: Joan

Answer 1: Susan

In each of the questions we have the following four features:

    1. Two parties are mentioned in a sentence by noun phrases. They can be two males, two females, two inanimate objects or two groups of people or objects.
    2. A pronoun or possessive adjective is used in the sentence in reference to one of the parties, but it is also of the right sort for the second party. In the case of males, it is “he/him/his”; for females, it is “she/her/her” for inanimate object it is “it/it/its,” and for groups it is “they/them/their.”
    3. The question involves determining the referent of the pronoun or possessive adjective. Answer 0 is always the first party mentioned in the sentence, and Answer 1 is the second party.
    4. There is a word -called the special word– that appears in the sentence and possibly the question. When it is replaced by another word -called the alternate word, everything still makes perfect sense, but the answer changes. 

Regarding how it works the fourth feature, consider the first example, the special word is “big” and its alternate is “small;” and in the second example, the special word is “given” and its alternate is “received.” These alternate words only show up in alternate versions of the two questions:

The trophy would not fit in the brown suitcase because it was too small.

What was too small?

Answer 0: the trophy

Answer 1: the suitcase

——————-

Joan made sure to thank Susan for all the help she had received.

Who had received the help?

Answer 0: Joan

Answer 1: Susan

With this fourth feature, we can see clever tricks involving word order or other features of words or groups of words will not work. The claim is that doing better than guessing requires machines to figure out what is going on.

The need for thinking is perhaps even more evident in a much more difficult example, a variant of which was first presented by Terry Winograd (1972):

The town councillors refused to give the angry demonstrators a permit because they feared violence.

Who feared violence?

Answer 0: the town councillors

Answer 1: the angry demonstrators

Here the special word is “feared” and its alternate is “advocated” as in the following:

The town councillors refused to give the angry demonstrators a permit because they advocated violence.

Who advocated violence?

Answer 0: the town councillors

Answer 1: the angry demonstrators

You need to have background knowledge that is not expressed in the words of the sentence to be able to sort out what is going on and decide that it is one group that might be fearful and the other group that might be violent. And it is precisely bringing this background knowledge to bear that we informally call thinking. Therefore, the WS challenge does not allow a machine to hide behind a smokescreen of verbal tricks, playfulness, or canned responses, as it might happen with the Turing Test.

One last conclusive thought: we should avoid being deeply convinced by what appears to be the most promising approach of the day. We see this in the fashions of AI research over the years: first, automated theorem proving was going to solve it all; then, the methods appeared too weak, and we favored expert systems; then the programs were not situated enough, and we moved to behavior-based robotics; then we came to believe that learning from big data was the answer; and on it goes.

It would be much better to admit that, for instance regarding natural language, other AI approaches, less in vogue and implying more hard working, will be needed for dealing with it. This will help AI progress in a steadier and more solid fashion.

Will a computer ever hold a free natural conversation with a human being without cheap tricks? As always in life, it will depend only and just only on us: on how much perseverance, inventiveness, and wishes of hard working we will bring to the task. By the end of the day, language mastering is not an easy question: we, human beings, have been coping with it for around the last 50,000 years.

A long and exciting challenge lies ahead of us.


Comments are closed.