A chatbot pretending to be a 13-year-old Ukrainian boy made waves last weekend when its programmers announced that it had passed the Turing test.
But the judges of this test were apparently easily fooled, because any cursory exchange with ‘Eugene Goosterman’ reveals the machine inside the ghost. Maybe the time has come, 60 years after Alan Turing’s death, to discard the idea that imitating human conversation is a good test of artificial intelligence.
“I start my Cognitive Science class with a slide titled ‘Artificial Stupidity,’” said Noah Goodman, director of the computation and cognition lab at Stanford University. “People have made progress on the Turing test by making chatbots quirkier and stupider.” Non-sequitur, spelling errors, and humor all make a chatbot seem more human.
The history of the Loebner prize, an annual Turing test competition, confirms this trend. Last year’s contest was won by a bot named Mitsuku also pretending to be young ESL speaker, a silly Japanese girl.
Even Turing anticipated that evasion might be the most human answer to a hard question:
Q: PLEASE WRITE ME A SONNET ON THE SUBJECT OF THE FORTH BRIDGE.
A : COUNT ME OUT ON THIS ONE. I NEVER COULD WRITE POETRY.
If not the Turing test, is there an alternative measure of intelligence that would bring out the best in our machines? Experts have suggested an array of challenging tasks in the very human domains of language, perception, and interpretation. Perhaps a computer passing one of these tests would seem not just like a person, but like an intelligent person.
Let’s look first at language comprehension, as computers can easily interact with text. In the following sentence, the person referred to by “he” depends on the verb: “Paul tried to call George on the phone, but he was not [successful/available].”
You, human reader, know that if he is not successful then “he” is Paul, and if he is not available then “he” is George. To figure that out, you needed to know something about the meaning of the verb “to call.” Machine learning researcher Hector Levesque of the University of Toronto proposes that resolving such ambiguous sentences, called Winograd schema, is a behavior worthy of the name intelligence.
Because humans interact with the world through sight and sound, not strings of letters, a stronger test of human-like intelligence might include speech and image processing. Computer speech and text recognition has improved rapidly in the last twenty years, but are still far from perfect.
When asked a question about the Turing test, Apple’s Siri answered about a “touring” test. Bots struggle to decipher squiggly letters, which is why you have to fill out a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) when you sign up for things like Facebook.
Humans are also exceptionally good at recognizing faces. At the age of six months, a typical baby can pick out its mother’s face from a crowd. Computer vision researcher Avideh Zakhor at UC Berkeley says we should aim for computers to “be as good as the best human, or better than the best human” at recognizing objects and people.
We could further ask for the computer to interpret audio-visual phenomena and then reason about them. “An example of a task is a system providing a running commentary on a sporting match,” said Michael Jordan, a machine learning researcher at UC Berkeley.
“Even more difficult: The system doesn’t know about soccer, but I explain soccer to the system and then it provides a running commentary on the match.” That goal won’t be scored for a while.
A computer capable of achieving any of these tasks would certainly be impressive, but would we call it intelligent? Fifty years ago, we thought a computer that could beat a grandmaster at chess would necessarily be intelligent, but then Deep Blue passed that test, yet can’t even play checkers. Watson, the Jeopardy computer, knows more than Deep Blue about the human world of drinks and cities and movies, but it can only answer one kind of question about those things (in the form of a question of course).
As computers become more powerful and pervasive, our standards shift. Fifty years from now, a soccer-learning, header-calling, wise-cracking machine might seem more like a party trick than a thinking being.
“If you fix a landmark goal, you tend to end up with systems that are narrow and inflexible,” said UC Berkeley computer scientist Stuart Russell. “In developing general-purpose AI we look for breadth and depth of capabilities and flexibility in developing new capabilities automatically.” A different kind of mission might be preferable, one which can expand with our own abilities and desires, something in the spirit of Google’s quest to “organize the world’s information.”
After all, UPS already routes millions of packages a day, hospitals sequence patients’ DNA to find cancer-causing mutations, and Google can in a millisecond report the age at which children begin to recognize their mother.
These abilities are ”fricking fantastic, and way beyond the capability of a person,” said Goodman, the computation and cognitive science researcher at Stanford. “So in some sense the programs are super intelligent, super human, but because of our common-sense notion, we say that’s not intelligence, that’s something else.”
As functions proliferate, some may become united behind a more flexible user interface and be powered by a deeper corpus. There might be a machine that can teach you a dance that it learned by watching YouTube and diagnose a disease by smelling your breath. You could ask that machine to simulate human behaviors in order to pass the old Turing test, but that would be insulting to everyone’s intelligence.