Would IBM’s Project Debater Win the AI Box Experiment?
This past week IBM’s Project Debater, an AI system, debated with Dan Zafrir and Noa Ovadia, humans, on the topics of government subsidy of space exploration, and telemedicine. Each participant provided an opening statement, a rebuttal, and concluding remarks. What was remarkable about Project Debater was that it was able to develop coherent arguments on topics and make jokes.
We've seen computers compete with humans in games like chess and Jeopardy!—but how does an AI system fare against a human in a debate?
Recently, IBM’s Project Debater, an AI system, debated with Dan Zafrir and Noa Ovadia, humans, on the topics of government subsidy of space exploration and telemedicine.
Each participant provided an opening statement, a rebuttal, and concluding remarks. What was remarkable about Project Debater was that it was able to develop coherent arguments on topics and make jokes.
Of course, if you have been following the progression of artificial intelligence, especially when pitted against humans, this seems like the natural next step. We have seen AI beat world chess champions with DeepBlue’s victory over in 1997, IBM Watson win Jeopardy! in 2011, and this week, perform well in a debate on relatively complex topics.
Project Debater consists of several elements that bring it all together as a talking, debating, entity:
Using machine learning, Project Debater finds relevant information to support or refute claims on documents, can negate claims either by coming up with a direct rebuttal or statistically determining one that could be used, synthesizing new claims based on the information found, and evaluating the quality of its argument.
Stance Classification and Sentiment Analysis
Project Debater can identify if an argument supports or is against a claim. It checks against expert opinion based on data collected from websites like Wikipedia, determines the stance of the argument by breaking down the claim into the cognitive steps it would take to evaluate, can handle mixed and reversed sentiment in sentences, and also understand what stance an idiom implies.
Deep Neural Net and Weak Supervision
The DNN analyzes and scores arguments, can produce natural sounding speech, introduce pauses in its speech and also use emphasis on words or phrases. Rhetoric is an important element of debate and this attempts to mimic it.
Algorithms for Natural Language Processing
Of course, it can understand the natural speaking patterns of humans debating it. It can also determine the relation between arguments in documents based on the language used,.
This element focuses on making Project Debater’s speech as natural as possible, and continually works on improving itself to sound more expressive.
After seeing all this, one might ask if Project Debater could win the AI-Box Experiment?
What Is the AI-Box Experiment?
The AI-Box Experiment is Eliezer Yudkowsky’s answer to the suggestion that humans could contain a super intelligent AI within a box until we were convinced it was safe enough to let it out. The experiment is meant to demonstrate that perhaps it could be possible for an AI, especially one exceeding human intelligence, to convince a human to let it out of the box it was being contained in through text dialogue alone.
Image courtesy of xkcd.
The experiment is typically carried out between two humans via an instant messaging interface, where one person acts as the AI, and the other as a gatekeeper. A set of rules define the parameters of what each player can or cannot do to win the game. Some of the rules include a time limit on how long the gatekeeper must talk to the AI, and limitations on what the AI can do to convince the gatekeeper to let it out (for example, it can not offer material items).
Part of Project Debater’s persuasiveness is in the speech patterns it uses to try and be convincing. Although, based on what we know is under the hood, perhaps it would be possible for it to produce text-based language that could also be persuasive and have strong rhetoric. A lot of communication today is done via text (SMS, email, instant messaging) so it could be possible that it’d be even easier today, than when the experiment was first proposed, for the AI to talk its way out of the box. Many people have become adapted to the nuances and sentiment behind text-based communication.
It’d be really interesting to see what kind of argument it could form on why it should be let out.
Putting Our Reasoning on Auto-Pilot
When visiting IBM’s website dedicated to Project Debater, a couple of interesting elements stick out. The company intends for the system to be used in helping humans analyze information, arguments, and assist in stronger reasoning. The tone of the website suggests this is being done for the betterment of society.
In a timeline, it shows AI progressing from being used as a tool to filter out spam, simple virtual assistants, answering open questions on Jeopardy!, to the free-form debate system we saw this week.
Will we someday have AI systems that navigate and reason through complex argument for us, the same way our phones guide us through cities? There are millennials out there who have become unable to use paper maps because using satellite navigation based maps has become so common and ubiquitous. Should we worry that maybe if we lean too heavily on AI for argument building, that we’ll lose our incentive to think critically?
Project Debater certainly has a lot of positive potential. IBM also mentions that it could expand our understanding, help fight “fake news”, and maybe strengthen critical thinking in people.
Either way, the demonstration has caught the attention of the world.
Feature image courtesy of IBM.