ChatGPT Just Passed This Wharton Professor’s Final Exam. He Says He Won’t Ban The AI Tool
The Wharton Schools operations professor, Christian Terwiesch submitted his operations final into ChatGPT. To his surprise, it passed. Courtesy photo
The whole experiment started as a dinner table conversation. Christian Terwiesch’s older child was interested in emerging AI tools that recognize, process, and create images. The younger was interested in the code generation capabilities of Open AI’s remarkable new chatbot, ChatGPT.
“Dad, you teach at a university,” one of his kids suggested. “See how this thing would do on your exam.”
Terwiesch, the Andrew M. Heller professor of operations at University of Pennsylvania’s Wharton School, had no expectations at that point. His hopes were not high. The family sat at a computer and cut-and-pasted into the chatbot’s prompt, with no editing, the first question from Terwiesch’s final exam for his Wharton Operations Management course.
“I was just blown away. The answer was correct, and it was really well-worded and well-reasoned,” Terwiesch tells Poets&Quants. “I said, ‘Wow, this machine is awesome.’”
‘WOULD CHATGPT GET A WHARTON MBA?’
According to a white paper Terwiesch published on January 17, “Would ChatGPT Get a Wharton MBA?” , Terwiesch would have given the chatbot an A+ on that first question, which asked students to identify the bottleneck in a seven-part process in an iron-ore refinery.
ChatGPT, which stands for Chat Generative Pre-trained Transformer, earned another A+ on the second question on inventory turns and working capital requirements. Terwiesch thought he was sitting next to some kind of wonder machine.
Then it “came back down to Earth,” he says. On the third question, which was more complex, the computer chatbot couldn’t do the math.
“In a sense, that is an irony, because it’s good at reasoning, telling stories, humor – all really human things,” Terwiesch says. “The things that humans have always been bad at – like mathematics – we kind of thought a computer would be good at.”
In the end, the chatbot scored well on three of the five questions; Terwiesch says he would have given it a final grade of B or B-. Not too shabby for a fledgling bot taking its first final at one of the most prestigious business schools in the world.
TO BAN OR NOT TO BAN
Soon after its launch in November, ChatGBT went viral. It passed out relationship advice, composed original rap lyrics about Elon Musk (an Open AI cofounder), and demonstrated an ability to write, debug and explain computer code. It also triggered a debate about the possibilities behind its astonishing capabilities versus the ability for people to abuse them.
This month, the Stanford Daily reported that 17% of surveyed Stanford University students said they used the chatbot in their final exams and assignments. Of those, 7.3% reported that they’d submitted written material from ChatGPT with edits, and 5.5% reported submitting ChatGPT material with no edits whatsoever. New York City’s public schools blocked ChatGPT access on school computers and networks out of fear of cheating.
Terwiesch, who serves as chair of Wharton’s Operations, Information, and Decisions Department and co-director of Penn’s Mack Institute for Innovation Management, makes clear that plugging his final into the chatbot was not a careful research study. It was simply an experiment spurred by his dinner table discussion. But the results and the implications he wrote about afterwards spurred its own flurry of headlines, reigniting a debate about how business schools should use this wondrous new technology: Ban it or use it.
Terwiesch hopes the conversation will be a bit more nuanced.
These conversations, happening now at business schools around the world, echo those from nine years earlier. That’s when MOOCs (Massive Open Online Courses) first exploded onto the scene and, some believed, would sound the death knell for all but the top ranked MBA programs. Instead of shunning the new technology, Terwiesch was among the first business school faculty to make his MBA coursework available to the general public. Contrary to what some thought, video didn’t kill the classroom star.
Poets&Quants had the chance to talk with Terwiesch at length this week about his white paper and the implications of ChatGPT on business education. The conversation has been edited for length and clarity.
You explain in your white paper how ChatGPT scored A+ grades on the first two questions of the exam, but then did worse on the third. Can you explain where it tripped up?
It messed up in two ways: First, it really got lost in the question, but I give it great credit that it recovered when I gave it human hints. So, in addition to the questions with the command line, you can add comments that build upon and add to the initial question. It really took those into account and changed its quote unquote “thought process.”
It failed bitterly at the math. I’m not an AI expert, but it’s my understanding that this has to do with the nature of how the chatbot works. It’s basically a prediction machine that makes probabilistic statements based on the patterns of characters and words it has seen in the question. That works well with language. I think the earlier versions of the technology were trained by Shakespeare texts. If you’ve read all of Shakespeare’s work over and over again, you kind of get the rhythm of it and can extrapolate.
Wharton students gather inside Huntsman Hall. Courtesy photo
Math doesn’t work this way. It works for 2 + 2 = 4 because that statement is found often enough on the internet that it’s able to make a prediction. But as you get into bigger numbers – even for multiplication tables or for problems a fourth grader could do in five minutes with a pen and paper – there’s just not enough evidence available that the machine has seen on the internet. So, those simple things it really struggles on.
Is it conceivable that it will get better at those kinds of things over time?
I’m not a technology expert, but my gut feel is they will add another layer of software on top of that. For example, when the machine needs to generate its next word, it will use its current prediction method. When it needs to make a mathematical statement, it will reach out to a more traditional calculator and outsource that part.
When you get to five or six digit numbers, though, you can’t have it memorizing solutions. That’s like a kid memorizing the multiplication table as opposed to being able to do the multiplication cognitively.
What was your initial reaction after realizing ChatGPT scored a B to B- on your final exam? Was it interesting, alarming, what?
Initially, I was just like, “Wow. We’re living in exciting times.” You know, I took a class on neural networks when I went to college 30-plus years ago, and these things were so lame then. Seeing it now is just unbelievable. You cannot even imagine how the world will look in 10 years. So my first reaction was just being in awe.
I think there are two types of alarms here. One, is the type of alarm that this thing is going to pass my tests and many other tests — AP exams, college entrance exams, homework assignments, etc. – so, we have to rethink education a bit. Then there’s the other alarm, those raised by people like Sam Harris: Is AI taking over the world? Does this machine have consciousness? I think both of those are interesting.
On the educational front, I think every student in my class is welcome to use it for case discussion and preparing for class. If anything, I think it would add to the dynamics and the action of an MBA classroom where people get outside advice. I mean, it’s how the business world works. You have two people bargaining with each other: On one side is advice, maybe by McKinsey, and the other one is an investment banker from Goldman Sachs. Being able to critically look at what advisors tell you and realize when the advisor is wrong, and potentially reconciling conflicting opinions, I think that makes the MBA classroom a richer place that is closer to the real world.
I do think we have to revisit testing. But I think for classroom discussion, this is going to be great. I mean, again, we educate MBAs to be leaders in a moving, dynamic, complex world and with this technology, this world just moves a little faster, becomes a little more dynamic. If anything, the demand for MBA students has only gone up.
NEXT PAGE: How is Wharton responding to ChatGPT?
The Wharton School’s Huntsman Hall. The business school at University of Pennsylvania is putting decisions around ChatGPT in the hands of individual professors. Courtesy photo
In your white paper, you said that you would change your personal syllabus and wouldn’t let students use it for tests. Have there been more formal conversations amongst Wharton faculty about changing the honor code or formerly banning it for exams?
Wharton is a place – and I give it great credit for this – where we have long discussions about these things, and then it’s left at the discretion of the instructor. I think that makes perfect sense because whether this thing is going to be an enabler or a barrier to learning will really depend on the specific class and how it is taught.
We have to ask ourselves why do we test in the first place? I think there are three reasons: First one is skill certification: You pass the test to become a certified public accountant. You pass a test to drive a vehicle at the DMV. You pass a board certified radiology exam. For that, we have to absolutely make sure that it is you passing the test and not the bot.
The second reason we test is to customize the learning to where the students are in their learning journey. Just before introducing new material, I have to make sure the students are comfortable with the old material. For that I need some form of a test — it could be a cold call in the classroom, or a little homework assignment. Again, I think, everybody loses if we have the bot take that test for the students because then I’m teaching week-eight material for students that are still struggling with week-three materials.
The third reason we test is to have students engage with the material, and as part of that engagement, to learn. When we have students write up a case analysis, or when high school students write a history essay comparing political leaders in the 18th century, nobody really cares about the outcome of that paper. That five-page essay is not going to go into the notes of history. What we care about is the process, engaging with the material and becoming a more knowledgeable person. That is at risk with this bot. Somebody can write a five page essay, and all they did was spend one minute with the bot. They haven’t engaged with the material at all, and we’ve just wasted a learning opportunity.
So I think for these types of tests and assignments, we have to find new ways to engage the student. I think the technology, if we’re creative about it, is actually our friend. I could have an eighth grader interview George Washington, for example. A student could have a French pen pal. I could have a student even crawl in the cell of the human body and wander around.
If what we’re solving for is engagement, we shouldn’t be scared about the test question. When testing for certification and customizing learning, we may want to ban ChatGPT. But for solving for engagement, we should find other ways of engaging the students. And there I think the technology does miracles.
Christian Terwiesch, the Andrew M. Heller professor of operations at The Wharton School, teaching MBA students. He will encourage students to use ChatGPT to prepare for class and facilitate brainstorming and idea creation. Courtesy photo
So what will your Chat GPT policy be for your MBA classes?
For now, I just made it clear that on the exam and graded homeworks the bot is forbidden, because operations management is a little more bread and butter skill certification. It’s only a policy in the sense that students could, in theory, still use it on homework in the same way they could send an email to get answers from someone who has gotten an A+ in the course. The honor policy, in my view, always works.
Ethan Mollick (Wharton associate professor of management teaching innovation and entrepreneurship) has given this more thought and just published his AI policy, requiring students to use it in his classes. So, at Wharton we have agreed we would pass it down to the professors, and I think there’s a very healthy debate.
At the University of Pennsylvania, the School of Education has chimed in. In the medical school, we are aware of the fact that ChatGPT has passed all the three major medical licensing exams, so they are giving this some thought. This is clearly something that is intensely debated. But I think we’re not doing the problem justice if you’re looking for a single dimensional decision around how much should we regulate it or should we ban it. I think there’s just so much upside opportunity.
I think for very privileged business schools with resources, self motivated students, honor codes, all of these things we can now build on, is a good foundation to get to something better than a ban.
The Stanford Daily surveyed nearly 4,500 Stanford University students this month and found that 17% of students had used ChatGPT on their finals. Most used it for idea generation and brainstorming. But, nearly 13% of them either edited ChatBot’s answers or copy and pasted them directly. What is your reaction to that?
So, I’ve spent many, many years of my life researching ideation and brainstorming, and I hope that’s where the discussion is going. You ask the bot for, say, five first lines for an essay or five ideas for new business, and then the human being makes the selection decisions.
Any form of creativity always involves these two steps: Creation and selection. Selection is really hard. Even when you look at the best venture capital firms in the country, they all struggle with selection. Everybody has passed over amazing investment opportunities, and everybody has invested in crap.
What you have now is a device that can add additional options for you to consider. I always say in innovation that variance is your friend; You want to really push the ball towards making totally wacky suggestions. Even if those suggestions themselves are total nonsense, there was a spark in your brain.
So, what I saw in my experiment, the bot got three out of five questions on my final right. If I’m a clever student who wants an A or even a B+ grade, I can’t just copy and paste the whole homework in. If you are a doctor, three out of five is a really bad ratio. You have five patients, and two of them die? But in venture capital, three out of five is better than anybody.
We have to ask ourselves, what type of decisions am I making? Is there a validation after me? If I’m the last person in the decision line, I cannot rely on ChatGPT. There’s just too many unpredictable errors. But, early on in the decision line, and if I’m there to create crazy ideas that will get validated, improved and potentially recombined by human beings, I mean, I’ll take three out of five.
What kind of response did you get when you first released your experiment? I imagine it started a lot of conversations with colleagues, even outside of Wharton.
I sent out the paper to the operations management community first. I think, without any exceptions, it was positive feedback. I heard from three other colleagues who did similar things after they saw my white paper, and they all ended up in this B to B- range.
Some folks are pushing hard on making sure we have a cheating detection device, and I think that’s the wrong way for business schools to work. My colleague Karl Ulrich and I did a story with Poets&Quants many years ago about whether MOOCs would put business schools out of business. I think that was one of the few cases where our prediction actually was reasonably good. Initially it sounded provocative, but for the top business schools, I think MOOCs have been a blessing. And I think the same is going to happen with this technology here. If you are in the top business schools, it is an additional tool to create learning experiences. At the end of the day, that’s our job.
Have you thought yet about how you will use it in your classroom?
I’m totally open to having students use it in case discussions. McKinsey in the box, if you will. I teach an innovation class, and we’re going to run a brainstorming innovation tournament around generating ideas on taking advantage of the technology. In this tournament and others, I invite students to collaborate with ChatGPT to see if their ideation, concept generation, and brainstorming get better if supported by AI.
There is a really cool research study I hope to do this spring comparing multiple brainstorming groups. One of them will brainstorm as usual, the other gets a chatbot as a partner. We can just see which is better.
You also note in the paper that you’ve written more than 1,000 exam questions in your career, and this will be a tool to help you do that more efficiently. And, time faculty saves in such endeavors can be used to better serve students. Can you explain what you mean?
I for sure will use ChatGPT for question generation. As I mentioned in the paper, when it comes out of the press, you have to clean up the edges. It’s not good to go. I don’t think anybody pities me for that, but coming up with these exam questions is just hard. There are only so many DMVs and barber shops that you can write about. So I look forward to that part.
In the long run, as we all adapt and find the best usage, I think our productivity will go up.
For the sake of argument, let’s say our productivity will double. Whenever you have productivity go up in a process, you have two options: You could either cut the amount of teaching staff in half, or we could double student learning. I just very much hope, even the poorer parts of our educational system, can go down the path where we double the learning. That means additional courses, maybe more students, but I think an important part in a business school is just out-of-the classroom experiences.
All the top business schools have done a lot with global immersive courses, programs with faculty and students to interact outside the classroom, independent study projects, and that kind of thing. There are a lot of cool things you can do to turn faculty time into additional student learning. We should just really make sure that we continue to do that. And again, I’m totally aware of the fact that this is easier for the rich business schools than it is for community colleges.
Your paper listed several implications business schools should consider based on the results of the experiment. What do you think is the most important?
I think all of us need to find a way of rethinking work in a world where computers are so smart. There’s a story that we’re running out of work, and I think it’s nonsense. I mean, look at our schools, at our health care workers, at our environment; Do you really think we’re gonna run out of work? There’s just so much that needs to be done, and we have to find a way of solving these problems. Ask an eighth grade teacher whether they feel like they need more work.
We have an abundance of work. We have to find ways to improve productivity, and then lift everybody up to be healthier, better educated, and a happier person. I mean, if we have a machine that makes us 50% more productive, as in our previous example, let’s use it.
Read Terwiesch’s full white paper here.
DON’T MISS: THE P&Q INTERVIEW: NYU STERN’S JOHANNES STROEBEL, WINNER OF THE FISCHER BLACK PRIZE FOR FINANCIAL RESEARCH WITH IMPACT and SLOAN FELLOWS: AN ELITE MID-CAREER DEGREE AT 3 WORLD CLASS B-SCHOOLS
The post ChatGPT Just Passed This Wharton Professor’s Final Exam. He Says He Won’t Ban The AI Tool appeared first on Poets&Quants.