38 Comments

I tried to answer your questions based on my experience and knowledge (as a retired civil engineer), and I'm pretty sure the AI did better than you would have graded me. On the other hand, I am sure that if I had sat through your classes, hearing how you expressed these concepts, and getting feedback to my questions, I would have done much better.

Given that ChatGPT uses available sources - which include a hodge-podge of divergent opinions - it is not surprising that it failed to respond to your questions as you outlined. But if it had access to transcripts of all your classes, and knew to give priority to your input over what is generally available, I suspect that it would have returned something much closer to what you expected.

One of the chief drawbacks to ChatGPT, at least as I understand it, is that it simply looks at all the information - both correct and incorrect - and tries to provide an answer that weights all opinions. It does NOT yet have the ability to evaluate logically ideas against data and to put together a thesis that is based on facts but that runs contrary to widely established opinions.

Expand full comment

I think what's most impressive about ChatGPT is not its current capabilities, but its momentum. Just a few years ago the idea of an AI taking an IQ-test or an SAT was almost laughable. AI experts predicted that this level of capabilities wouldn't be achieved until 2030s and general public considered even those predictions too optimistic.

Just a couple of years ago GPT-3 was mostly being compared to 7 year old kids. Today you are comparing ChatGPT to a college student.

Expand full comment

Thank you for trying this -- that is a very useful contribution!

It does seem like there is a bigger-picture point though, in that this is software available to the general public interpreting a free-format natural language economics exam and writing essay-style answers that are mostly coherent -- an earthshaking development compared to the state of the art just three years ago. It seems a bit like critiquing the ballet-dancing bear's pointe technique and docking 2 points for performance while grudging acknowledging that the choreography and presentation are passable, while others observers are going "Holy hot sauce, that bear is doing ballet!"

Expand full comment

Since different ChatGPT prompts result in different answers, don't you need to tell us your exact inputs that produced these answers? Further, isn't it possible that some prompts could result in significantly better performance, such as telling it to respond like an economist or economic student who is taking a test? Given what I have seen elsewhere with attempts to improve outputs, it's highly likely there is more optimization that you could do to improve the test score

Expand full comment

I don't know if this is idiosyncratic to me or not, but I find the way Brian writes questions confusing. Consider the below snippet:

"T, F, and Explain: Krugman argues that such employment loss is a market failure that justifies government regulation."

I take it from context that Krugman *does* in fact argue this and the question is not "can you recapitulate the content of Krugman's argument?" but rather "is the content of this argument true?". I think a capable student will get there, but given that testing is generally pretty stressful anyway, if I were a student, I would be a LOT happier if the question was: "Krugman argues {x}. Is {x} actually true?".

Expand full comment

Years ago I watched the Jeopardy! with the IBM AI Watson. It "won" in that it regurgitated answers faster than the human contestants and dominated the board. In Final Jeopardy is answered to the category "U.S. Cities" was "Toronto". It was so far ahead that it didn't matter, but it exhibited a habit you occasionally see with AI. Occasional gross and obvious errors no human would make.

The big thing they wanted it to do was medical industry, but it didn't work. You can't make errors like that in medicine.

At the same time, AI does appear to be good at producing "mediocre work for very cheap." I think someone in the translation industry noted that not great translation for 90% less of the price is usually "good enough" for most customers. If what you want isn't sensitive to these kind of big dumbfounding errors from time to time, it might matter.

Early guns weren't as good as bows, but they were a cheaper weapons system.

Basically, AI can replace mediocore and fairly unimportant work, of which we still have a lot of.

Expand full comment

A more important question may be: if ChatGPT had transcripts of your class lectures, and were told to refer to those in answering these questions, what grade would it get then? Based on my use of AI, I suspect it would do very well if given the same materials students are given.

I also suspect that most college students who had *not* taken your class, but were reliant on ChatGPT's database to answer these questions, would also score very poorly.

The reason that's important is because then it's just a matter of feeding the right info to ChatGPT -- its ability to use it well is already mostly there.

Expand full comment

From a Turing Test perspective, these are good answers--much, much higher quality than you would get asking a random college graduate who had not taken your class. Additionally, simply feeding the questions on the test to ChatGPT is not a fair comparison to your students. Your students probably had a lot of additional context as to what level of detail to go into when answering questions and what sorts of things it is important to mention in answers in order to get a good grade. I expect that if you provided that sort of context in the prompt (and especially if you used a few different prompts and selected the best answer provided) that the bot would get a substantially better grade.

Expand full comment

Another piece of evidence.

I gave ChatGTP the final exam of my strategic management course and and asked a colleague (who teaches the same course and gave his students the same exam) to grade it without mentioning that they were ChatGTP answers.

The outcome: ChatGPT performed comfortably above average, both in the multiple choice questions and in the open ended ones.

Also, there was a lot of variability. Some answers were excellent while others were considerably subpar.

Expand full comment

As people here comment, it is actually amazing. I have already used chat-gpt a lot and still had no idea it could even grasp these complicated economy questions.

Expand full comment

I’m pretty much amazed that the answers were as good as they were. Getting a “D” on this test for a class it didn’t take with nothing to go on but the questions? Crazy.

Expand full comment

I was surprised that you gave it any points at all for question 2 (the kind of BS paraphrase of the question that you can usually do in a subject you know nothing about), and also that you didn't give more points for question 4 (nothing in the T/F statement itself suggested to me that you'd expect me to restate the transparent meaning of the final part of the Landsburg quote).

Expand full comment

What this perhaps shows is that ChatGPT has been trained with material from textbooks and other sources that do not reflect GMU's economics department curriculum. Had it been trained with, say, transcripts of Caplan's lectures, it most likely would have achieved a higher score.

Expand full comment

Sure, but four years ago the best AI probably would have gotten a 0. And I'd be willing to bet even money that within 5 years, the best publicly available AI can get a B or better on tests of this sort (I'll let you adjudicate). Interested?

Expand full comment

It might have gotten a D but when you combine with enough context your student might have, it will definitely do better than a D.

Expand full comment

AIs are only as good as the training material. Train it on A material, you will get A answers. Train it on Wikipedia, you will get garbage on anything that is slightly or more political.

Expand full comment