ScienceSoft

Why Today’s Chatbots Are Weird, Argumentative, and Wrong

Long before most people began playing around with generative AI models like ChatGPT and DALL-E, Janelle Shane started documenting AI oddities. An optics researcher by training, she’s also held a long fascination in testing AIs’ ability to be, well, normal. With more and more people testing AI limits than ever before, Shane took a minute to answer five relatively normal questions from IEEE Spectrum about why chatbots love to talk back and why image-recognition models are head over heels for giraffes.

Janelle Shane

Janelle Shane’s AI humor blog, AI Weirdness, and her book, You Look Like a Thing and I Love You: How AI Works, and Why It’s Making the World a Weirder Place use cartoons and humorous pop-culture experiments to look inside the artificial intelligence algorithms that run our world.

How has AIs’ weirdness changed in the past year?

Janelle Shane: They’ve gotten less weird, more coherent. Instead of being absurd and half-incomprehensible, they’ve become way more fluent and more subtly wrong in ways that are harder to detect. But—they’re a lot more accessible now. People have the chance to experiment with them themselves. So from that standpoint, the weirdness of these models is a lot more evident.

You’ve written that it’s outrageous that chatbots like Google’s Bard and Bing Chat are seen as an alternative to search engines. What’s the problem?

Shane: The problem is how incorrect—and in many cases very subtly incorrect—these answers are, and you may not be able to tell at first, if it’s outside your area of expertise. The problem is the answers do look vaguely correct. But [the chatbots] are making up papers, they’re making up citations or getting facts and dates wrong, but presenting it the same way they present actual search results. I think people can get a false sense of confidence on what is really just probability-based text.

You’ve noted as well that chatbots are often confidently incorrect, and even double down when challenged. What do you think is causing that?

Shane: They’re trained on books and Internet dialogues and Web pages in which humans are generally very confident about their answers. Especially in the earliest releases of these chatbots, before the engineers did some tweaking, you would get chatbots that acted like they were in an Internet argument and doubling down sounding like they’re getting very hyped up and emotional about how correct they are. I think that came straight from imitating humans in Internet arguments during training.

What inspired you to ask ChatGPT to draw things or create ASCII art?

Shane: I wanted to find ways in which it could be obvious at a glance that these models are making mistakes, and also what kinds of mistakes they’re making. To understand how wrong they are about quantum physics, you have to know quantum physics well enough to know it’s making things up. But if you see it generate a blob, claim it’s a unicorn, and describe how skillfully it has generated this unicorn, you get an idea of just what kind of overconfidence you’re dealing with.

Why is AI so obsessed with giraffes?

Shane: That’s a meme going back to the early days of image-captioning AIs. The origin of the term “giraffing” was somebody who set up a Tumblr bot that automatically captioned images and started to notice that quite a lot of them had phantom giraffes in them.

It’s kind of a fun example animal to use at this point. When I was talking with Visual Chatbot, one of these early question-and-answer image-describing bots, that’s what I picked to test: What happens if you ask it how many giraffes there are? It would always give you a nonzero answer because people didn’t tend to ask that question in training when the answer was zero.

This article appears in the September 2023 print issue as “5 Questions for Janelle Shane.”

​IEEE Spectrum  

Related Articles

Back to top button