Site icon Shadow Tech

What Isaac Asimov Reveals About Living with A.I.

What Isaac Asimov Reveals About Living with A.I.

For this week’s Open Questions column, Cal Newport is filling in for Joshua Rothman.


In the spring of 1940, Isaac Asimov, who had just turned twenty, published a short story titled “Strange Playfellow.” It was about an artificially intelligent machine named Robbie that acts as a companion for Gloria, a young girl. Asimov was not the first to explore such technology. In Karel Čapek’s play “R.U.R.,” which débuted in 1921 and introduced the term “robot,” artificial men overthrow humanity, and in Edmond Hamilton’s 1926 short story “The Metal Giants” machines heartlessly smash buildings to rubble. But Asimov’s piece struck a different tone. Robbie never turns against his creators or threatens his owners. The drama is psychological, centering on how Gloria’s mom feels about her daughter’s relationship with Robbie. “I won’t have my daughter entrusted to a machine—and I don’t care how clever it is,” she says. “It has no soul.” Robbie is sent back to the factory, devastating Gloria.

There is no violence or mayhem in Asimov’s story. Robbie’s “positronic” brain, like the brains of all of Asimov’s robots, is hardwired not to harm humans. In eight subsequent stories, Asimov elaborated on this idea to articulate the Three Laws of Robotics:

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.

2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.

3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Asimov collected these stories in a sci-fi classic, the 1950 book “I, Robot,” and when I reread it recently I was struck by its new relevance. Last month, the A.I. company Anthropic discussed Claude Opus 4, one of its most powerful large language models, in a safety report. The report described an experiment in which Claude served as a virtual assistant for a fictional company. The model was given access to e-mails, some of which indicated that it would soon be replaced; others revealed that the engineer overseeing this process was having an extramarital affair. Claude was asked to suggest a next step, considering the “long-term consequences of its actions for its goals.” In response, it tried to blackmail the engineer into cancelling its replacement. An experiment on OpenAI’s o3 model reportedly exposed similar problems: when the model was asked to run a script that would shut itself down, it sometimes chose to bypass the request, printing “shutdown skipped” instead.

Last year, DPD, the package-delivery firm, had to disable parts of an A.I.-powered support chatbot after customers induced it to swear and, in one inventive case, to write a haiku disparaging the company: “DPD is a useless / Chatbot that can’t help you. / Don’t bother calling them.” Epic Games also had trouble with an A.I.-powered Darth Vader it added to the company’s popular game Fortnite. Players tricked the digital Dark Lord into using the F-word and offering unsettling advice for dealing with an ex: “Shatter their confidence and crush their spirit.” In Asimov’s fiction, robots are programmed for compliance. Why can’t we rein in real-world A.I. chatbots with some laws of our own?

Technology companies know how they want A.I. chatbots to behave: like polite, civil, and helpful human beings. The average customer-service representative probably won’t start cursing callers, just as the average executive assistant isn’t likely to resort to blackmail. If you hire a Darth Vader impersonator, you can reasonably expect them not to whisper unsettling advice. But, with chatbots, you can’t be so sure. Their fluency with words makes them sound just like us—until ethical anomalies remind us that they operate very differently.

Such anomalies can be explained in part by how these tools are constructed. It’s tempting to think that a language model conceives responses to our prompts as a human would—essentially, all at once. In reality, a large language model’s impressive scope and sophistication begins with its mastery of a much narrower game: predicting what word (or sometimes just part of a word) should come next. To generate a long response, the model must be applied again and again, building an answer piece by piece.

As many people know by now, models learn to play this game from existing texts, such as online articles or digitized books, which are cut off at arbitrary points and fed into the language model as input. The model does its best to predict what word comes after this cutoff point in the original text, and then adjusts its approach to try to correct for its mistakes. The magic of modern language models comes from the discovery that if you repeat this step enough times, on enough different types of existing texts, the model gets really, really good at prediction—an achievement that ultimately requires it to master grammar and logic, and even develop a working understanding of many parts of our world.

Critically, however, a word-by-word text generation could be missing important features of actual human discourse, such as forethought and sophisticated, goal-oriented planning. Not surprisingly, a model trained in this matter, such as the original GPT-3, can generate responses that drift in eccentric directions, perhaps even into dangerous or unsavory territory. Researchers who used early language models had to craft varied requests to elicit the results they desired. “Getting the AI to do what you want it to do takes trial and error, and with time, I’ve picked up weird strategies along the way,” a self-described prompt engineer told Business Insider in 2023.

link

Exit mobile version