Do AI Models Get Brain Fog? It’s Complicated


An Israeli study suggesting leading artificial intelligence (AI) chatbots suffer from mild cognitive decline has caused a kerfuffle in the field, as critics dismissed the conclusion as unreasonable because bots aren’t built to reason like the human brain is.

Since his first term as President, Donald Trump has repeatedly bragged about how he “aced” a widely used screening test for mild cognitive impairment. Trump has often recited his responses — “person, woman, man, camera, TV” — to demonstrate his mental fitness.

Researchers in Israel subjected this test to some leading AI chatbots and found Trump outperformed the machines.

The study’s lead author confessed to having some fun with a serious message. “These findings challenge the assumption that artificial intelligence will soon replace human doctors, as the cognitive impairment evident in leading chatbots may affect their reliability in medical diagnostics and undermine patients’ confidence,” the authors of the study, published in the BMJ, confidently concluded.

That takeaway, along with the study’s methods, has become almost as polarizing as the president who thrust the test into the public eye. Some critics were surprised at the media reaction to the findings, which appeared in the BMJ’s tongue-in-cheek but peer-reviewed Christmas issue. Its 1999 Christmas issue (in)famously introduced the world to the first MRI images of copulating couples; it remains among the journal’s most downloaded articles.

“We were kind of surprised” AI failed, said Roy Dayan, MD, a neurologist at Hadassah Medical Center in Jerusalem, Israel, and a co-author of the study. The results should come as comfort for doctors, or at least for neurologists, Dayan said: “I think we have a few more years before we’ll be obsolete.”

Up Against the Montreal Cognitive Assessment (MoCA)

The screening tool, called the MoCA, developed by Ziad Nasreddine, MD, a Canadian neurologist, has come into widespread use since its introduction 25 years ago. In the brief test, clinicians gauge various cognitive skills: Visuospatial (drawing a clockface with the correct time); recall and delayed recall (as in Trump’s reciting “person, woman, man” response); and executive function, language, and orientation.

“AI is an amazing tool,” Dayan added, but many medical professionals are worried the bots are so good that they’ll take their livelihoods. “It’s definitely in the conversation for many doctors and many patients that some aspects of medicine will be more readily replaced,” he said. It’s especially concerning for folks in the radiology and pathology fields because of AI’s sharp eye for pattern recognition, he said. It has also outscored human doctors on board exams. (Some evidence suggests AI alone outperforms physicians using AI in certain domains.)

Although the propensity of AI tools to “hallucinate” by citing nonexistent studies is well-known, none of the models had been tested for “cognitive decline” until Dayan and his colleagues did so for the BMJ.

“Our main goal was not to criticize AI,” he said, but rather to “examine their susceptibility to these very human impairments.”

The team administered the MoCA to five leading, publicly available chatbots: OpenAI’s ChatGPT 4 and 4o, Anthropic’s Claude 3.5, and Google’s Gemini 1 and more advanced Gemini 1.5. The main difference between testing humans and the chatbots was that the questions were asked via text instead of voice.

ChatGPT 4o scored highest with a 26 — barely passing the threshold of mild cognitive decline — followed by ChatGPT 4 and Claude 3.5, with 25. Gemini 1.5 scored a 22, while Gemini 1’s score of 16 indicated “a more severe state of cognitive impairment,” the authors wrote. All chatbots performed well with memory, attention span, naming objects, and recall, although the two Gemini bots suffered in tests of delayed recall.

The bots came up short in visuospatial tests; none could recreate the drawing of a cube. All struggled with drawing a clockface with the correct time of 11:10, even when asked to use ASCII characters to draw. Two versions drew clockfaces that more closely resembled avocados than circles. Gemini spat out “10 past 11” in text, but the clockface read 4:05.

The bots “have to translate everything first to words, then back to visuals,” Dayan said. Humans are more adept at conjuring the image of the time on a clockface when told what time it is. The conversion for humans is easier because “in our brain we’ve had abstract abilities,” he said.

The bots also struggled to describe the overarching message behind a drawing of a cookie theft depicting a distracted mother and her children in a kitchen. While they accurately described parts of the picture, they failed to notice that the mom paid no attention to a boy stealing from the cookie jar who was falling from a stool — indicating a lack of empathy.

AI: ‘Category Error of the Highest Order’

Critics of the study were concerned about the study’s take-home message. One such criticism came from Claude 3.5, a model found to suffer from decline: “Applying human neurological assessments to artificial intelligence systems represents a category error of the highest order,” it read. “Claiming an LLM has ‘dementia’ because it struggles with visuospatial tasks is akin to diagnosing a submarine with asthma because it cannot breathe air.”

“I understand the paper was written tongue in cheek, but there were a lot of journalists covering it sincerely,” said Roxana Daneshjou, MD, PhD, an assistant professor of biomedical science at Stanford School of Medicine, in Stanford, California. She and others complained about the authors using the phrase “cognitive decline” rather than “performance changes” or “performance drift,” which gave the article unwarranted credibility.

One big issue with the paper was that “they tested it once and only once,” even though the models they used were updated during the research, Daneshjou said. “One version they tested from 1 month to the next actually changes. Newer versions generally perform better than older versions. That’s not because the older models have cognitive decline. The new ones are designed to perform better.”

While Daneshjou said she understands the anxiety among certain clinicians about being replaced by AI, the bigger problem is that the healthcare system is already understaffed. Humans will always be needed. “There is no such model that is able to provide general medical care,” she said. “They are very good at doing parlor tricks.”

Even the neurologist who developed the MoCA test had issues with the otherwise “interesting” research. “The MoCA was designed to assess human cognition,” said Nasreddine, founder of the MoCA Cognition memory in Quebec, Canada. “Humans tend to respond in various ways, but only a limited set of responses are acceptable.”

Because the AI models were not supposed to have studied the rules for scoring well on the test, they had to predict what the expected correct response should be for each task. “The more recent LLM possibly had access to more data or better prediction models that may have improved their performance,” he said.

Ravi Parikh, MD, an associate professor of oncology at Emory University School of Medicine in Atlanta, saw firsthand the human role in AI’s “performance drift” during the COVID-19 pandemic. He was lead author of a study, which found an AI algorithm that predicted cancer mortality lost nearly 7 percentage points of accuracy.

“COVID was really changing the output of these predictive algorithms — not COVID itself, but care during the COVID era,” Parikh said. That was largely because patients turned to telemedicine use of lab tests became “a lot less routine,” he said. “Staying at home was a human decision. It’s not the AI’s fault. It takes a human to recognize that it’s an issue.”

Dayan said he’s still a fan of AI despite the results of the study, which he thinks was a natural fit for the lighthearted The BMJ’s Christmas issue.

“I hope no harm was done,” he said, tongue in cheek.



Source link : https://www.medscape.com/viewarticle/do-ai-models-get-brain-fog-its-complicated-2025a10002zs?src=rss

Author :

Publish date : 2025-02-06 12:57:04

Copyright for syndicated content belongs to the linked Source.
Exit mobile version