Sunday, December 7, 2025
News Health
  • Health News
  • Hair Products
  • Nutrition
    • Weight Loss
  • Sexual Health
  • Skin Care
  • Women’s Health
    • Men’s Health
No Result
View All Result
  • Health News
  • Hair Products
  • Nutrition
    • Weight Loss
  • Sexual Health
  • Skin Care
  • Women’s Health
    • Men’s Health
No Result
View All Result
HealthNews
No Result
View All Result
Home Health News

Google’s Gemini 3 model keeps the AI hype train going – for now

November 19, 2025
in Health News
Share on FacebookShare on Twitter


Gemini 3 is Google’s latest AI model

VCG via Getty Images

Google’s latest chatbot, Gemini 3, has made significant leaps on a raft of benchmarks designed to measure AI progress, according to the company. These achievements may be enough to allay fears of an AI bubble bursting for the moment, but it is unclear how well these scores translate to real-world capabilities.

What’s more, persistent factual inaccuracies and hallucinations that have become a hallmark of all large language models show no signs of being ironed out, which could prove problematic for any uses where reliability is vital.

In a blog post announcing the new model, Google bosses Sundar Pichai, Demis Hassabis and Koray Kavukcuoglu write that Gemini 3 has “PhD-level reasoning”, a phrase that competitor OpenAI also used when it announced its GPT-5 model. As evidence for this, they list scores on several tests designed to test “graduate-level” knowledge, such as Humanity’s Last Exam, a set of 2500 research-level questions from maths, science and the humanities. Gemini 3 scored 37.5 per cent on this test, outclassing the previous record holder, a version of OpenAI’s GPT-5, which scored 26.5 per cent.

Jumps like this can indicate that a model has become more capable in certain respects, says Luc Rocher at the University of Oxford, but we need to be careful about how we interpret these results. “If a model goes from 80 per cent to 90 per cent on a benchmark, what does it mean? Does it mean that a model was 80 per cent PhD level and now is 90 per cent PhD level? I think it’s quite difficult to understand,” they say. “There is no number that we can put on whether an AI model has reasoning, because this is a very subjective notion.”

Benchmark tests have many limitations, such as requiring a single answer or multiple choice answers for which models don’t need to show their working. “It’s very easy to use multiple choice questions to grade [the models],” says Rocher, “but if you go to a doctor, the doctor will not assess you with a multiple choice. If you ask a lawyer, a lawyer will not give you legal advice with multiple choice answers.” There is also a risk that the answers to such tests were hoovered up in the training data of the AI models being tested, effectively letting them cheat.

The real test for Gemini 3 and the most advanced AI models – and whether their performance will be enough to justify the trillions of dollars that companies like Google and OpenAI are spending on AI data centres – will be in how people use the model and how reliable they find it, says Rocher.

Google says the model’s improved capabilities will make it better at producing software, organising email and analysing documents. The firm also says it will improve Google search by supplementing AI-generated results with graphics and simulations.

It is likely that the real improvements will be for people who use AI tools to autonomously write code, a process called agentic coding, says Adam Mahdi at the University of Oxford. “I think we’re hitting the upper limit of what a typical chatbot can do, and the real benefits of Gemini 3 Pro [the standard version of Gemini 3] will probably be in more complex, potentially agentic workflows, rather than everyday chatting,” he says.

 Initial reactions online have included people praising Gemini’s coding capabilities and ability to reason, but as with all new model releases, there have also been posts highlighting failures to do apparently simple tasks, such as tracing hand-drawn arrows pointing to different people, or simple visual reasoning tests.

Google admits, in Gemini 3’s technical specifications, that the model will continue to hallucinate and produce factual inaccuracies some of the time, at a rate that is roughly comparable with other leading AI models. The lack of improvement in this area is a big concern, says Artur d’Avila Garcez at City St George’s, University of London. “The problem is that all AI companies have been trying to reduce hallucinations for more than two years, but you only need one very bad hallucination to destroy trust in the system for good,” he says.

Topics:



Source link : https://www.newscientist.com/article/2505039-googles-gemini-3-model-keeps-the-ai-hype-train-going-for-now/?utm_campaign=RSS%7CNSNS&utm_source=NSNS&utm_medium=RSS&utm_content=home

Author :

Publish date : 2025-11-19 15:38:00

Copyright for syndicated content belongs to the linked Source.

Previous Post

Quantum computers that recycle their qubits can limit errors

Next Post

MacArthur Foundation Awards $100M to Outbreak Surveillance Network Amid Global Cuts

Related Posts

Health News

Want Good Info on Epilepsy Treatment? Don’t Ask Dr. TikTok

December 7, 2025
Health News

Fixed-Duration or Continuous Therapy in CLL: Trial Says Take Your Pick

December 7, 2025
Health News

What Did Trump’s MRI Assess?

December 6, 2025
Health News

For People With Seizures, It May Be Better to Be Old

December 6, 2025
Health News

Neflamapimod Shows Promise in Dementia With Lewy Bodies

December 6, 2025
Health News

Tai Chi for Insomnia; WHO Guidance on GLP-1 Drugs

December 6, 2025
Load More

Want Good Info on Epilepsy Treatment? Don’t Ask Dr. TikTok

December 7, 2025

Fixed-Duration or Continuous Therapy in CLL: Trial Says Take Your Pick

December 7, 2025

What Did Trump’s MRI Assess?

December 6, 2025

For People With Seizures, It May Be Better to Be Old

December 6, 2025

Neflamapimod Shows Promise in Dementia With Lewy Bodies

December 6, 2025

Tai Chi for Insomnia; WHO Guidance on GLP-1 Drugs

December 6, 2025

The Other Screen Time Risk We Rarely Hear About

December 6, 2025

Nerve Stimulation for Epilepsy: Intracranial Versus Extracranial Approaches Compared

December 6, 2025
Load More

Categories

Archives

December 2025
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
293031  
« Nov    

© 2022 NewsHealth.

No Result
View All Result
  • Health News
  • Hair Products
  • Nutrition
    • Weight Loss
  • Sexual Health
  • Skin Care
  • Women’s Health
    • Men’s Health

© 2022 NewsHealth.

Go to mobile version