A golden age of maths is dawning and mathematicians are freaking out

New Scientist. Science news and long reads from expert journalists, covering developments in science, technology, health and the environment on the website and the magazine.

I am attempting to solve a mathematical conundrum that has stumped many of humanity’s greatest thinkers. I have zero mathematical training, apart from a distant undergraduate physics degree, which should put my odds of success at slim to none. But I also have a trick up my sleeve – a kind of mathematical genie that can conjure arcane secrets seemingly out of thin air. I make a short request concerning an esoteric conjecture in number theory, then cross my fingers.

Perhaps “genie” is a bit too strong – I’m simply using GPT 5.5 Pro, the latest iteration of OpenAI’s flagship model. But for mathematicians, modern AI models appear to have a spark of magic. Even in an era of rapid progress, the growth in AI’s mathematical ability is stunning. In just a few months, many prominent mathematicians have walked back previous scepticism and replaced it with sweeping predictions, whispering behind closed doors about job concerns and whether it is even worth embarking on a particular research project if AI might get there first.

In April, I visited San Francisco, where the future often seems to arrive fastest, to attend a hastily organised meeting between mathematicians and AI researchers. There was an air of excitement and curiosity in the room, but also an undeniable feeling of existential dread. If someone like me could produce mathematics at the press of a button, what would that mean for the professionals? Will we even need human mathematicians? And will the machines crack problems that no human could? The answers may have profound consequences for the millennia-old practice of mathematics, and it feels like mathematicians have only a brief window to prepare.

“I think AI is going to come in a big way, and it will significantly revolutionise the field,” says Jacob Tsimerman at the University of Toronto, Canada, who helped organise the conference.

Opinions on the future are divided. “We are running out of places to hide,” wrote Jeremy Avigad at Carnegie Mellon University in Pennsylvania in a recent essay. “We have to face up to the fact that AI will soon be able to prove theorems better than we can.”

Some mathematicians are welcoming the mechanisation of mathematics. Terence Tao at the University of California, Los Angeles, has said the field is moving from an era of “proof scarcity” to one of abundance that could see many once-thorny problems fall to AI. Rather than focusing on being the first person to find a proof, mathematicians might instead race to be the first to understand it, he argues.

Artificial intelligence isn’t terra incognita to mathematicians, but it is only in the past few years that it has started producing useful contributions. At first, these were artisan operations, using individually crafted neural networks to crack particular problems. These bespoke AI models proved difficult to apply across different mathematical disciplines, and remained of interest to only a tiny fraction of working mathematicians.

Even when ChatGPT launched in 2022, mathematicians remained unimpressed – large language models like GPT-3.5, which powered the first version of OpenAI’s chatbot, struggled to perform even basic arithmetic and spouted confident nonsense when asked to solve research-level mathematical problems. But as LLMs scaled up and were trained on increasing amounts of mathematical data, they began to yield results.

One of the first signals that AI was becoming more adept came when AI models were tasked with attempting the International Mathematical Olympiad (IMO), an elite test for high-school students consisting of just six questions of devilish difficulty. The mathematical intuition and range of disciplines required to succeed at the test meant many researchers saw it as a benchmark for mathematical AI, but thought it would take years, possibly a decade, for it to score highly.

They were wrong. In July 2024, Google DeepMind announced that its AlphaProof AI system could solve four out of six questions from that year’s IMO, enough for a silver-level performance. This was impressive, but AlphaProof wasn’t a strict large language model and had been fine-tuned for IMO-style questions, such as geometry, and it was unclear how much further it might go. But just a year later, Google and OpenAI announced they had achieved a gold-level performance, with OpenAI in particular using a less maths-focused model. The results made mathematicians sit up. “People’s eyes really opened,” says Ravi Vakil at Stanford University in California.

Problem solving

It wasn’t long before these capabilities were made available to the public, where they quickly found use beyond high-school competitions and began encroaching on research-level mathematics. Thomas Bloom at the University of Manchester, UK, first noticed the impact of these newer models in the last months of 2025. He runs a website that tracks progress on a set of more than a thousand problems posed by the famous mathematician Paul Erdős. They tend to be simple to state, but range in complexity from relatively straightforward to very difficult, and many of them are seen as signposts for mathematical progress.

Bloom started getting comments on the site from people he didn’t recognise. At first, they were just using GPT-5, then recently released, to dig out obscure references in the literature that might help with a particular problem. But in a matter of months, the release of more powerful models like GPT 5.2 Pro saw people posting full-blown solutions with AI assistance, some of which were verified by Bloom and his colleagues as correct. These solutions took “non-trivial effort”, Bloom told me at the time. “It’s incredible that AI is capable of that.”

What’s more, some of these solutions weren’t coming from professional mathematicians, but amateurs and novices. Kevin Barreto, who is in his second year of an undergraduate mathematics degree at the University of Cambridge, has solved numerous Erdős problems using AI, frequently with his collaborator Liam Price, who has no maths degree or formal training.

Inspired by their success, I wanted to try autonomous mathematics for myself. While these tools can, in theory, be used by anyone, Barreto and Price seem to have a magic touch in prodding the genie to produce useful answers, so I asked for help. The trick isn’t just asking the model to produce a proof, Barreto tells me, but bizarrely giving it a certain level of support, like “try your best” or “don’t give up”. “You try to encourage the model,” he says. “You try to hint it into believing the problem is of an easier difficulty than it actually is.”

Even so, success wasn’t guaranteed. Solving certain problems has often taken Barreto numerous attempts, if he succeeds at all. “Coaxing the correct proof strategy out of it is essentially like trying to play the lottery,” he says.

Still, I wanted to try my hand and spin the wheel in the mathematical proof casino. I chose an unsolved Erdős problem, number 710, which concerns a list of requirements that must be satisfied by a set of numbers, with the goal being to find a set with the smallest difference between the lowest and highest numbers. It is a bit like having a list of picky hotel guests, who insist on having a room with a bath or a sea view, for instance, and needing to find the shortest block of rooms that will satisfy them all.

Left to Right, Paul Erdos, Arthur Herald Stone, and Shizuo Kakutani — Mathematician Paul Erdős (left) posed hundreds of problems that AI is getting better and better at solving

New York Daily News Archive/NY Daily News via Getty Images

Mindful that I needed to use the most powerful AI model available, I asked OpenAI for access to ChatGPT 5.5 Pro, which normally costs $200 a month but was provided for free for this article. Like Barreto suggested, my prompt for the AI hints that the solution is within reach and that “it just takes a few clever tricks”.

As I left the AI crunching away, I turned to consider the most recent developments in this mathematical revolution. If solving Erdős problems is AI creeping up on the door of research-level mathematics, the past few months have seen it kicked down. A steady stream of mathematical papers are claiming to solve real, cutting-edge problems.

In January, Vakil and his colleagues uploaded one such paper, noting that “the proof of this result was obtained in conjunction with Google Gemini and related tools”. The proof focuses on a particularly thorny problem concerning how certain sphere-like shapes can be linked to other mathematical objects called flag spaces, which can be thought of as collections of nesting-doll-like objects. This would provide an important link between topology, which concerns the more general properties of shapes, and algebraic geometry, which deals with the precise shapes themselves. The task is made difficult by the multitude of ways in which the flag spaces and sphere-like shapes can correspond.

Vakil and his colleagues first gave a simpler version of what they wanted to prove to a custom AI model from Google DeepMind. The model found a mathematical structure they hadn’t previously seen, making it clear to them how to generalise and write the entire argument, which turned out to be simpler than it initially seemed.

Human and machine

“There’s no way the AI could do it by itself because it wouldn’t know the [correct] question. We absolutely told it what to do,” says Vakil. At the same time, the AI provided a shortcut. “The paper might never have happened because we might never have had the time to get together and figure out the argument,” he says. “It’s more how things will happen. The future will be some combination of human and machine.”

This line is already becoming increasingly blurry, however. The very same month as Vakil’s paper, Tony Feng at the University of California, Berkeley, who also works with Google DeepMind, published a paper detailing how he had used Google’s Aletheia AI to calculate a previously unknown collection of numbers that are vital for translating between two disparate mathematical disciplines, algebraic geometry and number theory. Building such bridges is an important goal in the Langlands programme, often seen as a grand unified theory of mathematics. According to Feng, the “core mathematical content” was generated entirely by Aletheia.

The biggest result yet in AI mathematics came just a few weeks ago in May, when OpenAI announced that it had used an unreleased model to solve an 80-year-old maths conjecture called the planar unit distance problem. The firm didn’t provide full details of the model, other than to say it was a general-purpose AI, rather than one trained specifically to do mathematics. The reaction among mathematicians has been one of stunned disbelief.

It is becoming difficult to keep track of the torrent of mathematical research assisted by AI, not least for professional mathematicians, who are themselves busily attempting problems using AI that they may not have previously had time to do.

“It opens up a world of possibility,” says Alex Kontorovich at Rutgers University in New Jersey. “I can imagine projects I could undertake this summer, things that I know would have taken me five years that I would never have even started.”

Could those new possibilities even include a solution for the Riemann hypothesis, a deep question about the origin of prime numbers that is one of the Millennium Prize Problems, which are often seen as the greatest challenges in maths? Several mathematicians working for AI companies told me they thought we might see one of these problems fall in the next several years, while others cautioned that they are in a wildly different class of difficulty from those problems that had been solved so far.

Blackboard with mathematical equations written on it — If we are entering an era of AI-led mathematics, what role will human mathematicians play?

Thomas T/Unsplash

The San Francisco conference I attended in April was an attempt to map these possible futures. It took place in a nondescript building owned by a venture capital firm, the only clue that it existed an unmarked pink door and a video doorbell. As I waited for the door to open, I was joined by a former maths professor who now works for a hedge fund, stepping out from a driverless car. Once inside, I found eminent mathematicians like Vakil and Kontorovich mingling with employees from companies like OpenAI and Google.

The ostensible goal of the meeting was to come up with a way to track AI’s mathematical progress and where it might be headed. Attendees had their own personal priorities, however. “My hope was to understand a little bit more about where the models are and where they’re going in terms of mathematical capability,” says Daniel Litt, another conference organiser, also at the University of Toronto. “It’s clear that the models are, in some sense, missing some capabilities that mathematicians have.”

In the past, the most common way to test an AI model’s mathematical ability was to run it on a benchmark, a collection of problems that typically require simple and easy-to-verify solutions, like a single number. This was convenient for AI companies, because they could present their models’ progress as a clean, rising line on a graph. But many mathematical tasks aren’t so neat and tidy, requiring proofs that need interpretation by an expert.

What’s more, prowess in one area of maths doesn’t imply a human-like mathematical ability in general, says Melanie Wood at Harvard University. “One big mistake that people make when they think about AI and math is to take the correlation of these skills in humans and think that it’s going to match some correlation in AI.”

A button-pushing future

Mathematicians at the conference worked in small groups to come up with a better way to track AI’s mathematical ability and finished the week with a working draft. But boiling down all the things a working mathematician does into a short document wasn’t easy, and there was still disagreement over the best way forward.

A large part of the conference consisted of free-flowing group discussions primarily between the mathematicians, hashing out the details of what an AI-led mathematics might look like. Would it be of humans and machines working in lockstep, like Vakil thought, or would it be more like a slot machine, pressing a button that sometimes produced an interesting result in full?

For Tsimerman, who grew up taking part in maths competitions like the IMO, the latter didn’t have much appeal. “My experience of math is the act of solving problems, and if I don’t do that anymore, I think I might prefer playing music or doing theatre or learning something else,” he says.

At one point in a group discussion, Tsimerman asked people in the room to indicate whether, in his button-pushing vision of the future, they would want to continue being mathematicians. Only around half raised their hand.

Not everyone agreed that this was a useful exercise, however, or that solving problems was the most important mathematical activity. “What I actually care about is understanding things and figuring out what’s true,” says Litt. “One can do that by posing and proving a conjecture, but you can also do that by going over to your friend and asking them a question.”

And even if these tools can solve difficult and thorny problems, many mathematicians remained adamant that it was only humans that could decide what was interesting to work on or what the important problems to tackle should be. Maths isn’t about solving puzzles just for the sake of it, points out Wood, and mathematicians generally look for solutions that push the field forward. “Does it suggest a way to solve a lot of other problems, or is it only a solution for that particular problem?” she says.

On the conference’s third day, excited murmurs rippled among the attendees. Overnight, it appeared that another Erdős problem had been cracked, one that was qualitatively different from the others. Jared Lichtman at Stanford University, who happened to be at the conference, had spent a considerable portion of his PhD wrestling with a closely related problem, after many mathematicians had spent decades trying to solve it. “It was a problem I was already independently very passionate about,” he says.

Price had elicited a solution to the problem, known as Erdős 1196, from a single request to ChatGPT 5.5 Pro. It concerns “primitive” sets of numbers that are similar to prime numbers, in that no number in the set can divide another. Erdős had come up with a number calculated from these sets that helped order them, and argued that the largest this number could be for any primitive set was 1.6. Lichtman had proved that Erdős was correct in this case, but wanted to do the same for a more restricted family of primitive sets. Erdős suspected the highest value this number could be was 1, but proving it remained a tougher nut to crack.

The AI took an entirely different approach, using a mathematical tool that all previous attempts had missed, called a Von Mangoldt function. “You can use the Von Mangoldt function to circumvent a lot of technical difficulties that all these previous approaches had used,” says Lichtman. Working with others, including Price, Barreto and Tao, he later adapted this technique to solve a related 60-year-old conjecture by Erdős. “This is perhaps one of the first examples of an AI-generated proof having downstream impacts, which we are still exploring,” Lichtman said when posting about the work on social media.

Meanwhile, I was finally ready to explore my own AI-generated proof. After “thinking” for 22 minutes and 18 seconds, ChatGPT pinged me with a response. “Here is the clean proof,” it wrote, followed by dozens of lines of impenetrable mathematics. I felt a jolt of excitement. Had I solved a decades-old problem, cementing my name in the mathematical history books?

I fed the answer back into ChatGPT, and soon received confirmation: “Yes — the main argument is correct.” I was growing even more confident. I dashed off an email to Barreto, asking whether I might be on to something. But as quickly as my excitement had arrived, it vanished. “It doesn’t look like it solves the problem,” he replied. I had missed that the AI had actually proven something different from the formula Erdős had hoped for, which had already been discovered by Erdős himself years ago.

It was something that an expert mathematician might have quickly caught, but for me it was lost in the noise. Perhaps there is a future for mathematicians after all, even if only to help humans understand what an AI produces. “I still want to know what’s going on,” says Litt. “A model can’t understand something for you.”

Topics:

artificial intelligence/
ChatGPT

Source link : https://www.newscientist.com/article/2526650-a-golden-age-of-maths-is-dawning-and-mathematicians-are-freaking-out/?utm_campaign=RSS%7CNSNS&utm_source=NSNS&utm_medium=RSS&utm_content=home

Author :

Publish date : 2026-06-01 16:00:00

Copyright for syndicated content belongs to the linked Source.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

A golden age of maths is dawning and mathematicians are freaking out

Read the Results of Trump’s 2026 Physical Exam

How human error became a weapon against large language models

Related Posts

Non-Operative Treatment for CRC May Include Observation After Complete Response

Tenecteplase Unseats Low-Dose Alteplase Favored for Asian Patients

Obesity With a Normal BMI? Study Suggests It’s Common

Patients With Tardive Dyskinesia Symptoms May Go Undiagnosed

FDA Approves Cefepime-Zidebactam for Complicated UTIs

AI Spots Undertreated Heart Failure Risk Early

Non-Operative Treatment for CRC May Include Observation After Complete Response

Tenecteplase Unseats Low-Dose Alteplase Favored for Asian Patients

Obesity With a Normal BMI? Study Suggests It’s Common

Patients With Tardive Dyskinesia Symptoms May Go Undiagnosed

FDA Approves Cefepime-Zidebactam for Complicated UTIs

AI Spots Undertreated Heart Failure Risk Early

Targeted Combo Boosts PFS in BRAF-Mutant Metastatic Colorectal Cancer

CELMoD Drug Cuts Risk of Myeloma Progression, Death by 52% in Trial

Categories

Archives