{"id":853476,"date":"2025-06-05T05:13:11","date_gmt":"2025-06-05T10:13:11","guid":{"rendered":"https:\/\/newsycanuse.com\/index.php\/2025\/06\/05\/whats-next-for-ai-and-math\/"},"modified":"2025-06-05T05:13:11","modified_gmt":"2025-06-05T10:13:11","slug":"whats-next-for-ai-and-math","status":"publish","type":"post","link":"https:\/\/newsycanuse.com\/index.php\/2025\/06\/05\/whats-next-for-ai-and-math\/","title":{"rendered":"What\u2019s next for AI and math"},"content":{"rendered":"<div id=\"content--body\">\n<div>\n<p>MIT Technology Review<em>\u2019s What\u2019s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them\u00a0<strong><a href=\"https:\/\/www.technologyreview.com\/tag\/whats-next-in-tech\/?\">here<\/a>.<\/strong><\/em><\/p>\n<p>The way DARPA tells it, math is stuck in the past. In April, the US Defense Advanced Research Projects Agency kicked off a <a href=\"https:\/\/www.darpa.mil\/research\/programs\/expmath-exponential-mathematics\">new initiative called expMath<\/a>\u2014short for Exponentiating Mathematics\u2014that it hopes will speed up the rate of progress in a field of research that underpins a wide range of crucial real-world applications, from computer science to medicine to national security.<\/p>\n<\/p><\/div>\n<div>\n<p>\u201cMath is the source of huge impact, but it\u2019s done more or less as it\u2019s been done for centuries\u2014by people standing at chalkboards,\u201d DARPA program manager Patrick Shafto said in a <a href=\"https:\/\/www.youtube.com\/watch?v=pbf4cZk4e2w\">video introducing the initiative<\/a>.\u00a0<\/p>\n<\/div>\n<div>\n<p>The modern world is built on mathematics. Math lets us model complex systems such as the way air flows around an aircraft, the way financial markets fluctuate, and the way blood flows through the heart. And breakthroughs in advanced mathematics can unlock new technologies such as cryptography, which is essential for private messaging and online banking, and data compression, which lets us shoot images and video across the internet.<\/p>\n<p>But advances in math can be years in the making. DARPA wants to speed things up. The goal for expMath is to encourage mathematicians and artificial-intelligence researchers to develop what DARPA calls an AI coauthor, a tool that might break large, complex math problems into smaller, simpler ones that are easier to grasp and\u2014so the thinking goes\u2014quicker to solve.<\/p>\n<p>Mathematicians have used computers for decades, to speed up calculations or check whether certain mathematical statements are true. The new vision is that AI might help them crack problems that were previously uncrackable.\u00a0\u00a0<\/p>\n<p>But there\u2019s a huge difference\u00a0between AI that can solve the kinds of problems set in high school\u2014math that the latest generation of models has already mastered\u2014and AI that could (in theory) solve the kinds of problems that professional mathematicians spend careers chipping away at.<\/p>\n<p>On one side are tools that might be able to automate certain tasks that math grads are employed to do; on the other are tools that might be able to push human knowledge beyond its existing limits.<\/p>\n<p>Here are three ways to think about that gulf.<\/p>\n<h3><strong>1\/ AI needs more than just clever tricks<\/strong><\/h3>\n<p>Large language models are not known to be good at math. They <a href=\"https:\/\/www.technologyreview.com\/2024\/06\/18\/1093440\/what-causes-ai-hallucinate-chatbots\/\">make things up<\/a> and <a href=\"https:\/\/www.reddit.com\/r\/ChatGPT\/comments\/11brmiv\/gaslighting_the_ai_into_225\/\">can be persuaded that 2 + 2 = 5<\/a>. But newer versions of this tech, especially so-called large reasoning models (LRMs) like OpenAI\u2019s o3 and Anthropic\u2019s Claude 4 Thinking, are far more capable\u2014and that&#8217;s got mathematicians excited.<\/p>\n<\/p><\/div>\n<div>\n<p>This year, a number of LRMs, which try to solve a problem step by step rather than spit out the first result that comes to them, have achieved <a href=\"https:\/\/www.vals.ai\/benchmarks\/aime-2025-04-18\">high scores on the American Invitational Mathematics Examination<\/a> (AIME), a test given to the top 5% of US high school math students.<\/p>\n<p>At the same time, a handful of new hybrid models that combine LLMs with some kind of fact-checking system have also made breakthroughs. Emily de Oliveira Santos, a mathematician at the University of S\u00e3o Paulo, Brazil, points to Google DeepMind\u2019s AlphaProof, a system that combines an LLM with DeepMind\u2019s game-playing model AlphaZero, as one key milestone. Last year AlphaProof became the first computer program to <a href=\"https:\/\/www.technologyreview.com\/2024\/07\/25\/1095315\/google-deepminds-ai-systems-can-now-solve-complex-math-problems\/\">match the performance of a silver medallist at the International Math Olympiad<\/a>, one of the most prestigious mathematics competitions in the world.<\/p>\n<p>And in May, a Google DeepMind model called AlphaEvolve <a href=\"https:\/\/www.technologyreview.com\/2025\/05\/14\/1116438\/google-deepminds-new-ai-uses-large-language-models-to-crack-real-world-problems\/\">discovered better results than anything humans had yet come up with<\/a> for more than 50 unsolved mathematics puzzles and several real-world computer science problems.<\/p>\n<p>The uptick in progress is clear. \u201cGPT-4 couldn\u2019t do math much beyond undergraduate level,\u201d says de Oliveira Santos. \u201cI remember testing it at the time of its release with a problem in topology, and it just couldn\u2019t write more than a few lines without getting completely lost.\u201d But when she gave the same problem to OpenAI\u2019s o1, an LRM released in January, it nailed it.<\/p>\n<p>Does this mean such models are all set to become the kind of coauthor DARPA hopes for? Not necessarily, she says: \u201cMath Olympiad problems often involve being able to carry out clever tricks, whereas research problems are much more explorative and often have many, many more moving pieces.\u201d Success at one type of problem-solving may not carry over to another.<\/p>\n<p>Others agree. Martin Bridson, a mathematician at the University of Oxford, thinks the Math Olympiad result is a great achievement. \u201cOn the other hand, I don\u2019t find it mind-blowing,\u201d he says. \u201cIt\u2019s not a change of paradigm in the sense that \u2018Wow, I thought machines would never be able to do that.\u2019 I expected machines to be able to do that.\u201d<\/p>\n<p>That\u2019s because even though the problems in the Math Olympiad\u2014and similar high school or undergraduate tests like AIME\u2014are hard, there\u2019s a pattern to a lot of them. \u201cWe have training camps to train high school kids to do them,\u201d says Bridson. \u201cAnd if you can train a large number of people to do those problems, why shouldn\u2019t you be able to train a machine to do them?\u201d<\/p>\n<p>Sergei Gukov, a mathematician at the California Institute of Technology who coaches Math Olympiad teams, points out that the style of question does not change too much between competitions. New problems are set each year, but they can be solved with the same old tricks.<\/p>\n<\/p><\/div>\n<div>\n<p>\u201cSure, the specific problems didn\u2019t appear before,\u201d says Gukov. \u201cBut they\u2019re very close\u2014just a step away from zillions of things you have already seen. You immediately realize, \u2018Oh my gosh, there are so many similarities\u2014I\u2019m going to apply the same tactic.\u2019\u201d As hard as competition-level math is, kids and machines alike can be taught how to beat it.<\/p>\n<\/div>\n<div>\n<p>That\u2019s not true for most unsolved math problems. Bridson is president of the Clay Mathematics Institute, a nonprofit US-based research organization best known for setting up the Millenium Prize Problems in 2000\u2014seven of the most important unsolved problems in mathematics, with a $1 million prize to be awarded to the first person to solve each of them. (One problem, the Poincar\u00e9 conjecture, was solved in 2010; the others, which include P versus NP and the Riemann hypothesis, remain open). \u201cWe\u2019re very far away from AI being able to say anything serious about any of those problems,\u201d says Bridson.<\/p>\n<p>And yet it\u2019s hard to know exactly how far away, because many of the existing benchmarks used to evaluate progress are maxed out. The best new models already outperform most humans on tests like AIME.<\/p>\n<p>To get a better idea of what existing systems can and cannot do, a startup called Epoch AI has created a new test called FrontierMath, released in December. Instead of co-opting math tests developed for humans, Epoch AI worked with more than 60 mathematicians around the world to come up with a <a href=\"https:\/\/epoch.ai\/frontiermath\/benchmark-problems:\">set of math problems<\/a> from scratch.<\/p>\n<p>FrontierMath is designed to probe the limits of what today\u2019s AI can do. None of the problems have been seen before and the majority are being kept secret to avoid contaminating training data. Each problem demands hours of work from expert mathematicians to solve\u2014if they can solve it at all: some of the problems require specialist knowledge to tackle.<\/p>\n<p>FrontierMath is set to become an industry standard. It\u2019s not yet as popular as AIME, says de Oliveira Santos, who helped develop some of the problems: \u201cBut I expect this to not hold for much longer, since existing benchmarks are very close to being saturated.\u201d<\/p>\n<p>On AIME, the best large language models (Anthropic\u2019s Claude 4, OpenAI\u2019s o3 and o4-mini, Google DeepMind\u2019s Gemini 2.5 Pro, X-AI\u2019s Grok 3) now score around 90%. On FrontierMath, <a href=\"https:\/\/www.linkedin.com\/posts\/epochai_we-recently-completed-preliminary-evaluations-activity-7325188118750371840-Jof_\/\">04-mini scores 19% and Gemini 2.5 Pro scores 13%<\/a>. That\u2019s still remarkable, but there\u2019s clear room for improvement.\u00a0\u00a0\u00a0\u00a0\u00a0<\/p>\n<p>FrontierMath should give the best sense yet just how fast AI is progressing at math. But there are some problems that are still too hard for computers to take on.<\/p>\n<\/p><\/div>\n<div>\n<h3><strong>2\/ AI needs to manage really vast sequences of steps<\/strong><\/h3>\n<p>Squint hard enough and in some ways math problems start to look the same: to solve them you need to take a sequence of steps from start to finish. The problem is finding those steps.\u00a0<\/p>\n<p>\u201cPretty much every math problem can be formulated as path-finding,\u201d says Gukov. What makes some problems far harder than others is the number of steps on that path. \u201cThe difference between the Riemann hypothesis and high school math is that with high school math the paths that we\u2019re looking for are short\u201410 steps, 20 steps, maybe 40 in the longest case.\u201d The steps are also repeated between problems.<\/p>\n<p>\u201cBut to solve the Riemann hypothesis, we don\u2019t have the steps, and what we\u2019re looking for is a path that is extremely long\u201d\u2014maybe a million lines of computer proof, says Gukov.<\/p>\n<p>Finding very long sequences of steps can be thought of as a kind of complex game. It\u2019s what DeepMind\u2019s AlphaZero learned to do when it mastered Go and chess. A game of Go might only involve a few hundred moves. But to win, an AI must find a winning sequence of moves among a vast number of possible sequences. Imagine a number with 100 zeros at the end, says Gukov.<\/p>\n<p>But that\u2019s still tiny compared with the number of possible sequences that could be involved in proving or disproving a very hard math problem: \u201cA proof path with a thousand or a million moves involves a number with a thousand or a million zeros,\u201d says Gukov.\u00a0<\/p>\n<p>No AI system can sift through that many possibilities. To address this, Gukov and his colleagues developed a system that shortens the length of a path by combining multiple moves into single supermoves. It\u2019s like having boots that let you take giant strides: instead of taking 2,000 steps to walk a mile, you can now walk it in 20.<\/p>\n<p>The challenge was figuring out which moves to replace with supermoves. In a series of experiments, the researchers came up with a system in which one reinforcement-learning model suggests new moves and a second model checks to see if those moves help.<\/p>\n<p>They used this approach to make a breakthrough in a math problem called the Andrews-Curtis conjecture, a puzzle that has been unsolved for 60 years. It\u2019s a problem that every professional mathematician will know, says Gukov.<\/p>\n<\/div>\n<div>\n<p>(An aside for math stans only: The AC conjecture states that a particular way of describing a type of set called a trivial group can be translated into a different but equivalent description with a certain sequence of steps. Most mathematicians think the AC conjecture is false, but nobody knows how to prove that. Gukov admits himself that it is an intellectual curiosity rather than a practical problem, but an important problem for mathematicians nonetheless.)<\/p>\n<p>Gukov and his colleagues didn\u2019t solve the AC conjecture, but they found that a counterexample (suggesting that the conjecture is false) proposed 40 years ago was itself false. \u201cIt\u2019s been a major direction of attack for 40 years,\u201d says Gukov. With the help of AI, they showed that this direction was in fact a dead end.\u00a0\u00a0\u00a0<\/p>\n<p>\u201cRuling out possible counterexamples is a worthwhile thing,\u201d says Bridson. \u201cIt can close off blind alleys, something you might spend a year of your life exploring.\u201d\u00a0<\/p>\n<p>True, Gukov checked off just one piece of one esoteric puzzle. But he thinks the approach will work in any scenario where you need to find a long sequence of unknown moves, and he now plans to try it out on other problems.<\/p>\n<p>\u201cMaybe it will lead to something that will help AI in general,\u201d he says. \u201cBecause it\u2019s teaching reinforcement learning models to go beyond their training. To me it\u2019s basically about thinking outside of the box\u2014miles away, megaparsecs away.\u201d\u00a0\u00a0<\/p>\n<h3><strong>3\/ Can AI ever provide real insight?<\/strong><\/h3>\n<p>Thinking outside the box is exactly what mathematicians need to solve hard problems. Math is often thought to involve robotic, step-by-step procedures. But advanced math is an experimental pursuit, involving trial and error and flashes of insight.<\/p>\n<\/p><\/div>\n<div>\n<p>That\u2019s where tools like AlphaEvolve come in. Google DeepMind\u2019s latest model asks an LLM to generate code to solve a particular math problem. A second model then evaluates the proposed solutions, picks the best, and sends them back to the LLM to be improved. After hundreds of rounds of trial and error, AlphaEvolve was able to come up with solutions to a wide range of math problems that were better than anything people had yet come up with. But it can also work as a collaborative tool: at any step, humans can share their own insight with the LLM, prompting it with specific instructions.<\/p>\n<p>This kind of exploration is key to advanced mathematics. \u201cI\u2019m often looking for interesting phenomena and pushing myself in a certain direction,\u201d says Geordie Williamson, a mathematician at the University of Sydney in Australia. \u201cLike: \u2018Let me look down this little alley. Oh, I found something!\u2019\u201d<\/p>\n<\/p><\/div>\n<div>\n<p>Williamson worked with Meta on an AI tool called PatternBoost, designed to support this kind of exploration. PatternBoost can take a mathematical idea or statement and generate similar ones. \u201cIt\u2019s like: \u2018Here\u2019s a bunch of interesting things. I don\u2019t know what\u2019s going on, but can you produce more interesting things like that?\u2019\u201d he says.<\/p>\n<p>Such brainstorming is essential work in math. It\u2019s how new ideas get conjured. Take the icosahedron, says Williamson: \u201cIt\u2019s a beautiful example of this, which I kind of keep coming back to in my own work.\u201d The icosahedron is a 20-sided 3D object where all the faces are triangles (think of a 20-sided die). The icosahedron is the largest of a family of exactly five such objects: there\u2019s the tetrahedron (four sides), cube (six sides), octahedron (eight sides), and dodecahedron (12 sides).<\/p>\n<p>Remarkably, the fact that there are exactly five of these objects was proved by mathematicians in ancient Greece. \u201cAt the time that this theorem was proved, the icosahedron didn\u2019t exist,\u201d says Williamson. \u201cYou can\u2019t go to a quarry and find it\u2014someone found it in their mind. And the icosahedron goes on to have a profound effect on mathematics. It\u2019s still influencing us today in very, very profound ways.\u201d<\/p>\n<p>For Williamson, the exciting potential of tools like PatternBoost is that they might help people discover future mathematical objects like the icosahedron that go on to shape the way math is done. But we\u2019re not there yet. \u201cAI can contribute in a meaningful way to research-level problems,\u201d he says. \u201cBut we&#8217;re certainly not getting inundated with new theorems at this stage.\u201d<\/p>\n<p>Ultimately, it comes down to the fact that machines still lack what you might call intuition or creative thinking. Williamson sums it up like this: We now have AI that can beat humans when it knows the rules of the game. \u201cBut it\u2019s one thing for a computer to play Go at a superhuman level and another thing for the computer to invent the game of Go.\u201d<\/p>\n<p>\u201cI think that applies to advanced mathematics,\u201d he says. \u201cBreakthroughs come from a new way of thinking about something, which is akin to finding completely new moves in a game. And I don\u2019t really think we understand where those really brilliant moves in deep mathematics come from.\u201d<\/p>\n<p>Perhaps AI tools like AlphaEvolve and PatternBoost are best thought of as advance scouts for human intuition. They can discover new directions and point out dead ends, saving mathematicians months or years of work. But the true breakthroughs will still come from the minds of people, as has been the case for thousands of years.<\/p>\n<p>For now, at least. \u201cThere\u2019s plenty of tech companies that tell us that won\u2019t last long,\u201d says Williamson. \u201cBut you know\u2014we\u2019ll see.\u201d\u00a0 <\/p>\n<\/div>\n<\/div>\n<p><a href=\"https:\/\/www.technologyreview.com\/2025\/06\/04\/1117753\/whats-next-for-ai-and-math\/\" class=\"button purchase\" rel=\"nofollow noopener\" target=\"_blank\">Read More<\/a><br \/>\n Will Douglas Heaven<\/p>\n","protected":false},"excerpt":{"rendered":"<p>MIT Technology Review\u2019s What\u2019s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them\u00a0 here. The way DARPA tells it, math is stuck in the past. In April, the US Defense Advanced Research Projects Agency kicked off a new initiative called<\/p>\n","protected":false},"author":1,"featured_media":853477,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[46,947],"tags":[],"class_list":{"0":"post-853476","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-technology","8":"category-whats"},"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/853476","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=853476"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/853476\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/853477"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=853476"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=853476"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=853476"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}