{"id":821460,"date":"2025-01-20T23:12:05","date_gmt":"2025-01-21T05:12:05","guid":{"rendered":"https:\/\/newsycanuse.com\/index.php\/2025\/01\/20\/the-second-wave-of-ai-coding-is-here\/"},"modified":"2025-01-20T23:12:05","modified_gmt":"2025-01-21T05:12:05","slug":"the-second-wave-of-ai-coding-is-here","status":"publish","type":"post","link":"https:\/\/newsycanuse.com\/index.php\/2025\/01\/20\/the-second-wave-of-ai-coding-is-here\/","title":{"rendered":"The second wave of AI coding is here"},"content":{"rendered":"<div id=\"content--body\">\n<div>\n<p>Ask people building generative AI what generative AI is good for right now\u2014what they\u2019re really fired up about\u2014and many will tell you: coding.\u00a0<\/p>\n<p>\u201cThat\u2019s something that\u2019s been very exciting for developers,\u201d Jared Kaplan, chief scientist at Anthropic, <a href=\"https:\/\/www.technologyreview.com\/2025\/01\/11\/1109909\/anthropics-chief-scientist-on-5-ways-agents-will-be-even-better-in-2025\/\">told <em>MIT Technology Review<\/em> this month<\/a>: \u201cIt\u2019s really understanding what\u2019s wrong with code, debugging it.\u201d<\/p>\n<\/p><\/div>\n<div>\n<p>Copilot, a tool built on top of OpenAI\u2019s large language models and launched by Microsoft-backed GitHub in 2022, is <a href=\"https:\/\/www.technologyreview.com\/2023\/12\/06\/1084457\/ai-assistants-copilot-changing-code-software-development-github-openai\/\">now used by millions of developers<\/a> around the world. Millions more turn to general-purpose chatbots like Anthropic\u2019s Claude, OpenAI\u2019s ChatGPT, and Google DeepMind\u2019s Gemini for everyday help.<\/p>\n<p>\u201cToday, more than a quarter of all new code at Google is generated by AI, then reviewed and accepted by engineers,\u201d Alphabet CEO Sundar Pichai claimed on an <a href=\"https:\/\/blog.google\/inside-google\/message-ceo\/alphabet-earnings-q3-2024\/\">earnings call in October<\/a>: \u201cThis helps our engineers do more and move faster.\u201d Expect other tech companies to catch up, if they haven\u2019t already.<\/p>\n<p>It\u2019s not just the big beasts rolling out AI coding tools. A bunch of new startups have entered this buzzy market too. Newcomers such as Zencoder, Merly, Cosine, Tessl (valued at $750 million within months of being set up), and Poolside (valued at $3 billion before it even released a product) are all jostling for their slice of the pie. \u201cIt actually looks like developers are willing to pay for copilots,\u201d says Nathan Benaich, an analyst at investment firm Air Street Capital: \u201cAnd so code is one of the easiest ways to monetize AI.\u201d<\/p>\n<p>Such companies promise to take generative coding assistants to the next level. Instead of providing developers with a kind of supercharged autocomplete, like most existing tools, this next generation can prototype, test, and debug code for you. The upshot is that developers could essentially turn into managers, who may spend more time reviewing and correcting code written by a model than writing it from scratch themselves.\u00a0<\/p>\n<p>But there\u2019s more. Many of the people building generative coding assistants think that they could be a fast track to <a href=\"https:\/\/www.technologyreview.com\/2023\/11\/16\/1083498\/google-deepmind-what-is-artificial-general-intelligence-agi\/\">artificial general intelligence<\/a> (AGI), the <a href=\"https:\/\/www.technologyreview.com\/2024\/07\/10\/1094475\/what-is-artificial-intelligence-ai-definitive-guide\/\">hypothetical superhuman technology<\/a> that a number of top firms claim to have in their sights.<\/p>\n<p>\u201cThe first time we will see a massively economically valuable activity to have reached human-level capabilities will be in software development,\u201d says Eiso Kant, CEO and cofounder of Poolside. (OpenAI has already boasted that its latest o3 model beat the company\u2019s own chief scientist in a competitive coding challenge.)<\/p>\n<p>Welcome to the second wave of AI coding.\u00a0<\/p>\n<h3><strong>Correct code\u00a0<\/strong><\/h3>\n<p>Software engineers talk about two types of correctness. There\u2019s the sense in which a program\u2019s syntax (its grammar) is correct\u2014meaning all the words, numbers, and mathematical operators are in the right place. This matters a lot more than grammatical correctness in natural language. Get one tiny thing wrong in thousands of lines of code and none of it will run.<\/p>\n<\/p><\/div>\n<div>\n<p>The first generation of coding assistants are now pretty good at producing code that\u2019s correct in this sense. Trained on billions of pieces of code, they have assimilated the surface-level structures of many types of programs.\u00a0\u00a0<\/p>\n<\/div>\n<div>\n<p>But there\u2019s also the sense in which a program\u2019s function is correct: Sure, it runs, but does it actually do what you wanted it to? It\u2019s that second level of correctness that the new wave of generative coding assistants are aiming for\u2014and this is what will really change the way software is made.<\/p>\n<p>\u201cLarge language models can write code that compiles, but they may not always write the program that you wanted,\u201d says Alistair Pullen, a cofounder of Cosine. \u201cTo do that, you need to re-create the thought processes that a human coder would have gone through to get that end result.\u201d<\/p>\n<p>The problem is that the data most coding assistants have been trained on\u2014the billions of pieces of code taken from online repositories\u2014doesn\u2019t capture those thought processes. It represents a finished product, not what went into making it. \u201cThere\u2019s a lot of code out there,\u201d says Kant. \u201cBut that data doesn\u2019t represent software development.\u201d<\/p>\n<p>What Pullen, Kant, and others are finding is that to build a model that does a lot more than autocomplete\u2014one that can come up with useful programs, test them, and fix bugs\u2014you need to show it a lot more than just code. You need to show it how that code was put together.\u00a0\u00a0<\/p>\n<p>In short, companies like Cosine and Poolside are building models that don\u2019t just mimic what good code looks like\u2014whether it works well or not\u2014but mimic the process that produces such code in the first place. Get it right and the models will come up with far better code and far better bug fixes.\u00a0<\/p>\n<h3>Breadcrumbs<\/h3>\n<p>But you first need a data set that captures that process\u2014the steps that a human developer might take when writing code. Think of these steps as a breadcrumb trail that a machine could follow to produce a similar piece of code itself.<\/p>\n<p>Part of that is working out what materials to draw from: Which sections of the existing codebase are needed for a given programming task? \u201cContext is critical,\u201d says Zencoder founder Andrew Filev. \u201cThe first generation of tools did a very poor job on the context, they would basically just look at your open tabs. But your repo [code repository] might have 5000 files and they\u2019d miss most of it.\u201d<\/p>\n<\/p><\/div>\n<div>\n<p>Zencoder has hired a bunch of search engine veterans to help it build a tool that can analyze large codebases and figure out what is and isn\u2019t relevant. This detailed context reduces hallucinations and improves the quality of code that large language models can produce, says Filev: \u201cWe call it repo grokking.\u201d<\/p>\n<p>Cosine also thinks context is key. But it draws on that context to create a new kind of data set. The company has asked dozens of coders to record what they were doing as they worked through hundreds of different programming tasks. \u201cWe asked them to write down everything,\u201d says Pullen: \u201cWhy did you open that file? Why did you scroll halfway through? Why did you close it?\u201d They also asked coders to annotate finished pieces of code, marking up sections that would have required knowledge of other pieces of code or specific documentation to write.<\/p>\n<p>Cosine then takes all that information and generates a large synthetic data set that maps the typical steps coders take, and the sources of information they draw on, to finished pieces of code. They use this data set to train a model to figure out what breadcrumb trail it might need to follow to produce a particular program, and then how to follow it.\u00a0\u00a0<\/p>\n<p>Poolside, based in San Francisco, is also creating a synthetic data set that captures the process of coding, but it leans more on a technique called RLCE\u2014reinforcement learning from code execution. (Cosine uses this too, but to a lesser degree.)<\/p>\n<p>RLCE is analogous to the technique used to make chatbots like ChatGPT slick conversationalists, known as RLHF\u2014<a href=\"https:\/\/www.technologyreview.com\/2023\/03\/03\/1069311\/inside-story-oral-history-how-chatgpt-built-openai\/\">reinforcement learning from human feedback<\/a>. With RLHF, a model is trained to produce text that\u2019s more like the kind human testers say they favor. With RLCE, a model is trained to produce code that\u2019s more like the kind that does what it is supposed to do when it is run (or executed).\u00a0\u00a0<\/p>\n<h3>Gaming the system<\/h3>\n<p>Cosine and Poolside both say they are inspired by the approach DeepMind took with its <a href=\"https:\/\/www.technologyreview.com\/2022\/02\/23\/1045016\/ai-deepmind-demis-hassabis-alphafold\/\">game-playing model AlphaZero<\/a>. AlphaZero was given the steps it could take\u2014the moves in a game\u2014and then left to play against itself over and over again, figuring out via trial and error what sequence of moves were winning moves and which were not.\u00a0\u00a0<\/p>\n<p>\u201cThey let it explore moves at every possible turn, simulate as many games as you can throw compute at\u2014that led all the way to beating Lee Sedol,\u201d says Pengming Wang, a founding scientist at Poolside, referring to the Korean Go grandmaster that AlphaZero beat in 2016. Before Poolside, Wang worked at Google DeepMind on applications of AlphaZero beyond board games, including <a href=\"https:\/\/www.technologyreview.com\/2023\/12\/14\/1085318\/google-deepmind-large-language-model-solve-unsolvable-math-problem-cap-set\/\">FunSearch<\/a>, a version trained to solve advanced math problems.<\/p>\n<p>When that AlphaZero approach is applied to coding, the steps involved in producing a piece of code\u2014the breadcrumbs\u2014become the available moves in a game, and a correct program becomes winning that game. Left to play by itself, a model can improve far faster than a human could. \u201cA human coder tries and fails one failure at a time,\u201d says Kant. \u201cModels can try things 100 times at once.\u201d<\/p>\n<\/p><\/div>\n<div>\n<p>A key difference between Cosine and Poolside is that Cosine is using a custom version of GPT-4o provided by OpenAI, which makes it possible to train on a larger data set than the base model can cope with, but Poolside is building its own large language model from scratch.<\/p>\n<p>Poolside\u2019s Kant thinks that training a model on code from the start will give better results than adapting an existing model that has sucked up not only billions of pieces of code but most of the internet. \u201cI\u2019m perfectly fine with our model forgetting about butterfly anatomy,\u201d he says.\u00a0\u00a0<\/p>\n<p>Cosine claims that its generative coding assistant, called Genie, tops the leaderboard on SWE-Bench, a standard set of tests for coding models. Poolside is still building its model but claims that what it has so far already matches the performance of GitHub\u2019s Copilot.<\/p>\n<p>\u201cI personally have a very strong belief that large language models will get us all the way to being as capable as a software developer,\u201d says Kant.<\/p>\n<p>Not everyone takes that view, however.<\/p>\n<h3><strong>Illogical LLMs<\/strong><\/h3>\n<p>To Justin Gottschlich, the CEO and founder of Merly, large language models are the wrong tool for the job\u2014period. He invokes his dog: \u201cNo amount of training for my dog will ever get him to be able to code, it just won&#8217;t happen,\u201d he says. \u201cHe can do all kinds of other things, but he\u2019s just incapable of that deep level of cognition.\u201d\u00a0\u00a0<\/p>\n<p>Having worked on code generation for more than a decade, Gottschlich has a similar sticking point with large language models. Programming requires the ability to work through logical puzzles with unwavering precision. No matter how well large language models may learn to mimic what human programmers do, at their core they are still essentially <a href=\"https:\/\/www.technologyreview.com\/2024\/06\/18\/1093440\/what-causes-ai-hallucinate-chatbots\/\">statistical slot machines<\/a>, he says: \u201cI can\u2019t train an illogical system to become logical.\u201d<\/p>\n<p>Instead of training a large language model to generate code by feeding it lots of examples, Merly does not show its system human-written code at all. That\u2019s because to really build a model that can generate code, Gottschlich argues, you need to work at the level of the underlying logic that code represents, not the code itself. Merly\u2019s system is therefore trained on an intermediate representation\u2014something like the machine-readable notation that most programming languages get translated into before they are run.<\/p>\n<\/div>\n<div>\n<p>Gottschlich won\u2019t say exactly what this looks like or how the process works. But he throws out an analogy: There\u2019s this idea in mathematics that the only numbers that have to exist are prime numbers, because you can calculate all other numbers using just the primes. \u201cTake that concept and apply it to code,\u201d he says.<\/p>\n<p>Not only does this approach get straight to the logic of programming; it\u2019s also fast, because millions of lines of code are reduced to a few thousand lines of intermediate language before the system analyzes them.<\/p>\n<h3><strong>Shifting mindsets<\/strong><\/h3>\n<p>What you think of these rival approaches may depend on what you want generative coding assistants to be.\u00a0\u00a0<\/p>\n<p>In November, Cosine banned its engineers from using tools other than its own products. It is now seeing the impact of Genie on its own engineers, who often find themselves watching the tool as it comes up with code for them. \u201cYou now give the model the outcome you would like, and it goes ahead and worries about the implementation for you,\u201d says Yang Li, another Cosine cofounder.<\/p>\n<\/p><\/div>\n<div>\n<p>Pullen admits that it can be baffling, requiring a switch of mindset. \u201cWe have engineers doing multiple tasks at once, flitting between windows,\u201d he says. \u201cWhile Genie is running code in one, they might be prompting it to do something else in another.\u201d<\/p>\n<p>These tools also make it possible to protype multiple versions of a system at once. Say you\u2019re developing software that needs a payment system built in. You can get a coding assistant to simultaneously try out several different options\u2014Stripe, Mango, Checkout\u2014instead of having to code them by hand one at a time.<\/p>\n<p>Genie can be left to fix bugs around the clock. Most software teams use bug-reporting tools that let people upload descriptions of errors they have encountered. Genie can read these descriptions and come up with fixes. Then a human just needs to review them before updating the code base.<\/p>\n<p>No single human understands the trillions of lines of code in today\u2019s biggest software systems, says Li, \u201cand as more and more software gets written by other software, the amount of code will only get bigger.\u201d<\/p>\n<\/p><\/div>\n<div>\n<p>This will make coding assistants that maintain that code for us essential. \u201cThe bottleneck will become how fast humans can review the machine-generated code,\u201d says Li.<\/p>\n<p>How do Cosine\u2019s engineers feel about all this? According to Pullen, at least, just fine. \u201cIf I give you a hard problem, you\u2019re still going to think about how you want to describe that problem to the model,\u201d he says. \u201cInstead of writing the code, you have to write it in natural language. But there\u2019s still a lot of thinking that goes into that, so you\u2019re not really taking the joy of engineering away. The itch is still scratched.\u201d<\/p>\n<p>Some may adapt faster than others. Cosine likes to invite potential hires to spend a few days coding with its team. A couple of months ago it asked one such candidate to build a widget that would let employees share cool bits of software they were working on to social media.\u00a0<\/p>\n<p>The task wasn\u2019t straightforward, requiring working knowledge of multiple sections of Cosine\u2019s millions of lines of code. But the candidate got it done in a matter of hours. \u201cThis person who had never seen our code base turned up on Monday and by Tuesday afternoon he\u2019d shipped something,\u201d says Li. \u201cWe thought it would take him all week.\u201d (They hired him.)<\/p>\n<p>But there\u2019s another angle too. Many companies will use this technology to cut down on the number of programmers they hire. Li thinks we will soon see tiers of software engineers. At one end there will be elite developers with million-dollar salaries who can diagnose problems when the AI goes wrong. At the other end, smaller teams of 10 to 20 people will do a job that once required hundreds of coders. \u201cIt will be like how ATMs transformed banking,\u201d says Li.<\/p>\n<p>\u201cAnything you want to do will be determined by compute and not head count,\u201d he says. \u201cI think it\u2019s generally accepted that the era of adding another few thousand engineers to your organization is over.\u201d<\/p>\n<h3><strong>Warp drives <\/strong><\/h3>\n<p>Indeed, for Gottschlich, machines that can code better than humans are going to be essential. For him, that\u2019s the only way we will build the vast, complex software systems that he thinks we will eventually need. Like many in Silicon Valley, he anticipates a future in which humans move to other planets. That\u2019s only going to be possible if we get AI to build the software required, he says: \u201cMerly\u2019s real goal is to get us to Mars.\u201d<\/p>\n<p>Gottschlich prefers to talk about \u201cmachine programming\u201d rather than \u201ccoding assistants,\u201d because he thinks that term frames the problem the wrong way. \u201cI don\u2019t think that these systems should be assisting humans\u2014I think humans should be assisting them,\u201d he says. \u201cThey can move at the speed of AI. Why restrict their potential?\u201d<\/p>\n<\/p><\/div>\n<div>\n<p>\u201cThere\u2019s this cartoon called <em>The Flintstones<\/em> where they have these cars, but they only move when the drivers use their feet,\u201d says Gottschlich. \u201cThis is sort of how I feel most people are doing AI for software systems.\u201d<\/p>\n<p>\u201cBut what Merly\u2019s building is, essentially, spaceships,\u201d he adds. He\u2019s not joking. \u201cAnd I don\u2019t think spaceships should be powered by humans on a bicycle. Spaceships should be powered by a warp engine.\u201d<\/p>\n<p>If that sounds wild\u2014it is. But there\u2019s a serious point to be made about what the people building this technology think the end goal really is.<\/p>\n<p>Gottschlich is not an outlier with his galaxy-brained take. Despite their focus on products that developers will want to use today, most of these companies have their sights on a far bigger payoff. Visit Cosine\u2019s website and the company introduces itself as a \u201cHuman Reasoning Lab.\u201d It sees coding as just the first step toward a more general-purpose model that can mimic human problem-solving in a number of domains.<\/p>\n<p>Poolside has similar goals: The company states upfront that it is building AGI. \u201cCode is a way of formalizing reasoning,\u201d says Kant.<\/p>\n<p>Wang invokes agents. Imagine a system that can spin up its own software to do any task on the fly, he says. \u201cIf you get to a point where your agent can really solve any computational task that you want through the means of software\u2014that is a display of AGI, essentially.\u201d<\/p>\n<p>Down here on Earth, such systems may remain a pipe dream. And yet software engineering is changing faster than many at the cutting edge expected.\u00a0<\/p>\n<p>\u201cWe\u2019re not at a point where everything\u2019s just done by machines, but we\u2019re definitely stepping away from the usual role of a software engineer,\u201d says Cosine\u2019s Pullen. \u201cWe\u2019re seeing the sparks of that new workflow\u2014what it means to be a software engineer going into the future.\u201d <\/p>\n<\/div>\n<\/div>\n<p><a href=\"https:\/\/www.technologyreview.com\/2025\/01\/20\/1110180\/the-second-wave-of-ai-coding-is-here\/\" class=\"button purchase\" rel=\"nofollow noopener\" target=\"_blank\">Read More<\/a><br \/>\n Will Douglas Heaven<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ask people building generative AI what generative AI is good for right now\u2014what they\u2019re really fired up about\u2014and many will tell you: coding.\u00a0 \u201cThat\u2019s something that\u2019s been very exciting for developers,\u201d Jared Kaplan, chief scientist at Anthropic, told MIT Technology Review this month: \u201cIt\u2019s really understanding what\u2019s wrong with code, debugging it.\u201d Copilot, a tool<\/p>\n","protected":false},"author":1,"featured_media":821461,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2531,2472,46],"tags":[],"class_list":["post-821460","post","type-post","status-publish","format-standard","has-post-thumbnail","category-coding","category-second","category-technology"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/821460","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=821460"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/821460\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/821461"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=821460"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=821460"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=821460"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}