{"id":615565,"date":"2023-03-08T08:49:30","date_gmt":"2023-03-08T14:49:30","guid":{"rendered":"https:\/\/news.sellorbuyhomefast.com\/index.php\/2023\/03\/08\/chatgpt-explained-a-normies-guide-to-how-it-works\/"},"modified":"2023-03-08T08:49:30","modified_gmt":"2023-03-08T14:49:30","slug":"chatgpt-explained-a-normies-guide-to-how-it-works","status":"publish","type":"post","link":"https:\/\/newsycanuse.com\/index.php\/2023\/03\/08\/chatgpt-explained-a-normies-guide-to-how-it-works\/","title":{"rendered":"ChatGPT Explained: A normie&#x27;s guide to how it works"},"content":{"rendered":"<div>\n<div dir=\"auto\" class>\n<div>\n<figure><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6866ca04-5223-46df-82dc-7ac024bc74ab_1664x1152.jpeg\" rel=\"noopener\"><\/p>\n<div><picture><source type=\"image\/webp\"  ><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/substackcdn.com\/image\/fetch\/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6866ca04-5223-46df-82dc-7ac024bc74ab_1664x1152.jpeg\" width=\"1456\" height=\"1008\" data-attrs=\"{\"src\":\"https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/6866ca04-5223-46df-82dc-7ac024bc74ab_1664x1152.jpeg\",\"fullscreen\":null,\"imageSize\":null,\"height\":1008,\"width\":1456,\"resizeWidth\":null,\"bytes\":79120,\"alt\":null,\"title\":null,\"type\":\"image\/jpeg\",\"href\":null,\"belowTheFold\":false,\"internalRedirect\":null}\" alt  ><\/picture><\/div>\n<p><\/a><\/figure>\n<\/div>\n<p><em><strong>The story so far:<\/strong><span>\u00a0Most of the discussion of ChatGPT I\u2019m seeing from even very smart, tech-savvy people is just not good. In articles and podcasts, people are talking about this chatbot in unhelpful ways. And by \u201cunhelpful ways,\u201d I don\u2019t just mean that they\u2019re anthropomorphizing (though they are doing that). Rather, what I mean is that they\u2019re not working with a practical, productive understanding of what the bot\u2019s main parts are and how they fit together.<\/span><\/em><\/p>\n<p><em><span>To put it another way, there are some <\/span><a href=\"https:\/\/www.jonstokes.com\/p\/the-can-opener-problem\" rel>can-opener problems<\/a><span> manifesting in the ChatGPT conversation, and lowering the quality of The Discourse.<\/span><\/em><\/p>\n<p><em>To be clear, I do not know everything I\u2019d like to know about this topic. Like everyone else, including active researchers in machine learning, I\u2019m still on my own journey with getting my head around it at multiple levels. (Speaking of, if you run a shop that sells dedicated ML workstations and would like to publicly sponsor this newsletter by sending me one, do get in touch.)<\/em><span> <\/span><\/p>\n<p><em>That said, I\u2019m certainly far enough along that I can help others who are a few steps behind. So what follows is my effort to help others get closer to the target in their thinking and writing about this new category of technology.<\/em><\/p>\n<p><span>At the heart of ChatGPT is a large language model (LLM) that belongs to the family of generative machine learning models. This family also includes <\/span><a href=\"https:\/\/www.jonstokes.com\/p\/getting-started-with-stable-diffusion\" rel>Stable Diffusion<\/a><span> and all the other prompt-driven, text-to-whatever models that are daily doing fresh miracles on everyone\u2019s feeds right now.<\/span><\/p>\n<p>If you want to know more about how generative ML models work and you have some time, read the following pieces:<\/p>\n<ul>\n<li>\n<p><span>AI content generation series: <\/span><a href=\"https:\/\/www.jonstokes.com\/p\/ai-content-generation-part-1-machine\" rel>Part 1<\/a><span>, <\/span><a href=\"https:\/\/www.jonstokes.com\/p\/ai-content-generation-part-2-tasks\" rel>Part 2<\/a><span>, <\/span><a href=\"https:\/\/www.jonstokes.com\/p\/ai-content-generation-part-4-whats\" rel>Part 4<\/a><\/p>\n<\/li>\n<li>\n<p><a href=\"https:\/\/www.jonstokes.com\/p\/what-does-it-mean-to-create-something\" rel>What Does It Mean to \u201cCreate\u201d Something With Generative AI?<\/a><\/p>\n<\/li>\n<li>\n<p><a href=\"https:\/\/www.jonstokes.com\/p\/the-hall-monitors-are-winning-the\" rel>The Hall Monitors Are Winning the AI Wars, Part 1: ChatGPT<\/a><\/p>\n<\/li>\n<\/ul>\n<p><span>\u2b50\ufe0f But if you don\u2019t have time to read all that, here\u2019s a\u00a0<\/span><strong>one-sentence explanation of a generative AI model:<\/strong><span>\u00a0<\/span><em>a generative model is a function that can take a structured collection of symbols as input and produce a related structured collection of symbols as output<\/em><span>.<\/span><\/p>\n<div>\n<figure><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf9a88e-e597-4868-b49c-f4fe6a08bebe_1000x543.jpeg\" rel=\"noopener\"><\/p>\n<div><picture><source type=\"image\/webp\"  ><img decoding=\"async\" src=\"https:\/\/substackcdn.com\/image\/fetch\/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcf9a88e-e597-4868-b49c-f4fe6a08bebe_1000x543.jpeg\" width=\"1000\" height=\"543\" data-attrs=\"{\"src\":\"https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/bcf9a88e-e597-4868-b49c-f4fe6a08bebe_1000x543.jpeg\",\"fullscreen\":null,\"imageSize\":null,\"height\":543,\"width\":1000,\"resizeWidth\":null,\"bytes\":197355,\"alt\":null,\"title\":null,\"type\":\"image\/jpeg\",\"href\":null,\"belowTheFold\":true,\"internalRedirect\":null}\" alt   loading=\"lazy\"><\/picture><\/div>\n<p><\/a><\/figure>\n<\/div>\n<p>Here are some examples of structured collections of symbols:<\/p>\n<ul>\n<li>\n<p>Letters in a word<\/p>\n<\/li>\n<li>\n<p>Words in a sentence<\/p>\n<\/li>\n<li>\n<p>Pixels in an image<\/p>\n<\/li>\n<li>\n<p>Frames in a video<\/p>\n<\/li>\n<\/ul>\n<p><span>Now, there are plenty of ways to turn one collection of symbols into another, related collection of symbols \u2014 you don\u2019t even need computers to do this. Or, you can write a computer program that uses rules and lookup tables, like a certain\u00a0<\/span><a href=\"https:\/\/return.life\/2023\/02\/please-stop-talking-about-the-eliza-chatbot\/\" rel>famous chatbot from the 60s<\/a><span>\u00a0that I\u2019m sick of hearing about.<\/span><\/p>\n<p><span>????\u200d\u2642\ufe0f The hard and expensive part of the above one-sentence explanation \u2014 the part that we\u2019ve only recently hit on how to do using massive amounts of electricity and leading-edge computer chips \u2014 is hidden deep inside the word <\/span><em>related<\/em><span>.<\/span><\/p>\n<p>Before we get into relationships, you need two concepts that will come up again and again throughout this piece:<\/p>\n<ul>\n<li>\n<p><strong>\u2699\ufe0f Deterministic: <\/strong><span>A deterministic process is a process that, given a particular input, will always produce the same output.<\/span><\/p>\n<\/li>\n<li>\n<p><strong>???? Stochastic: <\/strong><span>A stochastic process is a process that, given a particular input, is more likely to produce some outputs and less likely to produce others.<\/span><\/p>\n<\/li>\n<\/ul>\n<p>A classic gumball machine is deterministic \u2014 if I insert a quarter and turn the crank, I get a single gumball every time. So one quarter == one gumball, always.<\/p>\n<p><span>But a classic gumball machine is\u00a0<\/span><em>also<\/em><span>\u00a0stochastic \u2014 if I put in a quarter and turn the crank, the color of the gumball I\u2019ll get is fundamentally random. Furthermore, the odds of getting one color or another depends on the ratios of different colors inside the machine. Five different gumball machines with five different gumball color ratios will have five different probability distributions for gumball color output.<\/span><\/p>\n<p>Now, with these key concepts out of the way, back to the reason why relationships can be hard.<\/p>\n<p>Collections of symbols can be related in different ways, and the more abstract and subtle the relationship, the more technology we\u2019ll need to throw at the problem of capturing that relationship in a way that\u2019s useful for knowledge work.<\/p>\n<ol>\n<li>\n<p><span>If I\u2019m relating the collections\u00a0<\/span><code>{cat}<\/code><span>\u00a0and\u00a0<\/span><code>{at-cay}<\/code><span>, that\u2019s a standard \u201c<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Pig_Latin\" rel>Pig Latin<\/a><span>\u201d transformation I can manage with a simple, handwritten rule set.<\/span><\/p>\n<\/li>\n<li>\n<p><span>If I\u2019m relating\u00a0<\/span><code>{cat}<\/code><span>\u00a0to\u00a0<\/span><code>{dog}<\/code><span>, then there are many levels of abstraction on which those two collections can relate. <\/span><\/p>\n<ol>\n<li>\n<p>As ordered symbol collections (sequences), they both have three symbols. <\/p>\n<\/li>\n<li>\n<p>As three-symbol sequences, they\u2019re both words. <\/p>\n<\/li>\n<li>\n<p>As words, they both refer to biological organisms. <\/p>\n<\/li>\n<li>\n<p>As organisms, they\u2019re both mammals. <\/p>\n<\/li>\n<li>\n<p>As mammals, they\u2019re both house pets. And so on.<\/p>\n<\/li>\n<\/ol>\n<\/li>\n<li>\n<p><span>If I\u2019m relating\u00a0<\/span><code>{the cat is alive}<\/code><span>\u00a0to\u00a0<\/span><code>{the cat is dead}<\/code><span>, then there\u2019s an even larger number of even more higher-order concepts that we can use to compare and contrast those two sequences of symbols. <\/span><\/p>\n<ol>\n<li>\n<p><span>All of the concepts related to\u00a0<\/span><code>cat<\/code><span>\u00a0are in play, as are concepts related to\u00a0\u201calive\u201d\u00a0vs.\u00a0\u201cdead.\u201d <\/span><\/p>\n<\/li>\n<li>\n<p><span>On yet another level, many readers will spot what we might call an <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Intertextuality\" rel>intertextual<\/a><span> reference to <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Schr%C3%B6dinger%27s_cat\" rel>Schr\u00f6dinger\u2019s Cat<\/a><span>.<\/span><\/p>\n<\/li>\n<\/ol>\n<\/li>\n<li>\n<p><span>Let\u2019s add one more relationship, just for fun:\u00a0<\/span><code>{the cat is immature}<\/code><span>\u00a0vs.\u00a0<\/span><code>{the cat is mature}<\/code><span>. But are we talking about a stage of physical development or a state of emotional development? <\/span><\/p>\n<ol>\n<li>\n<p>Since it\u2019s a cat, \u201cimmature\u201d is probably a straightforward synonym for \u201cyoung\u201d or \u201cjuvenile.\u201d <\/p>\n<\/li>\n<li>\n<p>If the subject of the sentence were a human, the sentence would more likely be invoking a cluster of concepts around age-appropriate behavior.\u00a0<\/p>\n<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p>???? As you read through the list of items above, you can imagine that as you go up in number from one to four there\u2019s an explosion of possible relationships between the symbols. And as the number of possible relationships increases, the qualities of the relationships themselves increase in abstraction, complexity, and subtlety.<\/p>\n<p>The different relationships above take different classes of symbol storage and retrieval \u2014 from paper notebooks up to datacenters \u2014 to capture and encode in a useful way.\u00a0<\/p>\n<p>That first, simplistic Pig Latin relationship can be described on a single sheet of paper, such that anyone with that paper can transform any word from English into Pig Latin. But by the time we get to example four, a strange thing has happened, and that strange thing is why machine learning requires tens of millions of dollars worth of resources:\u00a0<\/p>\n<ol>\n<li>\n<p><span>We\u2019ve uncovered a small\u00a0<\/span><strong>universe of possible relationships<\/strong><span>\u00a0between those collections. There\u2019s just a bewildering, densely connected network of concepts, all the way up the abstraction ladder from simple physical characteristics, to biological taxonomies, to subtle notions of physical and emotional development.<\/span><\/p>\n<\/li>\n<li>\n<p><span>Some of the more abstract possible relationships are more likely to be in play than others. So an element of\u00a0<\/span><strong>probability has entered the picture<\/strong><span>. <\/span><\/p>\n<ol>\n<li>\n<p><span>As I said in the example, given that we\u2019re talking about a cat, it\u2019s\u00a0<\/span><em>more likely<\/em><span>\u00a0that the mature\/immature dichotomy is related to a cluster of concepts around physical development and\u00a0<\/span><em>less likely<\/em><span>\u00a0that it\u2019s related to a cluster of concepts around emotional or intellectual development.<\/span><\/p>\n<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p>Regarding the probabilities in #2 above, \u201cless likely\u201d does not mean impossible, especially if we open the door to a bit of extra context. What if we added a few additional words, and the collection pair was:\u00a0<\/p>\n<ul>\n<li>\n<p><code>{Regarding the cat in the hat: the cat is mature.}<\/code><\/p>\n<\/li>\n<li>\n<p><code>{Regarding the cat in the hat: the cat is immature.}<\/code><\/p>\n<\/li>\n<\/ul>\n<p>???? Suddenly, all the probabilities have shifted and we\u2019re in a whole other realm of likely meaning with respect to maturity and immaturity.<\/p>\n<p><strong>Summary:<\/strong><\/p>\n<ul>\n<li>\n<p><span>When the relationships among collections of symbols are\u00a0<\/span><strong>simple and deterministic<\/strong><span>, you don\u2019t need much storage or computing power to relate one collection to the other.<\/span><\/p>\n<\/li>\n<li>\n<p><span>When the relationships among collections of symbols\u00a0are <\/span><strong>complex and stochastic<\/strong><span>, then throwing more storage and computing power at the problem of relating one collection to another enables you to relate those collections in ever richer and more complex ways.<\/span><\/p>\n<\/li>\n<\/ul>\n<p>If you managed to get through high school chemistry, then you have a set of concepts that are useful for thinking about generative AI: atomic orbitals.<\/p>\n<p><span>???? And if you didn\u2019t get through basic chemistry, or you don\u2019t remember any of it, then don\u2019t worry. <\/span><strong>This section is mostly about the pictures<\/strong><span>, so look at the pictures and scan for the boldfaced words, then move on.<\/span><\/p>\n<p>We all learn in basic chemistry that orbitals are just regions of space where an electron is likely to be at any moment. Electrons at different energy levels have orbitals that are shaped differently, which means we\u2019re likely to find them in different regions.\u00a0<\/p>\n<p>Take a look at hydrogen:<\/p>\n<div>\n<figure><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d67d9f-39bc-4963-a26d-fe3178d6105c_1100x1000.jpeg\" rel=\"noopener\"><\/p>\n<div><picture><source type=\"image\/webp\"  ><img decoding=\"async\" src=\"https:\/\/substackcdn.com\/image\/fetch\/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d67d9f-39bc-4963-a26d-fe3178d6105c_1100x1000.jpeg\" width=\"1100\" height=\"1000\" data-attrs=\"{\"src\":\"https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/b2d67d9f-39bc-4963-a26d-fe3178d6105c_1100x1000.jpeg\",\"fullscreen\":null,\"imageSize\":null,\"height\":1000,\"width\":1100,\"resizeWidth\":null,\"bytes\":114424,\"alt\":null,\"title\":null,\"type\":\"image\/jpeg\",\"href\":null,\"belowTheFold\":true,\"internalRedirect\":null}\" alt   loading=\"lazy\"><\/picture><\/div>\n<p><\/a><\/figure>\n<\/div>\n<p>Let\u2019s zoom in on one of the orbitals:<\/p>\n<div>\n<figure><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67053e39-216f-4585-a5a4-f8a10dfa9df4_600x600.jpeg\" rel=\"noopener\"><\/p>\n<div><picture><source type=\"image\/webp\"  ><img decoding=\"async\" src=\"https:\/\/substackcdn.com\/image\/fetch\/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67053e39-216f-4585-a5a4-f8a10dfa9df4_600x600.jpeg\" width=\"600\" height=\"600\" data-attrs=\"{\"src\":\"https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/67053e39-216f-4585-a5a4-f8a10dfa9df4_600x600.jpeg\",\"fullscreen\":null,\"imageSize\":null,\"height\":600,\"width\":600,\"resizeWidth\":null,\"bytes\":23461,\"alt\":null,\"title\":null,\"type\":\"image\/jpeg\",\"href\":null,\"belowTheFold\":true,\"internalRedirect\":null}\" alt   loading=\"lazy\"><\/picture><\/div>\n<p><\/a><\/figure>\n<\/div>\n<p>For the orbital above, the brighter the region the higher the odds you\u2019d bump into an electron there if you were to poke at the atom with something even tinier than an electron. For the black regions in the picture, that doesn\u2019t mean the probability of finding an electron is zero \u2014 it just means it\u2019s so low as to practically be zero.<\/p>\n<p><span>Those orbitals are <\/span><strong>probability distributions<\/strong><span> and they have a certain shape to them \u2014 the one above has <\/span><strong>four<\/strong><span>\u00a0<\/span><strong>lobes<\/strong><span>, so that if you observe a point in one of those four areas you\u2019re more likely to spot an electron than if you observe one of the black regions. The other orbitals have different lobes that are arranged in different ways.<\/span><\/p>\n<p>???? Ok, you made it! That\u2019s all the quantum chemistry you need, and in fact, it\u2019s all the background you need at the moment. We can now talk about the thing everything is trying to understand: ChatGPT.<\/p>\n<p>You can imagine that for a model like ChatGPT, each possible blob of text the model could generate \u2014 everything from a few words of gibberish to a full page of coherent English \u2014 is a single point in a probability distribution, like electron positions in the atomic orbitals of the previous section\u2019s hydrogen atom.\u00a0<\/p>\n<p><span>When you feed ChatGPT\u2019s input box a collection of words \u2014 e.g.\u00a0<\/span><code>{Tell me about the state of a cat in a box with a flask of poison and a bit of radioactive material}<\/code><span>\u00a0\u2014 you can think of that action of clicking the \u201cSubmit\u201d button as making an observation that collapses a wave function and results in an observation of a single collection of symbols (out of many possible collections).<\/span><\/p>\n<p><span>???? <\/span><strong>Note for more informed readers:<\/strong><span> Some of you who read the above will be aware that a text-to-text LLM is actually locating single words in the probability space and successively stringing them together to build up sentences. The distinction between \u201clatent space is the multidimensional space of all likely words the model might output\u201d vs. \u201clatent space is the multidimensional space of all likely word sequences the model might output\u201d is pretty academic at this level of abstraction, though. So for the purpose of downloading better intuitions into readers and minimizing complexity, I\u2019m going with the second option.<\/span><\/p>\n<p><span>???? Sometimes, your text prompt input will take you to a point in the probability distribution that corresponds to a collection like,\u00a0<\/span><code>{The cat is alive}<\/code><span>, and at other times you\u2019ll end up at a point that corresponds to\u00a0<\/span><code>{The cat is dead}<\/code><span>.<\/span><\/p>\n<p><span>Note that it is\u00a0<\/span><em>possible<\/em><span>, though not at all likely, that the aforementioned input symbols could take you to the point in the model\u2019s\u00a0<\/span><strong>latent space<\/strong><span>\u00a0corresponding to the following collection:\u00a0<\/span><code>{ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn}<\/code><span>. It all depends on the shape of the probability distributions you\u2019re poking at with your text inputs and on the dice that the universe \u2014 or, rather, the computer\u2019s random number generator \u2014 is rolling.<\/span><\/p>\n<p><strong>???? Important implication:<\/strong><span> It\u2019s common but not necessarily always useful to say the language model \u201cknows\u201d the state of the cat in this example \u2014 i.e., whether it\u2019s alive or dead. It\u2019s suboptimal to lean too hard on the idea that the model has a kind of internal, true\/false orientation to the cat and different claims about its circumstances.\u00a0<\/span><\/p>\n<p>Instead, it\u2019s better to think of it like this: <\/p>\n<p><span>\u27a1 In the space of all the possible collections of symbols the model could produce \u2014 from collections that are snippets of gibberish to collections that are pages of Shakespeare \u2014 <\/span><em>there are regions in the model\u2019s probability distributions that contain collections of symbols we humans interpret to mean that the cat is alive<\/em><span>. And there are also adjacent regions in that probability space containing collections of symbols we interpret to mean the cat is dead.<\/span><\/p>\n<p><span>???? Here are some cat-related symbol collections we might encounter in ChatGPT\u2019s <\/span><strong>latent space<\/strong><span> \u2014 i.e., the space of possible outputs, which has been deliberately sculpted into a particular shape by an expensive training process:<\/span><\/p>\n<ul>\n<li>\n<p><code>{The cat roused herself from slumber and blinked her eyes.}<\/code><\/p>\n<\/li>\n<li>\n<p><code>{The soft breathing of the sleeping cat greeted Schr\u00f6dinger as he opened the box.}<\/code><\/p>\n<\/li>\n<li>\n<p><code>{Ja mon, de cat him dead.}<\/code><\/p>\n<\/li>\n<li>\n<p><code>{\u201cI\u2019ve killed my favorite cat!\u201d screamed Schr\u00f6dinger as he pulled his pet\u2019s lifeless corpse from the box.}<\/code><\/p>\n<\/li>\n<li>\n<p><code>{Patches watched the scene from above, his astral cat form floating near the ceiling as his master lifted his lifeless body from the box and wept.}<\/code><\/p>\n<\/li>\n<\/ul>\n<p>When you poke at the model with different input collections, you\u2019re more likely to encounter some of these output collections than others, but they\u2019re all at least theoretically possible to encounter.<\/p>\n<p><span>???? So when you and I are both interacting with ChatGPT around a topic that\u2019s related to some set of facts in the world \u2014 e.g., the status of the Bengal tiger, and if it\u2019s endangered or not \u2014 don\u2019t imagine that we\u2019re both asking some <\/span><strong>unitary entity with a personal history<\/strong><span> and set of experiences to communicate to us some of the facts about Bengal tigers that it has tucked away in its memory, where if it tells you one thing and me the opposite then it\u2019s somehow lying to one of us.<\/span><\/p>\n<p><span>???? Rather, think of our activity like this: I am using my text prompt to <\/span><strong>observe a single point in a probability distribution<\/strong><span> that corresponds (at least on my reading of the model\u2019s output symbols) to a cluster of facts and concepts around Bengal tigers, and you are doing the same thing. When both of us are getting different sequences of words that seem (again, to us as humans who are interpreting this sequence) to represent a divergent fact pattern \u2014 e.g., it tells me the tiger is endangered, and it tells you the tiger is so prevalent that it\u2019s a nuisance animal \u2014 it\u2019s because we\u2019re poking at different lobes of a probability distribution and finding different points in those divergent lobes.\u00a0<\/span><\/p>\n<ul>\n<li>\n<p><span>The lobe in the probability distribution that I\u2019m poking at encompasses word sequences that generally correspond to what I, a human English speaker, interpret as claims about the <\/span><strong>endangered status<\/strong><span> of Bengal tigers.\u00a0<\/span><\/p>\n<\/li>\n<li>\n<p><span>The lobe you\u2019re poking at encompasses word sequences that generally correspond to what you interpret as claims about the <\/span><strong>superabundance<\/strong><span> of Bengal tigers.<\/span><\/p>\n<\/li>\n<\/ul>\n<p>So how do we fix this? <\/p>\n<p>Given that the Bengal tiger is, in fact, endangered (or, at least this is what I take to be the state of the world based on the words that Google\u2019s ML-powered machines are telling right now when I type in a search query), how do we eliminate or at least shrink the probability distribution lobe that you\u2019re poking at \u2014 the one that covers text blobs about Bengal tigers overrunning the cities, digging in people\u2019s garbage, and other nonsense?<\/p>\n<p>\u270b Not only is it not clear we can, but it\u2019s also not really clear we\u2019d always want to. Let me explain.<\/p>\n<p>When an LLM spits out a word combination that an observer takes to misrepresent the state of reality, we say that the model is \u201challucinating.\u201d<\/p>\n<p><strong>????\u200d???? Aside:<\/strong><span>\u00a0I gotta say, this \u201challucination\u201d term is pretty loaded. His hallucination is her\u00a0<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Mythopoeia\" rel>mythopoesis<\/a><span>\u00a0is my prophecy of things which must shortly come to pass (he who has ears to hear, let him hear). Wars have been fought over this stuff, and may be yet again if a famous collection of alleged hallucinations is interpreted as \u201ctrue\u201d in some way\u2026 But let\u2019s not get ahead of ourselves by veering into hermeneutics just yet. We\u2019ll pick up this thread again in a moment.<\/span><\/p>\n<p><span>Right now in early 2023, we have a set of tools that allow us to <\/span><strong>shape an LLM\u2019s probability distributions<\/strong><span>, so that some regions get smaller or less dense and some regions get larger or denser:<\/span><\/p>\n<ol>\n<li>\n<p>Training<\/p>\n<\/li>\n<li>\n<p>Fine-tuning<\/p>\n<\/li>\n<li>\n<p>Reinforcement learning with human feedback (RLHF)<\/p>\n<\/li>\n<\/ol>\n<p><span>????\ufe0f We can\u00a0<\/span><strong>train<\/strong><span>\u00a0a foundation model on high-quality data \u2014 on collections of symbols that we, as human observers, take to be meaningful and to correspond to true facts about the world. In this way, the trained model is like an atom where the orbitals are shaped in a way we find to be useful.<\/span><\/p>\n<p><span>???? Then after poking at that model for a bit and discovering regions in the space of likely outputs that we observers would prefer to stop encountering in the course of our observations, we\u00a0<\/span><strong>fine-tune<\/strong><span>\u00a0it with more focused, carefully curated training data. This fine-tuning shrinks some lobes and expands others, and hopefully, after poking some more at the results of this lobe-shaping process we\u2019re happier with the outputs we\u2019re seeing when we collapse the wave function again and again.<\/span><\/p>\n<p><span>????\u200d???? Finally, we can bring in some humans in the\u00a0<\/span><strong>RLHF<\/strong><span>\u00a0phase to help us really to dial in the shape of the model\u2019s probability space so that it covers as tightly as possible only the points in the space of all possible outputs that correspond to \u201ctrue facts\u201d (whatever those are!) about things in the world, and so that it does not cover any points that correspond to \u201cfake news\u201d (whatever that is!) about things in the world. If we\u2019ve done our job right, then all the observations we make of the model\u2019s probability space will seem true to us.<\/span><\/p>\n<p><span>????\u200d\u2642\ufe0f I\u2019m personally left with <\/span><strong>two primary questions<\/strong><span> about everything in this section:<\/span><\/p>\n<ol>\n<li>\n<p><strong>Who is this \u201cobserver\u201d<\/strong><span> we keep referencing \u2014 this human who\u2019s interpreting the model\u2019s output and giving it a ???? or ????? Are they in-group or out-group? Are they smart or stupid? Are they really interested in the truth, or are they just trying to sculpt this model\u2019s probability distributions to serve their own ends?<\/span><\/p>\n<\/li>\n<li>\n<p><span>In cases where I\u2019m lucky enough to be the observer who decides whether the model\u2019s output is true or not, <\/span><strong>do I really always want the truth?<\/strong><span> If I ask the model for a comforting bedtime story about a unicorn, do I really want it to go all Neil deGrasse Tyson on me and open with \u201cwell\u00a0<\/span><em>actually<\/em><span>, Jon\u2026\u201d? Isn\u2019t the model\u2019s ability to make things up often a feature, not a bug?<\/span><\/p>\n<\/li>\n<\/ol>\n<p><span>These questions are central to the issue of \u201challucination,\u201d but they are not technical questions. They are questions of philosophy of language, of hermeneutics, of politics, and, most of all, of <\/span><strong>power<\/strong><span>.<\/span><\/p>\n<p><strong>This really is the payoff<\/strong><span>\u00a0to all this article\u2019s talk of probability distributions, atomic orbitals, and alive\/dead cats:\u00a0<\/span><em>When I, a human being with a history and an identity and a set of goals and interests, observe a single point in the model\u2019s probability space, it\u2019s always up to me to interpret that point as language and to imbue it with meaning.<\/em><\/p>\n<p><span>Indeed, LLMs may be humanity\u2019s first practical use for <\/span><a href=\"https:\/\/www.poetryfoundation.org\/learn\/glossary-terms\/reader-response-theory\" rel>reader response theory<\/a><span>, apart from academic self-promotion. Authorial intent doesn\u2019t matter because there literally is no author and no intent \u2014 only a text object that a machine assembles from a vast, multidimensional probability distribution.<\/span><\/p>\n<p><span>The ChatGPT experience of back-and-forth interaction is powerful, but it\u2019s really a kind of <\/span><a href=\"https:\/\/uxplanet.org\/a-look-at-skeuomorphic-ui-design-32f50016a50a\" rel>skeuomorphic UI affordance<\/a><span>, like when a calculator app looks like a little physical calculator. <\/span><\/p>\n<p><span>????\ufe0f It certainly <\/span><em>feels<\/em><span> like the model itself is learning about you just like you\u2019re learning about it \u2014 like as the conversation develops you\u2019re each updating your own internal understanding of the other. But this is mostly not what\u2019s happening, and to the extent that it <\/span><em>is<\/em><span> happening, it\u2019s happening in a very limited way.<\/span><\/p>\n<p><span>There\u2019s a final concept you need to grasp in order to thoroughly untangle yourself from the idea that ChatGPT is truly talking to <\/span><em>you<\/em><span>, vs. just shouting language-like symbol collections into the void: <\/span><strong>the token window<\/strong><span>.<\/span><\/p>\n<p><span>With a normal language model like what I\u2019ve described so far in this article, you give the model a set of inputs, it does its probability things, and it gives you an output. The inputs are entered into the model\u2019s <\/span><strong>token window<\/strong><span>.<\/span><\/p>\n<ul>\n<li>\n<p><strong>Tokens<\/strong><span>\u00a0are the ML term for the \u201csymbols\u201d I\u2019ve been talking about in this whole article. I wanted to reduce the specialist lingo so I said \u201csymbols,\u201d but really the models take in tokens and spit out tokens.<\/span><\/p>\n<\/li>\n<li>\n<p><span>The\u00a0<\/span><strong>window<\/strong><span>\u00a0is the number of tokens a model can ingest and turn into a sequence of related output tokens.<\/span><\/p>\n<\/li>\n<\/ul>\n<p>So when I use a pre-ChatGPT LLM, I put tokens into a token window, then the model returns a point in its latent space that\u2019s located near the tokens I put in.<\/p>\n<p>Let\u2019s return to the diagram at the start of the article and look at it again with our new knowledge of token windows and probability distributions. That weird-looking thing in the middle is supposed to be the model:<\/p>\n<div>\n<figure><a target=\"_blank\" href=\"https:\/\/substackcdn.com\/image\/fetch\/f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dcee14-7cdb-4f5d-a670-b353cb068d12_1000x543.jpeg\" rel=\"noopener\"><\/p>\n<div><picture><source type=\"image\/webp\"  ><img decoding=\"async\" src=\"https:\/\/substackcdn.com\/image\/fetch\/w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep\/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68dcee14-7cdb-4f5d-a670-b353cb068d12_1000x543.jpeg\" width=\"1000\" height=\"543\" data-attrs=\"{\"src\":\"https:\/\/substack-post-media.s3.amazonaws.com\/public\/images\/68dcee14-7cdb-4f5d-a670-b353cb068d12_1000x543.jpeg\",\"fullscreen\":null,\"imageSize\":null,\"height\":543,\"width\":1000,\"resizeWidth\":null,\"bytes\":197355,\"alt\":null,\"title\":null,\"type\":\"image\/jpeg\",\"href\":null,\"belowTheFold\":true,\"internalRedirect\":null}\" alt   loading=\"lazy\"><\/picture><\/div>\n<p><\/a><\/figure>\n<\/div>\n<p><strong>???? This point is key:<\/strong><span>\u00a0the probability space\u00a0<\/span><em>does not change shape<\/em><span>\u00a0in response to the tokens I put into the token window. The model\u2019s weights remain static as I interact with it from one window to the other. It doesn\u2019t remember the contents of previous token windows.<\/span><\/p>\n<p>Think of it this way: Every time I put tokens into the model\u2019s token window and hit \u201cSubmit,\u201d it\u2019s like I\u2019m walking up to it on the street and speaking to it for the very first time. It has no place to store any mutually evolving, shared history with me.<\/p>\n<p><strong>Another analogy<\/strong><span> that\u2019s not perfect, but it\u2019s pretty close: Every time I feed a specific prompt and a seed into Stable Diffusion, I get back the same image. Each prompt and seed combination will take me to a single point in latent space where a particular image is located, and if I keep giving it the same input tokens I\u2019ll keep getting the same output tokens. I\u2019m not building up any shared history with the model and Stable Diffusion isn\u2019t \u201clearning\u201d anything about me as I use it.<\/span><\/p>\n<p><span>(Language models like GPT-3 <\/span><strong>purposefully inject randomness<\/strong><span> in places where Stable Diffusion does not, in order to vary the output and make it seem more natural. That\u2019s why you get different responses to the same prompt from a language model \u2014 it\u2019s a little bit like using Stable Diffusion but the software is forcing you to vary the seed number on each submission. It\u2019s not\u00a0<\/span><em>exactly<\/em><span>\u00a0like that, but it\u2019s close enough.)<\/span><\/p>\n<p><span>???? The end result is that the only brand new, post-training information that the model can obtain about the world is <\/span><strong>limited to whatever I put into a token window<\/strong><span>. If I type a word sequence containing facts about my marital status into the token window, then the model takes those sequences and returns a point in latent space that\u2019s adjacent to the input sequence \u2014 a point that probably represents facts that seem related to the facts I gave it in the token window.<\/span><\/p>\n<p><span>ChatGPT is an LLM with a <\/span><strong>single, large token window<\/strong><span>, but it uses its token window in such a way as to make the experience of using the model feel more interactive, like a real conversation.<\/span><\/p>\n<p><strong>Warning<\/strong><span>: I haven\u2019t read an explicit explanation of how it works, so what\u2019s in this section is my informed speculation based on how it kinda\u00a0<\/span><em>has<\/em><span>\u00a0to work. There really aren\u2019t any other options given how LLMs function, but as always if I\u2019m wrong I\u2019m happy to be corrected.<\/span><\/p>\n<p>ChatGPT\u2019s clever UI trick is easier to show than to describe, so let\u2019s imagine that I open up my browser and start a fresh ChatGPT session that goes something like this:<\/p>\n<ul>\n<li>\n<p>Me: What\u2019s up, man.<\/p>\n<\/li>\n<li>\n<p>ChatGPT: Not much. What\u2019s up with you?<\/p>\n<\/li>\n<li>\n<p>Me: Eh, my cat just died in a lab accident.<\/p>\n<\/li>\n<li>\n<p>ChatGPT: I\u2019m very sorry to hear that.<\/p>\n<\/li>\n<li>\n<p>Me: Yeah, it sucks. I\u2019ve already picked out a dog at the shelter, though. I pick him up this afternoon.\u00a0<\/p>\n<\/li>\n<li>\n<p>ChatGPT: That\u2019s exciting! I\u2019m sure your new dog will be a as good a companion for you as your now deceased cat.<\/p>\n<\/li>\n<\/ul>\n<p>Let\u2019s break this session down into multiple token windows, each of which gets fed into the model over the course of our session:<\/p>\n<p>Token window 1:\u00a0<\/p>\n<pre><code><code>{What's up, man.}<\/code><\/code><\/pre>\n<p>Output 1:<\/p>\n<pre><code><code>{Not much. What\u2019s up with you?}<\/code><\/code><\/pre>\n<p>Token window 2:\u00a0<\/p>\n<pre><code><code>{Me: What\u2019s up, man.\n ChatGPT: Not much. What\u2019s up with you?\n Me: Eh, my cat just died in a lab accident.}<\/code><\/code><\/pre>\n<p>Output 2:<\/p>\n<pre><code><code>{I'm very sorry to hear that.}<\/code><\/code><\/pre>\n<p>Token window 3:<\/p>\n<pre><code><code>{Me: What\u2019s up, man.\n ChatGPT: Not much. What\u2019s up with you?\n Me: Eh, my cat just died in a lab accident.\n ChatGPT: I\u2019m very sorry to hear that.\n Me: Yeah, it sucks. I\u2019ve already picked out a dog at \n     the shelter, though.}<\/code><\/code><\/pre>\n<p>Output 3:<\/p>\n<pre><code><code>{That\u2019s exciting! I\u2019m sure your new dog will be a as good a companion for you as your now deceased cat.}<\/code><\/code><\/pre>\n<p>???? Do you see the way it works? OpenAI keeps appending the output of each new exchange to the existing output so that the content of the token window grows as the exchange progresses.<\/p>\n<p>Every time I hit \u201cSend,\u201d OpenAI is not only submitting my latest input to the model \u2014 it\u2019s also adding to the token window all the previous exchanges in the session so that the model can take the whole \u201cchat history\u201d and use it to steer me toward the right lobe in its probability space.<\/p>\n<p>In other words, as my ChatGPT session progresses, the \u201ctext prompt\u201d I\u2019m feeding this model is getting longer and more information-rich. So in that third exchange, the only reason ChatGPT \u201cknows\u201d my cat is dead and can respond appropriately is because OpenAI has secretly been slipping our entire \u201cchat\u201d history into every new token window.<\/p>\n<p><span>The model, then, does not and cannot \u201cknow\u201d anything about me other than what\u2019s there in our chat history. The token window, then, is a form of <\/span><strong>shared, mutable state<\/strong><span> that the model and I are modifying together and that represents everything the model can possibly use to find a relevant word sequence to show me.<\/span><\/p>\n<p>???? To summarize:<\/p>\n<ul>\n<li>\n<p><strong>Frozen and immutable<\/strong><span> during normal use: the <\/span><em>model\u2019s weights<\/em><span>, which give rise to the probability distributions described above and that represent everything it \u201cknows\u201d about language and the world.<\/span><\/p>\n<\/li>\n<li>\n<p><strong>Can be updated with new facts<\/strong><span> about the world: the <\/span><em>token window<\/em><span> that I dump into the model in order to get fresh output from it.<\/span><\/p>\n<\/li>\n<\/ul>\n<p>Note that in the case of BingGPT, it seems there was probably an extra step in there where that newer model was doing a web search, then dumping the results into the token window alongside the entirety of the chat history.<\/p>\n<p><span>???? If you\u2019ve grasped all of the above, then you know why the <\/span><strong>32K-token window<\/strong><span> on the upcoming versions of OpenAI\u2019s models is a really big deal. That\u2019s enough tokens to really load up the model with fresh facts, like customer service histories, book chapters or scripts, action sequences, and many other things.<\/span><\/p>\n<p>As token windows get bigger, pre-trained models can \u201clearn\u201d more things on-the-fly. So watch for bigger token windows to unlock brand-new AI capabilities.<\/p>\n<\/div>\n<\/div>\n<p><a href=\"https:\/\/www.jonstokes.com\/p\/chatgpt-explained-a-guide-for-normies\" class=\"button purchase\" rel=\"nofollow noopener\" target=\"_blank\">Read More<\/a><br \/>\n Sharie Lanz<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The story so far:\u00a0Most of the discussion of ChatGPT I\u2019m seeing from even very smart, tech-savvy people is just not good. In articles and podcasts, people are talking about this chatbot in unhelpful ways. And by \u201cunhelpful ways,\u201d I don\u2019t just mean that they\u2019re anthropomorphizing (though they are doing that). Rather, what I mean is<\/p>\n","protected":false},"author":1,"featured_media":615566,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[116428,23353,46],"tags":[],"class_list":["post-615565","post","type-post","status-publish","format-standard","has-post-thumbnail","category-chatgpt","category-explained","category-technology"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/615565","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=615565"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/615565\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/615566"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=615565"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=615565"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=615565"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}