{"id":617755,"date":"2023-03-14T09:49:06","date_gmt":"2023-03-14T14:49:06","guid":{"rendered":"https:\/\/news.sellorbuyhomefast.com\/index.php\/2023\/03\/14\/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi\/"},"modified":"2023-03-14T09:49:06","modified_gmt":"2023-03-14T14:49:06","slug":"you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi","status":"publish","type":"post","link":"https:\/\/newsycanuse.com\/index.php\/2023\/03\/14\/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi\/","title":{"rendered":"You can now run a GPT-3 level AI model on your laptop, phone, and Raspberry Pi"},"content":{"rendered":"<div>\n<header>\n<h4>\n      Pocket-sized hallucination on demand    \u2014<br \/>\n<\/h4>\n<h2 itemprop=\"description\">Thanks to Meta LLaMA, AI text models have their &#8220;Stable Diffusion moment.&#8221;<\/h2>\n<section>\n<p itemprop=\"author creator\" itemscope itemtype=\"http:\/\/schema.org\/Person\">\n      <a itemprop=\"url\" href=\"https:\/\/arstechnica.com\/author\/benjedwards\/\" rel=\"author\"><span itemprop=\"name\">Benj Edwards<\/span><\/a><br \/>\n    &#8211;  <time data-time=\"1678749380\" datetime=\"2023-03-13T23:16:20+00:00\">Mar 13, 2023 11:16 pm UTC<\/time>\n<\/p>\n<\/section>\n<\/header>\n<div itemprop=\"articleBody\">\n<figure>\n  <img decoding=\"async\" src=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2023\/03\/llama_laptop_hero_1-800x450.jpg\" alt=\"An AI-generated abstract image suggesting the silhouette of a figure.\"><figcaption>\n<p>Ars Technica<\/p>\n<\/figcaption><\/figure>\n<p>Things are moving at lightning speed in AI Land. On Friday, a software developer named Georgi Gerganov created a tool called <a href=\"https:\/\/github.com\/ggerganov\/llama.cpp\">&#8220;llama.cpp&#8221;<\/a> that can run Meta&#8217;s new GPT-3-class AI large language model, <a href=\"https:\/\/arstechnica.com\/information-technology\/2023\/02\/chatgpt-on-your-pc-meta-unveils-new-ai-model-that-can-run-on-a-single-gpu\/\">LLaMA<\/a>, locally on a Mac laptop. Soon thereafter, people worked out <a href=\"https:\/\/github.com\/ggerganov\/llama.cpp\/issues\/22\">how to run LLaMA on Windows<\/a> as well. Then someone <a href=\"https:\/\/twitter.com\/thiteanish\/status\/1635188333705043969\">showed it running<\/a> on a Pixel 6 phone, and next came <a href=\"https:\/\/twitter.com\/miolini\/status\/1634982361757790209\">a Raspberry Pi<\/a> (albeit running very slowly).<\/p>\n<p>If this keeps up, we may be looking at a pocket-sized <a href=\"https:\/\/arstechnica.com\/information-technology\/2023\/03\/get-ready-to-meet-the-chat-gpt-clones\/\">ChatGPT competitor<\/a> before we know it.<\/p>\n<p>But let&#8217;s back up a minute, because we&#8217;re not quite there yet. (At least not today\u2014as in literally today, March 13, 2023.) But what will arrive next week, no one knows.<\/p>\n<p>Since <a href=\"https:\/\/arstechnica.com\/information-technology\/2022\/12\/openai-invites-everyone-to-test-new-ai-powered-chatbot-with-amusing-results\/\">ChatGPT launched<\/a>, some people have been frustrated by the AI model&#8217;s built-in limits that prevent it from discussing topics that OpenAI has deemed sensitive. Thus began the <a href=\"https:\/\/boards.4channel.org\/g\/thread\/91848262#p91850335\">dream<\/a>\u2014in some quarters\u2014of an open source large language model (LLM) that anyone could run locally without censorship and without paying <a href=\"https:\/\/arstechnica.com\/information-technology\/2023\/03\/chatgpt-and-whisper-apis-debut-allowing-devs-to-integrate-them-into-apps\/\">API fees<\/a> to OpenAI.<\/p>\n<p>Open source solutions do exist (such as <a href=\"https:\/\/www.reddit.com\/r\/MachineLearning\/comments\/oe6paj\/d_gptj_for_text_generation_hardware_requirements\/\">GPT-J<\/a>), but they <a href=\"https:\/\/www.reddit.com\/r\/MachineLearning\/comments\/oe6paj\/d_gptj_for_text_generation_hardware_requirements\/\">require<\/a> a lot of GPU RAM and storage space. Other open source alternatives could not boast GPT-3-level performance on readily available consumer-level hardware.<\/p>\n<p>Enter LLaMA, an LLM available in parameter sizes ranging from 7B to 65B (that&#8217;s &#8220;B&#8221; as in &#8220;billion parameters,&#8221; which are floating point numbers stored in matrices that represent what the model &#8220;knows&#8221;). LLaMA made a heady claim: that its smaller-sized models could match <a href=\"https:\/\/arstechnica.com\/information-technology\/2022\/09\/twitter-pranksters-derail-gpt-3-bot-with-newly-discovered-prompt-injection-hack\/\">OpenAI&#8217;s GPT-3<\/a>, the foundational model that powers ChatGPT, in the quality and speed of its output. There was just one problem\u2014Meta released the LLaMA code open source, but it held back the &#8220;weights&#8221; (the trained &#8220;knowledge&#8221; stored in a neural network) for qualified researchers only.<\/p>\n<h2>Flying at the speed of LLaMA<\/h2>\n<p>Meta&#8217;s restrictions on LLaMA didn&#8217;t last long, because on March 2, someone <a href=\"https:\/\/news.ycombinator.com\/item?id=35007978\">leaked the LLaMA weights<\/a> on BitTorrent. Since then, there has been an explosion of development surrounding LLaMA. Independent AI researcher Simon Willison has <a href=\"https:\/\/simonwillison.net\/2023\/Mar\/11\/llama\/\">compared<\/a>\u00a0this situation to the release of <a href=\"https:\/\/arstechnica.com\/information-technology\/2022\/09\/with-stable-diffusion-you-may-never-believe-what-you-see-online-again\/\">Stable Diffusion<\/a>, an open source image synthesis model that launched last August. Here&#8217;s what he wrote in a post on his blog:<\/p>\n<blockquote>\n<p>It feels to me like that Stable Diffusion moment back in August kick-started the entire new wave of interest in generative AI\u2014which was then pushed into over-drive by the release of ChatGPT at the end of November.<\/p>\n<p>That Stable Diffusion moment is happening again right now, for large language models\u2014the technology behind ChatGPT itself. This morning I ran a GPT-3 class language model on my own personal laptop for the first time!<\/p>\n<p>AI stuff was weird already. It\u2019s about to get a whole lot weirder.<\/p>\n<\/blockquote>\n<p>Typically, running GPT-3 requires several datacenter-class <a href=\"https:\/\/arstechnica.com\/gadgets\/2020\/05\/nvidia-ditches-intel-cozies-up-to-amd-with-its-new-dgx-a100\/\">A100<\/a> GPUs (also, the weights for GPT-3 are not public), but LLaMA made waves because it could run on a single beefy consumer GPU. And now, with optimizations that reduce the model size using a technique called quantization, LLaMA can run on an M1 Mac or a lesser Nvidia consumer GPU (although &#8220;llama.cpp&#8221; only runs on CPU at the moment\u2014which is impressive and surprising in its own way).<\/p>\n<p>Things are moving so quickly that it&#8217;s sometimes <a href=\"https:\/\/arstechnica.com\/information-technology\/2022\/12\/please-slow-down-the-7-biggest-ai-stories-of-2022\/\">difficult to keep up<\/a> with the latest developments. (Regarding AI&#8217;s rate of progress, a fellow AI reporter told Ars, &#8220;It&#8217;s like those videos of dogs where you upend a crate of tennis balls on them. [They] don&#8217;t know where to chase first and get lost in the confusion.&#8221;)<\/p>\n<p>For example, here&#8217;s a list of notable LLaMA-related events based on a <a href=\"https:\/\/news.ycombinator.com\/item?id=35140369\">timeline<\/a> Willison laid out in a Hacker News comment:<\/p>\n<ul>\n<li>February 24, 2023: Meta AI <a href=\"https:\/\/arstechnica.com\/information-technology\/2023\/02\/chatgpt-on-your-pc-meta-unveils-new-ai-model-that-can-run-on-a-single-gpu\/\">announces<\/a> LLaMA.<\/li>\n<li>March 2, 2023: Someone <a href=\"https:\/\/news.ycombinator.com\/item?id=35007978\">leaks the LLaMA models<\/a> via BitTorrent.<\/li>\n<li>March 10, 2023: Georgi Gerganov <a href=\"https:\/\/github.com\/ggerganov\/llama.cpp\/commit\/26c084662903ddaca19bef982831bfb0856e8257\">creates llama.cpp<\/a>, which can run on an M1 Mac.<\/li>\n<li>March 11, 2023: Artem Andreenko runs LLaMA 7B (slowly) <a href=\"https:\/\/twitter.com\/miolini\/status\/1634982361757790209\">on a Raspberry Pi 4<\/a>, 4GB RAM, 10 sec\/token.<\/li>\n<li>March 12, 2023: LLaMA 7B <a href=\"https:\/\/cocktailpeanut.github.io\/dalai\/#\/\">running on NPX,<\/a> a node.js execution tool.<\/li>\n<li>March 13, 2023: Someone gets llama.cpp running <a href=\"https:\/\/twitter.com\/thiteanish\/status\/1635188333705043969\">on a Pixel 6 phone<\/a>, also very slowly.<\/li>\n<li>March 13, 2023, 2023: Stanford releases <a href=\"https:\/\/crfm.stanford.edu\/2023\/03\/13\/alpaca.html\">Alpaca 7B<\/a>, an instruction-tuned version of LLaMA 7B that &#8220;behaves similarly to OpenAI&#8217;s &#8220;<a href=\"https:\/\/arstechnica.com\/information-technology\/2022\/11\/openai-conquers-rhyming-poetry-with-new-gpt-3-update\/\">text-davinci-003<\/a>&#8221; but runs on much less powerful hardware.<\/li>\n<\/ul>\n<p>After obtaining the LLaMA weights ourselves, we followed Willison&#8217;s instructions and got the 7B parameter version running on an M1 Macbook Air, and it runs at a reasonable rate of speed. You call it as a script on the command line with a prompt, and LLaMA does its best to complete it in a reasonable way.<\/p>\n<figure><a href=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2023\/03\/llama_mac_screenshot.png\" data-height=\"1113\" data-width=\"1779\" alt=\"A screenshot of LLaMA 7B in action on a MacBook Air running llama.cpp.\"><img loading=\"lazy\" decoding=\"async\" alt=\"A screenshot of LLaMA 7B in action on a MacBook Air running llama.cpp.\" src=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2023\/03\/llama_mac_screenshot-640x400.png\" width=\"640\" height=\"400\" ><\/a><figcaption>\n<p><a href=\"https:\/\/cdn.arstechnica.net\/wp-content\/uploads\/2023\/03\/llama_mac_screenshot.png\" data-height=\"1113\" data-width=\"1779\">Enlarge<\/a> <span>\/<\/span> A screenshot of LLaMA 7B in action on a MacBook Air running llama.cpp.<\/p>\n<p>Benj Edwards \/ Ars Technica<\/p>\n<\/figcaption><\/figure>\n<p>There&#8217;s still the question of how much the quantization affects the quality of the output. In our tests, LLaMA 7B trimmed down to 4-bit quantization was very impressive for running on a MacBook Air\u2014but still not on par with what you might expect from ChatGPT. It&#8217;s entirely possible that better prompting techniques might generate better results.<\/p>\n<p>Also, optimizations and fine-tunings come quickly when everyone has their hands on the code and the weights\u2014even though LLaMA is still saddled with some <a href=\"https:\/\/docs.google.com\/forms\/d\/e\/1FAIpQLSfqNECQnMkycAp2jP4Z9TFX0cGR4uf7b_fBxjY_OjhJILlKGA\/viewform\">fairly restrictive<\/a> terms of use. The <a href=\"https:\/\/crfm.stanford.edu\/2023\/03\/13\/alpaca.html\">release of Alpaca<\/a> today by Stanford proves that fine tuning (additional training with a specific goal in mind) can improve performance, and it&#8217;s still early days after LLaMA&#8217;s release.<\/p>\n<p>As of this writing, running LLaMA on a Mac remains a fairly technical exercise. You have to install Python and Xcode and be familiar with working on the command line. Willison has good <a href=\"https:\/\/simonwillison.net\/2023\/Mar\/11\/llama\/\">step-by-step instructions<\/a> for anyone who would like to attempt it. But that may soon change as developers continue to code away.<\/p>\n<p>As for the implications of having this tech out in the wild\u2014no one knows yet. While some worry about AI&#8217;s impact as a tool for spam and misinformation, Willison says, &#8220;It\u2019s not going to be un-invented, so I think our priority should be figuring out the most constructive possible ways to use it.&#8221;<\/p>\n<p>Right now, our only guarantee is that things will change rapidly.<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<p><a href=\"https:\/\/arstechnica.com\/?p=1923645\" class=\"button purchase\" rel=\"nofollow noopener\" target=\"_blank\">Read More<\/a><br \/>\n Benj Edwards<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Pocket-sized hallucination on demand \u2014 Thanks to Meta LLaMA, AI text models have their &#8220;Stable Diffusion moment.&#8221; Benj Edwards &#8211; Mar 13, 2023 11:16 pm UTC Ars Technica Things are moving at lightning speed in AI Land. On Friday, a software developer named Georgi Gerganov created a tool called &#8220;llama.cpp&#8221; that can run Meta&#8217;s new<\/p>\n","protected":false},"author":1,"featured_media":617756,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[22500,38,46],"tags":[],"class_list":["post-617755","post","type-post","status-publish","format-standard","has-post-thumbnail","category-level","category-model","category-technology"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/617755","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=617755"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/617755\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/617756"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=617755"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=617755"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=617755"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}