{"id":618091,"date":"2023-03-15T09:49:21","date_gmt":"2023-03-15T14:49:21","guid":{"rendered":"https:\/\/news.sellorbuyhomefast.com\/index.php\/2023\/03\/15\/interview-with-openais-greg-brockman-gpt-4-isnt-perfect-but-neither-are-you\/"},"modified":"2023-03-15T09:49:21","modified_gmt":"2023-03-15T14:49:21","slug":"interview-with-openais-greg-brockman-gpt-4-isnt-perfect-but-neither-are-you","status":"publish","type":"post","link":"https:\/\/newsycanuse.com\/index.php\/2023\/03\/15\/interview-with-openais-greg-brockman-gpt-4-isnt-perfect-but-neither-are-you\/","title":{"rendered":"Interview with OpenAI\u2019s Greg Brockman: GPT-4 isn\u2019t perfect, but neither are you"},"content":{"rendered":"<div>\n<p id=\"speakable-summary\">OpenAI shipped <a href=\"https:\/\/techcrunch.com\/2023\/03\/14\/openai-releases-gpt-4-ai-that-it-claims-is-state-of-the-art\/\">GPT-4<\/a>\u00a0yesterday, the much-anticipated text-generating AI model, and it\u2019s a curious piece of work.<\/p>\n<p>GPT-4 improves upon its predecessor, <a href=\"https:\/\/techcrunch.com\/tag\/gpt-3\/\">GPT-3<\/a>, in key ways, for example giving more factually true statements and allowing developers to prescribe its style and behavior more easily. It\u2019s also multimodal in the sense that it can understand images, allowing it to caption and even explain in detail the contents of a photo.<\/p>\n<p>But GPT-4 has serious shortcomings. Like GPT-3, the model \u201challucinates\u201d facts and makes basic reasoning errors. In one example on OpenAI\u2019s <a href=\"https:\/\/openai.com\/research\/gpt-4\">own blog<\/a>, GPT-4 describes Elvis Presley as the \u201cson of an actor.\u201d (Neither of his parents were actors.)<\/p>\n<p>To get a better handle on GPT-4\u2019s development cycle and its capabilities, as well as its limitations, TechCrunch spoke with Greg Brockman, one of the co-founders of OpenAI and its president, via a video call on Tuesday.<\/p>\n<p>Asked to compare GPT-4 to GPT-3, Brockman had one word: Different.<\/p>\n<p>\u201cIt\u2019s just different,\u201d he told TechCrunch. \u201cThere\u2019s still a lot of problems and mistakes that [the model] makes \u2026 but you can really see the jump in skill in things like calculus or law, where it went from being really bad at certain domains to actually quite good relative to humans.\u201d<\/p>\n<p>Test results support his case. On the AP Calculus BC exam, GPT-4 scores a 4 out of 5 while GPT-3 scores a 1. (<a href=\"https:\/\/techcrunch.com\/2022\/12\/01\/while-anticipation-builds-for-gpt-4-openai-quietly-releases-gpt-3-5\/\">GPT-3.5<\/a>, the intermediate model between GPT-3 and GPT-4, also scores a 4.) And in a simulated bar exam, GPT-4 passes with a score around the top 10% of test takers; GPT-3.5\u2019s score hovered around the bottom 10%.<\/p>\n<p>Shifting gears, one of GPT-4\u2019s more intriguing aspects is the above-mentioned multimodality. Unlike GPT-3 and GPT-3.5, which could only accept text prompts (e.g. \u201cWrite an essay about giraffes\u201d), GPT-4 can take a prompt of both images and text to perform some action (e.g. an image of giraffes in the Serengeti with the prompt \u201cHow many giraffes are shown here?\u201d).<\/p>\n<p>That\u2019s because GPT-4 was trained on image <em>and<\/em> text data while its predecessors were only trained on text. OpenAI says that the training data came from \u201ca variety of licensed, created, and publicly available data sources, which may include publicly available personal information,\u201d but Brockman demurred when I asked for specifics. (Training data has gotten OpenAI into <a href=\"https:\/\/techcrunch.com\/2023\/01\/27\/the-current-legal-cases-against-generative-ai-are-just-the-beginning\/\">legal trouble<\/a> before.)<\/p>\n<p>GPT-4\u2019s image understanding abilities are quite impressive. For example, fed the prompt \u201cWhat\u2019s funny about this image? Describe it panel by panel\u201d plus a three-paneled image showing a fake VGA cable being plugged into an iPhone, GPT-4 gives a breakdown of each image panel and correctly explains the joke (\u201cThe humor in this image comes from the absurdity of plugging a large, outdated VGA connector into a small, modern smartphone charging port\u201d).<\/p>\n<p>Only a single launch partner has access to GPT-4\u2019s image analysis capabilities at the moment \u2014 an assistive app for the visually impaired called <a href=\"https:\/\/techcrunch.com\/2023\/03\/14\/gpt-4s-first-app-is-a-virtual-volunteer-for-the-visually-impaired\/\">Be My Eyes<\/a>. Brockman says that the wider rollout, whenever it happens, will be \u201cslow and intentional\u201d as OpenAI evaluates the risks and benefits.<\/p>\n<p>\u201cThere\u2019s policy issues like facial recognition and how to treat images of people that we need to address and work through,\u201d Brockman said. \u201cWe need to figure out, like, where the sort of danger zones are \u2014 where the red lines are \u2014 and then clarify that over time.\u201d<\/p>\n<p>OpenAI dealt with similar ethical dilemmas around DALL-E 2, its text-to-image system. After initially disabling the capability, OpenAI allowed customers to upload people\u2019s faces to edit them using the AI-powered image-generating system. At the time, OpenAI <a href=\"https:\/\/techcrunch.com\/2022\/09\/19\/openai-begins-allowing-users-to-edit-faces-with-dall-e-2\/\">claimed<\/a> that upgrades to its safety system made the face-editing feature possible by \u201cminimizing the potential of harm\u201d from deepfakes as well as attempts to create sexual, political and violent content.<\/p>\n<p>Another perennial is preventing GPT-4 from being used in unintended ways that might inflict harm \u2014 psychological, monetary or otherwise. Hours after the model\u2019s release, Israeli cybersecurity startup Adversa AI published a <a href=\"https:\/\/adversa.ai\/blog\/gpt-4-hacking-and-jailbreaking-via-rabbithole-attack-plus-prompt-injection-content-moderation-bypass-weaponizing-ai\/\">blog post<\/a> demonstrating methods to bypass OpenAI\u2019s content filters and get GPT-4 to generate phishing emails, offensive descriptions of gay people and other highly objectionable text.<\/p>\n<p>It\u2019s not a new phenomenon in the language model domain. Meta\u2019s BlenderBot and OpenAI\u2019s ChatGPT, too, have been prompted to say wildly offensive things, and even reveal sensitive details about their inner workings. But many had hoped, this reporter included, that GPT-4 might deliver significant improvements on the moderation front.<\/p>\n<p>When asked about GPT-4\u2019s robustness, Brockman stressed that the model has gone through six months of safety training and that, in internal tests, it was 82% less likely to respond to requests for content disallowed by OpenAI\u2019s usage policy and 40% more likely to produce \u201cfactual\u201d responses than GPT-3.5.<\/p>\n<p>\u201cWe spent a lot of time trying to understand what GPT-4 is capable of,\u201d Brockman said. \u201cGetting it out in the world is how we learn. We\u2019re constantly making updates, include a bunch of improvements, so that the model is much more scalable to whatever personality or sort of mode you want it to be in.\u201d<\/p>\n<p>The early real-world results aren\u2019t that promising, frankly. Beyond the Adversa AI tests, <a href=\"https:\/\/techcrunch.com\/2023\/03\/14\/microsofts-new-bing-ai-chatbot-arrives-in-the-stable-version-of-its-edge-web-browser\/\">Bing Chat<\/a>, Microsoft\u2019s chatbot powered by GPT-4, has been shown to be highly susceptible to jailbreaking. Using carefully tailored inputs, users have been able to get the bot to profess love, threaten harm, <a href=\"https:\/\/techcrunch.com\/2023\/02\/08\/hands-on-with-the-new-bing\/\">defend<\/a> the Holocaust and invent conspiracy theories.<\/p>\n<p>Brockman didn\u2019t deny that GPT-4 falls short, here. But he emphasized the model\u2019s new mitigatory steerability tools, including an API-level capability called \u201csystem\u201d messages. System messages are essentially instructions that set the tone \u2014 and establish boundaries \u2014 for GPT-4\u2019s interactions. For example, a system message might read: \u201cYou are a tutor that always responds in the Socratic style. You <em>never<\/em> give the student the answer, but always try to ask just the right question to help them learn to think for themselves.\u201d<\/p>\n<p>The idea is that the system messages act as guardrails to prevent GPT-4 from veering off course.<\/p>\n<p>\u201cReally figuring out GPT-4\u2019s tone, the style and the substance has been a great focus for us,\u201d Brockman said. \u201cI think we\u2019re starting to understand a little bit more of how to do the engineering, about how to have a repeatable process that kind of gets you to predictable results that are going to be really useful to people.\u201d<\/p>\n<p>Brockman also pointed to <a href=\"https:\/\/techcrunch.com\/2023\/03\/14\/with-evals-openai-hopes-to-crowdsource-ai-model-testing\/\">Evals<\/a>, OpenAI\u2019s newly open-sourced software framework to evaluate the performance of its AI models, as a sign of OpenAI\u2019s commitment to \u201crobustifying\u201d its models. Evals lets users develop and run benchmarks for evaluating models like GPT-4 while inspecting their performance \u2014 a sort of crowdsourced approach to model testing.<\/p>\n<p>\u201cWith Evals, we can see the [use cases] that users care about in a systematic form that we\u2019re able to test against,\u201d Brockman said. \u201cPart of why we [open-sourced] it is because we\u2019re moving away from releasing a new model every three months \u2014 whatever it was previously \u2014 to make constant improvements. You don\u2019t make what you don\u2019t measure, right? As we make new versions [of the model], we can at least be aware what those changes are.\u201d<\/p>\n<p>I asked Brockman if OpenAI would ever compensate people to test its models with Evals. He wouldn\u2019t commit to that, but he did note that \u2014 for a limited time \u2014 OpenAI\u2019s granting select Evals users early access to the GPT-4 API.<\/p>\n<p>Brockman and I\u2019s conversation also touched on GPT-4\u2019s context window, which refers to the text the model can consider before generating additional text. OpenAI is testing a version of <a href=\"https:\/\/techcrunch.com\/2023\/03\/14\/openai-releases-gpt-4-ai-that-it-claims-is-state-of-the-art\/\">GPT-4<\/a> that can \u201cremember\u201d roughly 50 pages of content, or five times as much as the vanilla GPT-4 can hold in its \u201cmemory\u201d and eight times as much as GPT-3.<\/p>\n<p>Brockman believes that the expanded context window lead to new, previously unexplored applications, particularly in the enterprise. He envisions an AI chatbot built for a company that leverages context and knowledge from different sources, including employees across departments, to answer questions in a very informed but conversational way.<\/p>\n<p>That\u2019s <a href=\"https:\/\/venturebeat.com\/ai\/the-growth-of-cognitive-search-in-the-enterprise-and-why-it-matters\/\">not a new concept<\/a>. But Brockman makes the case that GPT-4\u2019s answers will be far more useful than those from chatbots and search engines today.<\/p>\n<p>\u201cPreviously, the model didn\u2019t have any knowledge of who you are, what you\u2019re interested in and so on,\u201d Brockman said. \u201cHaving that kind of history [with the larger context window] is definitely going to make it more able \u2026 It\u2019ll turbocharge what people can do.\u201d<\/p>\n<\/p><\/div>\n<p><a href=\"https:\/\/techcrunch.com\/2023\/03\/15\/interview-with-openais-greg-brockman-gpt-4-isnt-perfect-but-neither-are-you\/\" class=\"button purchase\" rel=\"nofollow noopener\" target=\"_blank\">Read More<\/a><br \/>\n Kyle Wiggers<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI shipped GPT-4\u00a0yesterday, the much-anticipated text-generating AI model, and it\u2019s a curious piece of work. GPT-4 improves upon its predecessor, GPT-3, in key ways, for example giving more factually true statements and allowing developers to prescribe its style and behavior more easily. It\u2019s also multimodal in the sense that it can understand images, allowing it<\/p>\n","protected":false},"author":1,"featured_media":618092,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[22316,77235,46],"tags":[],"class_list":{"0":"post-618091","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-interview","8":"category-openais","9":"category-technology"},"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/618091","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=618091"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/618091\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/618092"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=618091"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=618091"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=618091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}