Every day, millions of standard English speakers enjoy the benefits provided by natural language processing (NLP) models. <\/p>\n

<\/p>\n

Event<\/h3>\n
\n
Intelligent Security Summit On-Demand<\/span><\/p>\n
Learn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.<\/span><\/p>\n<\/div>\n

\n Watch Here <\/a>\n <\/p>\n<\/div>\n
<\/body><\/p>\n
\u201cMy hope with this project is that social and computational linguists, anthropologists, computer scientists, social scientists, and other researchers will poke and prod at this corpora, do research with it, wrestle with it, and test its limits so we can grow this into a true representation of AAVE and provide feedback and insight on our potential next steps algorithmically,\u201d said Henry.<\/p>\n
In this interview, she describes the early obstacles in developing this database, its potential to help computational linguistics understand the origins of AAVE, and her plans post-Stanford.\u00a0<\/p>\n
How do you describe African American Vernacular English? <\/h3>\n
To me, AAVE is a language of perseverance and uplift. It\u2019s the result of African languages thought to have been lost during the slave trade migration that have been incorporated into English to create a new language used by the descendants of those African peoples.\u00a0<\/p>\n
How did you become interested in including AAVE in NLP models? <\/h3>\n
As a child, both my parents occasionally spoke their native languages. For my Caribbean father, that was Jamaican patois, and for my mother it was Gullah Geechee, found in the coastal areas of the Carolinas and Georgia. Each language was a creole, which is a new language created by blending different languages. <\/p>\n
Everyone seemed to understand that my parents were speaking a different language, and no one doubted their intelligence. But when I saw people in my community speaking AAVE, which I believe to be another creole language, I could tell that there was a shame and stigma associated with it \u2014 a sense that if we used this language outside, we were going to be judged as being less intelligent. When I began working in data science, I wondered what would happen if I tried to collect data on AAVE and incorporate it into NLP models<\/a> so we could really begin to understand it and improve the performance of these models.<\/p>\n
How did your project evolve, and what obstacles did you encounter? <\/h3>\n
There were a lot of obstacles, and in the end I had to change my objective. AAVE evolves much more quickly than many languages and often turns standardized English on its head, giving words entirely new meanings. For example, the word \u201cmad\u201d is often defined as meaning \u201cangry.\u201d In AAVE, however, it\u2019s frequently used to mean \u201cvery,\u201d as in \u201cmad funny.\u201d <\/p>\n
AAVE can also be largely defined by the situation, the speaker, and the tone being used, things that language processing models<\/a> don\u2019t take into consideration. I eventually decided to create a corpus of AAVE, which is broken down into four collections. The lyric collection includes the words to 15,000 songs by 105 artists ranging from Etta James and Muddy Waters all the way up to Lil Baby and DaBaby. <\/p>\n
The leadership collection includes speeches from consequential individuals ranging from Fredrick Douglass and Sojourner Truth to Martin Luther King and Ketanji Brown Jackson. The most difficult to put together has been the book collection, because African Americans are grossly underrepresented in the literary canon, but I\u2019ve included works from historically Black book archive collections from universities. <\/p>\n
Finally, the social media collection is the most robust and diverse and includes video transcripts, blog posts, and 15,000 tweets, all collected from Black thought leaders.<\/p>\n
How do you hope your project will be used? <\/h3>\n
I know the corpora is beginning to be used, but I don\u2019t yet know by whom or for what purpose. My hope is that this preliminary work inspires researchers to enter this space, question it, and push it forward to make sure AAVE is represented in the languages used in NLP<\/a>. Social and computational linguists may be able to use this to help determine if AAVE is in fact its own language or dialect and to look for links between it and other African languages, particularly ones that have not been recorded or preserved in western history.<\/p>\n
Growing up, we learned what was taken from our enslaved ancestors and from their descendants. AAVE may be the proof that everything wasn\u2019t taken away and that we were able to retain some of who we were in the way we communicate with each other. That knowledge has the potential to remove shame and inject pride. When I\u2019m saying \u201cWhat up, my brother?\u201d I\u2019m not being unintelligent; I\u2019m being strategic and calling on our ancestors with that conversation.<\/p>\n
Not only does it not reflect the broader community, it also actively discriminates against that community. Large language models that struggle to understand or generate words in AAVE are more likely to exacerbate stereotypes about Black people generally, and these biased associations are being codified within these models. When they\u2019re commercialized, these models<\/a> \u2014 and their biases \u2014 can result in companies making unfair decisions that affect the lives of AAVE speakers. This can result in everything from individuals having their social media disproportionately edited or removed from platforms to discrimination in areas such as housing, banking, and the law enforcement and judicial systems.<\/p>\n
What should NLP developers be thinking about as they build tools?<\/h3>\n
There have been some popular NLP models that incorporate a lot of bias. Companies are working to scale back these problematic models, but that\u2019s often followed by a focus on risk mitigation over bias mitigation. Rather than try to find solutions, companies will sometimes take the approach of saying \u201cLet\u2019s not touch AAVE or anything that has to do with Blackness again, because we didn\u2019t do it right the first time.\u201d <\/p>\n
Instead, they should be asking how they can do it correctly now. This is the time to build models that are better, that improve on processes, and that come up with new ways to work with languages such as AAVE, so larger companies don\u2019t continue to perpetuate harm.<\/p>\n
What are your plans moving forward as you leave Stanford? <\/h3>\n
I\u2019m starting a new job at Microsoft, where I\u2019ll be working as a senior applied engineer for the autonomous systems team with Project Bonsai<\/a>. We\u2019re increasing deep reinforcement learning capabilities with something we call \u201cmachine teaching,\u201d which is essentially teaching machines how to perform tasks that can make humans more productive, improve safety, and allow for autonomous decision-making using AI. This work gives me the chance to improve people\u2019s lives, and I\u2019m so grateful for the opportunity.<\/p>\n
Beth Jensen is a contributing writer for the Stanford Institute for Human-Centered AI.<\/em><\/p>\n
This story originally appeared on Hai.stanford.edu<\/a>. Copyright 2023<\/i><\/p>\n
\n
DataDecisionMakers<\/h3>\n
Welcome to the VentureBeat community!<\/p>\n
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.<\/p>\n
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.<\/p>\n
You might even consider\u00a0 contributing an article<\/a>\u00a0of your own!<\/p>\n
Read More From DataDecisionMakers<\/a><\/p>\n<\/div><\/div>\n
Read More<\/a>
\n Beth Jensen, Stanford Institute for Human-Centered AI<\/p>\n","protected":false},"excerpt":{"rendered":"
Every day, millions of standard English speakers enjoy the benefits provided by natural language processing (NLP) models. But for speakers of African American Vernacular English (AAVE), technologies like voice-operated GPS systems, digital assistants, and speech-to-text software are often problematic because large NLP models frequently are unable to understand or generate words in AAVE. Even worse<\/p>\n","protected":false},"author":1,"featured_media":596424,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4299,62945,46],"tags":[],"class_list":["post-596423","post","type-post","status-publish","format-standard","has-post-thumbnail","category-building","category-inclusive","category-technology"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/596423","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=596423"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/596423\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/596424"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=596423"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=596423"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=596423"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}

Event<\/h3>\n\nIntelligent Security Summit On-Demand<\/span><\/p>\nLearn the critical role of AI & ML in cybersecurity and industry specific case studies. Watch on-demand sessions today.<\/span><\/p>\n<\/div>\n