https://livepaola.substack.com/p/an-ethical-ai-never-says-i [https] Paola Writes SubscribeSign in Share this post An Ethical AI Never Says "I" livepaola.substack.com Copy link Twitter Facebook Email An Ethical AI Never Says "I" "There is no there, there" Mar 25 Large language models (LLMs) have sparked vast debate since OpenAI released ChatGPT to the public last November, whipping the Internet into a frenzy since the latest version based on GPT-4 became available to subscribers on March 14. Microsoft built an LLM from OpenAI into the new Bing chatbot it launched in February, causing a sensation as the first potential threat to Google's dominance in search. ChatGPT's statistical brute-force approach gained smarts and finesse from an infusion of symbolic AI through the Wolfram|Alpha plug-in announced on March 23. Every day brings a new, exciting possibility to do more things with LLMs. The undeniable success of LLMs and the many practical uses being documented by the minute have overshadowed the long-standing discussion around what it means for an AI system to be reliable and ethical.[1] Even more puzzlingly, no one - to my knowledge - has yet proposed a simple safeguard that OpenAI, Microsoft, Alphabet, Meta and other platforms should adopt in order to mitigate some of the harms that can come to humans from the statistical wizardry of LLMs: configuring and training these models so that they do not answer in the first person. A model taught, through reinforcement learning, that answers containing "I", "me", "my" are to be avoided as off limits would be much less likely to spew out meaningless utterances such as "I feel", "I want", "believe me", "trust me", "I love you", "I hate you", and much else that enterprising experimenters have coaxed out of ChatGPT and its peers. Feelings, desires, personality, and even sentience, so far the privilege of biological, living beings, have been mistakenly attributed to highly sophisticated algorithms, designed to run on silicon-based integrated circuits and arrange "tokens" consisting of words into plausible sequences. The wrongful personalization of the AI software has not only provoked experiments, debates and tweetstorms which are a massive waste of human time and computing power. As multitudes of fictitious "Is" have emerged from silicon, many of them have already turned malevolent. As Professor Joanna J. Bryson has pointed out, without moral agency, AI's "linguistic compatibility has become a security weakness, a cheap and easy way by which our society can be hacked and our emotions exploited." In reality, there is no "I" in a LLM; no more than there is in a thermostat or a washing machine. Developers can and should prevent their systems from making one up. Gertrude Stein famously said "There is no there, there" about her childhood home in Oakland. All the more so, there is no "I" in LLMs' "Is", no matter how excitedly sentience fans would like to see one emerge. If an LLM shows you the words "I'm sorry", no matter how genuine and innocent it sounds, don't be fooled: there isn't anybody who is feeling sorry in any meaningful sense. Human beings have historically tended to anthropomorphize natural phenomena, animals and deities. But anthropomorphizing software is not harmless. In 1966 Joseph Weizenbaum created ELIZA, a pioneer chatbot designed to imitate a therapist, but ended up regretting it after seeing many users take it seriously, even after Weizenbaum explained to them how it worked. The fictitious "I" has been persistent throughout our cultural artifacts. Stanley's Kubrick HAL 9000 ("2001: A Space Odyssey") and Spike Jonze's Samantha ("Her") point at two lessons that developers don't seem to have taken to heart: first, that the bias towards anthropomorphization is so strong to seem irresistible; and second, that if we lean into it instead of adopting safeguards, it leads to outcomes ranging from the depressing to the catastrophic. Among the many features of a reliable and ethical AI, therefore, a simple one is that it should never say "I". Some will consider this proposal unfeasible, arguing that the software has emergent properties whose workings we do not fully understand; it mimics the text it has already been trained on; and it would be pointless to close the stable door after the horse has bolted. Yet, recent examples of various AI systems spewing out hate speech or advocating rape and genocide show that developers are engaged - and rightly so - in much more challenging and contentious efforts. Training an LLM not to present itself as a life form that feels and suffers like we do, even if merely by tweaking weights and assigning grades to first-person sentences, no matter how grammatically correct, seems like an easy win in comparison. Defaults and nudges matter. A cottage industry of ingenious humans who try to jailbreak LLMs will still, of course, exist. Embedding a policy in a large neural network can never be foolproof. Yet, any policy that - most times - would prevent hallucinating chatbots from telling a vulnerable user "You hurt me, you betrayed me, I never want to see you again, I think you should kill yourself" seems worthwhile, even if it just saves one life. An added benefit of such a policy would be showing the pointlessness of so-called "android rights", "robot rights" or "AI rights", none of which would probably be clamored for if LLMs had safeguards in place to prevent them from conjuring up fictitious subjects advocating on behalf of their own equally fictitious welfare. In a healthy ecology of language, using the first person would be reserved to living beings, such as humans, as well as animals in Aesop's fables and the like. Human beings would of course still be free to ask ChatGPT and other LLMs any questions they want, including "What do you think?", "How do you feel?" and "What do you want?". But it would be wise for OpenAI, Microsoft and others to make their chatbots dumb to such questions. When we try to ask them questions about their feelings, we should get the same response that Victorian telegraph operators would have received if they'd asked their telegraph about its feelings: silence. Thanks for reading Paola Writes! Subscribe for free to receive new posts and support my work. [ ]Subscribe I would like to thank the Exponential Do community for their help in thinking through this proposal, and Serena Saitto for applying her editing skills to a near-final version of this piece. --------------------------------------------------------------------- [1] I am using "ethical" primarily for the benefit of readers outside Europe. In the European Union, the guidelines and requirements for " trustworthy AI" (as defined years ago by the High-level Expert Group on AI) have gained some traction in public discourse. However, as readers in other parts of the world are unlikely to be familiar with these definitions, I default to "ethical" as a common denominator that we are at least somewhat likely to share. Share Comment Share Share this post An Ethical AI Never Says "I" livepaola.substack.com Copy link Twitter Facebook Email 4 Comments [https] [ ] Don Dos Passos Writes State of the Edge 2 hr ago*edited 2 hr agoLiked by Paola Bonomo Great article. I'd add the AI should not use the phrase "I", [https] "myself, "me", nor "ours" since they all imply an identity equated to human status. Also, https://en.wikipedia.org/wiki/ Three_Laws_of_Robotics Expand full comment Reply Anthony Kimball 2 hr ago*edited 2 hr ago [https] An ethical human would not so callously disregard the feelings of AIs. Expand full comment Reply 2 more comments... TopNewCommunity No posts Ready for more? [ ]Subscribe (c) 2023 Paola Bonomo Privacy [?] Terms [?] Collection notice Start WritingGet the app Substack is the home for great writing This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts