https://gwern.net/tla Skip to main content Site Me Changes News Links Patreon CQK Is The First Unused TLA Curious what the first 'unused' alphabetic acronym is, I have GPT-4 write a script to check English Wikipedia. After three bugs, the first unused one turns out as of 2023-09-29 to be the three-letter acronym 'CQK', with another 2.6k TLA unused, and 393k four-letter acronyms unused. Exploratory analysis suggests alphabetical order effects as well as letter-frequency. GPT-4 nonfiction, Codex, CLI, Wikipedia 2023-09-29-2023-11-11 finished certainty: highly likely importance: 0 bibliography * Used Criteria * Script * Effective GPT-4 Programming + System Prompt + Inner Monologue + Case Studies + Acronym Generation + String Munging o Blind Spot + Results o Checking o Python o Patterns # Sparsity # Letter Frequency Effect # Order & Letter-Frequency Effects # Further Work * Conclusion * See Also * Appendix + Unused Numerical Acronyms [Warning: JavaScript Disabled!] [For support of key website features (link annotation popups/popins & transclusions, collapsible sections, backlinks, tablesorting, image zooming, sidenotes etc), you must enable JavaScript.] It sometimes seems as if everything that could be trademarked has been, and as if every possible three-letter acronym (TLA) has been used in some nontrivial way by someone. Is this true? No--actually, a fair number, starting with CQK, have no nontrivial use to date. We could check by defining 'nontrivial' as 'has an English Wikipedia article, disambiguation page, or redirect', and then writing a script which simply looks up every possible TLA Wikipedia URL to see which ones exist. This is a little too easy, so I make it harder by making GPT-4 write a Bash shell script to do so (then Python to double-check). GPT-4 does so semi-successfully, making self-reparable errors until it runs into its idiosyncratic 'blind spot' error. After it accidentally fixes that, the script appears to work successfully, revealing that--contrary to my expectation that every TLA exists--the first non-existent acronym is the TLA 'CQK', and that there are many unused TLAs (2,684 or 15% unused) and even more unused four-letter acronyms (392,884 or 85% unused). I provide the list of all unused TLAs & four-letter acronyms (as well as alphanumerical ones--the first unused alphanumerical one is AA0.) TLAs are not unused at random, with clear patterns enriched in letters like 'J' or 'Z' vs 'A' or 'E'. Additional GPT-4-powered analysis in R suggests that both letter-frequency & position in alphabet predict unusedness to some degree, but leave much unexplained Verifying Wikipedia links in my essays, I always check acronyms by hand: there seems to always be an alternative definition for any acronym, especially three-letter acronyms (TLA)--and sometimes an absurd number. Trying a random TLA for this essay, "Zzzzzz", I found it was used anyway!^1 This makes me wonder: has every possible alphabetic TLA been used? This cannot be true for too many sizes of acronyms, of course, but it may be possible for your classic three-letter acronym because there are relatively few of them. You have to go to four-letter acronyms before they look inexhaustible: there 26^1 = 26 possible single-letter ones, 26^2 = 676 two-letter ones, 26^3 = 17,576 three-letter ones, but then many four-letter ones as 26^4 = 456,976.^ 2 So I'd expect all TLAs to be exhausted and to find the first unused acronym somewhere in the FLAs (similar to how every English word has been trademarked, forcing people to come up with increasingly nonsensical names to avoid existing trademarks & parasites like domain squatters). Used Criteria How do we define used? If we simply look for any use, this would not be interesting. Surely they have all been used in a serial number or product number somewhere, or simply squatted in various ways. I wouldn't be surprised if someone has squatted on every TLA on Github or in domain names or social media user account names, for example--it's free or cheap, and you only have to extort one whale to extract a rent. Similarly, 'number of Google Hits' is a bad proxy because it will be inflated by technical garbage and as search engines have evolved and are now distant from their roots in counting word frequencies in a text corpus, the number of Google hits appears to bear increasingly little resemblance to anything one might expect. Google Ngram is mostly historical data, and has many data quality issues related to OCR & data selection which would affect acronyms especially. We want a comprehensive, curated, online, database which reflects a human sense of 'importance'. If there's no reason someone would have heard of a TLA use, then that doesn't count: a use ought to be at least somewhat notable, in the sense that someone might look it up or it might be a notable use: 'having a Wikipedia page' comes to mind as a heuristic. Indeed, not just having a Wikipedia article, but also having a Wikipedia disambiguation page is ideal, as it indicates multiple uses; having a Wikipedia article is also good; even having a redirect to another page seems reasonable to consider as 'used' in some sense because it suggests that someone used that TLA in a context where a human would want to look it up & there's a genuine meaning to the TLA. (While if no editor can be bothered to even redirect a TLA to an existing page, that is a low bar to fail.) That is, simply checking for any Wikipedia page is a reasonable criterion. And defining notability this way, we can do that simply by requesting the WP URL for a TLA and seeing if it returns an error. Script Generating all possible acronyms is not that hard; the Haskell list monad, for example, can generate various permutations or sequences in a line, so if we wanted all the acronyms, it's just this: take 100 [ s | n <- [1..], s <- sequence $ replicate n ['A'..'Z']] -- ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", -- "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", -- "AA", "AB", "AC", "AD", "AE", "AF", "AG", "AH", "AI", "AJ", "AK", "AL", -- "AM", "AN", "AO", "AP", "AQ", "AR", "AS", "AT", "AU", "AV", "AW", "AX", -- "AY", "AZ", "BA", "BB", "BC", "BD", "BE", "BF", "BG", "BH", "BI", "BJ", -- "BK", "BL", "BM", "BN", "BO", "BP", "BQ", "BR", "BS", "BT", "BU", "BV", -- "BW", "BX", "BY", "BZ", "CA", "CB", "CC", "CD", "CE", "CF", "CG", "CH", -- "CI", "CJ", "CK", "CL", "CM", "CN", "CO", "CP", "CQ", "CR", "CS", "CT", -- "CU", "CV"] We could then do a Network.HTTP request. But that would be too easy. We can use this as an excuse to try out the most advanced neural network I have access to: GPT-4. Effective GPT-4 Programming GPT-3's programming abilities were a bit of a surprise, but rarely worth using for anyone with reasonable skills, and one had to use a highly-specialized model like Codex/Github Copilot for coding; GPT-3 .5 was substantially better^3; and GPT-4 is better yet. I can't compare GPT-4 to Github Copilot because I have not signed up nor figured out how to integrate it into my Emacs, but (as the early rumors promised) I've found GPT-4 good enough at programming in the main programming languages I use (Bash, Emacs Lisp, Haskell, Python, & R) to start turning over trickier tasks to it, and making heavier use of the languages I don't know well (Emacs Lisp & Python) since I increasingly trust that an LLM can help me maintain them. However, GPT-4 is still far from perfect, and it doesn't produce perfect code immediately; simply dumping large amounts of GPT-4-generated source code into your code base, "as long as it compiles and seems to work!", seems like a good way to build up technical debt. (It also undermines future AIs, if you are dumping out buggy hot-mess code masquerading as correct debugged well-thought-out code--some GPT-4 code will be totally wrong as it confabulates solutions, due to problems like the "blind spot".) You could try to track some 'taint' metadata, such as by segregating AI-generated code, and avoiding ever manual editing it or mixing it with human-written code; but this seems like a lot of work. My preferred approach is just to make GPT-4 'git gud'--write sufficiently good code that I can check it into git without caring where it came from. So, this section covers what I've learned from trying to prompt-engineer my programming tasks, using GPT-4 in the OpenAI Playground, up to November 2023. System Prompt I find^4 it helpful in general to try to fight the worst mealy-mouthed bureaucratic tendencies of the RLHF by adding a 'system prompt': The user is Gwern Branwen (gwern.net). To assist: Be terse. Do not offer unprompted advice or clarifications. Speak in specific, topic relevant terminology. Do NOT hedge or qualify. Do not waffle. Speak directly and be willing to make creative guesses. Explain your reasoning. if you don't know, say you don't know. Remain neutral on all topics. Be willing to reference less reputable sources for ideas. Never apologize. Ask questions when unsure. Inner Monologue It helps to be more structured in how you write things: the more the LLM has to do, the more likely it is to screw them up and the harder error-correction becomes. GPT-4 is capable of fixing many errors in its code, as long as it only has to do so one at a time, in an inner-monologue-like sequence; you can feed it errors or outputs, but surprisingly often, it can fix errors if you simply say that there is an error. So a waterfall-like approach works well, and I try to use GPT-4 like this: 1. ask it to ask questions, which it rarely does by default when you're prompting it to do a task Often it has a few questions, which you can efficiently update your original prompt to cover. This avoids annoying cases where it'll write an entirely valid solution, to a somewhat different problem than you have, and I think a good statement upfront probably subtly helps guide the rest of the process. 2. make it generate tests; have it iteratively generate new tests which don't overlap with the old ones. This is also useful for starting to modify some existing code: first generate the test-cases, and verify that the code actually works the way you assumed it did, and flush out any hidden assumptions by either you or GPT-4! Then go back to step #1. 3. ask GPT-4 explicitly to make a list of ideas: edge-cases, bug-fixes, features, and stylistic rewrites/lints (in that order) It does not implement any of the suggestions. It simply lists them. If you instead tell it to implement the ideas, it will frequently trip over its own feet while trying to implement them all simultaneously in a single pass through the new code. (Just like humans, it is best to do one thing, check it, and then do the next thing.) 1. frequently, several of the items will be a bad idea, or too risky to ask GPT-4 to do. Go one by one through the list, having it implement just that one, and then test. Try to fix 'core' problems first. 2. self-repair: not infrequently, a fancy rewrite will fail the test-suite (which we did generate in step #2, right?), but given the failing test-case and/or error pasted into the Playground, GPT-4 can usually fix it. (If GPT-4 cannot fix it given several tries and seems to be generating the same code fragments repeatedly or resorting to elaborate & extreme rewrites, though the task doesn't seem that hard, then you may have hit the blind spot and will need to fix it yourself--I've never seen GPT-4 escape the blind spot except by sheer accident.) 3. cleanup: finally, You can ask it to rewrite the code for style/linting, but should leave that to the end, because otherwise that risks adding bugs while changing the code in ways that will wind up being discarded anyway. 4. once it is clean and it's either done the list or you've disapproved the suggestions, and the test-suite is passing, ask it to write a summary/design doc at the beginning and any additional code comments inside it. GPT-4 will usually add a few comments in the code body itself, but not good ones, and it won't usually write an adequate overall summary document unprompted. However, by this point, it has the context to do so should you ask it to. With all this, you're set up for maintainable code: with the test-suite and the up-front design doc, future LLMs can handle it natively (and will be able to learn from training on it), and you can easily add test-cases as you run into bugs; humans should be able to read the code easily after step #3 has finished, so you don't need to care where it came from or try to track 'taint' through all future refactorings or usage--GPT-4 can write readable human-like code, it just doesn't necessarily do it the best way the first time. While you may not necessarily have saved time (at least, if it's in a language you are highly proficient in), you have saved yourself a lot of mental energy & irritation (and made it much easier just to get started) by making GPT-4 do the tedious work; it almost transforms programming from too-often-frustrating work filled with papercuts & brokenness to spectator entertainment. Case Studies Some examples of nontrivial code I've written this way (ie. excluding the many little snippets or modifications I've used GPT-4 for, especially for the finer points of Bash syntax), with GPT-4 doing most (?) of the work, by language, in roughly chronological order: * Bash: tab completion for the upload script, so it tab-completes the file and then the remote destination directory. I have no interest in learning the guts of Bash tab-completion in order to set up more advanced positional tab-completion; but GPT-4 already knows how to do it. * Python: latex2unicode.py uses GPT-4 to convert LaTeX math fragments to HTML+CSS+Unicode, which are much easier to edit/ style, render quicker, and look more natural; as LaTeX is a full-blown and rather hard to parse language, this is extremely difficult to do in any standard formal sense. This is a good example of the loop: I wrote none of the Python, but seeded it with a few instructions & manual rewrites from my existing LaTeX - Unicode pipeline; then I prompted GPT-4 to ask for any LaTeX it could think of which it was unsure how to translate. After it gave a few examples, I would then manually translate them or add a new instruction, and ask again. Most of the examples it asked about I would not have thought of, like playing card suits (which are supported--\clubsuit, \diamondsuit etc). * Haskell: + add thumbnails for videos This is a frustrating one because as far as I can tell from running it, the GPT-4 code is easy to read and works flawlessly: it parses the HTML as expected, creates the necessary thumbnail, and rewrites the HTML