[HN Gopher] Infinite AI Array
       ___________________________________________________________________
        
       Infinite AI Array
        
       Author : adrianh
       Score  : 257 points
       Date   : 2023-01-02 22:36 UTC (1 days ago)
        
 (HTM) web link (ianbicking.org)
 (TXT) w3m dump (ianbicking.org)
        
       | jamal-kumar wrote:
       | I tried GPT-3 and it generated code examples that just ~made up~
       | some library that didn't exist at all. It had code that in theory
       | would have ran, if that library ever existed at all, but it
       | didn't, and it consistently came up with code examples using this
       | imaginary library. I thought that wasn't very impressive.
       | 
       | What I'm finding it's extremely great for is writing drivel. I'm
       | talking ecommerce product descriptions, product names, copy...
       | it's really awesome for making one of my side gigs less of a
       | chore.
        
         | ianbicking wrote:
         | I asked it to write some GPT-related code and it used a gpt3
         | library (which does not exist) instead of the openai library. I
         | was amused by the self-blindness.
         | 
         | (Forgot to actually look it up... seems like it does exist but
         | shouldn't, as it's an empty placeholder:
         | https://pypi.org/project/gpt3/)
        
       | mmastrac wrote:
       | If the results were stable, reproducable and somehow memoizable,
       | this would be actually insanely useful. Perhaps it could modify a
       | cache file somewhere with the generated Python code, and that
       | could be committed.
        
         | recuter wrote:
         | In what manner do you reckon would it be insanely useful?
        
         | etaioinshrdlu wrote:
         | The backend OpenAI APIs are not deterministic even with
         | temperature 0, and they might upgrade the model weights/tuning
         | behind your back too? (Not sure about the upgrade, they might
         | just put out a new model id param...)
        
           | nasir wrote:
           | normally you can choose the model version so that's not a
           | concern.
        
           | ilaksh wrote:
           | They seem pretty consistent with temperature 0. Maybe not
           | 100% but very close.
        
             | justsid wrote:
             | "Pretty consistent" and "deterministic" are not at all the
             | same
        
             | AgentME wrote:
             | It's close but not fully deterministic. I remember seeing a
             | theory that if the system is considering multiple possible
             | completions that have the same rounded score percentage,
             | then its choice between them is nondeterministic.
        
               | teebs wrote:
               | It's likely because GPU calculations are non-
               | deterministic and small differences in floating point
               | numbers could lead to different outcomes (either in the
               | way you described or somewhere deeper in the model)
        
         | keuriGPT wrote:
         | I'm no python expert but it looks like this does happen in
         | memory: https://github.com/ianb/infinite-ai-
         | array/blob/main/iaia/mag...
         | 
         | I can't imagine it's too hard to serialize the `existing`
         | python dict so subsequent runs are deterministic
        
           | wging wrote:
           | It is also caching requests on-disk. See for instance
           | https://github.com/ianb/infinite-ai-
           | array/blob/main/iaia/gpt...
           | 
           | The problem* is that if a prompt changes slightly then it
           | won't be a cache hit.
           | 
           | * okay, _one_ problem.
        
         | samwillis wrote:
         | Looks like Dill [0] would be a good option for serialising the
         | generated code. The built in Python Pickle doesn't support
         | pickling functions properly, but Dill extends it to enable this
         | sort of functionality.
         | 
         | 0: https://pypi.org/project/dill/
        
         | wging wrote:
         | Per the README (and the source code), requests to GPT are
         | cached. I know that doesn't solve the stability problem, since
         | the cache is keyed on the exact prompt among other parameters,
         | but it's something at least.
         | 
         | The source code shows that it's using pickling to store OpenAI
         | responses on the local filesystem. See here:
         | 
         | https://github.com/ianb/infinite-ai-array/blob/main/iaia/gpt...
         | 
         | and here:
         | 
         | https://github.com/ianb/infinite-ai-array/blob/main/iaia/gpt...
        
           | ianbicking wrote:
           | It really should be saving them to JSON instead of Pickle,
           | but I gave up trying to figure out how to properly rehydrate
           | the openai Completion objects.
           | 
           | If it was JSON then it wouldn't be crazy to add it to your
           | repository. (I guess Pickle is stable enough in practice, but
           | it would offend my personal sensibilities to check them in.)
        
       | karmasimida wrote:
       | Neat idea. But GPT itself has context length limit, so it is not
       | infinite? At some point the list will go rogue and becomes
       | irrelevant.
        
         | ianbicking wrote:
         | There's a maximum context of 10 items so it won't ever
         | technically stop (though your GPT bill will keep going up). For
         | something clearly ordered this might be enough, e.g.,
         | primes[100:110] will give a context of primes[90:100] which
         | might give you the correct next primes. (Everything in this
         | library is a "might".)
         | 
         | For something like the book example I expect once you get to
         | item 12 it might repeat item 0 since that won't be included in
         | the examples.
        
         | visarga wrote:
         | If you have a prompt with demonstrations (in-context-learning)
         | you can randomly sample the demonstrations from a larger
         | dataset. This will make the model generate more variations as
         | you are introducing some randomness in the prompt.
        
       | MrZander wrote:
       | Clever. I could see this being legitimately useful for generating
       | test data.
        
       | ccozan wrote:
       | I find it simply ... human to use a tool once you understand it.
       | The ingenuity is so characteristic. Python is fusing with an NL
       | dialect. Amazing.
       | 
       | It feels like the current AI developments ( OpenAI, SD, etc) are
       | just like wheels. We are now putting them on a cart and invent
       | transportation.
       | 
       | And look, we are planning to go to Mars.
        
         | shadowgovt wrote:
         | Python, in particular, is a great language to play this game
         | with since an imported module is just another class instance
         | and class instances can have their dereferencing hooked via the
         | __get__ method.
        
       | paxys wrote:
       | Does anyone remember Google Sets? It was a short lived project
       | where you'd input some entries from a list and Google would
       | automatically expand it with related items. Seemed magical at the
       | time (mid-late 00s I think?).
        
         | dan-robertson wrote:
         | I think it partly made it into Google sheets. The 'auto fill'
         | function when you drag the bottom-right corner of a range will
         | work on lots of things you wouldn't expect. But maybe it
         | doesn't work anymore.
        
           | Gigachad wrote:
           | Yeah I 100% remember this around when Sheets first came out.
        
         | hnuser123456 wrote:
         | Google Squared, and yes, killed around 2007-8, and yes, never
         | seen something that compares ever since. It was Sheets except
         | you could make any row and column headers/labels you want and
         | it would try to fill in the result for every cell.
         | 
         | Edit: just RTFA'd, yeah this is better. What happens if you
         | call magic.ai_take_over_world_debug_until_no_exceptions()?
        
       | iamflimflam1 wrote:
       | I was just messing around with ChatGPT for a similar use case.
       | Amazing what comes out of you ask for:                   Give me
       | a list of imaginary products that might be found in a magic shop
       | formatted as json array including an id name, description, sku,
       | price. Use syntax highlighting for the output and pretty print
       | it.
        
       | xianshou wrote:
       | This is a great illustration of how ChatGPT fundamentally changes
       | "true/false" into a continuum of "truthiness" as measured by
       | plausibility. The infinite AI array is clearly marked as such,
       | but how long will it be before generative extensibility is the
       | (undeclared) norm?
       | 
       | We're all about to get some real-world GAN training.
        
       | kleene_op wrote:
       | I wasn't aware there was a mechanism to get the actual name
       | attached to an object when it is instantiated in Python! That
       | alone made my day.
        
       | w-m wrote:
       | The magic method resolution seems much more intriguing to me than
       | the infinite array. Are there any programming languages (or
       | perhaps a better word would be paradigms), that take this concept
       | further?
       | 
       | I would imagine that I just state comments in my code file. Then,
       | at runtime, all code is produced by the language model, and then
       | executed.
       | 
       | There's the issue of the model producing different generative
       | results with each execution. But maybe that could be taken care
       | off by adding doctests within my comments. Which could, of
       | course, be mostly generated by another language model...
        
         | shadowgovt wrote:
         | Hardware manufacturers actually have to deal with this issue
         | these days.
         | 
         | An awful lot of chip fabrication is done via stochastic /
         | heuristic / monte carlo methods for any large chip; rather than
         | exhaustively laying out the whole chip by hand, hardware devs
         | describe the chip in a constraints language and feed it to a
         | fabrication program (often with some additional parameters like
         | "optimize speed" or "minimize power consumption / heat
         | generation"). Then the program outputs the chip schematic to be
         | fabricated.
         | 
         | Unless you save the random seed at every step, it's entirely
         | possible to end up with the problem that you have _a_
         | schematic, but if you lose it you 'll never get the program to
         | fabricate exactly that schematic again (because hundreds or
         | thousands of precise schematics solve the problem well enough
         | to terminate the monte carlo search).
        
         | matsemann wrote:
         | Used to be quite common in java jpa spring, but I haven't seen
         | it that much in the wild lately. Basically you just write a
         | method name as an interface, and runtime it will implement it.
         | https://docs.spring.io/spring-data/jpa/docs/current/referenc...
         | 
         | To make fun of this behavior, I made a similar thing for
         | javascript when proxies came into the language
         | https://github.com/Matsemann/Declaraoids
        
       | shadowgovt wrote:
       | This will be absolute gold for generating test data.
        
       | alar44 wrote:
       | Can anyone clarify what this does? I can't understand what the
       | purpose of this is.
        
         | Workaccount2 wrote:
         | I believe it uses ChatGPT to generate "infinite" lists of
         | things in a chosen topic.
         | 
         | So rather than programming in every airline (and array of
         | airline names) for a program that returns airline names, it
         | queries ChatGPT for airline names.
         | 
         | If I'm wrong I am sure someone here will be quick to correct me
         | and provide the right answer.
        
       | molenzwiebel wrote:
       | This is similar to copilot-import [0] which in turn was based on
       | stack-overflow-import [1]. I'd be interested to see whether
       | ChatGPT/GPT-3 or Codex/Copilot is better at generating function
       | bodies.
       | 
       | [0]: https://github.com/MythicManiac/copilot-import [1]:
       | https://github.com/drathier/stack-overflow-import
        
       | ballenf wrote:
       | Super useful for mock data generation and testing.
        
       | pedrovhb wrote:
       | This is an example of something I've seen referred to as "code
       | hallucination". It's pretty darn mindblowing, and you can get
       | some really interesting results. Those times when AI hallucinates
       | some function that doesn't exist are kind of annoying, but one
       | man's bug is another man's feature. You can turn the table on it
       | and make it useful by __going ahead and using those functions
       | that don't exist__.
       | 
       | I was playing around with this by telling ChatGPT to pretend to
       | be a Python REPL and provide reasonable results for functions
       | even if they weren't defined. A few of my favorite results:
       | >>> sentence_transform("I went to the bank yesterday.",
       | tense="future")          "I will go to the bank tomorrow."
       | >>> wittiest_comeback(to_quip="Hey George, the ocean called.
       | They're running out of shrimp.", funny=True)         "Well, I
       | hope it's not too crabby about it."                  >>>
       | sort_by_temperature(["sun", "ice cube", "flamin-hot cheetos",
       | "tea", "coffee", "winter day", "summer day"], reverse=True)
       | ["flamin-hot cheetos", "sun", "tea", "coffee", "summer day",
       | "winter day", "ice cube"]
       | 
       | It took some experimenting to get it to consistently respond as
       | expected. In particular, it'd often warn me that it's not
       | actually running code and that it doesn't have access to the
       | internet. Explicitly telling it to respond despite those things
       | helped. Here's the latest version of the prompt I've had success
       | with:
       | 
       | ---
       | 
       | Your task is to simulate an interpreter for the Python
       | programming language. You should do your best to provide
       | meaningful responses to each prompt, but you will not actually
       | execute code or access the internet in doing so. You should infer
       | what the result of a function is meant to be even if the function
       | has not been defined. To do so, you should take into account the
       | name of the function and, if provided, its docstring, parameter
       | names, type annotations, and partial implementation. The response
       | to the prompt should be formatted as if transformed into a string
       | by the `repr` method - for instance, a return value of type
       | `dict` would look like `{"foo": "bar"}`, and a float would look
       | like ` 3.14`. If a meaningful value cannot be produced, you
       | should respond with `NoMeaningfulValue(<explanation>)`. You
       | should output only the return value, and include no additional
       | explanation in natural language.
       | 
       | ---
       | 
       | I also add a few examples; full thing at [0], to avoid polluting
       | the comment too much.
       | 
       | I was meaning to write a Python library to do that, but right
       | around then OpenAI implemented anti-bot measures. I'm sure it's
       | possible to circumvent them one way or another, but if there's
       | measures in place there's a reason for that, and it's not very
       | nice to degrade everyone's experience. I've had less impressive
       | results with codex-2 so far. Still, harnessing hallucination is a
       | pretty cool idea.
       | 
       | [0]
       | https://gist.github.com/pedrovhb/2ac9b93f446f91a2be234622309...
        
         | ianbicking wrote:
         | I like the idea of a prompt based sort! E.g.,
         | books.sort(key="publish date"). I'm not sure if that's best
         | done with a dict-like approach (i.e., actually calculate the
         | key) or let it get really fuzzy and ask it to sort directly
         | based on an attribute. Then you might be able to do
         | books.sort(key="overall coolness factor") which is an attribute
         | that doesn't necessarily map to any concrete value but might be
         | guessed on a pairwise basis. (This might be stretching GPT a
         | bit far.)
        
         | AgentME wrote:
         | Using just your examples and no explanatory prompt with text-
         | davinci-003 or code-davinci-002 worked pretty well for me in
         | some quick tests.
        
       | stared wrote:
       | It kind of reminds me this one https://xkcd.com/221/:
       | getRandomNumber() {             return 4; // chosen by fair dice
       | roll                       // guaranteed to be random         };
        
         | visarga wrote:
         | Have you ever imagined it will come a day when an AI can
         | explain this joke?
         | 
         | > This code snippet appears to be a joke because it is claiming
         | to be a function that returns a random number, but the function
         | always returns the number 4. The line "chosen by fair dice roll
         | / guaranteed to be random" is a reference to the common phrase
         | "random as a dice roll," which means something is truly random
         | and unpredictable. However, the fact that the function always
         | returns 4 suggests that it is not actually random at all.
        
           | jamesdwilson wrote:
           | ChatGPT said:
           | 
           | > The joke in this code is that the function is called
           | "getRandomNumber", but it always returns the number 4, which
           | is not random at all. The comment "chosen by fair dice roll"
           | and "guaranteed to be random" are added for humorous effect,
           | because they suggest that the number 4 was chosen through a
           | random process, but in reality it is hardcoded into the
           | function. The joke is meant to be a play on the idea of
           | "randomness", implying that the function is not actually
           | generating a random number as it claims to do.
        
             | wzdd wrote:
             | It's interesting to contrast this explanation and the
             | parent's one with the explanation from https://www.explainx
             | kcd.com/wiki/index.php/221:_Random_Numbe... , which is
             | essentially that the function may well return a random
             | number but, contrary to expectation for functions with
             | names like this, won't return different results if called
             | more than once.
        
       ___________________________________________________________________
       (page generated 2023-01-03 23:00 UTC)