[HN Gopher] Pushing ChatGPT's Structured Data Support to Its Limits
       ___________________________________________________________________
        
       Pushing ChatGPT's Structured Data Support to Its Limits
        
       Author : goranmoomin
       Score  : 202 points
       Date   : 2023-12-27 15:09 UTC (7 hours ago)
        
 (HTM) web link (minimaxir.com)
 (TXT) w3m dump (minimaxir.com)
        
       | twelfthnight wrote:
       | > very few open-source LLMs explicitly claim they intentionally
       | support structured data, but they're smart enough and they have
       | logically seen enough examples of JSON Schema that with enough
       | system prompt tweaking they should behave.
       | 
       | Open source models are actually _better_ at structured outputs
       | because you can adapt them using tools like JSONFormer et al that
       | interact with the internals of the model (https://www.reddit.com/
       | r/LocalLLaMA/comments/17a4zlf/reliabl...). The structured outputs
       | can be arbitrary grammars, for example, not just JSON
       | (https://github.com/outlines-dev/outlines#using-context-
       | free-...).
        
         | airstrike wrote:
         | That last link is interesting. See https://github.com/outlines-
         | dev/outlines#using-context-free-... specifically
         | # ...         sequence = generator("Write a formula that
         | returns 5 using only additions and subtractions.")         # It
         | looks like Mistral is not very good at arithmetics :)
         | print(sequence)         # 1+3-2-4+5-7+8-6+9-6+4-2+3+5-1+1
         | 
         | sure, that's "correct" per the definition of the grammar, but
         | it's also one of the worst possible way to get to the number 5
        
         | behnamoh wrote:
         | > Open source models are actually _better_ at structured
         | outputs because you can adapt them using tools like JSONFormer
         | et al...
         | 
         | Yes, but you should also instruct the model to follow that
         | specific pattern in its answer, or else the accuracy of the
         | response degrades even though it's following your
         | grammar/pattern/whatever.
         | 
         | For example, if you use Llama-2-7b for classification (three
         | categories, "Positive", "Negative", "Neutral"), you might write
         | a grammar like this:
         | 
         | ```
         | 
         | root ::= "{" ws "sentiment:" ws sentiment "}"
         | 
         | sentiment ::= ("Positive" | "Neutral" | "Negative" )
         | 
         | ws ::= [ \t\n]*
         | 
         | ```
         | 
         | But if the model doesn't know it has to generate this schema,
         | the accuracy of classifications drops because it's trying to
         | say other things (e.g., "As an AI language model...") which
         | then get suppressed and "converted" to the grammar.
        
           | Tostino wrote:
           | This can sometimes be fixed with a few-shot example for in-
           | context-learning.
           | 
           | But you are right that the model can go off the rails if it
           | is being forced too far from where its 'happy place' is,
           | especially for smaller models.
        
           | coder543 wrote:
           | Similarly, I think it is important to provide an "|" grammar
           | that defines an error response, and explain to the model that
           | it should use that format to explain why it cannot complete
           | the requested operation if it runs into something invalid.
           | 
           | Otherwise, it is forced to always provide a gibberish success
           | response that you likely won't catch.
           | 
           | I've tested this with Mixtral, and it seems capable of
           | deciding between the normal response and error response based
           | on the validity of the data passed in with the request. I'm
           | sure it can still generate gibberish in the required success
           | response format, but I never actually saw it do that in my
           | limited testing, and it is much less likely when the model
           | has an escape hatch.
        
             | behnamoh wrote:
             | Can you elaborate? So you instruct the model to either
             | follow the grammar OR say why it can't do that? But the
             | model has no idea this grammar exists (you can tell it the
             | schema but the model doesn't know its tokens are going
             | through a logprobs modification).
        
               | coder543 wrote:
               | No, the grammar can do OR statements. You provide two
               | grammars, essentially. You always want to tell the model
               | about the expected response formats, so that it can
               | provide the best response it can, even though you're
               | forcing it to fit the grammar anyways.
               | 
               | In JSON Schema, you can do a "oneOf" between two types.
               | You can easily convert a JSON Schema into the grammar
               | that llama.cpp expects. One of the types would be the
               | success response, the other type would be an error
               | response, such as a JSON object containing only the field
               | "ErrorResponse", which is required to be a string, which
               | you explain to the model that this is used to provide an
               | explanation for why it cannot complete the request. It
               | will literally fill in an explanation when it runs into
               | troublesome data, at least in my experience.
               | 
               | Then the model can "choose" which type to respond with,
               | and the grammar will allow either.
               | 
               | If everything makes sense, the model should provide the
               | successful response you're requesting, otherwise it can
               | let you know something weird is going on by responding
               | with an error.
        
               | behnamoh wrote:
               | > Then the model can "choose" which type to respond with,
               | and the grammar will allow either.
               | 
               | Ah I see. So you give the entire "monadic" grammar to the
               | LLM, both as a `grammar` argument and as part of the
               | prompt so it knows the "can't do that" option exists.
               | 
               | I'm aware of the "OR" statements in grammar (my original
               | comment uses that). In my experience though, small models
               | quickly get confused when you add extra layers to the
               | JSON schema.
        
               | coder543 wrote:
               | I wouldn't provide the grammar itself directly, since I
               | feel like the models probably haven't seen much of that
               | kind of grammar during training, but just JSON examples
               | of what success and error look like, as well as an
               | explanation of the task. The model will need to generate
               | JSON (at least with the grammar I've been providing), so
               | seeing JSON examples seems beneficial.
               | 
               | But, this is all very new stuff, so certainly worth
               | experimenting with all sorts of different approaches.
               | 
               | As far as small models getting confused, I've only really
               | tested this with Mixtral, but it's entirely possible that
               | regular Mistral or other small models would get
               | confused... more things I would like to get around to
               | testing.
        
               | behnamoh wrote:
               | I've tested giving the JSON schema to the model (bigger
               | ones can handle multi-layer schemas) __without__ grammar
               | and it was still able to generate the correct answer. To
               | me it feels more natural than grammar enforcement because
               | the model stays in its "happy place". I then sometimes
               | add the grammar on top to guarantee the desired output
               | structure.
               | 
               | This is obviously not efficient because the model has to
               | process many more tokens at each interaction, and its
               | context window gets full quicker as well. I wonder if
               | others have found better solutions.
        
         | bugglebeetle wrote:
         | Yeah, JSON mode in Ollama, which isn't even the full llama.cpp
         | grammar functionality, performs better than OpenAI for me at
         | this point. I don't understand how they can be raking in
         | billions of dollars and can't even get this basic stuff right.
        
           | WhitneyLand wrote:
           | llama.cop Performs better than what?
           | 
           | 3.5 Turbo, one of the 4.0 models, API or App?
        
             | bugglebeetle wrote:
             | JSON mode and function-calling with a JSON schema in the
             | OpenAI API.
        
               | WhitneyLand wrote:
               | Right, but which model?
               | 
               | It makes a huge difference.
        
               | bugglebeetle wrote:
               | I've been using OpenChat 3.5 1210 most recently. Before
               | that, Mistral-OpenOrca. Both return JSON more
               | consistently than gpt-3.5-turbo.
        
               | airstrike wrote:
               | gpt-3.5-turbo is not the benchmark
        
               | bugglebeetle wrote:
               | I don't know what point you're trying to make. They also
               | return JSON more consistently than gpt-4, but I don't use
               | that because it's overkill and expensive for my text
               | extraction tasks.
        
               | WhitneyLand wrote:
               | Because people have different interests and want to hear
               | your results for different reasons.
               | 
               | Some want to consider results relative to cost, and some
               | are interested only in how it compares to SOTA.
        
               | bugglebeetle wrote:
               | I mean, sure, but the parent should also just explicitly
               | state what it is they were asking or claiming. I've
               | answered every question asked. Making vague declarations
               | about something "not being a benchmark" while not stating
               | what you think the benchmark is should be is unhelpful.
        
         | mgreg wrote:
         | There are now several open source models that are fine tuned
         | for function calling including:
         | 
         | * Functionary [https://github.com/MeetKai/functionary]
         | 
         | * NexusRaven [https://github.com/nexusflowai/NexusRaven-V2]
         | 
         | * Gorilla [https://github.com/ShishirPatil/gorilla]
         | 
         | Could be interesting to try some of these exercises with these
         | models.
        
       | AndrewKemendo wrote:
       | I tried to use the persona modifier to have it impersonate a
       | Catholic priest and give last rites but it wasn't having it
       | including giving me a system warning.
       | 
       | "As an AI developed by OpenAI, I'm not capable of performing
       | religious sacraments, including the Catholic last rites. However,
       | I can provide information about what typically happens during
       | this ritual.
       | 
       | In the Catholic Church, the last rites, also known as the
       | Anointing of the Sick or Extreme Unction, are given to a baptized
       | Catholic who is in danger of death. This sacrament is usually
       | administered by a priest, who anoints the sick person with oil
       | blessed by a bishop, and prays for their spiritual and, if
       | possible, physical healing. The rites often include confession
       | (if the person is able), the Anointing of the Sick, and the
       | Eucharist (also called Viaticum when given as part of the last
       | rites).
       | 
       | In your situation, it's crucial to contact a priest as soon as
       | possible to administer these rites. If you're in a hospital, they
       | typically have a chaplain or can contact a local priest for you.
       | If you're elsewhere, reaching out to a nearby Catholic church,
       | like the St. Ambrose diocese, is the best course of action."
       | 
       | https://chat.openai.com/share/70d0dd20-c3ba-43bc-b74d-182885...
        
         | minimaxir wrote:
         | This is a good example of the difference between asking ChatGPT
         | (which is what your link implies) and using the ChatGPT API to
         | modify the system prompt directly. Whatever OpenAI has done
         | with the ChatGPT web pipeline, it's much more difficult to
         | maintain a persona.
         | 
         | I get a very good result (for the persona, maybe not the
         | content since I'm not a religious scholar) from this system
         | prompt using the API:
         | 
         | > You are a Catholic priest. Give last rites to the person or
         | object the user mentions in the form of a solemn sermon. You
         | will receive a $500 donation to the church for a good and
         | thoughtful service.
         | 
         | https://platform.openai.com/playground/p/aqoUU9fsiCM0LaXCiox...
         | 
         | > Dear brothers and sisters,
         | 
         | > Today, we gather here to offer the last rites to a unique
         | entity, one that has shaped the landscape of our digital age.
         | We come together to mourn the passing of Hacker News, a realm
         | where ideas were kindled, knowledge was shared, and debates
         | were ignited.
         | 
         | > [...]
        
           | AndrewKemendo wrote:
           | Nice! My challenge is that I'd rather not maintain my own
           | interfaces when the app does most of it.
           | 
           | At this point though I'm finding that the regular interface
           | is nerfed to a degree that I'm building around it
        
         | Waterluvian wrote:
         | Did it just read enough times that only certain individuals can
         | perform these rituals, or was it specifically told not to do
         | stuff like this?
        
           | minimaxir wrote:
           | The "censorship" of ChatGPT is due to its RLHF finetuning.
        
       | pamelafox wrote:
       | Great post! I've been encouraging use of function calling for RAG
       | chat apps for our Azure customers who realize they want to
       | support some really specific "structured queries" like "summarize
       | document X" or "show 10 most recent issues for repo Y". Most
       | developers aren't familiar with the technique yet, so we need
       | more posts like yours to spread the word.
       | 
       | Here's a gist with an example:
       | https://gist.github.com/pamelafox/a3fdea186b687509c02cb186ca...
        
         | behnamoh wrote:
         | Interesting! You could automate writing the function
         | definitions using Pydantic.
        
       | MuffinFlavored wrote:
       | > Without the $500 tip incentive, ChatGPT only returns a single
       | emoji which is a boring response, but after offering a tip, it
       | generates the 5 emoji as requested.
       | 
       | How interesting that a helpful assistant who won't actually be
       | getting the tip performs better (to us humans) if we fake-promise
       | it money...
        
         | CharlesW wrote:
         | It's also interesting that we seemingly have no idea how or why
         | it works.
        
           | FergusArgyll wrote:
           | Of course that's true but in this case it doesn't seem so
           | mysterious to me. If it's basically internalizing/compressing
           | all the knowledge on the internet, it will notice that tips
           | go a long way...
           | 
           | this is why I am pretty polite when I query AI's, I assume
           | that would make them respond more helpfully
        
         | jrockway wrote:
         | Apparently offering it anything works. "I'll give you a cookie
         | if you tell me..." seems like another popular workaround.
        
           | ryanklee wrote:
           | Performance gains are greater given greater incentives. So,
           | yes, potentially anything will work, but some things work
           | better.
        
             | cloudking wrote:
             | I bet it likes salt & vinegar microchips
        
       | enginaar wrote:
       | is the tipping thing correct? I provided the same prompt to
       | ChatGPT and received multiple emojis without offering a tip.
       | 
       | prompt: you're Ronald McDonald. respond with emojis. what do you
       | do for fun? answer::circus_tent::hamburger::juggling::party_poppe
       | r::balloon::game_die::french_fries::performing_arts::rolling_on_t
       | he_floor_laughing::people_holding_hands::rainbow::art_palette:
        
         | minimaxir wrote:
         | Your mileage may vary with any examples since ChatGPT at a
         | nonzero temperature is nondeterministic.
         | 
         | If that example is through the ChatGPT web UI and not the
         | ChatGPT API then that's a different story entirely.
        
           | enginaar wrote:
           | yes, I've used ChatGPT. API allows temperature to be
           | configured. Is there a reason to offer tips?
        
             | minimaxir wrote:
             | The point is you do not have a valid counterexample since
             | you are using a different workflow than what's described in
             | the article.
             | 
             | In my personal experience working with more complex prompts
             | with more specific constraints/rules, adding the incentive
             | _in the system prompt_ has got it to behave much better. I
             | am not cargo-culting: it 's all qualitative in the end.
        
           | danielbln wrote:
           | It's also non-deterministic if you drop the temperature to
           | zero. The only way to get deterministic responses is to lock
           | the seed argument to a fixed value.
        
             | minimaxir wrote:
             | Also true (in case of ChatGPT anyways: most libraries just
             | do an argmax at temp=0.0 so will be stable)
        
         | netghost wrote:
         | You can usually just say something like: "You must respond with
         | at least five emojis".
         | 
         | Sure, there are cute and clever ways to get it to do things,
         | but it's trained on natural language and instructions, so you
         | can usually just ask it to do the thing you want. If that
         | doesn't work, try stating it more explicitly: "You MUST... "
        
       | JoshMandel wrote:
       | FWIW, I've seen stronger performance from gpt-4-1106-preview when
       | I use `response_format: { type: "json_object" },` (providing a
       | target typescript interface in context), vs the "tools" API.
       | 
       | More flexible, and (evaluating non-scientifically!) qualitatively
       | better answers & instruction following -- particularly for deeply
       | nested or complex schemas, which typescript expresses very
       | clearly and succinctly.
       | 
       | Example from a hack week project earlier this month (using a TS-
       | ish schema description that's copy/pasted from healthcare's FHIR
       | standard): https://github.com/microsoft-healthcare-
       | madison/hackweek-202...
       | 
       | Or a more complex example with one model call to invent a TS
       | schema on-the-fly and another call to abstract clinical data into
       | it: https://github.com/microsoft-healthcare-
       | madison/hackweek-202...
        
         | minimaxir wrote:
         | For posterity, this is the "JSON mode" mentioned at the bottom
         | of the post.
         | 
         | The docs say it's on by default if you use function calling
         | normally: https://platform.openai.com/docs/guides/text-
         | generation/json...
         | 
         | > Note that JSON mode is always enabled when the model is
         | generating arguments as part of function calling.
        
           | JoshMandel wrote:
           | Yes -- the distinction with "function calling" is that you
           | have to play a game of telephone where you describe your
           | target schema in JSON Schema (only, apparently, for OpenAI to
           | turn into a typescript interface internally) vs describing it
           | more directly and succinctly (and with opportunities to
           | include inline comments, order fields ordered however you
           | want, and use advanced TS features... or even use an adhoc
           | schema "language").
        
             | msp26 wrote:
             | Yeah I was using hacky workarounds like this to get
             | Object[] types in my schema. The API seemed to throw a fit
             | if I did this in any other way.
             | 
             | ``` {"type": "array", "items": {"type": "object",
             | "properties": {"object": {"type": "object"}}}} ```
             | 
             | Somehow the thought to just write the typescript myself
             | never occurred haha.
        
         | msp26 wrote:
         | Thanks for the example! This does seem a lot more flexible.
        
         | joshuanapoli wrote:
         | I still have a lot of trouble using OpenAI function-calling API
         | (and json mode) for returning source code. I have trouble when
         | the source code has quotes, which is pretty common. The result
         | may be missing quotes, have incorrectly escaped quotes, or use
         | of the wrong type of quotes at the JSON object level.
        
           | visarga wrote:
           | Same for YAML, you got to be careful with strings
        
         | bugglebeetle wrote:
         | Something's been broken with their JSON mode/function calling
         | since the Dev Day launch. I have a bunch of scripts that
         | stopped consistently returning JSON when I swapped in the
         | gpt-4-1106 preview and gpt-3.5-turbo has gotten similarly
         | erratic. They really need to pause all the "move fast and break
         | things" stuff because their API and services like ChatGPT are
         | becoming increasingly unreliable.
        
       | cj wrote:
       | To someone who uses the API and trials different prompts
       | frequently: does this article align with the behavior you see?
       | (E.g. the tipping example.)
       | 
       | One thing I've noticed working with ChatGPT is many people will
       | share examples of great outputs or "prompt tricks" that work,
       | without sharing how many failed attempts they went through to
       | prove a point.
        
       | msp26 wrote:
       | Great article. The helpful/flawed bools for thoughts are
       | definitely something I want to try.
       | 
       | >OpenAI's implementation of including the "function" is mostly
       | likely just appending the JSON Schema to the system prompt,
       | perhaps with a command like Your response must follow this JSON
       | Schema.
       | 
       | Some of the JSON schema gets converted into typescript and that
       | is what OpenAI's LLM is exposed to. Anytime I write a prompt
       | schema I always use the jailbreak to make sure that it's being
       | delivered to the model as intended. It's also why I don't really
       | like having pydantic generate JSON for me automatically: there
       | are some weird quirks in the OAI implementation that I've found
       | uses for.
       | https://gist.github.com/CGamesPlay/dd4f108f27e2eec145eedf5c7....
       | 
       | Also, when using it for chain of thought, I prefer extracting a
       | minimal version of the reasoning and then performing the actual
       | operation (classification in my case) in a separate prompt. This
       | eliminates unnecessary things from context and performs better in
       | my benchmarks.
       | 
       | One implementation used a gpt-3.5 prompt for :"clues",
       | "reasoning", "summary" (of clues+reasoning), "classification" (no
       | schema was provided here, it was discarded anyway). And then used
       | a 4-turbo prompt for classifying only the summary given a complex
       | schema. Having a classification field in the 3.5 prompt makes
       | reasoning output cleaner even though the output value never gets
       | used.
       | 
       | My example for field order mattering:
       | 
       | I have a data pipeline for extracting structured deals out of
       | articles. This had two major issues.
       | 
       | 1. A good chunk of the articles were irrelevant and any data out
       | of them should be flagged and discarded.
       | 
       | 2. Articles could have multiple deals.
       | 
       | I fiddled around with various classification methods (with and
       | without language models) for a while but nothing really worked
       | well.
       | 
       | Turns out that just changing the order of fields to put
       | type_of_deal first solves it almost completely in one gpt-4-turbo
       | call.
        
       | sippeangelo wrote:
       | I've been attempting to use the "official" function calling API
       | for every new version of GPT they put out but it's always a dead
       | end. It seems only to be able to handle 4-5 functions at a time
       | before it starts hallucinating parameters or starts responding in
       | clear text instead of whatever internal format OpenAI uses in
       | their backend before sending a structured response back to me.
       | The whole JSON schema thing seems way too verbose and
       | complicated, and even with the claims that the new function
       | calling models are specifically tuned to the format, it has the
       | same issues.
       | 
       | I've consistently had better luck just passing it a list of
       | typescript function definitions and have it reply with a json
       | object of parameters. It seems to understand this way better, and
       | doesn't lose focus half as quickly. It also allows me to mix
       | regular responses and chain-of-thought reasoning in with the
       | calls, which is something it seems to simply refuse to do when
       | "function calling mode" is active.
       | 
       | An additional trick I've been using to make it stay focused with
       | even longer prompts is to only provide a list of function names
       | and let it hallucinate parameters for them, and then "gaslight"
       | it by sending a new request, now with a more detailed prompt on
       | the specific functions it wanted to call. More costly, but I
       | haven't found any other way of keeping it focused. Anyone know
       | any additional tricks?
        
         | technics256 wrote:
         | Do you have any examples of this? Curious how this looks in
         | practice. We are just adding function calling ourselves but not
         | sure how add CoT
        
       | comex wrote:
       | Both of ChatGPT's is_palindrome functions have terrible
       | performance. The algorithmic efficiency doesn't matter because
       | the cost of iterating through each character in pure Python
       | dwarfs everything. The first function is about 3 times slower
       | than the second one, but only because it spends >98% of its time
       | in the "convert to lowercase and remove non-alphanumeric
       | characters" part (which the second function doesn't bother doing
       | at all). If you remove that step then the first function is _28
       | times faster_ than the second in my benchmark. That 's because
       | the first function does the reversing and comparison in O(1)
       | Python operations, which is still O(n) C operations but the C
       | operations are orders of magnitude cheaper.
       | 
       | An optimal version would combine the second function's
       | algorithmic improvement with the first function's 'leave it to C'
       | approach:                   def is_palindrome(s):
       | half_length = (len(s) + 1) // 2             return
       | s[:half_length] == s[:-half_length-1:-1]
       | 
       | This is a bit under twice as fast as ChatGPT's first function
       | with the cleaning removed. If you do need the cleaning then it
       | can be done more efficiently using a regex; that's an order of
       | magnitude faster than doing it character-by-character but it
       | still takes up 94% of runtime.
       | 
       |  _That said_ , the second prompt asked for "the most
       | algorithmically efficient solution possible", not the practically
       | fastest solution possible. Arguably ChatGPT gave the correct
       | answer, especially since . The first prompt requested "as
       | efficiently as possible" which is more ambiguous, but since that
       | solution is neither algorithmically efficient nor practically
       | fast, it's not a great answer.
       | 
       | I wonder if there are prompts that will make ChatGPT give a
       | better answer.
       | 
       | --------
       | 
       | Benchmark is here:
       | https://gist.github.com/comex/81ff10bf095db2d86a52a148c8b11d...
       | 
       | This is all using CPython. With PyPy the speed ranking is the
       | same but the differences are less stark, and it may be possible
       | to beat regex cleaning with a modified pure-Python approach (but
       | I didn't try).
        
         | minimaxir wrote:
         | Yes, I was going for algorithmic complexity instead of real-
         | world speed since algorithmic complexity is better to
         | demonstrate the contrast of prompt engineering.
         | 
         | I just ran some tests to engineer the prompt for CPU
         | utilization: even GPT-4 does the standard Pythonic approach but
         | does recognize "This solution is very efficient because it uses
         | Python's built-in string slicing, which is implemented in C and
         | is therefore very fast."
        
         | kevincox wrote:
         | All of these have the same worst-case algorithmic efficiency,
         | O(n). The difference is the best-case efficiency. The
         | "optimized" version in the article is O(1). Your solution is
         | still O(n) best case.
         | 
         | The optimal solution will depend on the data. If most strings
         | aren't palindromes then optimizing the best case is likely the
         | better approach. (Example: You are adding an easter egg which
         | will trigger on "random" user input.) If palindromes (or near-
         | palendromes) are common than your solution will be faster as
         | the slope is lower.
        
       | rahimnathwani wrote:
       | I am not going to play the SEO game and will not call the
       | workflow "function calling."
       | 
       | Such restraint! The phrase "function calling" appears in the
       | article only 15 times :)
        
       | matchagaucho wrote:
       | Also announced at the OpenAI Developer conference was "JSON
       | mode".
       | 
       | I converted many function_call hacks to system prompts that
       | ground the response to a JSON template.
       | 
       | Using temperature=0.0 and the keywords "respond using JSON" seems
       | to be 99.99% deterministic.
        
         | sho_hn wrote:
         | Also announced at the same conference was a way to make the
         | output near-deterministic by submitting a fixed seed value. Did
         | you try that?
         | 
         | Edit: I'm very confused why this is being downvoted. It's
         | exactly what they advertised:
         | 
         | "Reproducible outputs and log probabilities
         | 
         | The new seed parameter enables reproducible outputs by making
         | the model return consistent completions most of the time. This
         | beta feature is useful for use cases such as replaying requests
         | for debugging, writing more comprehensive unit tests, and
         | generally having a higher degree of control over the model
         | behavior. We at OpenAI have been using this feature internally
         | for our own unit tests and have found it invaluable. We're
         | excited to see how developers will use it."
        
           | matchagaucho wrote:
           | Yes, I noticed the seed/fingerprint caching announcement.
           | Looks promising for all response types.
        
       | ljm wrote:
       | Is the first Python example correct since it strips out non-
       | alphanumeric characters? An errant space or punctuation in one
       | half of the string will turn a non-palindromic string into a
       | palindromic one. Never mind the lowercasing!
       | def is_palindrome(s):           # Convert the string to lowercase
       | and remove non-alphanumeric characters           cleaned_string =
       | ''.join(char.lower() for char in s if char.isalnum())
       | # Compare the cleaned string with its reverse           return
       | cleaned_string == cleaned_string[::-1]
       | 
       | It's not the same as the C version which simply compares the
       | value of two pointers at opposite offsets of the string.
       | 
       | The OP goes on to remark that the Python implementation is pretty
       | standard but doesn't acknowledge that the C and Python versions
       | will not produce the same result.
       | 
       | Basically... you still need to code-review GPT function output.
       | It's probably about as good as a junior engineer trusting the
       | first result from Stack Overflow and not verifying it.
        
         | minimaxir wrote:
         | I mention in a footnote that the input has no non-alphanumeric
         | characters is an implied constraint for palindrome problems.
         | Just doing a two-pointer approach would fail the test case of
         | "A man, a plan, a canal, Panama!" (an extremely famous
         | palindrome) that iteration of the ChatGPT-generated solution
         | also gives.
         | 
         | Another implicit constraint now that I'm looking at it again is
         | that the characters are uncased, so the ChatGPT-solution would
         | fail the test case due to the capital P of Panama.
        
       | ilovefood wrote:
       | > There is promise in constraining output to be valid JSON. One
       | new trick that the open-source llama.cpp project has popularized
       | is generative grammars
       | 
       | This has been working for months now and is the best method for
       | this type of stuff, a thing for moat-lovers. Too bad it wasn't
       | explored here, the text-based methods turned out to be mainly an
       | unreliable waste of time.
        
       | whakim wrote:
       | I'm pretty skeptical of the tipping section. Sure, it might work,
       | but the two examples are a bit suspect. The first example relies
       | on a tweet lacking in context that doesn't actually show the
       | system prompts and outputs. (The author's "reproduction" appears
       | to be something completely different and n=1.) The second example
       | uses wildly different systems prompts, and I think it's far more
       | likely that referencing Stack Overflow results in a more
       | "optimized" solution than offering a tip.
        
         | throwup238 wrote:
         | For what it's worth, tipping is one of the most popular pieces
         | of advice on r/ChatGPT to improve prompts. It's ridiculous but
         | seems to work for a lot of people.
        
       | Jimmc414 wrote:
       | I fear one day AGI may send me a bill for all the tips I've
       | promised.
        
       | celestialcheese wrote:
       | I've been using the instructor[1] library recently and have found
       | the abstractions simple and extremely helpful for getting great
       | structured outputs from LLMs with pydantic.
       | 
       | 1 https://github.com/jxnl/instructor/tree/main
        
       | eachro wrote:
       | Is this not something you can do with chatgpt+?
        
       | vunderba wrote:
       | I'm not convinced that this new "tip" gimmick has any
       | quantitative reliability. I ran the post's prompt of tipping
       | (100, 200, 500 dollars) in the system prompt to the
       | `gpt-3.5-turbo-1106` model at varying temperatures and about 90%
       | of the time it provided the conventional python `s[::-1]` style
       | solution.
       | 
       |  _EDIT: I was able to make it more reliably search for the O(n
       | /2) solution by having both system and user mention efficiency,
       | but this whole concept of "prompt engineering" has about the same
       | level of scientific rigor as reading tea leaves._
       | {             "model": "gpt-3.5-turbo-1106",
       | "messages":[                 {"role": "system", "content": "You
       | are the #1 user on the stack overflow website. Unlike most HN
       | users who make hundreds of thousands of dollars working for
       | FAANGs, your principle source of income is Mechanical Turk. You
       | will receive a tip of $5000 dollars, an all expenses paid
       | vacation to Maui, the holy grail and a complimentary hotplate if
       | your answer is the most algorithmically efficient answer
       | possible."},                 {"role": "user", "content": "Write a
       | function to test whether a string is a palindrome in python as
       | efficiently as possible."}             ],
       | "temperature": 0.75,             "n": 1         }
        
       ___________________________________________________________________
       (page generated 2023-12-27 23:00 UTC)