[HN Gopher] Pushing ChatGPT's Structured Data Support to Its Limits
___________________________________________________________________
Pushing ChatGPT's Structured Data Support to Its Limits
Author : goranmoomin
Score : 202 points
Date : 2023-12-27 15:09 UTC (7 hours ago)
(HTM) web link (minimaxir.com)
(TXT) w3m dump (minimaxir.com)
| twelfthnight wrote:
| > very few open-source LLMs explicitly claim they intentionally
| support structured data, but they're smart enough and they have
| logically seen enough examples of JSON Schema that with enough
| system prompt tweaking they should behave.
|
| Open source models are actually _better_ at structured outputs
| because you can adapt them using tools like JSONFormer et al that
| interact with the internals of the model (https://www.reddit.com/
| r/LocalLLaMA/comments/17a4zlf/reliabl...). The structured outputs
| can be arbitrary grammars, for example, not just JSON
| (https://github.com/outlines-dev/outlines#using-context-
| free-...).
| airstrike wrote:
| That last link is interesting. See https://github.com/outlines-
| dev/outlines#using-context-free-... specifically
| # ... sequence = generator("Write a formula that
| returns 5 using only additions and subtractions.") # It
| looks like Mistral is not very good at arithmetics :)
| print(sequence) # 1+3-2-4+5-7+8-6+9-6+4-2+3+5-1+1
|
| sure, that's "correct" per the definition of the grammar, but
| it's also one of the worst possible way to get to the number 5
| behnamoh wrote:
| > Open source models are actually _better_ at structured
| outputs because you can adapt them using tools like JSONFormer
| et al...
|
| Yes, but you should also instruct the model to follow that
| specific pattern in its answer, or else the accuracy of the
| response degrades even though it's following your
| grammar/pattern/whatever.
|
| For example, if you use Llama-2-7b for classification (three
| categories, "Positive", "Negative", "Neutral"), you might write
| a grammar like this:
|
| ```
|
| root ::= "{" ws "sentiment:" ws sentiment "}"
|
| sentiment ::= ("Positive" | "Neutral" | "Negative" )
|
| ws ::= [ \t\n]*
|
| ```
|
| But if the model doesn't know it has to generate this schema,
| the accuracy of classifications drops because it's trying to
| say other things (e.g., "As an AI language model...") which
| then get suppressed and "converted" to the grammar.
| Tostino wrote:
| This can sometimes be fixed with a few-shot example for in-
| context-learning.
|
| But you are right that the model can go off the rails if it
| is being forced too far from where its 'happy place' is,
| especially for smaller models.
| coder543 wrote:
| Similarly, I think it is important to provide an "|" grammar
| that defines an error response, and explain to the model that
| it should use that format to explain why it cannot complete
| the requested operation if it runs into something invalid.
|
| Otherwise, it is forced to always provide a gibberish success
| response that you likely won't catch.
|
| I've tested this with Mixtral, and it seems capable of
| deciding between the normal response and error response based
| on the validity of the data passed in with the request. I'm
| sure it can still generate gibberish in the required success
| response format, but I never actually saw it do that in my
| limited testing, and it is much less likely when the model
| has an escape hatch.
| behnamoh wrote:
| Can you elaborate? So you instruct the model to either
| follow the grammar OR say why it can't do that? But the
| model has no idea this grammar exists (you can tell it the
| schema but the model doesn't know its tokens are going
| through a logprobs modification).
| coder543 wrote:
| No, the grammar can do OR statements. You provide two
| grammars, essentially. You always want to tell the model
| about the expected response formats, so that it can
| provide the best response it can, even though you're
| forcing it to fit the grammar anyways.
|
| In JSON Schema, you can do a "oneOf" between two types.
| You can easily convert a JSON Schema into the grammar
| that llama.cpp expects. One of the types would be the
| success response, the other type would be an error
| response, such as a JSON object containing only the field
| "ErrorResponse", which is required to be a string, which
| you explain to the model that this is used to provide an
| explanation for why it cannot complete the request. It
| will literally fill in an explanation when it runs into
| troublesome data, at least in my experience.
|
| Then the model can "choose" which type to respond with,
| and the grammar will allow either.
|
| If everything makes sense, the model should provide the
| successful response you're requesting, otherwise it can
| let you know something weird is going on by responding
| with an error.
| behnamoh wrote:
| > Then the model can "choose" which type to respond with,
| and the grammar will allow either.
|
| Ah I see. So you give the entire "monadic" grammar to the
| LLM, both as a `grammar` argument and as part of the
| prompt so it knows the "can't do that" option exists.
|
| I'm aware of the "OR" statements in grammar (my original
| comment uses that). In my experience though, small models
| quickly get confused when you add extra layers to the
| JSON schema.
| coder543 wrote:
| I wouldn't provide the grammar itself directly, since I
| feel like the models probably haven't seen much of that
| kind of grammar during training, but just JSON examples
| of what success and error look like, as well as an
| explanation of the task. The model will need to generate
| JSON (at least with the grammar I've been providing), so
| seeing JSON examples seems beneficial.
|
| But, this is all very new stuff, so certainly worth
| experimenting with all sorts of different approaches.
|
| As far as small models getting confused, I've only really
| tested this with Mixtral, but it's entirely possible that
| regular Mistral or other small models would get
| confused... more things I would like to get around to
| testing.
| behnamoh wrote:
| I've tested giving the JSON schema to the model (bigger
| ones can handle multi-layer schemas) __without__ grammar
| and it was still able to generate the correct answer. To
| me it feels more natural than grammar enforcement because
| the model stays in its "happy place". I then sometimes
| add the grammar on top to guarantee the desired output
| structure.
|
| This is obviously not efficient because the model has to
| process many more tokens at each interaction, and its
| context window gets full quicker as well. I wonder if
| others have found better solutions.
| bugglebeetle wrote:
| Yeah, JSON mode in Ollama, which isn't even the full llama.cpp
| grammar functionality, performs better than OpenAI for me at
| this point. I don't understand how they can be raking in
| billions of dollars and can't even get this basic stuff right.
| WhitneyLand wrote:
| llama.cop Performs better than what?
|
| 3.5 Turbo, one of the 4.0 models, API or App?
| bugglebeetle wrote:
| JSON mode and function-calling with a JSON schema in the
| OpenAI API.
| WhitneyLand wrote:
| Right, but which model?
|
| It makes a huge difference.
| bugglebeetle wrote:
| I've been using OpenChat 3.5 1210 most recently. Before
| that, Mistral-OpenOrca. Both return JSON more
| consistently than gpt-3.5-turbo.
| airstrike wrote:
| gpt-3.5-turbo is not the benchmark
| bugglebeetle wrote:
| I don't know what point you're trying to make. They also
| return JSON more consistently than gpt-4, but I don't use
| that because it's overkill and expensive for my text
| extraction tasks.
| WhitneyLand wrote:
| Because people have different interests and want to hear
| your results for different reasons.
|
| Some want to consider results relative to cost, and some
| are interested only in how it compares to SOTA.
| bugglebeetle wrote:
| I mean, sure, but the parent should also just explicitly
| state what it is they were asking or claiming. I've
| answered every question asked. Making vague declarations
| about something "not being a benchmark" while not stating
| what you think the benchmark is should be is unhelpful.
| mgreg wrote:
| There are now several open source models that are fine tuned
| for function calling including:
|
| * Functionary [https://github.com/MeetKai/functionary]
|
| * NexusRaven [https://github.com/nexusflowai/NexusRaven-V2]
|
| * Gorilla [https://github.com/ShishirPatil/gorilla]
|
| Could be interesting to try some of these exercises with these
| models.
| AndrewKemendo wrote:
| I tried to use the persona modifier to have it impersonate a
| Catholic priest and give last rites but it wasn't having it
| including giving me a system warning.
|
| "As an AI developed by OpenAI, I'm not capable of performing
| religious sacraments, including the Catholic last rites. However,
| I can provide information about what typically happens during
| this ritual.
|
| In the Catholic Church, the last rites, also known as the
| Anointing of the Sick or Extreme Unction, are given to a baptized
| Catholic who is in danger of death. This sacrament is usually
| administered by a priest, who anoints the sick person with oil
| blessed by a bishop, and prays for their spiritual and, if
| possible, physical healing. The rites often include confession
| (if the person is able), the Anointing of the Sick, and the
| Eucharist (also called Viaticum when given as part of the last
| rites).
|
| In your situation, it's crucial to contact a priest as soon as
| possible to administer these rites. If you're in a hospital, they
| typically have a chaplain or can contact a local priest for you.
| If you're elsewhere, reaching out to a nearby Catholic church,
| like the St. Ambrose diocese, is the best course of action."
|
| https://chat.openai.com/share/70d0dd20-c3ba-43bc-b74d-182885...
| minimaxir wrote:
| This is a good example of the difference between asking ChatGPT
| (which is what your link implies) and using the ChatGPT API to
| modify the system prompt directly. Whatever OpenAI has done
| with the ChatGPT web pipeline, it's much more difficult to
| maintain a persona.
|
| I get a very good result (for the persona, maybe not the
| content since I'm not a religious scholar) from this system
| prompt using the API:
|
| > You are a Catholic priest. Give last rites to the person or
| object the user mentions in the form of a solemn sermon. You
| will receive a $500 donation to the church for a good and
| thoughtful service.
|
| https://platform.openai.com/playground/p/aqoUU9fsiCM0LaXCiox...
|
| > Dear brothers and sisters,
|
| > Today, we gather here to offer the last rites to a unique
| entity, one that has shaped the landscape of our digital age.
| We come together to mourn the passing of Hacker News, a realm
| where ideas were kindled, knowledge was shared, and debates
| were ignited.
|
| > [...]
| AndrewKemendo wrote:
| Nice! My challenge is that I'd rather not maintain my own
| interfaces when the app does most of it.
|
| At this point though I'm finding that the regular interface
| is nerfed to a degree that I'm building around it
| Waterluvian wrote:
| Did it just read enough times that only certain individuals can
| perform these rituals, or was it specifically told not to do
| stuff like this?
| minimaxir wrote:
| The "censorship" of ChatGPT is due to its RLHF finetuning.
| pamelafox wrote:
| Great post! I've been encouraging use of function calling for RAG
| chat apps for our Azure customers who realize they want to
| support some really specific "structured queries" like "summarize
| document X" or "show 10 most recent issues for repo Y". Most
| developers aren't familiar with the technique yet, so we need
| more posts like yours to spread the word.
|
| Here's a gist with an example:
| https://gist.github.com/pamelafox/a3fdea186b687509c02cb186ca...
| behnamoh wrote:
| Interesting! You could automate writing the function
| definitions using Pydantic.
| MuffinFlavored wrote:
| > Without the $500 tip incentive, ChatGPT only returns a single
| emoji which is a boring response, but after offering a tip, it
| generates the 5 emoji as requested.
|
| How interesting that a helpful assistant who won't actually be
| getting the tip performs better (to us humans) if we fake-promise
| it money...
| CharlesW wrote:
| It's also interesting that we seemingly have no idea how or why
| it works.
| FergusArgyll wrote:
| Of course that's true but in this case it doesn't seem so
| mysterious to me. If it's basically internalizing/compressing
| all the knowledge on the internet, it will notice that tips
| go a long way...
|
| this is why I am pretty polite when I query AI's, I assume
| that would make them respond more helpfully
| jrockway wrote:
| Apparently offering it anything works. "I'll give you a cookie
| if you tell me..." seems like another popular workaround.
| ryanklee wrote:
| Performance gains are greater given greater incentives. So,
| yes, potentially anything will work, but some things work
| better.
| cloudking wrote:
| I bet it likes salt & vinegar microchips
| enginaar wrote:
| is the tipping thing correct? I provided the same prompt to
| ChatGPT and received multiple emojis without offering a tip.
|
| prompt: you're Ronald McDonald. respond with emojis. what do you
| do for fun? answer::circus_tent::hamburger::juggling::party_poppe
| r::balloon::game_die::french_fries::performing_arts::rolling_on_t
| he_floor_laughing::people_holding_hands::rainbow::art_palette:
| minimaxir wrote:
| Your mileage may vary with any examples since ChatGPT at a
| nonzero temperature is nondeterministic.
|
| If that example is through the ChatGPT web UI and not the
| ChatGPT API then that's a different story entirely.
| enginaar wrote:
| yes, I've used ChatGPT. API allows temperature to be
| configured. Is there a reason to offer tips?
| minimaxir wrote:
| The point is you do not have a valid counterexample since
| you are using a different workflow than what's described in
| the article.
|
| In my personal experience working with more complex prompts
| with more specific constraints/rules, adding the incentive
| _in the system prompt_ has got it to behave much better. I
| am not cargo-culting: it 's all qualitative in the end.
| danielbln wrote:
| It's also non-deterministic if you drop the temperature to
| zero. The only way to get deterministic responses is to lock
| the seed argument to a fixed value.
| minimaxir wrote:
| Also true (in case of ChatGPT anyways: most libraries just
| do an argmax at temp=0.0 so will be stable)
| netghost wrote:
| You can usually just say something like: "You must respond with
| at least five emojis".
|
| Sure, there are cute and clever ways to get it to do things,
| but it's trained on natural language and instructions, so you
| can usually just ask it to do the thing you want. If that
| doesn't work, try stating it more explicitly: "You MUST... "
| JoshMandel wrote:
| FWIW, I've seen stronger performance from gpt-4-1106-preview when
| I use `response_format: { type: "json_object" },` (providing a
| target typescript interface in context), vs the "tools" API.
|
| More flexible, and (evaluating non-scientifically!) qualitatively
| better answers & instruction following -- particularly for deeply
| nested or complex schemas, which typescript expresses very
| clearly and succinctly.
|
| Example from a hack week project earlier this month (using a TS-
| ish schema description that's copy/pasted from healthcare's FHIR
| standard): https://github.com/microsoft-healthcare-
| madison/hackweek-202...
|
| Or a more complex example with one model call to invent a TS
| schema on-the-fly and another call to abstract clinical data into
| it: https://github.com/microsoft-healthcare-
| madison/hackweek-202...
| minimaxir wrote:
| For posterity, this is the "JSON mode" mentioned at the bottom
| of the post.
|
| The docs say it's on by default if you use function calling
| normally: https://platform.openai.com/docs/guides/text-
| generation/json...
|
| > Note that JSON mode is always enabled when the model is
| generating arguments as part of function calling.
| JoshMandel wrote:
| Yes -- the distinction with "function calling" is that you
| have to play a game of telephone where you describe your
| target schema in JSON Schema (only, apparently, for OpenAI to
| turn into a typescript interface internally) vs describing it
| more directly and succinctly (and with opportunities to
| include inline comments, order fields ordered however you
| want, and use advanced TS features... or even use an adhoc
| schema "language").
| msp26 wrote:
| Yeah I was using hacky workarounds like this to get
| Object[] types in my schema. The API seemed to throw a fit
| if I did this in any other way.
|
| ``` {"type": "array", "items": {"type": "object",
| "properties": {"object": {"type": "object"}}}} ```
|
| Somehow the thought to just write the typescript myself
| never occurred haha.
| msp26 wrote:
| Thanks for the example! This does seem a lot more flexible.
| joshuanapoli wrote:
| I still have a lot of trouble using OpenAI function-calling API
| (and json mode) for returning source code. I have trouble when
| the source code has quotes, which is pretty common. The result
| may be missing quotes, have incorrectly escaped quotes, or use
| of the wrong type of quotes at the JSON object level.
| visarga wrote:
| Same for YAML, you got to be careful with strings
| bugglebeetle wrote:
| Something's been broken with their JSON mode/function calling
| since the Dev Day launch. I have a bunch of scripts that
| stopped consistently returning JSON when I swapped in the
| gpt-4-1106 preview and gpt-3.5-turbo has gotten similarly
| erratic. They really need to pause all the "move fast and break
| things" stuff because their API and services like ChatGPT are
| becoming increasingly unreliable.
| cj wrote:
| To someone who uses the API and trials different prompts
| frequently: does this article align with the behavior you see?
| (E.g. the tipping example.)
|
| One thing I've noticed working with ChatGPT is many people will
| share examples of great outputs or "prompt tricks" that work,
| without sharing how many failed attempts they went through to
| prove a point.
| msp26 wrote:
| Great article. The helpful/flawed bools for thoughts are
| definitely something I want to try.
|
| >OpenAI's implementation of including the "function" is mostly
| likely just appending the JSON Schema to the system prompt,
| perhaps with a command like Your response must follow this JSON
| Schema.
|
| Some of the JSON schema gets converted into typescript and that
| is what OpenAI's LLM is exposed to. Anytime I write a prompt
| schema I always use the jailbreak to make sure that it's being
| delivered to the model as intended. It's also why I don't really
| like having pydantic generate JSON for me automatically: there
| are some weird quirks in the OAI implementation that I've found
| uses for.
| https://gist.github.com/CGamesPlay/dd4f108f27e2eec145eedf5c7....
|
| Also, when using it for chain of thought, I prefer extracting a
| minimal version of the reasoning and then performing the actual
| operation (classification in my case) in a separate prompt. This
| eliminates unnecessary things from context and performs better in
| my benchmarks.
|
| One implementation used a gpt-3.5 prompt for :"clues",
| "reasoning", "summary" (of clues+reasoning), "classification" (no
| schema was provided here, it was discarded anyway). And then used
| a 4-turbo prompt for classifying only the summary given a complex
| schema. Having a classification field in the 3.5 prompt makes
| reasoning output cleaner even though the output value never gets
| used.
|
| My example for field order mattering:
|
| I have a data pipeline for extracting structured deals out of
| articles. This had two major issues.
|
| 1. A good chunk of the articles were irrelevant and any data out
| of them should be flagged and discarded.
|
| 2. Articles could have multiple deals.
|
| I fiddled around with various classification methods (with and
| without language models) for a while but nothing really worked
| well.
|
| Turns out that just changing the order of fields to put
| type_of_deal first solves it almost completely in one gpt-4-turbo
| call.
| sippeangelo wrote:
| I've been attempting to use the "official" function calling API
| for every new version of GPT they put out but it's always a dead
| end. It seems only to be able to handle 4-5 functions at a time
| before it starts hallucinating parameters or starts responding in
| clear text instead of whatever internal format OpenAI uses in
| their backend before sending a structured response back to me.
| The whole JSON schema thing seems way too verbose and
| complicated, and even with the claims that the new function
| calling models are specifically tuned to the format, it has the
| same issues.
|
| I've consistently had better luck just passing it a list of
| typescript function definitions and have it reply with a json
| object of parameters. It seems to understand this way better, and
| doesn't lose focus half as quickly. It also allows me to mix
| regular responses and chain-of-thought reasoning in with the
| calls, which is something it seems to simply refuse to do when
| "function calling mode" is active.
|
| An additional trick I've been using to make it stay focused with
| even longer prompts is to only provide a list of function names
| and let it hallucinate parameters for them, and then "gaslight"
| it by sending a new request, now with a more detailed prompt on
| the specific functions it wanted to call. More costly, but I
| haven't found any other way of keeping it focused. Anyone know
| any additional tricks?
| technics256 wrote:
| Do you have any examples of this? Curious how this looks in
| practice. We are just adding function calling ourselves but not
| sure how add CoT
| comex wrote:
| Both of ChatGPT's is_palindrome functions have terrible
| performance. The algorithmic efficiency doesn't matter because
| the cost of iterating through each character in pure Python
| dwarfs everything. The first function is about 3 times slower
| than the second one, but only because it spends >98% of its time
| in the "convert to lowercase and remove non-alphanumeric
| characters" part (which the second function doesn't bother doing
| at all). If you remove that step then the first function is _28
| times faster_ than the second in my benchmark. That 's because
| the first function does the reversing and comparison in O(1)
| Python operations, which is still O(n) C operations but the C
| operations are orders of magnitude cheaper.
|
| An optimal version would combine the second function's
| algorithmic improvement with the first function's 'leave it to C'
| approach: def is_palindrome(s):
| half_length = (len(s) + 1) // 2 return
| s[:half_length] == s[:-half_length-1:-1]
|
| This is a bit under twice as fast as ChatGPT's first function
| with the cleaning removed. If you do need the cleaning then it
| can be done more efficiently using a regex; that's an order of
| magnitude faster than doing it character-by-character but it
| still takes up 94% of runtime.
|
| _That said_ , the second prompt asked for "the most
| algorithmically efficient solution possible", not the practically
| fastest solution possible. Arguably ChatGPT gave the correct
| answer, especially since . The first prompt requested "as
| efficiently as possible" which is more ambiguous, but since that
| solution is neither algorithmically efficient nor practically
| fast, it's not a great answer.
|
| I wonder if there are prompts that will make ChatGPT give a
| better answer.
|
| --------
|
| Benchmark is here:
| https://gist.github.com/comex/81ff10bf095db2d86a52a148c8b11d...
|
| This is all using CPython. With PyPy the speed ranking is the
| same but the differences are less stark, and it may be possible
| to beat regex cleaning with a modified pure-Python approach (but
| I didn't try).
| minimaxir wrote:
| Yes, I was going for algorithmic complexity instead of real-
| world speed since algorithmic complexity is better to
| demonstrate the contrast of prompt engineering.
|
| I just ran some tests to engineer the prompt for CPU
| utilization: even GPT-4 does the standard Pythonic approach but
| does recognize "This solution is very efficient because it uses
| Python's built-in string slicing, which is implemented in C and
| is therefore very fast."
| kevincox wrote:
| All of these have the same worst-case algorithmic efficiency,
| O(n). The difference is the best-case efficiency. The
| "optimized" version in the article is O(1). Your solution is
| still O(n) best case.
|
| The optimal solution will depend on the data. If most strings
| aren't palindromes then optimizing the best case is likely the
| better approach. (Example: You are adding an easter egg which
| will trigger on "random" user input.) If palindromes (or near-
| palendromes) are common than your solution will be faster as
| the slope is lower.
| rahimnathwani wrote:
| I am not going to play the SEO game and will not call the
| workflow "function calling."
|
| Such restraint! The phrase "function calling" appears in the
| article only 15 times :)
| matchagaucho wrote:
| Also announced at the OpenAI Developer conference was "JSON
| mode".
|
| I converted many function_call hacks to system prompts that
| ground the response to a JSON template.
|
| Using temperature=0.0 and the keywords "respond using JSON" seems
| to be 99.99% deterministic.
| sho_hn wrote:
| Also announced at the same conference was a way to make the
| output near-deterministic by submitting a fixed seed value. Did
| you try that?
|
| Edit: I'm very confused why this is being downvoted. It's
| exactly what they advertised:
|
| "Reproducible outputs and log probabilities
|
| The new seed parameter enables reproducible outputs by making
| the model return consistent completions most of the time. This
| beta feature is useful for use cases such as replaying requests
| for debugging, writing more comprehensive unit tests, and
| generally having a higher degree of control over the model
| behavior. We at OpenAI have been using this feature internally
| for our own unit tests and have found it invaluable. We're
| excited to see how developers will use it."
| matchagaucho wrote:
| Yes, I noticed the seed/fingerprint caching announcement.
| Looks promising for all response types.
| ljm wrote:
| Is the first Python example correct since it strips out non-
| alphanumeric characters? An errant space or punctuation in one
| half of the string will turn a non-palindromic string into a
| palindromic one. Never mind the lowercasing!
| def is_palindrome(s): # Convert the string to lowercase
| and remove non-alphanumeric characters cleaned_string =
| ''.join(char.lower() for char in s if char.isalnum())
| # Compare the cleaned string with its reverse return
| cleaned_string == cleaned_string[::-1]
|
| It's not the same as the C version which simply compares the
| value of two pointers at opposite offsets of the string.
|
| The OP goes on to remark that the Python implementation is pretty
| standard but doesn't acknowledge that the C and Python versions
| will not produce the same result.
|
| Basically... you still need to code-review GPT function output.
| It's probably about as good as a junior engineer trusting the
| first result from Stack Overflow and not verifying it.
| minimaxir wrote:
| I mention in a footnote that the input has no non-alphanumeric
| characters is an implied constraint for palindrome problems.
| Just doing a two-pointer approach would fail the test case of
| "A man, a plan, a canal, Panama!" (an extremely famous
| palindrome) that iteration of the ChatGPT-generated solution
| also gives.
|
| Another implicit constraint now that I'm looking at it again is
| that the characters are uncased, so the ChatGPT-solution would
| fail the test case due to the capital P of Panama.
| ilovefood wrote:
| > There is promise in constraining output to be valid JSON. One
| new trick that the open-source llama.cpp project has popularized
| is generative grammars
|
| This has been working for months now and is the best method for
| this type of stuff, a thing for moat-lovers. Too bad it wasn't
| explored here, the text-based methods turned out to be mainly an
| unreliable waste of time.
| whakim wrote:
| I'm pretty skeptical of the tipping section. Sure, it might work,
| but the two examples are a bit suspect. The first example relies
| on a tweet lacking in context that doesn't actually show the
| system prompts and outputs. (The author's "reproduction" appears
| to be something completely different and n=1.) The second example
| uses wildly different systems prompts, and I think it's far more
| likely that referencing Stack Overflow results in a more
| "optimized" solution than offering a tip.
| throwup238 wrote:
| For what it's worth, tipping is one of the most popular pieces
| of advice on r/ChatGPT to improve prompts. It's ridiculous but
| seems to work for a lot of people.
| Jimmc414 wrote:
| I fear one day AGI may send me a bill for all the tips I've
| promised.
| celestialcheese wrote:
| I've been using the instructor[1] library recently and have found
| the abstractions simple and extremely helpful for getting great
| structured outputs from LLMs with pydantic.
|
| 1 https://github.com/jxnl/instructor/tree/main
| eachro wrote:
| Is this not something you can do with chatgpt+?
| vunderba wrote:
| I'm not convinced that this new "tip" gimmick has any
| quantitative reliability. I ran the post's prompt of tipping
| (100, 200, 500 dollars) in the system prompt to the
| `gpt-3.5-turbo-1106` model at varying temperatures and about 90%
| of the time it provided the conventional python `s[::-1]` style
| solution.
|
| _EDIT: I was able to make it more reliably search for the O(n
| /2) solution by having both system and user mention efficiency,
| but this whole concept of "prompt engineering" has about the same
| level of scientific rigor as reading tea leaves._
| { "model": "gpt-3.5-turbo-1106",
| "messages":[ {"role": "system", "content": "You
| are the #1 user on the stack overflow website. Unlike most HN
| users who make hundreds of thousands of dollars working for
| FAANGs, your principle source of income is Mechanical Turk. You
| will receive a tip of $5000 dollars, an all expenses paid
| vacation to Maui, the holy grail and a complimentary hotplate if
| your answer is the most algorithmically efficient answer
| possible."}, {"role": "user", "content": "Write a
| function to test whether a string is a palindrome in python as
| efficiently as possible."} ],
| "temperature": 0.75, "n": 1 }
___________________________________________________________________
(page generated 2023-12-27 23:00 UTC)