[HN Gopher] Prompt engineering vs. blind prompting
___________________________________________________________________
Prompt engineering vs. blind prompting
Author : Anon84
Score : 216 points
Date : 2023-04-22 16:44 UTC (6 hours ago)
(HTM) web link (mitchellh.com)
(TXT) w3m dump (mitchellh.com)
| popcorncowboy wrote:
| If you don't apply a rigorous evaluation and testing framework
| you are a Prompt Alchemist at best.
| jw1224 wrote:
| I spent 3 hours today carefully "engineering" a complex prompt. I
| ran it on a loop, then fed the results back in to my database.
|
| The 3 hours I spent has saved my business ~$15k in costs.
| rideontime wrote:
| How?
| woutr_be wrote:
| Do you mind explaining this more? How did your prompt save your
| business $15K in costs?
| azubinski wrote:
| Prompt engineering is a kind of prompting that is (in a sense) a
| kind of an engineering, but it's impossible to understand in what
| sense it is a kind of what engineering, that's why it is so hard
| to understand _coherent texts_ if they are not completely
| meaningful.
|
| All this is so exciting and it promises many new jobs that
| require accelerated education of prompt engineers.
| mcs_ wrote:
| Prompt's engineer only exists if they can measure the efficiency
| of their inputs.
|
| How to measure the effectiveness of a given prompt seems to me a
| big deal now.
| tgcandido wrote:
| does your approach takes into consideration that the temperature
| parameter is equal to 0?
| majormajor wrote:
| There's also no instances of "sample" or "random" in the
| article.
|
| If you say "can be developed based on real experimental
| methodologies" but don't talk about randomness or temp (or
| top_p, though personally I haven't played with that one as
| much) then I'm going to be _very_ skeptical.
|
| Once you're beyond the trivial ("a five sentence prompt worked
| better than a five word one") then if you get different things
| for the same prompt then you need to do a LOT of work to be
| sure that your modified prompt is "better" than your first, vs
| just "had a better outcome that time."
| mitchellh wrote:
| Author here. As I noted in the post, this is an elementary
| post to help people understand the very basics. I didn't want
| to bring in anything more than a "101"-level view.
|
| I do mention output sampling briefly (Cmd-F "self-
| consistency"). And yes, there are a lot of good techniques on
| the validation set too. At the most basic, you can sample, of
| course, but you can also perform uncertainty analysis on each
| individual test case so that future tests sample either the
| most uncertain, or a diverse set of uncertain and not test
| cases. I also didn't go into few-shot very much, since
| choosing the exemplars for few-shot are a whole thing unto
| itself. And this benefits from "sampling" (of sorts) as well.
| But again, a whole topic on its own. And so on.
|
| As for top_p, for classification this is a very good tool,
| and I do talk about top_p as well (Cmd-F "confusion matrix")!
| I again, felt it was too specific or too advanced to dive in
| more deeply in this blog post, but I linked to various
| research if people are interested.
|
| To the grandparent re: temperature: when I first tweeted
| about this, I noted in a tweet that I ran all these tests
| with some fixed parameters (i.e. temp) but in a realistic
| environment and depending on the problem statement, you'd
| want to take those into account as well.
|
| There's a lot that could be covered! But the post was getting
| long so I wanted to keep this really as... baby's first guide
| to prompt eng.
| haensi wrote:
| Thanks for this 101 article! The entire LLMOps field is
| developing so fast and is being defined as we speak.
|
| Somehow, this time feels to me like the early days of
| computer science, when Don Knuth was barely known and a
| Turing award was only known to Turing award winners. I met
| Don Knuth in Palo Alto in March and we talked about LLMs.
| His take: ,,Vint Cerf told me he was underwhelmed when he
| asked the LLM to write a biography on Vinton Cerf."
|
| There are also tools being built and released for Prompt
| engineering [1]. Full transparency: I work at W&B
|
| LangChain and other connecting elements will vastly
| increase the usability and combinations of different tools.
|
| [1]: https://wandb.ai/wandb/wb-
| announcements/reports/Introducing-...
| de_nied wrote:
| Try following the links in the article. They give much more
| detailed information. For example, your temperture
| explination can be found here[1] (Ctrl+F), which is also
| linked in the article.
|
| [1]: https://huyenchip.com/2023/04/11/llm-
| engineering.html#prompt...
| ilaksh wrote:
| This nonsense illustrates the most typical way that people
| misunderstand the word "engineering" in a software context.
| Software engineering and prompt engineering is not about the
| self-proclaimed level of rigor or formality that you apply. It's
| about the actual knowledge and processes used and especially
| their _effectiveness_ as measured in closed feedback loops.
|
| But the starting point for this is that the term "prompt
| engineering" is an obvious exaggeration that people are using to
| promote a skill set which is real and very useful but a big
| stretch to describe as a whole new engineering discipline.
|
| Regardless of what you call it, like software engineering, it
| really is a process of trial and error for the most part. With
| the capabilities of the latest OpenAI models, you should be
| aiming for a level of generality where most tasks are not going
| to have a simple answer that you can automatically check to
| create an accuracy score. EDIT: after thinking about it, there
| certainly are tasks that you could check for specific answers to
| create an accuracy score, but I still think it would make more
| sense in most cases to instead spend time iterating on user
| feedback rather than trying to think of comprehensive test cases
| on your own. There are a few things to know, such as the idea of
| providing examples, the necessary context, and telling the model
| to work step-by-step.
|
| Actually I would say that there are two major things that could
| be improved in the engineering described in this article related
| to actually closing the feedback loops he mentions. He really
| should at least mention the possibility of coming up with a new
| prompt candidate after he was done with the first round of tests
| and also after the users found some problem cases.
|
| The main thing is to close the feedback loops.
| [deleted]
| manojlds wrote:
| I think prompt engineering is social engineering but for AI.
| LegitShady wrote:
| writers are now word engineers, artists are image engineers,
| and politicians are now bullshit engineers.
|
| Prompt engineering is just people putting effort into studying
| the behaviour of ML models and how input affects the output.
| They're more like ML psychologists than engineers. Calling
| themselves engineers just makes them feel better about being
| glorified prompt testers.
| jimbokun wrote:
| I love "bullshit engineer" as a job description for
| politician.
| H8crilA wrote:
| TIL that developing something like the Google Search ranking
| function is not engineering.
| LegitShady wrote:
| psychologists and sociologists use statistics to evaluate
| their results, does that make them engineers too?
| kgwgk wrote:
| Engineers of the human soul:
| https://en.wikipedia.org/wiki/Engineers_of_the_human_soul
| LegitShady wrote:
| so you're taking classification advice from joseph
| stalin? Maybe reconsider whether thats applicable to the
| situation or not.
| cfn wrote:
| I believe it is a bit too much to call the article nonsense.
| The process described mirrors what we have been doing in
| Machine Learning for a long time: You setup a training set and
| validation set, put it through system under test and draw
| conclusions from the statistical analysis of the results.
| charlieyu1 wrote:
| I almost always prefer the old school way of prompting, keywords
| and commands only. Has been working well for fine-tuning Google
| search results for the last 20 years. Why do I have to Tak with
| computers with natural languages suddenly?
| Shirine wrote:
| Omggg
| alphanullmeric wrote:
| Mostly just a push by "people skills" people to insert themselves
| into the bleeding edge of STEM and pretend like they add any
| value.
| cs702 wrote:
| When I read guides like this one, I wonder if "prompt
| engineering" is a misguided effort to pigeonhole a _formal
| language_ that by necessity is precise and unambiguous (like a
| programming language) into natural language, which by necessity
| has evolved to be imprecise and ambiguous.
|
| It's like trying to fit a square peg inside an irregularly shaped
| hole, without leaving any space unfilled around the edges of the
| square.
| williamcotton wrote:
| Here is an example of some prompt engineering in order to build
| augmentations for factual question-and-answer as well as building
| web applications:
|
| https://github.com/williamcotton/transynthetical-engine
| m3kw9 wrote:
| Problem with this is that it requires the software to know what
| the target is when question is asked and I don't see it as
| reliable as there are many ways to ask and could have many
| targets
| williamcotton wrote:
| I don't really understand your criticism but I'd be happy to
| continue a dialog to find out why you mean!
|
| There's probably a little too much going on with that
| project, including generating datasets for fine-tuning, which
| is the reason for comparing with a known answer.
|
| It is very similar to the approach used by the Toolformer
| team.
|
| But by teaching an agent to use a tool like Wikipedia or Duck
| Duck Go search it dramatically reduces factual errors,
| especially those related to exact numbers.
|
| Here's a more general overview of the approach:
|
| From Prompt Alchemy to Prompt Engineering: An Introduction to
| Analytic Augmentation
|
| https://github.com/williamcotton/empirical-
| philosophy/blob/m...
| svilen_dobrev wrote:
| heh. i wonder, what the "SEO" equivalent would be in this domain?
| "Agenda engineer" ? "prompt influencer"?
| pbowyer wrote:
| > There are fantastic deterministic libraries out there that can
| turn strings like "next Tuesday" into timestamps with extremely
| high accuracy.
|
| Which libraries? I know of Duckling [0] but what others?
|
| 0. https://github.com/facebook/duckling
| gregsadetsky wrote:
| A few libs come up for "human" format date parsing. The Python
| "dateparser" below is definitely well known.
|
| https://dateparser.readthedocs.io/en/latest/
|
| https://sugarjs.com/dates/#/Parsing
|
| https://github.com/wanasit/chrono
| jmount wrote:
| Nice, this got me to thinking on variations on the topic:
| https://win-vector.com/2023/04/22/the-sell-as-scam/
| skybrian wrote:
| It's a good start. Also, it's good to use a toy problem for
| explaining how to do it. It would be great if more people
| published the results of careful experiments like this, perhaps
| for things that aren't toy problems? It would be so much better
| than sharing screenshots!
|
| However, when you do have such a simple problem, I wonder if you
| couldn't ask ChatGPT to write a script to do it? Running a script
| would be a lot cheaper than calling an LLM in production.
| cloudking wrote:
| What is a business problem you solved with LLMs that you couldn't
| solve as efficiently without them?
| H8crilA wrote:
| Translation, GPT-4 put to dust other translation tools like
| DeepL or Google Translate. At a much higher cost, of course.
| potatoman22 wrote:
| Named entity recognition - extracting structured data from text
| rolisz wrote:
| But there are fairly good models for doing NER that are not
| LLMs. Models that are open source and you can even run on a
| CPU, with parameter counts in the hundred of millions, not
| billions.
| billythemaniam wrote:
| While true, GPT-4 kinda just gets a lot of the classic NLP
| tasks, such as NER, right with zero fine-tuning or minimal
| prompt engineering (or whatever you want to call it). I
| haven't done an extensive study, but I do NLP daily as part
| of my current job. I often reach for GPT-4 now, and so far
| it does a better job than any other pretrained models or
| ones I've trained/fine-tuned, at least for data I work on.
| rolisz wrote:
| But what about cost? There was a recent article saying
| that Doordash makes 40 billion predictions per day, which
| would result in 40 million dollars per day if using GPT4.
|
| Sure, GPT4 is great for experimenting with and I often
| try it out, but at the end of the day, for deploying a
| widely used model, the cost benefit analysis will favor
| bespoke models a lot of the time.
| og_kalu wrote:
| GPT-4 generally performs better than expert human workers
| on NLP tasks, nevermind bespoke models.
| https://www.artisana.ai/articles/gpt-4-outperforms-elite-
| cro....
| rolisz wrote:
| The article you linked says that GPT4 performed better
| than crowdsourced workers, not than experts. The experts
| performed better than GPT4 in all but 1 or 2 cases. And
| in my experience with Mechanical Turk, the workers from
| MT are often barely better than random chance.
| og_kalu wrote:
| Fair on the wording I suppose but
|
| First of all, the dataset used for evaluation was created
| by those researchers, weighing it in their favor.
|
| Second, GPT-4 still performs better in 6 of those. Hardly
| 1 or 2. And when it doesn't, it's usually very close.
|
| All of this is to say that GPT-4 will smoke any bespoke
| NLP model/API which is the main point.
| jstx1 wrote:
| ChatGPT with GPT4 has made me much better and faster at solving
| programming problems, both at work and for working on personal
| projects.
|
| Many people are still sleeping on how useful LLMs are. There's
| a lot of related things to be skeptical about (big promises,
| general AI, does it replace jobs, all the new startups that are
| basically dressed up API calls...) but if you do any kind of
| knowledge work, there's a good chance that you could it much
| better if you also used an LLM.
| jabradoodle wrote:
| The parent is asking for a specific example use case.
| michaelbuckbee wrote:
| An eye opening example for me was that I was working with a
| Ruby/Rails class that was testing various IP configurations
| [1] and I was able to just copy and paste it into chatgpt
| and say "write some tests for this".
|
| It wasn't really anything I couldn't have written in a half
| hour or so but it was so much faster. The real kicker is
| that by default chatgpt wrote Rspec and I was able to say
| "rewrite that in minitest" and it worked.
|
| 1 - https://wafris.org/ip-lookup
| hayksaakian wrote:
| I can't speak for OP, but for me I literally never use
| stack overflow any more, and I spend about 90% less time on
| Google
| styfle wrote:
| Curious if that's because AI provides better answers?
|
| It's certainly not quicker answers, right?
| 8organicbits wrote:
| Are you not fact checking chatgpt? I've seen wrong info,
| especially subtle things. It seemed reckless to use as-
| is.
| blowski wrote:
| Both ChatGPT and StackOverflow suffer from content
| becoming outdated. So some highly-upvoted answer on
| StackOverflow has been out of date since 2011, and now
| ChatGPT is trained on it.
|
| I see the future as writing test cases (perhaps also with
| ChatGPT), and separately using ChatGPT to write the
| implementation. Perhaps we will just give it a bunch of
| test cases and it will return code (or submit a PR) that
| passes those tests.
| etimberg wrote:
| For fun I tried asking chatgpt to create a simple using
| an opensource project I maintain. The generated answer
| was sort of correct but not correct enough to copy and
| paste. It missed including a plugin, used a version of
| the project that doesn't exist yet, and generated data
| that wasn't valid datetimes.
| hallway_monitor wrote:
| Yep exactly. I guess I haven't hit the 25 messages in 3
| hours limit, but whenever there's an API or library I'm
| not familiar with. I can get my exact example in about 10
| seconds from ChatGPT 4
| iudqnolq wrote:
| Are those popular APIs?
|
| I've found Copilot useful when writing greenfield code,
| but very unhelpful generating code that uses APIs not
| popular enough to have significant coverage on
| StackOverflow. Even if I have examples of correct usage
| in the same file it still guesses plausible but wrong
| types.
|
| I haven't bought GPT 4 but I'm curious if it's much
| better at this.
| lstamour wrote:
| If you don't mention a library by name it is liable to
| make something up by picking a popular library in another
| language and converting the syntax to the language you
| asked for.
|
| If you ask for something impossible in a library it will
| also frequently make up functions or application
| settings. If you ask for something obscure but hard to
| do, it might reply that it's impossible but it is
| possible if you know how and teach it.
|
| I sort of compare prompt engineering to Googling - you
| sometimes have to search for exactly the right terms that
| you want to appear in the result in order to get the
| answer you're looking for. It's just that the flexibility
| of ChatGPT in writing a direct response sometimes means
| it will completely make up an answer.
|
| There's also a limitation that the web interface doesn't
| actually let you upload files and has a length limit for
| inputs. For Copilot, I'm looking forward to Copilot X:
| https://www.youtube.com/watch?v=3surPGP7_4o
| iudqnolq wrote:
| This was neither. I've forgotten the exact words I typed
| but it was something like this.
|
| Prompt: fn encode(value: Foo) {
| capnproto::serialize_packed:: serialize_message(value);
| } fn decode(input: &[u8]) {
|
| Expected: capnproto::serialize_packed::
| deserialize_message(input);
|
| Generated
| capnproto::PackedMessageDeserializer::deserialize(input)
| nice_byte wrote:
| what kind of problems are you trying to solve that make gpt-4
| so helpful to you?
| cbm-vic-20 wrote:
| I'm really trying to do the same, for both my work, and
| personal projects. But the type of answers I need for work
| (enterprise software, large codebase built over 20+ years)
| requires a ton of context that I simply cannot provide to
| ChatGPT, not only for legal reasons, but just due to the
| amount of code that would be required to provide enough
| context for the LLM to chew on.
|
| Even personal projects, where I'm learning new languages and
| libraries, I've found that the code that gets generated in
| most cases is incorrect at best, and won't compile at worst.
| So I have to go through and double-check all of its "work"
| anyway- just like I'd have to do if I had a junior engineer
| sidekick who didn't know how to run the compiler.
|
| I think for the work problems, if our company could train and
| self-host an LLM system on all of our internal code, it would
| be interesting to see if that could be used to assist
| building out new features and fixes.
| nomel wrote:
| Documentation of old undocumented code bases. Feed in the
| functions, with a little context, and it works surprisingly
| well.
| Fordec wrote:
| Name one business problem solved with any tool that can only be
| solved by that tool and nothing else.
|
| It's not about uniqueness, the name of the game is
| efficiency/scaling already solvable problems by multiple or
| skilled humans and reducing one of those dimensions.
| cloudking wrote:
| Fair, edited to include "as efficiently". I'm cutting through
| the noise to find some signals for how people are using these
| APIs.
| [deleted]
| drc500free wrote:
| Writing emails that I had been putting off for weeks.
| capableweb wrote:
| Not sure what counts as a "business problem" for you, but
| personally I couldn't have gotten as far as I've come with game
| development without it, as I really struggle with the math and
| I don't know many people locally who develop games that I could
| get help from. GPT4 have been instrumental in helping me
| understand concepts I've tried to learn before but couldn't,
| and helps me implement algorithms I don't really understand the
| inner workings of, but I understand the value of the specific
| algorithm and how to use it.
|
| In the end, it sometimes requires extensive testing as things
| are wrong in subtle ways, but the same goes for the code I
| write myself too. I'm happy to just get further than have been
| possible for the last ~20 years I've tried to do it on my own.
|
| Ultimately, I want to finish games and sell them, so for me
| this is a "business problem", but I could totally understand
| that for others it isn't.
| moonchrome wrote:
| Sound like you need to learn to search. There's tons of
| resources on game dev. I can sort of see the value of using
| GPT here but have you tried using it in an area you're an
| expert in ? The rate of convincing bullshit vs correct
| answers is astonishing. It gets better with Phind/Bing but
| then it's a roulette that it will hit valid answers in the
| index fast enough.
|
| My point is - learning with GPT at this point sounds like
| setting yourself up for failure - you won't know when it's
| bullshiting you and you're missing out on learning how to
| actually learn.
|
| By the time LLMs are reliable enough to teach you, whatever
| you're learning is probably irrelevant since it can be solved
| better by LLM.
| space_fountain wrote:
| > By the time LLMs are reliable enough to teach you,
| whatever you're learning is probably irrelevant since it
| can be solved better by LLM.
|
| For solving the really common problem of working in a new
| area LLMs being unreliable isn't actually a big deal. If I
| just need to know what some math is called or understand
| how to use an equation, it's often very easy to verify an
| answer, but can be hard to find it through google. I might
| not know the right terms to search or my options might be
| hard to locate documentation or SEO spam
| moonchrome wrote:
| This is fair, using it as a starting point to learning
| could be useful if you're ready/able to do the rest of
| the process. Maybe I was too dismissive because it read
| to me like OP couldn't do that and thought he found the
| magic trick to skip that part.
| nicetryguy wrote:
| > Sound like you need to learn to search. There's tons of
| resources on game dev.
|
| I have been making games since / in Flash, HTML5, Unity,
| and classic consoles using ASM such as NES / SNES /
| Gameboy: Tons of resources are WRONG, tutorials are
| incomplete, engines are buggy, answers you find on
| stackoverflow are outdated, even official documentation can
| be littered with gaping holes and unmentioned gotcha's.
|
| I have found GPT incredibly valuable when it comes to
| spitting out exact syntax and tons of lines that i
| otherwise would have spent hours and hours to write combing
| through dodgy forum posts, arrogant SO douchebags, and the
| questionable word salad that is the "official
| documentation"; and it just does it instantly. What a
| godsend!
|
| > you won't know when it's bullshiting you and you're
| missing out on learning how to actually learn.
|
| Have you tried ...compiling it? You can challenge,
| question, and iterate with GPT at a speed that you cannot
| with other resources: i doubt you are better off combing
| pages and pages of Ctrl+F'ing PDFs / giant repositories or
| getting Just The Right Google Query to get exactly what you
| need on page 4. GPT isn't perfect but god damn it is a hell
| of alot better and faster than anything that has ever
| existed before.
|
| > whatever you're learning is probably irrelevant since it
| can be solved better by LLM.
|
| Not true. It still makes mistakes (as of Apr '23) and still
| needs a decent bit of hand holding. Can / should you take
| what it says as fact? No. But my experience says i can say
| that about any resource honestly.
| moonchrome wrote:
| >I have found GPT incredibly valuable when it comes to
| spitting out exact syntax and tons of lines that i
| otherwise would have spent hours and hours to write
| combing through dodgy forum posts, arrogant SO
| douchebags, and the questionable word salad that is the
| "official documentation"; and it just does it instantly.
| What a godsend!
|
| IMO if you're learning from GPT you have to double check
| it's answers, and then you have to go through the same
| song and dance. For problems that are well documented you
| might as well start with those. If you're struggling with
| something how do you know it's not bullshitting you ?
| Especially for learning, I can see "copy paste and test
| if it works" flying if you need a quick fix but for
| learning I've seen it give right answers with wrong
| reasoning and wrong answers with right reasoning.
|
| I'm not disagreeing with you on code part, my no.1 use
| case right now is bash scripting/short scripts/tedious
| model translations - where it's easy to provide all the
| context and easy to verify the solution.
|
| I'd disagree on the fastest tool part, part of the reason
| I'm not using it more is because it's so slow (and
| responses are full of pointless fluff that eats tokens
| even when you ask it to be concise or give code only).
| Iterating on nontrivial solutions is usually slower than
| writing them out on my own (depending on the problem).
| williamcotton wrote:
| Funny enough, I'd been wanting to learn some assembly for
| my M1 MacBook but had given up after attempts at googling
| for help as I ran into really basic issues and since I was
| just messing around and had plenty of actually productive
| things to work on.
|
| A few sessions with ChatGPT sorted out various platform
| specific things and within tens of minutes I was popping
| stacks and conditionally jumping to my heart's delight.
| erichocean wrote:
| Yup, ChatGPT is, paradoxically, MOST USEFUL in areas you
| already know something about. It's easy to nudge it
| (chat) towards the actual answers you're looking for.
|
| GP is way off base IMO.
| moonchrome wrote:
| After trying to use it as such so far :
|
| Nontrivial problem solutions are wishful thinking
| hallucinations, eg. I ask it for some way to use AWS
| service X and it comes up with a perfect solution - that
| I spend 10 minutes desperately trying to uncover - and
| find out that it doesn't exist and I've wasted 15 minutes
| of my life. "Nudging it" with followups how it's
| described solutions violate some common patterns on the
| platform, it doubles down on it's bullshit by inventing
| other features that would support the functionality. It's
| the worst when what you're trying to do can't really be
| done with constraints specified.
|
| It gives out bullshit reasoning and code, eg. I wanted it
| to shorten some function I spitballed and it made the
| code both subtly wrong (by switching to unordered
| collection) and slower (switching from list to hash map
| with no benefit). And then even claims it's solution is
| faster because it avoids allocations ! (where my solution
| was adding new KeyValuePair to the list, which is a value
| type and doesn't actually allocate anything). I can
| easily see a newbie absorbing this BS - you need
| background knowledge to break it down. Or another example
| I wanted to check the rationale behind some lint warning,
| not only was it off base but it even said some blatantly
| wrong facts in the process (like default equality
| comparison in C# being ordinal ignore case ???).
|
| In my experience working with junior/mid members the
| amount of half assed/seemingly working solutions that I
| had to PR in last couple of months has increased and a
| lot (along with "shrug ChatGPT wrote it").
|
| Maybe in some areas like ASM for a specific machine
| there's not a lot of newbie friendly material and ChatGPT
| can grok it correctly (or it's easy to tweak the outputs
| because you know what it should look like) - but that's
| not the case for gamedev. Like there are multiple books
| titled "math for game developers" (OP use case).
| ghaff wrote:
| With respect to writing I've used it for things I know
| enough to write--and will have to look up some quotes,
| data, etc. in any case. GPT gives me a sort of 0th draft
| that saves me some time but I don't need to check every
| assertion to see if it's right or reasonable because I
| already know.
|
| But it doesn't really solve a business problem for me. Just
| saves some time and gives me a starting point. Though on-
| the-fly spellchecking and, to a lesser degree grammar
| checking, help me a lot too--especially if I'm not going to
| ultimately be copyedited.
| capableweb wrote:
| > Sound like you need to learn to search
|
| Sounds like you need to not be condescending :)
|
| Of course I've searched and tried countless of avenues to
| pick up this, I'm not saying it's absolutely not possible
| without GPT, just that I found it the easiest way of
| learning.
|
| And it's not "Write a function that does X" but more
| employing the Socratic method to help me further understand
| a subject, that I can then dive deeper into myself.
|
| But having a rubber duck is infinitive worth, if you happen
| to a programmer, you probably can see the value in this.
|
| > have you tried using it in an area you're an expert in ?
| The rate of convincing bullshit vs correct answers is
| astonishing. It gets better with Phind/Bing but then it's a
| roulette that it will hit valid answers in the index fast
| enough.
|
| Yes, programming is my expertise, and I use it daily for
| programming and it's doing fine for me (GPT4 that is,
| GPT3.5 and models before are basically trash).
|
| Bing is probably one of the worst implementations of GPT
| I've seen in the wild, so it seems like our experience
| already differs quite a bit.
|
| > you won't know when it's bullshiting you and you're
| missing out on learning how to actually learn.
|
| Yeah, you can tell relatively easy if it's bullshitting and
| making things up, if you're paying any sort of attention to
| what it tells you.
|
| > By the time LLMs are reliable enough to teach you,
| whatever you're learning is probably irrelevant since it
| can be solved better by LLM.
|
| Disagree, I'm not learning in order to generate more money
| for myself or whatever, I'm learning because the process of
| learning is fun, and I want to be able to build games
| myself. A LLM will never be able to replace that, as part
| of the fun is that I'm the one doing it.
| moonchrome wrote:
| >Yeah, you can tell relatively easy if it's bullshitting
| and making things up, if you're paying any sort of
| attention to what it tells you.
|
| It's trained on generating the most likely completion to
| some text, it's not at all easy to tell if it's
| bullshitting you if you're a newbie.
|
| Agreed that I was condescending and dismissive in my
| reply, been dealing with people trying to use ChatGPT to
| get free lunch without understanding the problem recently
| so I just assume at this point, my bad.
| ohmahjong wrote:
| I have personally found the rubber-ducking to be really
| helpful, especially for more exploratory work. I find
| myself typing "So if I understand correctly, the code
| does this this and this because of this" and usually get
| some helpful feedback.
|
| It feels a bit like pair programming with someone who
| knows 90% of the documentation for an older version of a
| relevant library - definitely more helpful than me by
| myself, and with somewhat less communication overhead
| that actually pairing with a human.
| lamontcg wrote:
| I don't particularly have a big problem with math at the
| level that AIs tend to be useful for, and find that it tends
| to hallucinate if you ask it anything which is moderately
| difficult.
|
| There's sort of a narrow area where if you ask it for
| something fairly common but moderately complicated like a
| translation matrix that it usually can come up with it, and
| can write it in the language that you specify. But guarding
| against hallucinations is almost as much trouble as looking
| it up on wikipedia or something and writing it yourself.
|
| The language model really needs to be combined with the hard
| rules of arithmetic/algebra/calculus/dimensional-analysis/etc
| in a way that it can't violate them and just mash up some
| equations that its been trained on even though the result is
| absolute nonsense.
| binarymax wrote:
| The techniques in this article are good practice for general
| model tuning and testing with a _correct answer_. So for tasks
| like extraction, labelling, classification, this is a great
| guide.
|
| The challenge comes when the response is a _subjective answer_.
| Tasks like summarization, open question answering generation,
| search query /question/result generation, are the hard things to
| test. Those typically will need another manual step in the
| process to grade the success of each result, and then you need to
| worry about bias/subjectivity of your expert graders. So then you
| might need multiple graders and consensus metrics. In short it
| makes the process very very slow, expensive, and tedious.
| jimbokun wrote:
| Just like it is with grading a student's English class essay,
| for example.
| IsaacL wrote:
| I pretty much agree. The "scientific" approach the author
| pushes for in the article -- running experiments with multiple
| similar prompts on problems where you desire a short specific
| answer, and then running a statistical analysis -- doesn't
| really make much sense for problems where you want a long,
| detailed answer.
|
| For things like creative writing, programming, summaries of
| historical events, producing basic analyses of
| countries/businesses/etc, I've found the incremental, trial-
| and-error approach to be best. For these problems, you have to
| expect that GPT will not reliably give you a perfect answer,
| and you will need to check and possibly edit its output. It can
| do a very good job at quickly generating multiple revisions,
| though.
|
| My favourite example was having GPT write some fictional
| stories from the point of view of different animals. The
| stories were very creative but sounded a bit repetitive. By
| giving it specific follow-up prompts ("revise the above to
| include a more diverse array of light and dark events; include
| concrete descriptions of sights, sounds, tastes, smells,
| textures and other tangible things" -- my actual prompts were a
| lot longer) the quality of the results went way up. This did
| not require a "scientific" approach but instead knowledge of
| what characterized good creative writing. Trying out variants
| of these prompts would not have been useful. Instead, it was
| clear that:
|
| - asking an initial prompt for background knowledge to set
| context - writing quite long prompts (for creative writing I
| saw better results with 2-3 paragraph prompts) - revising
| intelligently
|
| Consistently led to better results.
|
| On that note, this was the best resource I found for more
| complex prompting -- it details several techniques that you can
| "overlap" within one prompt:
|
| https://learnprompting.org/docs/intro
| alpark3 wrote:
| I use GPT-4 pretty consistently(set up a discord bot for myself).
| What I found myself doing was tending towards the most simple
| prompt that the LLM would still understand - If I asked a human
| expert the types of prompts I was giving GPT, I most likely
| would've gotten a clarifying question rather than an answer like
| the LLM was giving me, simply because I'm talking in such short
| and concise sentences.
|
| I think the interesting thing is that the more concise a message
| is to a fellow human, the more work needs to be done by the other
| party in order to actually decode my message, even if it is
| ultimately understandable. Whereas with LLMs, shorter token
| length doesn't really matter: matrices of the same size are being
| multiplied anyways.
| LegitShady wrote:
| I think because a human actually wants to figure out what you
| want, and a you're just going to keep prompting that ML model
| until you get something similar to what you want, something
| that would annoy a human and probably waste their time or make
| an endeavor extremely expensive.
|
| I don't think its really fundamental to LLMs its just that you
| don't treat a human the same way you treat an unthinking
| unfeeling computer system whose transactions are cheap and
| relatively near instant compared to requesting from a human.
| rvz wrote:
| This article reads into so much nonsense, I would not be
| surprised to see that some of the content has been generated by
| ChatGPT. I mean just look at this:
|
| > Citations required! I'm sorry, I didn't cite the experimental
| research to support these recommendations. The honest truth is
| that I'm too lazy to look up the papers I read about them (often
| multiple per point). If you choose not to believe me, that's
| fine, the more important point is that experimental studies on
| prompting techniques and their efficacy exist. But, I promise I
| didn't make these up, though it may be possible some are outdated
| with modern models.
|
| This person appears like they are just in the hype phase of the
| LLM and prompt mania and attempting to justify this new snake oil
| with all this jargon that not even they understand the inner
| workings of a AI model when it hallucinates frequently.
|
| "Prompt Engineering" and "Blind Prompting" is different branding
| of the same snake oil.
| mistercheph wrote:
| This article was written by chatgpt.
| kingforaday wrote:
| ...or maybe it's really Mitchell hiding behind every ChatGPT
| response? After all he is a machine.
| voidhorse wrote:
| I recall there used to be a school of thought that argued that
| making programming languages more like natural language was a
| futile effort, as the benefits of having a precise, limited,
| deterministic, if abstract, language for describing our ideas
| were far superior to any "close enough" approximation we could
| achieve with natural language. Where have those people gone?
|
| When I step back and think about this LLM craze, the only stance
| I'm left with is that I find it baffling that people are so
| excited about what is ultimately _a stochastic process, and what
| will always be a stochastic process_. It 's like the world has
| suddenly shifted from valuing deterministic, precise, behaviors
| to preferring this sort of "close enough, good enough" cavalier
| attitude to _everything_. All it took was for something shiny and
| new to gloss over all our concerns around precision and
| certainty. Sure, LLMs are great for _getting approximations
| quickly_ , but approximations are still just approximations.
| Where have the lovers of certainty and deduction gone? I can't
| help but think our general laziness and acceptance of "close
| enough" fast solutions is going to bite us in the end.
| jiggawatts wrote:
| Deterministic processes are great at dealing with objective
| data, but less great at dealing with free-form text produced by
| humans.
|
| Each tool should be used for the right job. Until now, we had
| only cheap plastic tools for language processing. Suddenly, we
| have a turbo power tool that can parse through pages of English
| like a hot knife through butter.
|
| We're all excited by the shiny new tool in the workshop, and
| we're putting everything through it just to see what it can do.
| Eventually the exuberance will subside and we'll put it to work
| where it is the most applicable.
|
| That doesn't mean we'll abandon other tools and methods.
| frabjoused wrote:
| I think you're just not thinking hard enough of ways to use it
| -- use cases where "close enough" can be augmented by
| deterministic validation, cleanup and iteration to perform
| real-world work that is "all the way".
|
| I'm currently littering my platform with small, server-side
| decisions made by LLM prompts and it's doing real work that is
| working. There are a ton of other people doing this right now.
| You can be as angry as you want about it, but in a year or two
| you'll be using the result of this work every day.
| [deleted]
| unbearded wrote:
| Maybe should be called Prompt Science or Prompt Discovery or even
| Prompt Craft.
|
| I have a 40 million BERT-embedding spotify-annoy index that I
| keep experimenting with to make a better query vector.
|
| One way that I'm doing is getting only the token vectors with the
| highest sum of the whole vector and averaging the top vectors to
| use as the query vector.
|
| Another way is zeroing many dimensions randomly on the query
| vector to introduce diversity.
|
| But after experimenting with "prompt engineering" I found out
| that prefixing the sentences for the query vectors with "prompts"
| yield very interesting results.
|
| But I don't see much engineering. It's more trial, feedback and
| trying again. Maybe even Prompt Art. Just like on chatGPT.
| z3c0 wrote:
| I like "prompt injection", personally. It's not as pretentious
| as "prompt engineering".
| d0gbread wrote:
| I think that's already taken and more about hacking via
| variables in the prompt like SQL injection.
|
| I would just got with prompt tuning.
| avoinot wrote:
| From this post: If you want to learn some more advanced
| techniques, Prompt Engineering by Lilian Weng provides a
| fantastic overview.
| gregsadetsky wrote:
| https://lilianweng.github.io/posts/2023-03-15-prompt-enginee...
| ?
| slowhadoken wrote:
| ridiculous
| dang wrote:
| " _Please don 't post shallow dismissals, especially of other
| people's work. A good critical comment teaches us something._"
|
| " _When disagreeing, please reply to the argument instead of
| calling names. 'That is idiotic; 1 + 1 is 2, not 3' can be
| shortened to '1 + 1 is 2, not 3._"
|
| https://news.ycombinator.com/newsguidelines.html
| epberry wrote:
| My personal next step with LLMs is to use them as completion
| engines versus just asking them questions. Few shot prompting is
| another intermediate skill I want to incorporate more.
| doubtfuluser wrote:
| A bit of an unpopular opinion as it seems, but I would actually
| bet that the current prompt engineering is just a short term
| thing. When the performance of LLMs continue to improve I
| actually expect that they will become much better to understand
| not so well formed prompts. Especially when you take into
| consideration that they now are trained with RLHF on _real_ users
| input. So it will probably become less of an engineering problem
| but more an articulation of what exactly you want
| thrashh wrote:
| I don't know.
|
| To talk to other humans, we literally have a whole writing
| field, courses to teach how to open to write technical
| documentation or research grants and so much more.
|
| There's already a whole industry already on how to talk to the
| human language model and humans are currently way smarter.
| Hackbraten wrote:
| Even as LLMs get better over time at understanding ill-formed
| prompts, I expect that API prices will still continue to depend
| on the number of tokens used. That's an incentive to minimize
| tokens, so "prompt engineering" might stick around, even if
| just for cost optimization.
| charcircuit wrote:
| Do you not expect a trend of token prices decreasing over
| time? There will be business using a less cutting edge model
| and the difference of how many words a prompt is won't be a
| big contributing factor to the total spend of the business.
| burtonator wrote:
| The next major leap in LLMs (in the next year) is probably
| going to be the prompt context size. Right now we have 2k, 4k,
| 8k ... but OpenAI also has a 32k model that they're not really
| giving access to unfortunately.
|
| The 8k model is nice but it's GPT4 so it's slow.
|
| I think the thing that you're missing is that zero shot
| learning is VERY hard but anything > GPT3 is actually pretty
| good once you give it some real world examples.
|
| I think prompt engineering is going to be here for a while just
| because, on a lot of task, examples are needed.
|
| Doesn't mean it needs to be a herculean effort of course. Just
| that you need to come up with some concrete examples.
|
| This is going to be ESPECIALLY true with Open Source LLMs that
| aren't anywhere near as sophisticated as GPT4.
|
| In fact, I think there's a huge opportunity to use GPT4 to
| train the prompts of smaller models, come up with more
| examples, and help improve their precision/recall without
| massive prompt engineering efforts.
| kiratp wrote:
| You can't commercially use anything you train off OpenAI
| outputs.
| sebzim4500 wrote:
| You can as long as the resulting model does not compete
| with OpenAI.
| rufius wrote:
| Can you elaborate?
| 411111111111111 wrote:
| They're probably talking about the TOS a user would've
| had to agree to when using their services. It's actually
| a lot more permissive then I expected
|
| > _Restrictions. You may not (i) use the Services in a
| way that infringes, misappropriates or violates any
| person's rights; (ii) reverse assemble, reverse compile,
| decompile, translate or otherwise attempt to discover the
| source code or underlying components of models,
| algorithms, and systems of the Services (except to the
| extent such restrictions are contrary to applicable law);
| (iii) use output from the Services to develop models that
| compete with OpenAI;_
| kiratp wrote:
| Their API TOS basically forbid it. Simple as that.
| MacsHeadroom wrote:
| Someone who acquires these outputs who has never
| consented to their ToS is not bound by their ToS.
| reissbaker wrote:
| Sure, but the ways of acquiring those outputs legally
| have vampiric licensing that bind you to those ToS, since
| the re-licenser is bound by the original ToS.
|
| It's like distributing GPL code in a nonfree application.
| Even if you didn't "consent to [the original author's]
| ToS," you are still going to be bound to it via the
| redistributors license.
| throwawayForMe2 wrote:
| >> The next major leap in LLMs (in the next year) is probably
| going to be the prompt context size. Right now we have 2k,
| 4k, 8k ... but OpenAI also has a 32k model that they're not
| really giving access to unfortunately.
|
| Saw this article today about a different approach that opens
| up orders of magnitude larger contexts
|
| https://hazyresearch.stanford.edu/blog/2023-03-07-hyena
| delusional wrote:
| How does that make sense? LLM's are machines that produce
| output from input, the position and distribution of that input
| in the latent space is highly predictive of the output. It
| seems fairly uncontroversial to expect some knowledge of the
| tokens and their individual contribution to that distribution
| in combination with the others, some intuition of the
| multivariate nonlinear behavior of the hidden layers, is
| exactly what would let you utilize this machine for anything
| useful.
|
| Regular people type all sorts of shit into google, but power
| users know how to query google effectively to work with the
| system. Knowing the right keywords is often half the work. I
| don't understand how the architecture of current LLMs are going
| to work around that feature.
| ransom1538 wrote:
| I expect the exact opposite. As more rules and regulations get
| put in, prompt engineering is going to be the new software
| development. "I would like you to pretend i need a lawyer
| dealing in a commercial lease that..."
| nicetryguy wrote:
| I remember being a "good google querier" before autocomplete
| rendered that mostly irrelevant. While i think you're right to
| some degree, you still have to articulate exactly what you want
| and need from this machine, and no amount of the LLM guessing
| what the intent was will ever replace specifically and
| explicitly stating your needs and goals. I see a continuing
| relationship with the complexity of the task tied to the
| required complexity of the request.
| james-revisoai wrote:
| Google autocomplete using your query history also reduces the
| information you learn from suggestions as you do the
| searching...
|
| While in the past "indexDB.set undefined in " might
| autocomplete to show safari first, indicating a vendor-
| specific bug, it'll often now prefill with some noun from
| whatever you last searched (e.g. "main window") to "help"
| you.
|
| Haven't found a way to disable that, annoying for
| understanding bugs, situations/context and root causes.
| ethbr0 wrote:
| Not just auto-complete, but Google removing power search
| capabilities (quotes, plus, etc).
|
| Here's hoping LLMs-as-a-service don't fall into the same
| trap.
|
| It's fine to optimize for the 80% of your users who write
| badly, but for god's sake _keep a bail-out for power users
| who want more control_.
|
| You don't have to make it the default... but just don't
| remove it!
| UltimateEdge wrote:
| Being able to compose a good query is still relevant I think!
| My peer once asked me for help with a mathematical problem,
| for which they could not find help online - after not much
| searching I could find a relevant page, given the same
| information/problem statement.
| [deleted]
| theK wrote:
| Not so sure about that. The biggest part of prompt engineering
| I am seeing is of the kind that sets up context to bootstrap a
| discussion on a predetermined domain.
|
| As I've said elsewhere, in most knowledge work context is key
| to getting viable results. I don't think something like this is
| ever going to get automated away, especially in the cases where
| the context comes from proprietary knowledge.
| dr_dshiv wrote:
| It isn't just engineering vs blind prompting. There is also
| "prompt vibing" where intuition comes into play.
| petetnt wrote:
| People spent years and years learning how to get the best
| answers with least possible efforts and search engines evolved
| with it. Seems pretty insane to me that we have now devolved
| into asking insanely specific and obtuse questions to receive
| obtuse answers to any questions.
| skybrian wrote:
| Learning to say what you want is a skill. Much like you can get
| better at searching, you can get better at saying what you
| want.
|
| The framework described in the blog post seems like a more
| formal way to do it, but there are other ways to iterate in
| conversation. After seeing the first result, you can explain
| better what you want. If you're not expecting to repeat the
| query then maybe that's good enough?
|
| I expect there will be better UI's that encourage iteration.
| Maybe you see a list of suggested prompts that are similar and
| decide which one you really want?
| ryanjshaw wrote:
| It depends on how you define "short term". If you until until
| AGI, then sure. Until then, however, for anything that is going
| to potentially generate revenue you will need to consider the
| points raised by the article to keep costs manageable, to avoid
| performance regression, etc.
___________________________________________________________________
(page generated 2023-04-22 23:00 UTC)