[HN Gopher] O1 isn't a chat model (and that's the point)
___________________________________________________________________
O1 isn't a chat model (and that's the point)
Author : gmays
Score : 118 points
Date : 2025-01-18 18:04 UTC (4 hours ago)
(HTM) web link (www.latent.space)
(TXT) w3m dump (www.latent.space)
| ttul wrote:
| FWIW: OpenAI provides advice on how to prompt o1
| (https://platform.openai.com/docs/guides/reasoning/advice-
| on-...). Their first bit of advice is to, "Keep prompts simple
| and direct: The models excel at understanding and responding to
| brief, clear instructions without the need for extensive
| guidance."
| 3abiton wrote:
| But the way they did their PR for O1 made it sound like it was
| the next step, while in reality it was a side step. A branching
| from the current direction towards AGI.
| jmcdonald-ut wrote:
| The article links out to OpenAI's advice on prompting, but it
| also claims: OpenAI does publish advice on
| prompting o1, but we find it incomplete, and in a
| sense you can view this article as a "Missing Manual"
| to lived experience using o1 and o1 pro in practice.
|
| To that end, the article does seem to contradict some of the
| advice OpenAI gives. E.g., the article recommends _stuffing the
| model with as much context as possible..._ while OpenAI 's docs
| note to _include only the most relevant information to prevent
| the model from overcomplicating its response_.
|
| I haven't used o1 enough to have my own opinion.
| irthomasthomas wrote:
| Those are contradictory. Openai claim that you don't need a
| manual, since O1 performs best with simple prompts. The
| author claims it performs better with more complex prompts,
| but provides no evidence.
| orf wrote:
| In case you missed it OpenAI does publish
| advice on prompting o1, but we find it incomplete,
| and in a sense you can view this article as a
| "Missing Manual" to lived experience using o1 and
| o1 pro in practice.
|
| The last line is important
| irthomasthomas wrote:
| But extraordinary claims require extraordinary proof.
| Openai tested the model for months and concluded that
| simple prompts are best. The author claims that complex
| prompts are best, but cites no evidence.
| orf wrote:
| I find it surprising that you think documentation issues
| are "extraordinary".
|
| You have read literally any documentation before, right?
| yzydserd wrote:
| I think there is a distinction between "instructions",
| "guidance" and "knowledge/context". I tend to provide o1 pro
| with a LOT of knowledge/context, a simple instruction, and no
| guidance. I think TFA is advocating same.
| wahnfrieden wrote:
| The advice is wrong
| chikere232 wrote:
| So in a sense, being an early adopter for the previous models
| makes you worse at this one?
| goolulusaurs wrote:
| The reality is that o1 is a step away from general intelligence
| and back towards narrow ai. It is great for solving the kinds of
| math, coding and logic puzzles it has been designed for, but for
| many kinds of tasks, including chat and creative writing, it is
| actually worse than 4o. It is good at the specific kinds of
| reasoning tasks that it was built for, much like alpha-go is
| great at playing go, but that does not actually mean it is more
| generally intelligent.
| adrianN wrote:
| So-so general intelligence is a lot harder to sell than narrow
| competence.
| kilroy123 wrote:
| Yes, I don't understand their ridiculous AGI hype. I get it you
| need to raise a lot of money.
|
| We need to crack the code for updating the base model on the
| fly or daily / weekly. Where is the regular learning by doing?
|
| Not over the course of a year, spending untold billions to do
| it.
| tomohelix wrote:
| Technically, the models can already learn on the fly. Just
| that the knowledge it can learn is limited to the context
| length. It cannot, to use the trendy word, "grok" it and
| internally adjust the weights in its neural network yet.
|
| To change this you would either need to let the model retrain
| itself every time it receives new information, or to have
| such a great context length that there is no effective
| difference. I suspect even meat models like our brains is
| still struggling to do this effectively and need a long rest
| cycle (i.e. sleep) to handle it. So the problem is inherently
| more difficult to solve than just "thinking". We may even
| need an entire new architecture different from the neural
| network to achieve this.
| KuriousCat wrote:
| Only small problem is that models are neither thinking nor
| understanding, I am not sure how this kind of wording is
| allowed with these models.
| chikere232 wrote:
| > Technically, the models can already learn on the fly.
| Just that the knowledge it can learn is limited to the
| context length.
|
| Isn't that just improving the prompt to the non-learning
| model?
| ninetyninenine wrote:
| I understand the hype. I think most humans understand why a
| machine responding to a query like never before in the
| history of mankind is amazing.
|
| What you're going through is hype overdose. You're numb to
| it. Like I can get if someone disagrees but it's a next level
| lack of understanding human behavior if you don't get the
| hype at all.
|
| There exists living human beings who are still children or
| with brain damage with comparable intelligence to an LLM and
| we classify those humans as conscious but we don't with LLMs.
|
| I'm not trying to say LLMs are conscious but just saying that
| the creation of LLMs marks a significant turning point. We
| crossed a barrier 2 years ago somewhat equivalent to landing
| on the moon and i am just dumb founded that someone doesn't
| understand why there is hype around this.
| bbarnett wrote:
| The first plane ever flies, and people think "we can fly to
| the moon soon!".
|
| Yet powered flight has nothing to do with space travel, no
| connection at all. Gliding in the air via low/high pressure
| doesn't mean you'll get near space, ever, with that tech.
| No matter how you try.
|
| AI and AGI are like this.
| ninetyninenine wrote:
| That's not true. There was not endless hype about flying
| to the moon when the first plane flew.
|
| People are well aware of the limits of LLMs.
|
| As slow as the progress is, we now have metrics and
| measurable progress towards agi even when there are clear
| signs of limitations on LLMs. We never had this before
| and everyone is aware of this. No one is delusional about
| it.
|
| The delusion is more around people who think other people
| are making claims of going to the moon in a year or
| something. I can see it in 10 to 30 years.
| dTal wrote:
| And yet, the moon was reached a mere 66 years after the
| first powered flight. Perhaps it's a better heuristic
| than you are insinuating...
|
| In all honesty, there are _lots_ of connections between
| powered flight and space travel. Two obvious ones are
| "light and strong metallurgy" and "a solid mathematical
| theory of thermodynamics". Once you can build lightweight
| and efficient combustion chambers, a lot becomes
| possible...
|
| Similarly, with LLMs, it's clear we've hit some kind of
| phase shift in what's possible - we now have enough
| compute, enough data, and enough know-how to be able to
| copy human symbolic thought by sheer brute-force. At the
| same time, through algorithms as "unconnected" as
| airplanes and spacecraft, computers can now synthesize
| plausible images, plausible music, plausible human
| speech, plausible anything you like really. Our
| capabilities have massively expanded in a short timespan
| - we have _cracked_ something. Something big, like
| lightweight combustion chambers.
|
| The status quo ante is useless to predict what will
| happen next.
| bbarnett wrote:
| By that metric, there are lots of connections between
| space flight and any other aspect of modern society.
|
| No plane, relying upon air pressure to fly, can ever use
| that method to get to the moon. Ever. Never ever.
|
| If you think it is, you're adding things to make a plane
| capable of space flight.
| madeofpalk wrote:
| LLMs will not give us "artificial general intelligence",
| whatever that means.
| swalsh wrote:
| In my opinion it's probably closer to real agi then it's not.
| I think the missing piece is learning after the pretraining
| phase.
| nurettin wrote:
| I think it means a self-sufficient mind, which LLMs
| inherently are not.
| righthand wrote:
| AGI currently is an intentionally vague and undefined goal.
| This allows businesses to operate towards a goal, define the
| parameters, and relish in the "rocket launches"-esque hype
| without leaving the vague umbrella of AI. It allows
| businesses to claim a double pursuit. Not only are they
| building AGI but all their work will surely benefit AI as
| well. How noble. Right?
|
| It's vagueness is intentional and allows you to ignore the
| blind truth and fill in the gaps yourself. You just have to
| believe it's right around the corner.
| pzs wrote:
| "If the human brain were so simple that we could understand
| it, we would be so simple that we couldn't." - without
| trying to defend such business practice, it appears very
| difficult to define what are necessary and sufficient
| properties that make AGI.
| UltraSane wrote:
| An AGI will be able to do any task any humans can do. Or all
| tasks any human can do. An AGI will be able to get any
| college degree.
| nkrisc wrote:
| So it's not an AGI if it can't create an AGI?
| UltraSane wrote:
| Humans might create AGI without fully understanding how.
| ithkuil wrote:
| Thus a machine can solve tasks without "understanding"
| them
| swyx wrote:
| it must be wonderful to live life with such supreme unfounded
| confidence. really, no sarcasm, i wonder what that is like.
| to be so sure of something when many smarter people are not,
| and when we dont know how our own intelligence fully works or
| evolved, and dont know if ANY lessons from our own
| intelligence even apply to artificial ones.
|
| and yet, so confident. so secure. interesting.
| raincole wrote:
| Which sounds like... a very good thing?
| fpgaminer wrote:
| It does seem like individual prompting styles greatly effects the
| performance of these models. Which makes sense of course, but the
| disparity is a lot larger than I would have expected. As an
| example, I'd say I see far more people in the HN comments
| preferring Claude over everything else. This is in stark contrast
| to my experience, where ChatGPT has and continues to be my go to
| for everything. And that's on a range of problems: general
| questions, coding tasks, visual understanding, and creative
| writing. I use these AIs all day, every day as part of my
| research, so my experience is quite extensive. Yet in all cases
| Claude has performed significantly worse for me. Perhaps it just
| comes down to the way that I prompt versus the average HN user?
| Very odd.
|
| But yeah, o1 has been a _huge_ leap in my experience. One huge
| thing, which OpenAI's announcement mentions as well, is that o1
| is more _consistently_ strong. 4o is a great model, but sometimes
| you have to spin the wheel a few times. I much more rarely need
| to spin o1's wheel, which mostly makes up for its thinking time.
| (Which is much less these days compared to o1-preview). It also
| has much stronger knowledge. So far it has solved a number of
| troubleshooting tasks that there were _no_ fixes for online. One
| of them was an obscure bug in libjpeg.
|
| It's also better at just general questions, like wanting to know
| the best/most reputable store for something. 4o is too
| "everything is good! everything is happy!" to give helpful advice
| here. It'll say Temu is a "great store for affordable options."
| That kind of stuff. Whereas o1 will be more honest and thus
| helpful. o1 is also significantly better at following
| instructions overall, and inferring meaning behind instructions.
| 4o will be very literal about examples that you give it whereas
| o1 can more often extrapolate.
|
| One surprising thing that o1 does that 4o has never done, is that
| it _pushes back_. It tells me when I'm wrong (and is often
| right!). Again, part of that being less happy and compliant. I
| have had scenarios where it's wrong and it's harder to convince
| it otherwise, so it's a double edged sword, but overall it has
| been an improvement in the bot's usefulness.
|
| I also find it interesting that o1 is less censored. It refuses
| far less than 4o, even without coaxing, despite its supposed
| ability to "reason" about its guidelines :P What's funny is that
| the "inner thoughts" that it shows says that it's refusing, but
| its response doesn't.
|
| Is it worth $200? I don't think it is, in general. It's not
| really an "engineer" replacement yet, in that if you don't have
| the knowledge to ask o1 the right questions it won't really be
| helpful. So you have to be an engineer for it to work at the
| level of one. Maybe $50/mo?
|
| I haven't found o1-pro to be useful for anything; it's never
| really given better responses than o1 for me.
|
| (As an aside, Gemini 2.0 Flash Experimental is _very_ good. It's
| been trading blows with even o1 for some tasks. It's a bit
| chaotic, since its training isn't done, but I rank it at about #2
| between all SOTA models. A 2.0 Pro model would likely be tied
| with o1 if Google's trajectory here continues.)
| refulgentis wrote:
| This is a bug, and a regression, not a feature.
|
| It's odd to see it recast as "you need to give better
| instructions [because it's different]" -- you could drop the
| "because it's different" part, and it'd apply to failure modes in
| all models.
|
| It also begs the question of _how_ it 's different: and that's
| where the rationale gets cyclical. You have to prompt it
| different because it's different because you have to prompt it
| different.
|
| And where that really gets into trouble is the "and that's the
| point" part -- as the other comment notes, it's expressly against
| OpenAI's documentation and thus intent.
|
| I'm a yuge AI fan. Models like this are a clear step forward. But
| it does a disservice to readers to leave the impression that the
| same techniques don't apply to other models, and recasts a
| significant issue as design intent.
| torginus wrote:
| I wouldn't be so harsh - you cold have a 4o style LLM turn
| vague user queries into precise constraints for an o1 style AI
| - this is how a lot of stable diffusion image generators work
| already.
| refulgentis wrote:
| Correct, you get it: its turtles all the way down, not "it's
| different intentionally"
| inciampati wrote:
| Looking at o1's behavior, it seems there's a key architectural
| limitation: while it can see chat history, it doesn't seem able
| to access its own reasoning steps after outputting them. This
| is particularly significant because it breaks the computational
| expressivity that made chain-of-thought prompting work in the
| first place--the ability to build up complex reasoning through
| iterative steps.
|
| This will only improve when o1's context windows grow large
| enough to maintain all its intermediate thinking steps, we're
| talking orders of magnitude beyond current limits. Until then,
| this isn't just a UX quirk, it's a fundamental constraint on
| the model's ability to develop thoughts over time.
| refulgentis wrote:
| Is that relevant here? the post discussed writing a long
| prompt to get a good answer, not issues with ex. step #2
| forgetting what was done in step #1.
| skissane wrote:
| > This will only improve when o1's context windows grow large
| enough to maintain all its intermediate thinking steps, we're
| talking orders of magnitude beyond current limits.
|
| Rather than retaining _all_ those steps, what about just
| retaining a summary of them? Or put them in a vector DB so on
| follow-up it can retrieve the subset of them most relevant to
| the follow-up question?
| HarHarVeryFunny wrote:
| It's different because a chat model has been post-trained for
| chat, while o1/o3 have been post-trained for reasoning.
|
| Imagine trying to have a conversation with someone who's been
| told to assume that they should interpret anything said to them
| as a problem they need to reason about and solve. I doubt you'd
| give them high marks for conversational skill.
|
| Ideally one model could do it all, but for now the tech is
| apparently being trained using reinforcement learning to steer
| the response towards a singular training goal (human feedback
| gaming, or successful reasoning).
| adamgordonbell wrote:
| I'd love to see some examples, of good and bad prompting of o1
|
| I'll admit I'm probably not using O1 well, but I'd learn best
| from examples.
| swyx wrote:
| coauthor/editor here!
|
| we recorded a followup conversation after the surprise popularity
| of this article breaking down some more thoughts and behind the
| scenes: https://youtu.be/NkHcSpOOC60?si=3KvtpyMYpdIafK3U
| cebert wrote:
| Thanks for sharing this video, swyx. I learned a lot from
| listening to it. I hadn't considered checking prompts for a
| project into source control. This video has also changed my
| approach to prompting in the future.
| swyx wrote:
| thanks for watching!
|
| "prompts in source control" is kinda like "configs in source
| control" for me. recommended for small projects, but at scale
| eventually you wanna abstract it out into some kind of prompt
| manager software for others to use and even for yourself to
| track and manage over time. git isnt the right database for
| everything.
| geor9e wrote:
| Instead of learning the latest workarounds for the kinks and
| quirks of a beta AI product, I'm going to wait 3 weeks for the
| advice to become completely obsolete
| thornewolf wrote:
| To be fair, the article basically says "ask the LLM for what
| you want in detail"
| fullstackwife wrote:
| great advice, but difficult to apply given very small context
| window of o1 models
| jameslk wrote:
| The churn is real. I wonder if so much churn due to innovation
| in a space can prevent enough adoption such that it actually
| reduces innovation
| miltonlost wrote:
| A constantly changing "API" coupled with a inherently
| unreliable output is not conducive to stable business.
| bbarnett wrote:
| Unless your business is customer service reps, with no
| ability to do anything but read scripts, who have no real
| knowledge of how things actually work.
|
| Then current AI is basically the same, for cheap.
| dartos wrote:
| Many service reps do have some expertise in the systems
| they support.
|
| Once you get past the tier 1 incoming calls, support is
| pretty specialized.
| bbarnett wrote:
| _Many service reps do have some expertise in the systems
| they support._
|
| I said "Unless your business is customer service reps,
| with...". It's a conditional. It doesn't mean all service
| reps are clueless, or scripted.
| ithkuil wrote:
| It's interesting that despite all these real issues you're
| pointing out a lot of people nevertheless are drawn to
| interact with this technology.
|
| It looks as if it touches some deep psychological lever:
| have an assistant that can help to carry out tasks that you
| don't have to bother learning the boring details of a
| craft.
|
| Unfortunately lead cannot yet be turned into gold
| dartos wrote:
| > a lot of people nevertheless are drawn to interact with
| this technology.
|
| To look at this statement cynically, a lot of people are
| drawn to anything with billions of dollars behind it...
| like literally anything.
|
| Not to mention the amount companies spend on marketing AI
| products.
|
| > It looks as if it touches some deep psychological
| lever: have an assistant that can help to carry out tasks
|
| That deep lever is "make value more cheaply and with less
| effort"
|
| From what I've seen, most of the professional interest in
| AI is based on cost cutting.
|
| There are a few (what I would call degenerate) groups who
| believe there is some consciousness behind these AI, but
| theyre very small group.
| dartos wrote:
| It's churn because every new model may or may not break
| strategies that worked before.
|
| Nobody is designing how to prompt models. It's an emergent
| property of these models, so they could just change entirely
| from each generation of any model.
| kyle_grove wrote:
| IMO the lack of real version control and lack of reliable
| programmability have been significant impediments to impact
| and adoption. The control surfaces are more brittle than
| say, regex, which isn't a good place to be.
|
| I would quibble that there is a modicum of design in
| prompting; RLHF, DPO and ORPO are explicitly designing the
| models to be more promptable. But the methods don't yet
| adequately scale to the variety of user inputs, especially
| in a customer-facing context.
|
| My preference would be for the field to put more emphasis
| on control over LLMs, but it seems like the momentum is
| again on training LLM-based AGIs. Perhaps the Bitter Lesson
| has struck again.
| raincole wrote:
| There was a debate over whether to integrate Stable Diffusion
| into the curriculum in a local art school here.
|
| Personally while I consider AI a useful tool, I think it's
| quite pointless to teach it in school, because whatever you
| learn will be obsolete next month.
|
| Of course some people might argue that the whole art school
| (it's already quite a "job-seeking" type, mostly digital
| painting/Adobe After Effect) will be obsolete anyway...
| londons_explore wrote:
| All knowledge degrades with time. Medical books from the
| 1800's wouldn't be a lot of use today.
|
| There is just a different decay curve for different topics.
|
| Part of 'knowing' a field is to learn it _and then keep up
| with the field_.
| swyx wrote:
| > whatever you learn will be obsolete next month
|
| this is exactly the kind of attitude that turns university
| courses into dinosaurs with far less connection to the "real
| world" industry than ideal. frankly its an excuse for
| laziness and luddism at this point. much of what i learned
| about food groups and economics and politics and writing in
| school is obsolete at this point, should my teachers not have
| bothered at all? out of what? fear?
|
| the way stable diffusion works hasn't really changed, and in
| fact people have just built comfyui layers and workflows on
| top of it in the ensuing 3 years, and the more you stick your
| head in the sand because you already predetermined the
| outcome you are mostly piling up the debt that your students
| will have to learn on their own because you were too insecure
| to make a call without trusting that your students can adjust
| as needed
| simonw wrote:
| The skill that's worth learning is how to investigate,
| experiment and think about these kinds of tools.
|
| A "Stable Diffusion" class might be a waste of time, but a
| "Generative art" class where students are challenged to
| explore what's available, share their own experiments and
| discuss under what circumstances these tools could be useful,
| harmful, productive, misleading etc feels like it would be
| very relevant to me, no matter where the technology goes
| next.
| moritzwarhier wrote:
| Very true regarding the subjects of a hypothetical AI art
| class.
|
| What's also important is the teaching of how commercial art
| or art in general is conceptualized, in other words:
|
| What is important and why? Design thinking. I know that
| phrase might sound dated but that's the work what humans
| should fear being replaced on / foster their skills.
|
| That's also the line that at first seems to be blurred when
| using generative text-to-image AI, or LLMs in general.
|
| The seemingly magical connection between prompt and result
| appears to human users like the work of a creative entity
| distilling and developing an idea.
|
| That's the most important aspect of all creative work.
|
| If you read my reply, thanks Simon, your blog's an amazing
| companion in the boom of generative AI. Was a regular
| reader in 2022/2023, should revisit! I think you guided me
| through my first local LLama setup.
| dyauspitr wrote:
| Integrating it into the curriculum is strange. They should do
| one time introductory lectures instead.
| icpmacdo wrote:
| Modern AI both shortens the useful lifespan of software and
| increases the importance of development speed. Waiting around
| doesn't seem optimal right now.
| QuantumGood wrote:
| Great summary of how AI compresses the development (and hype)
| product cycle
| samrolken wrote:
| I have a lot of luck using 4o to build and iterate on context and
| then carry that into o1. I'll ask 4o to break down concepts, make
| outlines, identify missing information and think of more angles
| and options. Then at the end, switch on o1 which can use all that
| context.
| sklargh wrote:
| This echoes my experience. I often use ChatGPT to help with D&D
| module design and I found that O1 did best when I told it exactly
| what k required, dumped in a large amount of info and did not
| expect to use it to iterate multiple times.
| keizo wrote:
| I made a tool for manually collecting context. I use it when
| copying and pasting multiple files is cumbersome:
| https://pypi.org/project/ggrab/
| franze wrote:
| i creates thisismy.franzai.com for the same reason
| miltonlost wrote:
| oh god using an LLM for medical advice? and maybe getting 3/5
| right? Barely above a coin flip.
|
| And that Warning section? "Do not be wrong. Give the correct
| names." That this is necessary to include is an idiotic product
| "choice" since its non-inclusion implies the bot is able to be
| wrong and give wrong names. This is not engineering.
| isoprophlex wrote:
| Not if you're selecting out of 10s or 100s of possible
| diagnoses
| miltonlost wrote:
| ?????? What?
|
| > Just for fun, I started asking o1 in parallel. It's usually
| shockingly close to the right answer -- maybe 3/5 times. More
| useful for medical professionals -- it almost always provides
| an extremely accurate differential diagnosis.
|
| THIS IS DANGEROUS TO TELL PEOPLE TO DO. OpenAI is not a
| medical professional. Stop using chatbots for medical
| diagnoses. 60% is not almost always extremely accurate. This
| whole post, because of this bullet point, shows the author
| doesn't actually know the limitations of the product they're
| using and instead passing along misinformation.
|
| Go to a doctor, not your chatbot.
| simonw wrote:
| I honestly think trusting exclusively your own doctor is a
| dangerous thing to do as well. Doctors are not infallible.
|
| It's worth putting in some extra effort yourself, which may
| include consulting with LLMs provided you don't trust those
| blindly and are sensible about how you incorporate hints
| they give you into your own research.
|
| Nobody is as invested in your own health as you are.
| PollardsRho wrote:
| It's hard to characterize the entropy of the distribution of
| potential diseases given a presentation: even if there are in
| theory many potential diagnoses, in practice a few will be a
| lot more common.
|
| It doesn't really matter how much better the model is than
| random chance on a sample size of 5, though. There's a reason
| medicine is so heavily licensed: people die when they get
| uninformed advice. Asking o1 if you have skin cancer is
| gambling with your life.
|
| That's not to say AI can't be useful in medicine: everyone
| doesn't have a dermatologist friend, after all, and I'm sure
| for many underserved people basic advice is better than
| nothing. Tools could make the current medical system more
| efficient. But you would need to do so much more work than
| whatever this post did to ascertain whether that would do
| more good than harm. Can o1 properly direct people to a
| medical expert if there's a potentially urgent problem that
| can't be ruled out? Can it effectively disclaim its own
| advice when asked about something it doesn't know about, the
| way human doctors refer to specialists?
| isoprophlex wrote:
| People agreeing and disagreeing about the central thesis of the
| article, which is fine because i enjoy the discussion...
|
| no matter where you stand in the specific o1/o3 discussion the
| concept of "question entropy" is very enlightening.
|
| what is the question of theoretical minimum complexity that still
| solves your question adequately? or for a specific model, are its
| users capable of supplying the minimum required intellectual
| complexity the model needs?
|
| Would be interesting to quantify these two and see if our models
| are close to converging on certain task domains.
| iovrthoughtthis wrote:
| this is hilarious
| irthomasthomas wrote:
| Can you provide prompt/response pairs? I'd like to test how other
| models perform using the same technique.
| swalsh wrote:
| Work with chat bots like a junior dev, work with o1 like a senior
| dev.
| timewizard wrote:
| > To justify the $200/mo price tag, it just has to provide 1-2
| Engineer hours a month
|
| > Give a ton of context. Whatever you think I mean by a "ton" --
| 10x that.
|
| One step forward. Two steps back.
| martythemaniak wrote:
| One thing I'd like to experiment with is "prompt to service". I
| want to take an existing microservice of about 3-5kloc and see if
| I can write a prompt to get o1 to generate the entire service,
| proper structure, all files, all tests, compiles and passes etc.
| o1 certainly has the context window to do this at 200k input and
| 100k output - code is ~10 tokens per line of code, so you'd need
| like 100k input and 50k output tokens.
|
| My approach would be:
|
| - take an exemplar service, dump it in the context
|
| - provide examples explaining specific things in the exemplar
| service
|
| - write a detailed formal spec
|
| - ask for the output in JSON to simplify writing the code -
| [{"filename":"./src/index.php", "contents":"<?php...."}]
|
| The first try would inevitably fail, so I'd provide errors and
| feedback, and ask for new code (ie complete service, not diffs or
| explanations), plus have o1 update and rewrite the spec based on
| my feedback and errors.
|
| Curious if anyone's tried something like this.
| patrickhogan1 wrote:
| The buggy nature of o1 in ChatGPT is what prevents me from using
| it the most.
|
| Waiting is one thing, but waiting to return to a prompt that
| never completes is frustrating. It's the same frustration you get
| from a long running 'make/npm/brew/pip' command that errors out
| right as it's about to finish.
|
| One pattern that's been effective is
|
| 1. Use Claude Developer Prompt Generator to create a prompt for
| what I want.
|
| 2. Run the prompt on o1 pro mode
| inciampati wrote:
| o1 appears to not be able to see it's own reasoning traces. Or
| it's own context is potentially being summarized to deal with the
| cost of giving access to all those chain of thought traces and
| the chat history. This breaks the computational expressivity or
| chain of thought, which supports universal (general) reasoning if
| you have reliable access to the things you've thought, and is
| threshold circuit (TC0) or bounded parallel pattern matcher when
| not.
___________________________________________________________________
(page generated 2025-01-18 23:01 UTC)