[HN Gopher] O1 isn't a chat model (and that's the point)
       ___________________________________________________________________
        
       O1 isn't a chat model (and that's the point)
        
       Author : gmays
       Score  : 118 points
       Date   : 2025-01-18 18:04 UTC (4 hours ago)
        
 (HTM) web link (www.latent.space)
 (TXT) w3m dump (www.latent.space)
        
       | ttul wrote:
       | FWIW: OpenAI provides advice on how to prompt o1
       | (https://platform.openai.com/docs/guides/reasoning/advice-
       | on-...). Their first bit of advice is to, "Keep prompts simple
       | and direct: The models excel at understanding and responding to
       | brief, clear instructions without the need for extensive
       | guidance."
        
         | 3abiton wrote:
         | But the way they did their PR for O1 made it sound like it was
         | the next step, while in reality it was a side step. A branching
         | from the current direction towards AGI.
        
         | jmcdonald-ut wrote:
         | The article links out to OpenAI's advice on prompting, but it
         | also claims:                   OpenAI does publish advice on
         | prompting o1,          but we find it incomplete, and in a
         | sense you can         view this article as a "Missing Manual"
         | to lived         experience using o1 and o1 pro in practice.
         | 
         | To that end, the article does seem to contradict some of the
         | advice OpenAI gives. E.g., the article recommends _stuffing the
         | model with as much context as possible..._ while OpenAI 's docs
         | note to _include only the most relevant information to prevent
         | the model from overcomplicating its response_.
         | 
         | I haven't used o1 enough to have my own opinion.
        
           | irthomasthomas wrote:
           | Those are contradictory. Openai claim that you don't need a
           | manual, since O1 performs best with simple prompts. The
           | author claims it performs better with more complex prompts,
           | but provides no evidence.
        
             | orf wrote:
             | In case you missed it                   OpenAI does publish
             | advice on prompting o1,          but we find it incomplete,
             | and in a sense you can         view this article as a
             | "Missing Manual" to lived         experience using o1 and
             | o1 pro in practice.
             | 
             | The last line is important
        
               | irthomasthomas wrote:
               | But extraordinary claims require extraordinary proof.
               | Openai tested the model for months and concluded that
               | simple prompts are best. The author claims that complex
               | prompts are best, but cites no evidence.
        
               | orf wrote:
               | I find it surprising that you think documentation issues
               | are "extraordinary".
               | 
               | You have read literally any documentation before, right?
        
         | yzydserd wrote:
         | I think there is a distinction between "instructions",
         | "guidance" and "knowledge/context". I tend to provide o1 pro
         | with a LOT of knowledge/context, a simple instruction, and no
         | guidance. I think TFA is advocating same.
        
         | wahnfrieden wrote:
         | The advice is wrong
        
         | chikere232 wrote:
         | So in a sense, being an early adopter for the previous models
         | makes you worse at this one?
        
       | goolulusaurs wrote:
       | The reality is that o1 is a step away from general intelligence
       | and back towards narrow ai. It is great for solving the kinds of
       | math, coding and logic puzzles it has been designed for, but for
       | many kinds of tasks, including chat and creative writing, it is
       | actually worse than 4o. It is good at the specific kinds of
       | reasoning tasks that it was built for, much like alpha-go is
       | great at playing go, but that does not actually mean it is more
       | generally intelligent.
        
         | adrianN wrote:
         | So-so general intelligence is a lot harder to sell than narrow
         | competence.
        
         | kilroy123 wrote:
         | Yes, I don't understand their ridiculous AGI hype. I get it you
         | need to raise a lot of money.
         | 
         | We need to crack the code for updating the base model on the
         | fly or daily / weekly. Where is the regular learning by doing?
         | 
         | Not over the course of a year, spending untold billions to do
         | it.
        
           | tomohelix wrote:
           | Technically, the models can already learn on the fly. Just
           | that the knowledge it can learn is limited to the context
           | length. It cannot, to use the trendy word, "grok" it and
           | internally adjust the weights in its neural network yet.
           | 
           | To change this you would either need to let the model retrain
           | itself every time it receives new information, or to have
           | such a great context length that there is no effective
           | difference. I suspect even meat models like our brains is
           | still struggling to do this effectively and need a long rest
           | cycle (i.e. sleep) to handle it. So the problem is inherently
           | more difficult to solve than just "thinking". We may even
           | need an entire new architecture different from the neural
           | network to achieve this.
        
             | KuriousCat wrote:
             | Only small problem is that models are neither thinking nor
             | understanding, I am not sure how this kind of wording is
             | allowed with these models.
        
             | chikere232 wrote:
             | > Technically, the models can already learn on the fly.
             | Just that the knowledge it can learn is limited to the
             | context length.
             | 
             | Isn't that just improving the prompt to the non-learning
             | model?
        
           | ninetyninenine wrote:
           | I understand the hype. I think most humans understand why a
           | machine responding to a query like never before in the
           | history of mankind is amazing.
           | 
           | What you're going through is hype overdose. You're numb to
           | it. Like I can get if someone disagrees but it's a next level
           | lack of understanding human behavior if you don't get the
           | hype at all.
           | 
           | There exists living human beings who are still children or
           | with brain damage with comparable intelligence to an LLM and
           | we classify those humans as conscious but we don't with LLMs.
           | 
           | I'm not trying to say LLMs are conscious but just saying that
           | the creation of LLMs marks a significant turning point. We
           | crossed a barrier 2 years ago somewhat equivalent to landing
           | on the moon and i am just dumb founded that someone doesn't
           | understand why there is hype around this.
        
             | bbarnett wrote:
             | The first plane ever flies, and people think "we can fly to
             | the moon soon!".
             | 
             | Yet powered flight has nothing to do with space travel, no
             | connection at all. Gliding in the air via low/high pressure
             | doesn't mean you'll get near space, ever, with that tech.
             | No matter how you try.
             | 
             | AI and AGI are like this.
        
               | ninetyninenine wrote:
               | That's not true. There was not endless hype about flying
               | to the moon when the first plane flew.
               | 
               | People are well aware of the limits of LLMs.
               | 
               | As slow as the progress is, we now have metrics and
               | measurable progress towards agi even when there are clear
               | signs of limitations on LLMs. We never had this before
               | and everyone is aware of this. No one is delusional about
               | it.
               | 
               | The delusion is more around people who think other people
               | are making claims of going to the moon in a year or
               | something. I can see it in 10 to 30 years.
        
               | dTal wrote:
               | And yet, the moon was reached a mere 66 years after the
               | first powered flight. Perhaps it's a better heuristic
               | than you are insinuating...
               | 
               | In all honesty, there are _lots_ of connections between
               | powered flight and space travel. Two obvious ones are
               | "light and strong metallurgy" and "a solid mathematical
               | theory of thermodynamics". Once you can build lightweight
               | and efficient combustion chambers, a lot becomes
               | possible...
               | 
               | Similarly, with LLMs, it's clear we've hit some kind of
               | phase shift in what's possible - we now have enough
               | compute, enough data, and enough know-how to be able to
               | copy human symbolic thought by sheer brute-force. At the
               | same time, through algorithms as "unconnected" as
               | airplanes and spacecraft, computers can now synthesize
               | plausible images, plausible music, plausible human
               | speech, plausible anything you like really. Our
               | capabilities have massively expanded in a short timespan
               | - we have _cracked_ something. Something big, like
               | lightweight combustion chambers.
               | 
               | The status quo ante is useless to predict what will
               | happen next.
        
               | bbarnett wrote:
               | By that metric, there are lots of connections between
               | space flight and any other aspect of modern society.
               | 
               | No plane, relying upon air pressure to fly, can ever use
               | that method to get to the moon. Ever. Never ever.
               | 
               | If you think it is, you're adding things to make a plane
               | capable of space flight.
        
         | madeofpalk wrote:
         | LLMs will not give us "artificial general intelligence",
         | whatever that means.
        
           | swalsh wrote:
           | In my opinion it's probably closer to real agi then it's not.
           | I think the missing piece is learning after the pretraining
           | phase.
        
           | nurettin wrote:
           | I think it means a self-sufficient mind, which LLMs
           | inherently are not.
        
           | righthand wrote:
           | AGI currently is an intentionally vague and undefined goal.
           | This allows businesses to operate towards a goal, define the
           | parameters, and relish in the "rocket launches"-esque hype
           | without leaving the vague umbrella of AI. It allows
           | businesses to claim a double pursuit. Not only are they
           | building AGI but all their work will surely benefit AI as
           | well. How noble. Right?
           | 
           | It's vagueness is intentional and allows you to ignore the
           | blind truth and fill in the gaps yourself. You just have to
           | believe it's right around the corner.
        
             | pzs wrote:
             | "If the human brain were so simple that we could understand
             | it, we would be so simple that we couldn't." - without
             | trying to defend such business practice, it appears very
             | difficult to define what are necessary and sufficient
             | properties that make AGI.
        
           | UltraSane wrote:
           | An AGI will be able to do any task any humans can do. Or all
           | tasks any human can do. An AGI will be able to get any
           | college degree.
        
             | nkrisc wrote:
             | So it's not an AGI if it can't create an AGI?
        
               | UltraSane wrote:
               | Humans might create AGI without fully understanding how.
        
               | ithkuil wrote:
               | Thus a machine can solve tasks without "understanding"
               | them
        
           | swyx wrote:
           | it must be wonderful to live life with such supreme unfounded
           | confidence. really, no sarcasm, i wonder what that is like.
           | to be so sure of something when many smarter people are not,
           | and when we dont know how our own intelligence fully works or
           | evolved, and dont know if ANY lessons from our own
           | intelligence even apply to artificial ones.
           | 
           | and yet, so confident. so secure. interesting.
        
         | raincole wrote:
         | Which sounds like... a very good thing?
        
       | fpgaminer wrote:
       | It does seem like individual prompting styles greatly effects the
       | performance of these models. Which makes sense of course, but the
       | disparity is a lot larger than I would have expected. As an
       | example, I'd say I see far more people in the HN comments
       | preferring Claude over everything else. This is in stark contrast
       | to my experience, where ChatGPT has and continues to be my go to
       | for everything. And that's on a range of problems: general
       | questions, coding tasks, visual understanding, and creative
       | writing. I use these AIs all day, every day as part of my
       | research, so my experience is quite extensive. Yet in all cases
       | Claude has performed significantly worse for me. Perhaps it just
       | comes down to the way that I prompt versus the average HN user?
       | Very odd.
       | 
       | But yeah, o1 has been a _huge_ leap in my experience. One huge
       | thing, which OpenAI's announcement mentions as well, is that o1
       | is more _consistently_ strong. 4o is a great model, but sometimes
       | you have to spin the wheel a few times. I much more rarely need
       | to spin o1's wheel, which mostly makes up for its thinking time.
       | (Which is much less these days compared to o1-preview). It also
       | has much stronger knowledge. So far it has solved a number of
       | troubleshooting tasks that there were _no_ fixes for online. One
       | of them was an obscure bug in libjpeg.
       | 
       | It's also better at just general questions, like wanting to know
       | the best/most reputable store for something. 4o is too
       | "everything is good! everything is happy!" to give helpful advice
       | here. It'll say Temu is a "great store for affordable options."
       | That kind of stuff. Whereas o1 will be more honest and thus
       | helpful. o1 is also significantly better at following
       | instructions overall, and inferring meaning behind instructions.
       | 4o will be very literal about examples that you give it whereas
       | o1 can more often extrapolate.
       | 
       | One surprising thing that o1 does that 4o has never done, is that
       | it _pushes back_. It tells me when I'm wrong (and is often
       | right!). Again, part of that being less happy and compliant. I
       | have had scenarios where it's wrong and it's harder to convince
       | it otherwise, so it's a double edged sword, but overall it has
       | been an improvement in the bot's usefulness.
       | 
       | I also find it interesting that o1 is less censored. It refuses
       | far less than 4o, even without coaxing, despite its supposed
       | ability to "reason" about its guidelines :P What's funny is that
       | the "inner thoughts" that it shows says that it's refusing, but
       | its response doesn't.
       | 
       | Is it worth $200? I don't think it is, in general. It's not
       | really an "engineer" replacement yet, in that if you don't have
       | the knowledge to ask o1 the right questions it won't really be
       | helpful. So you have to be an engineer for it to work at the
       | level of one. Maybe $50/mo?
       | 
       | I haven't found o1-pro to be useful for anything; it's never
       | really given better responses than o1 for me.
       | 
       | (As an aside, Gemini 2.0 Flash Experimental is _very_ good. It's
       | been trading blows with even o1 for some tasks. It's a bit
       | chaotic, since its training isn't done, but I rank it at about #2
       | between all SOTA models. A 2.0 Pro model would likely be tied
       | with o1 if Google's trajectory here continues.)
        
       | refulgentis wrote:
       | This is a bug, and a regression, not a feature.
       | 
       | It's odd to see it recast as "you need to give better
       | instructions [because it's different]" -- you could drop the
       | "because it's different" part, and it'd apply to failure modes in
       | all models.
       | 
       | It also begs the question of _how_ it 's different: and that's
       | where the rationale gets cyclical. You have to prompt it
       | different because it's different because you have to prompt it
       | different.
       | 
       | And where that really gets into trouble is the "and that's the
       | point" part -- as the other comment notes, it's expressly against
       | OpenAI's documentation and thus intent.
       | 
       | I'm a yuge AI fan. Models like this are a clear step forward. But
       | it does a disservice to readers to leave the impression that the
       | same techniques don't apply to other models, and recasts a
       | significant issue as design intent.
        
         | torginus wrote:
         | I wouldn't be so harsh - you cold have a 4o style LLM turn
         | vague user queries into precise constraints for an o1 style AI
         | - this is how a lot of stable diffusion image generators work
         | already.
        
           | refulgentis wrote:
           | Correct, you get it: its turtles all the way down, not "it's
           | different intentionally"
        
         | inciampati wrote:
         | Looking at o1's behavior, it seems there's a key architectural
         | limitation: while it can see chat history, it doesn't seem able
         | to access its own reasoning steps after outputting them. This
         | is particularly significant because it breaks the computational
         | expressivity that made chain-of-thought prompting work in the
         | first place--the ability to build up complex reasoning through
         | iterative steps.
         | 
         | This will only improve when o1's context windows grow large
         | enough to maintain all its intermediate thinking steps, we're
         | talking orders of magnitude beyond current limits. Until then,
         | this isn't just a UX quirk, it's a fundamental constraint on
         | the model's ability to develop thoughts over time.
        
           | refulgentis wrote:
           | Is that relevant here? the post discussed writing a long
           | prompt to get a good answer, not issues with ex. step #2
           | forgetting what was done in step #1.
        
           | skissane wrote:
           | > This will only improve when o1's context windows grow large
           | enough to maintain all its intermediate thinking steps, we're
           | talking orders of magnitude beyond current limits.
           | 
           | Rather than retaining _all_ those steps, what about just
           | retaining a summary of them? Or put them in a vector DB so on
           | follow-up it can retrieve the subset of them most relevant to
           | the follow-up question?
        
         | HarHarVeryFunny wrote:
         | It's different because a chat model has been post-trained for
         | chat, while o1/o3 have been post-trained for reasoning.
         | 
         | Imagine trying to have a conversation with someone who's been
         | told to assume that they should interpret anything said to them
         | as a problem they need to reason about and solve. I doubt you'd
         | give them high marks for conversational skill.
         | 
         | Ideally one model could do it all, but for now the tech is
         | apparently being trained using reinforcement learning to steer
         | the response towards a singular training goal (human feedback
         | gaming, or successful reasoning).
        
       | adamgordonbell wrote:
       | I'd love to see some examples, of good and bad prompting of o1
       | 
       | I'll admit I'm probably not using O1 well, but I'd learn best
       | from examples.
        
       | swyx wrote:
       | coauthor/editor here!
       | 
       | we recorded a followup conversation after the surprise popularity
       | of this article breaking down some more thoughts and behind the
       | scenes: https://youtu.be/NkHcSpOOC60?si=3KvtpyMYpdIafK3U
        
         | cebert wrote:
         | Thanks for sharing this video, swyx. I learned a lot from
         | listening to it. I hadn't considered checking prompts for a
         | project into source control. This video has also changed my
         | approach to prompting in the future.
        
           | swyx wrote:
           | thanks for watching!
           | 
           | "prompts in source control" is kinda like "configs in source
           | control" for me. recommended for small projects, but at scale
           | eventually you wanna abstract it out into some kind of prompt
           | manager software for others to use and even for yourself to
           | track and manage over time. git isnt the right database for
           | everything.
        
       | geor9e wrote:
       | Instead of learning the latest workarounds for the kinks and
       | quirks of a beta AI product, I'm going to wait 3 weeks for the
       | advice to become completely obsolete
        
         | thornewolf wrote:
         | To be fair, the article basically says "ask the LLM for what
         | you want in detail"
        
           | fullstackwife wrote:
           | great advice, but difficult to apply given very small context
           | window of o1 models
        
         | jameslk wrote:
         | The churn is real. I wonder if so much churn due to innovation
         | in a space can prevent enough adoption such that it actually
         | reduces innovation
        
           | miltonlost wrote:
           | A constantly changing "API" coupled with a inherently
           | unreliable output is not conducive to stable business.
        
             | bbarnett wrote:
             | Unless your business is customer service reps, with no
             | ability to do anything but read scripts, who have no real
             | knowledge of how things actually work.
             | 
             | Then current AI is basically the same, for cheap.
        
               | dartos wrote:
               | Many service reps do have some expertise in the systems
               | they support.
               | 
               | Once you get past the tier 1 incoming calls, support is
               | pretty specialized.
        
               | bbarnett wrote:
               | _Many service reps do have some expertise in the systems
               | they support._
               | 
               | I said "Unless your business is customer service reps,
               | with...". It's a conditional. It doesn't mean all service
               | reps are clueless, or scripted.
        
             | ithkuil wrote:
             | It's interesting that despite all these real issues you're
             | pointing out a lot of people nevertheless are drawn to
             | interact with this technology.
             | 
             | It looks as if it touches some deep psychological lever:
             | have an assistant that can help to carry out tasks that you
             | don't have to bother learning the boring details of a
             | craft.
             | 
             | Unfortunately lead cannot yet be turned into gold
        
               | dartos wrote:
               | > a lot of people nevertheless are drawn to interact with
               | this technology.
               | 
               | To look at this statement cynically, a lot of people are
               | drawn to anything with billions of dollars behind it...
               | like literally anything.
               | 
               | Not to mention the amount companies spend on marketing AI
               | products.
               | 
               | > It looks as if it touches some deep psychological
               | lever: have an assistant that can help to carry out tasks
               | 
               | That deep lever is "make value more cheaply and with less
               | effort"
               | 
               | From what I've seen, most of the professional interest in
               | AI is based on cost cutting.
               | 
               | There are a few (what I would call degenerate) groups who
               | believe there is some consciousness behind these AI, but
               | theyre very small group.
        
           | dartos wrote:
           | It's churn because every new model may or may not break
           | strategies that worked before.
           | 
           | Nobody is designing how to prompt models. It's an emergent
           | property of these models, so they could just change entirely
           | from each generation of any model.
        
             | kyle_grove wrote:
             | IMO the lack of real version control and lack of reliable
             | programmability have been significant impediments to impact
             | and adoption. The control surfaces are more brittle than
             | say, regex, which isn't a good place to be.
             | 
             | I would quibble that there is a modicum of design in
             | prompting; RLHF, DPO and ORPO are explicitly designing the
             | models to be more promptable. But the methods don't yet
             | adequately scale to the variety of user inputs, especially
             | in a customer-facing context.
             | 
             | My preference would be for the field to put more emphasis
             | on control over LLMs, but it seems like the momentum is
             | again on training LLM-based AGIs. Perhaps the Bitter Lesson
             | has struck again.
        
         | raincole wrote:
         | There was a debate over whether to integrate Stable Diffusion
         | into the curriculum in a local art school here.
         | 
         | Personally while I consider AI a useful tool, I think it's
         | quite pointless to teach it in school, because whatever you
         | learn will be obsolete next month.
         | 
         | Of course some people might argue that the whole art school
         | (it's already quite a "job-seeking" type, mostly digital
         | painting/Adobe After Effect) will be obsolete anyway...
        
           | londons_explore wrote:
           | All knowledge degrades with time. Medical books from the
           | 1800's wouldn't be a lot of use today.
           | 
           | There is just a different decay curve for different topics.
           | 
           | Part of 'knowing' a field is to learn it _and then keep up
           | with the field_.
        
           | swyx wrote:
           | > whatever you learn will be obsolete next month
           | 
           | this is exactly the kind of attitude that turns university
           | courses into dinosaurs with far less connection to the "real
           | world" industry than ideal. frankly its an excuse for
           | laziness and luddism at this point. much of what i learned
           | about food groups and economics and politics and writing in
           | school is obsolete at this point, should my teachers not have
           | bothered at all? out of what? fear?
           | 
           | the way stable diffusion works hasn't really changed, and in
           | fact people have just built comfyui layers and workflows on
           | top of it in the ensuing 3 years, and the more you stick your
           | head in the sand because you already predetermined the
           | outcome you are mostly piling up the debt that your students
           | will have to learn on their own because you were too insecure
           | to make a call without trusting that your students can adjust
           | as needed
        
           | simonw wrote:
           | The skill that's worth learning is how to investigate,
           | experiment and think about these kinds of tools.
           | 
           | A "Stable Diffusion" class might be a waste of time, but a
           | "Generative art" class where students are challenged to
           | explore what's available, share their own experiments and
           | discuss under what circumstances these tools could be useful,
           | harmful, productive, misleading etc feels like it would be
           | very relevant to me, no matter where the technology goes
           | next.
        
             | moritzwarhier wrote:
             | Very true regarding the subjects of a hypothetical AI art
             | class.
             | 
             | What's also important is the teaching of how commercial art
             | or art in general is conceptualized, in other words:
             | 
             | What is important and why? Design thinking. I know that
             | phrase might sound dated but that's the work what humans
             | should fear being replaced on / foster their skills.
             | 
             | That's also the line that at first seems to be blurred when
             | using generative text-to-image AI, or LLMs in general.
             | 
             | The seemingly magical connection between prompt and result
             | appears to human users like the work of a creative entity
             | distilling and developing an idea.
             | 
             | That's the most important aspect of all creative work.
             | 
             | If you read my reply, thanks Simon, your blog's an amazing
             | companion in the boom of generative AI. Was a regular
             | reader in 2022/2023, should revisit! I think you guided me
             | through my first local LLama setup.
        
           | dyauspitr wrote:
           | Integrating it into the curriculum is strange. They should do
           | one time introductory lectures instead.
        
         | icpmacdo wrote:
         | Modern AI both shortens the useful lifespan of software and
         | increases the importance of development speed. Waiting around
         | doesn't seem optimal right now.
        
         | QuantumGood wrote:
         | Great summary of how AI compresses the development (and hype)
         | product cycle
        
       | samrolken wrote:
       | I have a lot of luck using 4o to build and iterate on context and
       | then carry that into o1. I'll ask 4o to break down concepts, make
       | outlines, identify missing information and think of more angles
       | and options. Then at the end, switch on o1 which can use all that
       | context.
        
       | sklargh wrote:
       | This echoes my experience. I often use ChatGPT to help with D&D
       | module design and I found that O1 did best when I told it exactly
       | what k required, dumped in a large amount of info and did not
       | expect to use it to iterate multiple times.
        
       | keizo wrote:
       | I made a tool for manually collecting context. I use it when
       | copying and pasting multiple files is cumbersome:
       | https://pypi.org/project/ggrab/
        
         | franze wrote:
         | i creates thisismy.franzai.com for the same reason
        
       | miltonlost wrote:
       | oh god using an LLM for medical advice? and maybe getting 3/5
       | right? Barely above a coin flip.
       | 
       | And that Warning section? "Do not be wrong. Give the correct
       | names." That this is necessary to include is an idiotic product
       | "choice" since its non-inclusion implies the bot is able to be
       | wrong and give wrong names. This is not engineering.
        
         | isoprophlex wrote:
         | Not if you're selecting out of 10s or 100s of possible
         | diagnoses
        
           | miltonlost wrote:
           | ?????? What?
           | 
           | > Just for fun, I started asking o1 in parallel. It's usually
           | shockingly close to the right answer -- maybe 3/5 times. More
           | useful for medical professionals -- it almost always provides
           | an extremely accurate differential diagnosis.
           | 
           | THIS IS DANGEROUS TO TELL PEOPLE TO DO. OpenAI is not a
           | medical professional. Stop using chatbots for medical
           | diagnoses. 60% is not almost always extremely accurate. This
           | whole post, because of this bullet point, shows the author
           | doesn't actually know the limitations of the product they're
           | using and instead passing along misinformation.
           | 
           | Go to a doctor, not your chatbot.
        
             | simonw wrote:
             | I honestly think trusting exclusively your own doctor is a
             | dangerous thing to do as well. Doctors are not infallible.
             | 
             | It's worth putting in some extra effort yourself, which may
             | include consulting with LLMs provided you don't trust those
             | blindly and are sensible about how you incorporate hints
             | they give you into your own research.
             | 
             | Nobody is as invested in your own health as you are.
        
           | PollardsRho wrote:
           | It's hard to characterize the entropy of the distribution of
           | potential diseases given a presentation: even if there are in
           | theory many potential diagnoses, in practice a few will be a
           | lot more common.
           | 
           | It doesn't really matter how much better the model is than
           | random chance on a sample size of 5, though. There's a reason
           | medicine is so heavily licensed: people die when they get
           | uninformed advice. Asking o1 if you have skin cancer is
           | gambling with your life.
           | 
           | That's not to say AI can't be useful in medicine: everyone
           | doesn't have a dermatologist friend, after all, and I'm sure
           | for many underserved people basic advice is better than
           | nothing. Tools could make the current medical system more
           | efficient. But you would need to do so much more work than
           | whatever this post did to ascertain whether that would do
           | more good than harm. Can o1 properly direct people to a
           | medical expert if there's a potentially urgent problem that
           | can't be ruled out? Can it effectively disclaim its own
           | advice when asked about something it doesn't know about, the
           | way human doctors refer to specialists?
        
       | isoprophlex wrote:
       | People agreeing and disagreeing about the central thesis of the
       | article, which is fine because i enjoy the discussion...
       | 
       | no matter where you stand in the specific o1/o3 discussion the
       | concept of "question entropy" is very enlightening.
       | 
       | what is the question of theoretical minimum complexity that still
       | solves your question adequately? or for a specific model, are its
       | users capable of supplying the minimum required intellectual
       | complexity the model needs?
       | 
       | Would be interesting to quantify these two and see if our models
       | are close to converging on certain task domains.
        
       | iovrthoughtthis wrote:
       | this is hilarious
        
       | irthomasthomas wrote:
       | Can you provide prompt/response pairs? I'd like to test how other
       | models perform using the same technique.
        
       | swalsh wrote:
       | Work with chat bots like a junior dev, work with o1 like a senior
       | dev.
        
       | timewizard wrote:
       | > To justify the $200/mo price tag, it just has to provide 1-2
       | Engineer hours a month
       | 
       | > Give a ton of context. Whatever you think I mean by a "ton" --
       | 10x that.
       | 
       | One step forward. Two steps back.
        
       | martythemaniak wrote:
       | One thing I'd like to experiment with is "prompt to service". I
       | want to take an existing microservice of about 3-5kloc and see if
       | I can write a prompt to get o1 to generate the entire service,
       | proper structure, all files, all tests, compiles and passes etc.
       | o1 certainly has the context window to do this at 200k input and
       | 100k output - code is ~10 tokens per line of code, so you'd need
       | like 100k input and 50k output tokens.
       | 
       | My approach would be:
       | 
       | - take an exemplar service, dump it in the context
       | 
       | - provide examples explaining specific things in the exemplar
       | service
       | 
       | - write a detailed formal spec
       | 
       | - ask for the output in JSON to simplify writing the code -
       | [{"filename":"./src/index.php", "contents":"<?php...."}]
       | 
       | The first try would inevitably fail, so I'd provide errors and
       | feedback, and ask for new code (ie complete service, not diffs or
       | explanations), plus have o1 update and rewrite the spec based on
       | my feedback and errors.
       | 
       | Curious if anyone's tried something like this.
        
       | patrickhogan1 wrote:
       | The buggy nature of o1 in ChatGPT is what prevents me from using
       | it the most.
       | 
       | Waiting is one thing, but waiting to return to a prompt that
       | never completes is frustrating. It's the same frustration you get
       | from a long running 'make/npm/brew/pip' command that errors out
       | right as it's about to finish.
       | 
       | One pattern that's been effective is
       | 
       | 1. Use Claude Developer Prompt Generator to create a prompt for
       | what I want.
       | 
       | 2. Run the prompt on o1 pro mode
        
       | inciampati wrote:
       | o1 appears to not be able to see it's own reasoning traces. Or
       | it's own context is potentially being summarized to deal with the
       | cost of giving access to all those chain of thought traces and
       | the chat history. This breaks the computational expressivity or
       | chain of thought, which supports universal (general) reasoning if
       | you have reliable access to the things you've thought, and is
       | threshold circuit (TC0) or bounded parallel pattern matcher when
       | not.
        
       ___________________________________________________________________
       (page generated 2025-01-18 23:01 UTC)