codevoid.de

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
 (HTM) Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
 (HTM)   DeepSeek reasonix, DeepSeek native coding agent with high caching and low cost
       
       
        perseusai wrote 9 hours 13 min ago:
        This is a nice companion to the token saving context app I made. Even
        has the same Claude Design site, which I think looks awesome! Even
        though something is cheap, the concepts that make using Deepseek more
        efficiently can surely be applied elsewhere. Cool stuff!
       
        danborn26 wrote 13 hours 22 min ago:
        The caching strategy here looks really solid for keeping API costs
        down. Curious how it handles state invalidation when the agent context
        gets too large though.
       
        ElenaDaibunny wrote 14 hours 49 min ago:
        The caching strategy is doing most of the heavy lifting here cost-wise.
       
        mkrd wrote 16 hours 24 min ago:
        God, I whish there was a code harness I donât have to install a
        JavaScript runtime for
       
        tylerdurden91 wrote 16 hours 58 min ago:
        Given the number of supply chain attacks via npm, maybe the recommended
        approach to use should be pnpm instead of npx.
       
        naaqq wrote 17 hours 40 min ago:
        I don't think it's helpful, you can already get a 99+% cache hit on
        claude code, just change the api settings to deepseek. I would like to
        use a agent built by deepseek itself using deepseek models. Deepseek
        should make their own agent based on their model, just like OpenAI and
        Anthropic.
       
          m00dy wrote 17 hours 37 min ago:
          same here, using claude code on deepseekv4. just burnt 24.1M input
          hit and 170k cache miss.
       
        JSR_FDED wrote 17 hours 55 min ago:
        Maybe the first problem this tool can tackle is creating a better web
        page? Content continually shifting, super annoying.
       
        tw1984 wrote 19 hours 15 min ago:
        deepseek is building an official coding harness, why would anyone waste
        time on such 3rd party toy when official one is coming probably in
        weeks?
       
        edg5000 wrote 19 hours 30 min ago:
        Side note: In DeepSeek API docs they mention that coding clients
        automatically are assigned the highest thinking effort, despite any
        settings. This is what I suspected when using OpenCode with V4; it
        keeps reasoning in very long cycles, this felt like a flaw in the
        model. May just be a weird API thing.
        
        Overall I find their API design and docs so messy. It's a shame, since
        it's the main entrypoint to using their service.
       
        yanhangyhy wrote 20 hours 31 min ago:
        In the open-source contributors section, when you see a lot of anime or
        cartoon avatars, you know most of the devs are Chinese.
       
        treexs wrote 1 day ago:
        codex generated sites are so easy to spot lmao
       
        cloudengineer94 wrote 1 day ago:
        Quite interesting being Terminal based and the AI skills staying within
        a file of it's own.
        
        Will give a go and see how cache behaves
       
        trollbridge wrote 1 day ago:
        Well folks here we have it: DeepSeekâs brand is now strong enough
        people want to jump on their brand recognition.
       
        nikolay wrote 1 day ago:
        This is not an agent by DeepSeek, so the title is misleading.
       
        jedisct1 wrote 1 day ago:
        It's probably good, and the best for Deepseek models, but do we really
        need one harness per model?
       
        mark_l_watson wrote 1 day ago:
        I tried it and the text input area was black with a dark font. I
        checked the documentation, and asked DeepSeek v4, Claude, and Gemini
        for help with the fonts/style and nothing works except to run in a
        terminal with a dark theme. Crazy. None of the devs on the project use
        a light theme?
       
          miav wrote 1 day ago:
          I agree that this is an issue, but..
          no, they probably donât. Light themes are very rarely used.
       
            jofzar wrote 22 hours 15 min ago:
            I understand why, but I didn't even think of light themed terminals
            till now..
            .
       
        jbellis wrote 1 day ago:
        As someone who has been writing harnesses for a year: the people at
        opencode etc aren't stupid, when they decide to break the prefix cache
        [usually partially] it's always because they've tested it and it gives
        better results overall.
        
        If you think that dsv4 behaves differently enough from the aggregate of
        other models, submit a PR with a patch to special case that to your
        harness of choice with evidence. Just blindly assuming "append only all
        the time because cache" is a waste of everyone's time.
       
          schaefer wrote 11 hours 16 min ago:
          > As someone who has been writing harnesses for a yearâ¦
          
          Your agent harness, brokk, looks great.  Iâm going to try it this
          morning.
       
          phrotoma wrote 13 hours 26 min ago:
          Is "harness" in this context ~= "agent"?
       
            furyofantares wrote 4 hours 38 min ago:
            I think agent = harness + model.
       
            abustamam wrote 11 hours 45 min ago:
            I've understood harness to be the software that runs the agent
            (open code, pi, Claude code)
       
          anon373839 wrote 19 hours 8 min ago:
          Are there any learning resources you'd recommend on writing
          harnesses? I'm interested in doing a non-coding one, but not really
          sure where to start.
       
            sams99 wrote 2 hours 1 min ago:
            My agent wrote a pile of very interesting articles at wasnotwas.com
            I have been a bit quiet there for a bit, but it covers lots of
            areas that are very interesting to harness builders (albeit less
            interesting to the general public)
       
            jbellis wrote 15 hours 54 min ago:
            Generically, I would say, just start building it and ask your
            favorite coding agent for advice when you get stuck. This is the
            first technology that can teach you how to use it! (But do ask a
            model with a recent knowledge cutoff, i.e. not gemini.)
       
        stiray wrote 1 day ago:
        If only author would understand, that some people want single, self
        sustained binary that doesnt take half of computer memory and would
        rather write it in rust or golang.
       
          Defenestresque wrote 1 day ago:
          github.com/charmracelet/crush
          
          The company that had that acrimonious split from OpenCode. Still,
          fully written in Go and compared to node-based harnesses, uses 1/5th
          the RAM. (At least for me.)
          
          Works with any provider (including OpenRouter free ones).
          
          No conflict of interest here, just a happy "customer" of this
          excellent resource.
       
            _joel wrote 8 hours 13 min ago:
            it's a 404
       
              mikodin wrote 7 hours 59 min ago:
               [1] (Haven't used it, but also hit the 404 and wanted to see it)
              
 (HTM)        [1]: https://github.com/charmbracelet/crush
       
          wg0 wrote 1 day ago:
          Can someone explain that was use of AI (and all the claims) that a
          coding agent cannot be written in plain go for example? Given there
          are tons of good terminal libraries for golang?
       
            xlii wrote 1 day ago:
            It can be written in Golang but interaction libraries are very
            limited and with sharp edges.
            
            There's Google's genkit, charmbracelet's fantasy and LangChainGo.
            Each has ugly hacks and omissions. Then handling slice streaming of
            data into Elm architecture (bubbletea) is also complex.
            
            So in theory nothing stand against but in practice one has to get
            quite low to the ground to get anything done.
            
            Also: Golang agent exist! It's called crush and is developed by
            charmbracelet people. It's so-so though I prefer Pi myself.
       
          crystal_revenge wrote 1 day ago:
          If this is what you want, especially in the age of coding agents, why
          not just build it yourself?
       
          Xeoncross wrote 1 day ago:
          I'm really happy to see a lot of new software come out in Rust, Go,
          or Zig.
          
          The value and ease of development that slow interpreted languages
          used to offer is disappearing. New languages have all the nice things
          built in, or rather, our 1am pager alarms are starting to make us
          mad.
       
          zozbot234 wrote 1 day ago:
          If you want to try a single self-contained binary that does take half
          of your computer memory or more, there's always ds4-agent.
       
          pancsta wrote 1 day ago:
          Having a coding bot but skimming on coding? That should tell us
          something.
       
        sunaookami wrote 1 day ago:
        Another day, another vibeslopped "product" on the front page of hacker
        news with over 200 points. When will you guys learn?
       
        adi_kurian wrote 1 day ago:
        There is an uncanny valley effect to websites where FE is created in
        full via an AI.
        
        These sites have the immediate scent of 'high design', with errors that
        no 'high designer' would dare make.
        
        The italics give me nausea. 
        Text promoted with orange fill is seemingly random. 
        There is no thought behind the combination of art and copy. 
        Random smattering of Title Case and Sentence case and lower case. 
        A lack of commitment to a full stop 
        Widowed H1s.
        H1s with random spaces .
        
        At the same time, if I hammer CMD - to 25%, it looks fancy.
        Perhaps nobody gives a fuck.
        
        That said, I'm excited to try this tool!
       
        arikrahman wrote 1 day ago:
        Saw nix suffixed and was excited a new dotfiles was about to hit the
        market.
       
        carterschonwald wrote 1 day ago:
        i cant find anything substantiated in the code that actually
        differentiates it from any other harness.
        
        my fork of oh my pi that i have a lot of experiments in, is lterally
        designed to only work well with models that have decent reasoning
        levels, like deep seek models. check it out! [1] â thats the install
        script for after clone
        
        fair warning: tis my dog food test bed as i build even fancier stuff
        
 (HTM)  [1]: https://github.com/cartazio/oh-punkin-pi/blob/main/scripts/bui...
       
        agrippanux wrote 1 day ago:
        This website seems to have been generated by Codex - I asked Codex to
        create an HTML overview of a feature for my team and it made an overly
        produced monstrosity - complete with the same large stat boxes that
        were for the most part devoid of meaningful information - using the
        same font, colors, layout, hero section, etc.  It was also terrible on
        mobile just like this is.
        
        In the end I had Claude produce a one-page html file that was 95% of
        the way there and it took minor editing to clearly explain the intent
        of the feature.
       
          easygenes wrote 17 hours 19 min ago:
          Claude Opus 4.7 defaults to exactly this design language for a lot of
          "just make me a rich html presentation page" requests without further
          specification.
       
          ritonlajoie wrote 23 hours 9 min ago:
          strange, I got the same design with claude design, same fonts, same
          title designs with the strange character etc...
       
          port11 wrote 1 day ago:
          A lot of LLM-driven design now looks like this. I donât understand
          how people donât find ugly the pairings with an heavily italicised
          serif. You also canât read much of the page on mobile, because the
          code example keeps shifting the content around.
          
          Now, that is overly critical, Iâm sure their heart is in the right
          place. But a simpler website would do :)
       
            gizajob wrote 16 hours 52 min ago:
            Yeah such amazing tech used to produce a tediously unreadable
            website with great flair.
       
            krm01 wrote 1 day ago:
            Itâs sad to see companies not spending a bit more on design.
            Sure, ai will help you get something decent out fast. But thereâs
            a threshold where design becomes an indicator of trust. Especially
            for b2b software that tailor to large corps. Good design,
            character, adds directly to the bottom line.
       
              schaefer wrote 11 hours 37 min ago:
              > Itâs sad to see companiesâ¦
              
              The article is about an open source agent harness, Reasonix, that
              is built to leverage the DeepSeek native api.
              
              Thereâs no company here.  No design budget.  These people are
              graciously sharing a project they made in their free time.
       
                port11 wrote 2 hours 53 min ago:
                I agree. I didnât mean to be too critical. But if theyâd
                made something simpler, I think it would save them tokens and
                end up more likely to convince their target audience of
                developers.
                
                (The series of âmotherfucking websitesâ comes to mind, they
                were all very readable and simple, even if satire.)
       
                darkmatriarch wrote 7 hours 44 min ago:
                You're right, but I find as a solo engineer it's still
                important to check the frontends I create on mobile
       
          locknitpicker wrote 1 day ago:
          > In the end I had Claude produce a one-page html file that was 95%
          of the way there and it took minor editing to clearly explain the
          intent of the feature.
          
          That doesn't say much about any model though. For starters, any
          software engineer can tell you that leaving out features can
          drastically simplify any project.
       
        mmarcant wrote 1 day ago:
        "byte-stable prefix cache" -- give us your codebase in a way that's
        even EASIER for us to train on.
       
        wg0 wrote 1 day ago:
        Performance is horrible when you type but caching is magical.
        
        Extremely pro consumer tool. I have been hammering it hard with 97%
        cache utilization and barely $0.03 dollar spent for me constantly
        exploring a codebase.
       
          snqb wrote 1 day ago:
          Deepseek API caches very efficiently itself. I use it heavily via pi
          agent, and a lot of times I get 99%+ caching for longer sessions.
          
          Have you tried using Deepseek API via other agents? This project tbh
          looks like a S-tier slop
       
            wg0 wrote 1 day ago:
            I have used it with OpenCode and was good enough.
       
        nextaccountic wrote 1 day ago:
        > Tool-call repair
        
        > Tool arguments the model produces occasionally have JSON typos,
        unclosed quotes, or shape mismatches. Reasonix runs a schema-aware
        repair pass before dispatch so malformed args still execute.
        
        So Deepseek API doesn't have a structured output option where you give
        a grammar and the model promises the output will follow this grammar?
        
        Or it does, but it's buggy?
       
        WhereIsTheTruth wrote 1 day ago:
        Y'all should not be writing js/ts/slop/npm based clis anymore
        
        It's the agentic era, pick a better option
        
        Just stop
       
          fHr wrote 1 day ago:
          yep codex opensource rust cli clears this night and day long
       
          Alifatisk wrote 1 day ago:
          Whats that option?
       
        singingtoday wrote 1 day ago:
        That site does not render correctly on my android. Lots of text on the
        right breaking the reactive layout.
       
        storus wrote 1 day ago:
        Can it instruct DeepSeek during an LLM call to start removing old tool
        calls from the context instead of waiting for the LLM call to finish if
        the context size approaches DeepSeek's dumb zone? Claude Code can't do
        that, /compact can only happen after the LLM call; it's often
        preferable to start cleaning up context during an LLM call, especially
        when tool calls are huge like reading markdown files;
        implementation-wise all that is needed is to start removing earliest 
        ...  and replacing them just with some log entry stating this tool call
        was already performed, then re-running KV cache prefill (so the
        "online" compaction would get 0.5s latency hit every time it's
        performed). That way one can read 1000 files in one LLM call.
       
        m101 wrote 1 day ago:
        For those of you that use deepseek v4 occasionally, what harness do you
        use it with? Iâm only familiar with claude code and codex.
        
        Any comments on what you can or cannot rely on it for relative to cc
        and codex would be appreciated too!
       
          eikenberry wrote 1 day ago:
          Maybe check out Goose. It is the standard agent harness being
          developed by The Linux Foundation under the AAIF. Under active
          development and the implementation seems to have a good leg up on the
          other popular agents. [1]
          
 (HTM)    [1]: https://github.com/aaif-goose/goose
 (HTM)    [2]: https://goose-docs.ai/
       
            nsonha wrote 1 day ago:
            I see their name mentiod everywere along with Aider, presumably for
            being among the first agents, but I've never met anyone that
            actually uses them.
       
          droidjj wrote 1 day ago:
          Check out pi.dev. OpenCode is a nice batteries-included Claude Code
          replacement, but Iâm in love with the extensibility of Pi.
       
            chuckadams wrote 1 day ago:
            Any Pi extensions you'd specifically recommend?  I'm just starting
            out with Pi, but I've had mixed results with extensions.  I'm using
            Pi with gemma4 26b locally, so anything that's friendly to small
            local models would be appreciated.  I think the only extension I'm
            using right now is pi-total-recall.
       
              gck1 wrote 1 day ago:
              I think pi wants you to write your own extensions, adapted to
              your meeds.
              
              I haven't had a need for any extensions though. Maybe subagents,
              but I solved that with tmux. For all the rest, I just use
              "skills".
       
        ankitwarbhe wrote 1 day ago:
        you created it yourself ?
       
          Alifatisk wrote 1 day ago:
          No.
       
        andai wrote 1 day ago:
        But Claude made the website?
       
          Alifatisk wrote 1 day ago:
          What conclusion are you drawing from that?
       
            andai wrote 1 day ago:
            If Deepseek can't even make a static site, why would I want to use
            it for anything else? (Not saying it can't, just that it's a weird
            choice to present your Deepseek-oriented product.)
       
              Alifatisk wrote 1 day ago:
              I see your point, but as we know, devs from Google and OpenAI
              regularly use Claude Code because of its edge in frontend. I
              think using another model to build your own thing is a pragmatic
              engineering decision, not a sign of failure.
       
        Hfuffzehn wrote 1 day ago:
        This is really tickling the conspiracy theorist part of my brain.
        
        "Independent open-source project Â· not affiliated with DeepSeek"
        "Reasonix only targets DeepSeek because..."
        "Why DeepSeek only? Can I swap to Claude / GPT? It's a design choice,
        not a limitation"
        
        The lady doth protest too much, methinks?
        
        Nicely timed shortly after the making the rebate permanent anouncement.
        
        Could just be Chinese devs trying to help western devs with some
        software and a western facing marketing campaign to raise awareness.
        Could be DeepSeek astroturfing.
        Could be "someone" in China trying to get more access to western data.
        
        Who knows?
       
        danborn26 wrote 1 day ago:
        High caching rates for coding agents can drastically reduce latency and
        API costs. I am curious to see how the caching strategy handles context
        invalidation across multiple files.
       
        fouric wrote 1 day ago:
        I don't think it's particularly effective to create a new coding agent
        when there's existing open-source agents (especially extremely
        extensible ones like Pi) that already optimize for cache hits, have far
        larger communities, and work for providers other than Deepseek.
        
        I specifically use multiple different models and providers, so this
        wouldn't be useful for me.
        
        And it contributes to the problem of each person vibe-coding their own,
        incompatible, half-baked tool in a space, instead of contributing to a
        small set of tools and expanding them.
        
        It'd be better to just extend an existing tool.
       
        hmokiguess wrote 1 day ago:
        Click on the download page, it's hilarious. It has a lot of information
        about the "smart probe" on the download and it's a realtime probe you
        can rerun.
        
        That's the pinnacle of AI slop over engineered garbage in my opinion.
        All of that information is noise.
       
        ricardobeat wrote 1 day ago:
        > The loop is append-only, engineered around DeepSeek's byte-stable
        prefix cache â long sessions hold 90%+ cache hit and input-token cost
        collapses to ~1/5. Terminal-first, leave it running.
        
        AI marketing slop. This is how all models and coding harnesses work,
        isn't it?
        
        The author claims (in another AI-written post):
        
        > LangChain â along with every generic agent framework I checked â
        rebuilds the prompt every turn. Timestamps get injected. History gets
        reordered. Tool schemas re-serialize with different whitespace.
        
        I haven't touched LangChain in a long, long time, but don't think any
        of the current harnesses, Claude Code, Pi, Crush, OpenCode etc do that
        except if you change configuration?  Keeping the context stable for
        caching is a very basic principle and not a wild innovation.
        
        This posing as DeepSeek-specific is also a mystery.
       
        am17an wrote 1 day ago:
        This Claude front end skill is now soon to be slop.
       
          auggierose wrote 1 day ago:
          Oh, I was wondering why all new websites look shitty in the same way.
       
            aratahikaru5 wrote 9 hours 19 min ago:
            Not a maintainer, but I've fixed some of the really jarring issues
            on desktop (mobile needs a complete overhaul though). IMO It's not
            that bad, and it gets the job done.
            
            Any feedback on how to make it less "shitty"? I feel like doing
            some vibe coding tonight.
       
          ricardobeat wrote 1 day ago:
          Already is. Every new website looks exactly the same.
       
        imagetic wrote 1 day ago:
        
        
 (HTM)  [1]: https://shittycodingagent.ai
       
          peheje wrote 1 day ago:
          having issues with truncated output from deepseek v4 pro through
          openrouter via pi-harness on ptyxis-terminal using ubuntu
          
          trying reasonix with direct api..
       
            peheje wrote 1 day ago:
            first impression: the tui flickers a lot, unpleasent. very laggy to
            write in.
       
          mi_lk wrote 1 day ago:
          Not sure about the story but it would be funny if pi folks actually
          own this domain.
       
            chuckadams wrote 1 day ago:
            They do.  That's Pi's old name.
       
          chabes wrote 1 day ago:
          Aka pi.dev
       
        pkulak wrote 1 day ago:
        Doesn't Pi Agent do exactly this? Assuming "append only" means they do
        some kind of compaction as well.
       
        yalogin wrote 1 day ago:
        Can someone give me a eli5 version of what this is? It really sounds
        useful to Claude subscribers.
        
        Is this improving the cache hit and hence overall efficiency of coding
        workflows?
        
        Does it also let me host a local llm (deepseek)? What are model min
        requirements for this?
       
          timcobb wrote 1 day ago:
          You can also ask Claude and get an immediate answer, the power is
          yours
       
            Salgat wrote 1 day ago:
            Certainly you realize that these comments exist for more than a
            single person right? You expect potentially hundreds of viewers to
            each burn through AI tokens instead of just getting a direct and
            relevant answer here? This has the same vibe as the old forum posts
            where the only response was a "google it".
       
        quotemstr wrote 1 day ago:
        > no reordering, no marker-based compaction
        
        Is this really the behavior you want? Yes, doing tool-result clearing
        and such will blow your cache, but if you do it only occasionally, it's
        still likely a win. Yes, cache hits are good, but not so good that it's
        okay to be profligate with context to preserve those precious, precious
        KVs.
       
        hebetude wrote 1 day ago:
        Wow the UI looks exactly what I vibe coded yesterday. What a
        coincidence
       
          huqedato wrote 1 day ago:
          It's obvious why...
       
        singiamtel wrote 1 day ago:
        I would've liked benchmarks against other harnesses showing the caching
        performance
       
          Havoc wrote 1 day ago:
          Just checked the stats on my opencode/DS usage...looks like 70%ish
          hit rate.
          
          Pretty shaky datapoint though...don't use it as primary model
       
          Alifatisk wrote 1 day ago:
          Is there benchmarks and measurements that offers comparisons between
          different harnesses?
       
        unshavedyak wrote 1 day ago:
        It's pretty funny, i'm a $200/m Claude subscriber and i've had little
        need to use anything else. However the more Claude has been restricting
        my workflow (notably around the recent IDE/-p usage change) the more
        i've been wanting to go elsehwere.
        
        I'm concerned since i really want SOTA reasoning, but DeepSeek still
        has me interested.
       
          gck1 wrote 1 day ago:
          I gave a fairly complex reverse engineering task to DS-4 xhigh and
          GPT-5.5 xhigh today.
          
          After about 6 hours, both ultimately failed to fully RE, however,
          there were some drastic differences:
          
          DS stopped every 30 minutes or so, saying it did full RE and it
          should all work now, while in fact, it didn't complete even 1% of it.
          It also looked for shortcuts again and again, despite me prompting
          heavily that the specific shortcut may not be used. It was a complete
          and utter failure.
          
          GPT-5.5, on the other hand, blew me away. It just did the right
          things, didn't jump to next steps until it was sure it completed the
          initial layers and had a full understanding of what's required. The
          only time I prompted it during the 6 hours was when I saw it going in
          the right direction and I could nudge it slightly towards an even
          better way. I never felt I was fighting it. Okay, maybe a little bit
          - after compaction, it sometimes would go on a "no I'm not helping
          you with reverse engineering" tangent, but it would resolve in a
          clean session.
          
          I cancelled my Claude subscription a month ago, so I haven't tested
          that, but DeepSeek has reminded me a lot of how I worked with Opus
          4.6/4.7. Which perhaps could be a positive sign to some, but GPT-5.5
          showed me that the way claude/ds work is just way too annoying.
       
            Aurornis wrote 1 day ago:
            > DS stopped every 30 minutes or so, saying it did full RE and it
            should all work now, while in fact, it didn't complete even 1% of
            it. It also looked for shortcuts again and again, despite me
            prompting heavily that the specific shortcut may not be used. It
            was a complete and utter failure.
            
            This is my experience with non-SOTA models across the board. When
            you try them on little tasks and they work it feels amazing, but
            then you go deeper and you're back to going in loops and fighting
            the model for hours.
            
            Switching back to a SOTA model immediately yields progress again.
            
            When I read all of the comments from people saying they can't tell
            a difference between Opus and  I don't know if they haven't really
            used it much yet, or if they're just not doing anything
            complicated.
       
              am17an wrote 12 hours 58 min ago:
              Did you read the OP when he's exactly chiding the model you're
              glazing?
       
                Aurornis wrote 9 hours 54 min ago:
                Did you intentionally miss the point of my comment? Substitute
                Opus for GPT-5.5 if you will. I use both as well as locally
                hosted models using some of your branches, even.
       
                  am17an wrote 7 hours 58 min ago:
                  Fair enough. I agree with you - although DS4 Pro is a GPT 5
                  class model which scores 46% on ARC-AGI-2[^1]. It's behind by
                  maybe 9 months, I think it's still good enough for a lot of
                  complex tasks as well. They definitely need to work on a
                  "just fucking works" harness like CC/Codex. Also thanks!
                  
                  [^1]
                  
 (HTM)            [1]: https://www.nist.gov/news-events/news/2026/05/caisi-...
       
            cmrdporcupine wrote 1 day ago:
            The GPT models are heavily biased to a more incremental, empirical,
            evidence based approach. Sometimes to a fault. I prefer them for
            this reason, but it requires coaxing or strategic use of /goal to
            break it out if its highly staged, one piece at a time, approach..
            if you don't like it.
            
            I suspect for people doing more... website ... type development,
            the more "yeet this into existence" style of Opus feels preferable.
            
            With Claude I was constantly jamming my finger on the escape key
            "wait, you did what?! based on what proof?!"
       
              beering wrote 1 day ago:
              You make it sound as if Codex is for people who know what they
              want and Claude Code is for people who donât know what
              theyâre doing.
       
                cmrdporcupine wrote 1 day ago:
                I was trying to not sound that biased, but ok ;-)
       
            ttul wrote 1 day ago:
            What youâre experiencing is the difference in model intelligence.
            Most models can seem pretty good at simple stuff over short time
            horizons. Complex work requires that more intelligence be stuffed
            into those trillion-dimensional spaces.
       
          KronisLV wrote 1 day ago:
          > i've been wanting to go elsehwere.
          
          There's always the option of using Anthropic's models for some tasks
          like planning and then just hand over the implementation task to
          something like DeepSeek. Across different tools, a Markdown plan
          works pretty okay. That's what I'm planning to do if I go from the 5x
          Max subscription down to the Pro.
          
          I am also writing a launcher that makes using 3rd party providers
          with Claude Code easy ( [1] ) and I already have a local proxy up and
          running, just not dynamic model switching yet. Though it shouldn't be
          too hard to add, will probably be there within a week or two,
          depending on my schedule.
          
          I don't think it's wise to leave Anthropic altogether because their
          models are great (and a subscription gives you features like Remote
          Control which I like), but switching tiers and maybe saving a bit of
          money seems viable! On the other hand, you do need a quality
          baseline, because I remember using Cerebras with GLM 4.6 way back and
          there was a bit too much slop.
          
 (HTM)    [1]: https://ccode.kronis.dev
       
          0xbadcafebee wrote 1 day ago:
          You should definitely stick to the $200 plan, and not try the $10
          coding plans with open weight models and higher limits. Anthropic
          needs your money to stay solvent, and you'll sleep better knowing
          you're using SOTA.
       
            constantius wrote 1 day ago:
            The world would be better long-term if we chise tonfund open models
            instead however.
            
            If you think short-term and only about yourself, paying for SOTA
            regardless of how many military contracts the lab has is the best
            thing, but paying for open models is both better ethically, and for
            a future where AI belongs to everyone and not just to Altman et al.
       
            port11 wrote 1 day ago:
            (Zero reason to defend Anthropic.)
            
            Iâve gone that route. I really wanted to stop using Claude, but
            Deepseek v4 Pro and Kimi 2.6 didnât do the job. For a lot of
            coding tasks or well-specced plans, maybeâ¦ but then thatâs a
            plan made by Opus anyway.
            
            Even Sonnet is sometimes not worth the trouble. Opus is very
            thorough and reviews its own mistakes quite well. Catches a lot of
            edge cases.
            
            Iâm not saying we shouldnât try other things â I did! â,
            but itâs more or less okay that people just like Claude Code
            subscriptions? The back and forth I had with Kimi on a small
            feature came out to ~1.8â¬, which is 10% of my Claude subscription
            each month. And that was a single session. CC with Serena uses
            tokens fairly well.
       
              bazhand wrote 1 day ago:
              /advisor is like the old /opusplan mode but for running tasks not
              just pre-planning. It can work nicely with Sonnet as the main
              agent and escalates to Opus as needed.
       
                port11 wrote 2 hours 57 min ago:
                Advisor-mode has been very helpful indeed, I can now plan with
                Opus, have Haiku code, and escalate back to Opus for review.
                Itâs a decent flow for Pro subscribers trying to max their
                usage. But as Iâve said above, sometimes itâs not worth it:
                Sonnet and Haiku can produce stuff thatâs not worth
                reviewing.
       
          Alifatisk wrote 1 day ago:
          > I'm concerned since i really want SOTA reasoning
          
          I think you should give other models a try and see how much they
          differ from SOTA models. I did this and realized, even Qwen-2.5-Max
          was enough. I am sure even Claude Sonnet 3.5 is enough for things I
          play around with. I am not really striving for fields medal in
          Mathematics.
       
            unshavedyak wrote 1 day ago:
            That's fair, neither am i - i do tend to work in large, complex,
            full of legacy decision based codebases. Eg i have access to Sonnet
            (of course), but i choose to solely work in Opus because i find its
            output reads better, analyzes better, etc.
            
            The "cost" is dumb models is just so high for me. Eg every bad
            decision they make increases my frustration quite a bit. Despite
            putting a lot of effort into my workflow to help reduce the number
            of decisions they make, they always will. So my hedge is always
            against that.. trying to reduce how insane they can be heh.
       
          logicchains wrote 1 day ago:
          If you want SOTA reasoning you should be using GPT 5.5 Pro.
       
            unshavedyak wrote 1 day ago:
            This is fair, but i've found the different models to have different
            moods and require different interactions to get them to stick to
            just the specific edits i ask for, etc.
            
            I used to surf the three big players frequently and got really
            tired of the effort needed to steer some models. In the end i ended
            up sticking with Claude because it required less steering effort.
            While not strictly reasoning, a models ability to follow clear
            directions consistently is something i'd consider part of its SOTA
            capabilities.
            
            Eventually i just tired of exploring. I just want stability.
            
            Which ironically is why i'm thinking about moving from Claude. The
            very basic IDE/-p usage getting removed from my plan is a UX
            stability issue. I'm trying to progressively improve my workflows
            and efficiency, not have to establish a new foundation anytime
            something shifts. Quite frustrating.
       
            auggierose wrote 1 day ago:
            Codex has only GPT 5.5
       
        mmaunder wrote 1 day ago:
        Unusable thanks to the top animation pushing the rest of the site down
        repeatedly as youâre trying to read.
       
          busymom0 wrote 1 day ago:
          The layout of the entire page is horrendous on mobile too. Looks like
          a huge wide site where content is only in a tiny column on left side.
       
        schaefer wrote 1 day ago:
        Okay, I'm curious.
        
        From the FAQ, I see:
        
        >Can I point it at a self-hosted / private DeepSeek endpoint?
        
        >Yes. Since 0.30 we accept non-standard key prefixes for self-hosted
        DeepSeek endpoints. Just point `baseUrl` at your internal address â
        the loop, cache strategy, and tool protocol are unchanged.
        
        But my question is: 
        If I use Reasonix to talk to a deepseek endpoint through openrouter, am
        I still getting the cache-hit benifits of this agent harness?
       
          thomasfromcdnjs wrote 23 hours 21 min ago:
          I would wonder that too, I'm only a novice openrouter user, but I do
          notice it reroutes my same-model requests to different providers.
          
          Maybe users reporting otherwise are just looking at their client
          reports which wouldn't be able to tell the difference.
       
            Lapel2742 wrote 20 hours 21 min ago:
            Look into Openrouter's provider routing.
       
          csunoser wrote 1 day ago:
          Yes*. At least from my limited usage of deepseek-flash for a few
          billion tokens on openrouter, the cache-hit rate is >95%. And I
          simply used the claude code harness pointed at the openrouter
          anthropic compatible endpoint with no fluff.
       
            port11 wrote 1 day ago:
            Did you get proper tool use? Some CC-driven models seem to get a
            bit off when it comes to MCP usage. For example: I really struggled
            to get Kimi to use Serena, which I think ended up costing too many
            tokens.
       
            schaefer wrote 1 day ago:
            thank you!
       
        sergiotapia wrote 1 day ago:
        What AI model did you use for the website design? This is the second
        one I see with the exact same font and color scheme. Just curious
        because Claude models lean towards purples for example. Thank you!
       
          FergusArgyll wrote 1 day ago:
          Frontend design skill by Anthropic specifically says not to use
          purple. I'd be surprised if it still uses purple. Have you seen that
          recently?
       
          pcwelder wrote 1 day ago:
          Opus 4.7 selects such palette and motifs by default. Might even be
          first iteration of claude design.
       
          sheepscreek wrote 1 day ago:
          DeepSeek v4 perhaps?
       
          franga2000 wrote 1 day ago:
          This design still screams Claude to me, but a newer version than what
          you're thinking of. At some point they added a markdown file that
          tells it to use obviously AI designs like lots of blue/purple and
          gradients. Since then, this is its new style.
       
        declan_roberts wrote 1 day ago:
        I love the focus on cache hit efficiency. Hats off to the deekseek team
        for creating a great product that maximizes cost efficiency for the
        user.
       
          nicce wrote 1 day ago:
          Just in case, note that this project is someone's side project
          
          > Independent open-source project Â· not affiliated with DeepSeek
       
          Bombthecat wrote 1 day ago:
          Adding already cheap API cost and you probably could let it run for
          days and the same task..
       
          stavros wrote 1 day ago:
          How can you have cache hit efficiency? Isn't it just a matter of not
          changing the previous context? I don't understand what knobs there
          are to tweak on this.
       
            everforward wrote 1 day ago:
            > Isn't it just a matter of not changing the previous context?
            
            Yes, but a lot of harnesses change previous context.  E.g. the
            system prompt injects the current time/date, working directory,
            files in the working directory, etc.  Compaction also changes the
            whole previous context.  I _think_ changing the list of tools also
            invalidates cache, so invoking a subagent with different tools
            would invalidate the cache.
            
            My vague impression is that it's in a similar vein to functional
            programming languages.    It generally disallows doing things that
            lead to bugs (cache misses in this case), and presumably allows you
            to do those things in a way that makes it much clearer that this is
            likely to cause cache misses.  I would guess that in this paradigm,
            you don't mutate your existing session, you derive a new session by
            mutating the prior context into a new context.
       
              chillfox wrote 1 day ago:
              changing between plan/build mode in some agents will change the
              tools list, which breaks the cache.
       
                brookst wrote 1 day ago:
                Cache is always there, itâs just that it only caches up to
                the point where an input token changes. So if the tools list is
                early in the prompt, changing it would limit cache for most of
                the prompt. If the tools list is the last thing, you could
                still get 99% cache hits even if it changes every turn.
       
                  chillfox wrote 23 hours 17 min ago:
                  Depends upon the service and how the harness is built, Some
                  of the services allow for very few cache keys, so you won't
                  necessarily get any cache if you edit recent messages as the
                  cache is not per message, but big blocks of everything up to
                  a cache key.
                  
                  This was actually surprising to me when I learned about it as
                  I have never worked with (or built) any cache working like
                  that before.
       
                  RevEng wrote 1 day ago:
                  After a couple of turns the system prompt is a small part of
                  the context. Not changing the system prompt at all is key so
                  that the rest of the history is itself part of the prefix.
       
          bwfan123 wrote 1 day ago:
          > Hats off to the deekseek team for creating a great product
          
          I have been using it for a while, and I wholeheartedly agree. imo, it
          is as good as codex or claude which I also use. It is a winner in the
          cost-sensitive tier, and if some startup could put it together with
          data-retention in mind, it could be a great product sold to the
          enterprise, as data-retention and privacy are the main issues for the
          coding-assistant usecase.
       
            chillfox wrote 1 day ago:
            Deepseek v4 pro is definitely my preferred cheap model, it's very
            good, and I use it all the time for my personal projects (opencode
            go plan), but I also use Claude Opus all the time at work and
            Deepseek is not as good as that, but it does compete with Sonnet
            for capability, and beats it on price.
       
              spaceman_2020 wrote 18 hours 52 min ago:
              I genuinely donât think you need Opus 4.7/GPT-5.5 tier models
              for 95% of tasks in a normal workplace
              
              People are out there using frontier intelligence to make
              responsive headers and weekly work reports. Absolutely donât
              need the latest and greatest models for this stuff
       
              HDBaseT wrote 1 day ago:
              Deepseek V4 Pro is an amazing model, even without the unreal cost
              factored in.
              
              It is my default model at the moment. I'm not doing anything too
              complex though. I honestly found more expensive models like Qwen
              3.6 to fail in tasks Deepseek nails.
              
              I'm interested in knowing what people are using for tasks which
              require a bit more thinking. Kimi 2.6? Qwen 3.7? GLM 5.1?
       
                Akamant wrote 13 hours 25 min ago:
                17 GoLang microservices for a serious project were written
                perfectly using the latest version of QWEN(3.6). The only areas
                where we really had to work hard were documentation and a very
                serious task breakdown. All of this was tested, and yes, a
                review was required, but everything was within reason. The
                deadline was 10 days of 24/7 work, including the review. When
                attempting to submit the same task, Opus 4.7/4.6 had to be
                stopped after three hours. If you have significant resources
                for experimentation, you can certainly try. For us, the choice
                is absolutely clear at this point.
       
                chillfox wrote 23 hours 46 min ago:
                I don't think there's any open models at the moment that can
                handle the more challenging stuff.
                
                The things that I use Opus for at work is finding bugs in about
                ~200k lines of microservices and libraries in a niche language.
                So, we will get these bug reports that are missing context,
                can't easily be reproduced on our dev server, and are usually
                the result of something deep in multiple services/libraries
                combining with very custom configs. I can ask Opus (max
                thinking) to find what could cause the bug, and it usually
                nails it in a few hours (would take me 1-2 weeks to trace it
                myself). The end result will be like less than 10 lines of code
                to fix it,  some tests to reproduce the bug and a nice report
                explaining it, so it can be checked in an hour or two.
       
              pjerem wrote 1 day ago:
              I have unlimited Claude Opus at work and itâs wonderful. Not
              allozwed to use it for personal use though.
              
              So I use Deepseek Pro on the $20 Ollama Cloud plan and itâs
              really not that far behind and I never triggered the planâs
              limits.
              
              Itâs like 10-15% less powerful but costs 10 times less.
              
              Totally worth it. I prefer Opus because my employer pays for it
              but I would personally never pay 10 times more for it.
       
                chillfox wrote 23 hours 28 min ago:
                Nice,
                
                I have got unlimited Claude Opus at work as well.
                
                I was really having a hard time deciding between the Ollama and
                OpenCode plans for personal use, I couldn't really understand
                how much usage I would get with the Ollama plan, so in the end
                I went with OpenCode and I have never hit the limits despite
                using it most evenings and weekends for several hours.
       
                  abustamam wrote 11 hours 37 min ago:
                  What models do you use in open code? I too have unlimited
                  opus at work and I tried using my same workflow from work
                  using Kimi 2.6 in open code and... It's just not it, even for
                  relatively simple stuff.
                  
                  Maybe I should try DS4p?
       
        theanonymousone wrote 1 day ago:
        Isn't caching a server-side thing? How does the agent affect it,
        significantly at least?
       
          embedding-shape wrote 1 day ago:
          Say you put the current time down to the second in the system prompt,
          which is the message that goes in front of the entire conversation,
          then basically nothing will be cached, every agent turn needs to
          ingest the entire session over and over. Contrast to not doing that,
          and the backend can leverage caching all the way up to the latest
          message, as nothing until then changed.
       
            nawitus wrote 18 hours 6 min ago:
            That's not necessarily true, you can have multiple cache points,
            see e.g.
            
 (HTM)      [1]: https://platform.claude.com/docs/en/build-with-claude/prom...
       
            theanonymousone wrote 1 day ago:
            Yes, of course you can destroy it. But how far can you "improve",
            beyond decent "common sense" behaviour.
       
            esperent wrote 1 day ago:
            Surely other agent CLIs are not dumb enough to invalidate cache on
            every turn over something so obvious?
       
              brookst wrote 1 day ago:
              Probably not that exactly, but there is a tradeoff between
              effectiveness of the prompt and cache hit rate. If putting the
              userâs datetime in the middle of the prompt scores higher on
              evals but worsens cache hits, versus at the end of the prompt
              where itâs cache friendly but may not be as effective, what do
              you do?
              
              This is still art as much as science and the different harnesses
              take different approaches.
       
              chillfox wrote 1 day ago:
              I don't think any the agents breaks caching on every turn, but
              they might do things like current list of files, or available
              tools depending upon plan/build mode... or lots of other things
              that breaks caching multiple times during a session.
       
              embedding-shape wrote 1 day ago:
              Obviously not, most agents properly keep previous messages
              unchanged, at least the major ones I've been digging into the
              source off. Also, everything would get so much slower, that even
              developers creating their own agents would notice quickly how
              much slower theirs is, if they fuck this up.
       
        hirako2000 wrote 1 day ago:
        Good timing given the cost spike across other frontier models.
       
          notjes wrote 1 day ago:
          Good thing DS just made their discount permanent.
          
 (HTM)    [1]: https://x.com/deepseek_ai/status/2057854261699195173
       
        skeledrew wrote 1 day ago:
        Not a fan of that page. The animated typing and resulting continuous
        resize of the example keeps moving the content beneath it down and up.
        Such bad UX.
       
          m4rkuskk wrote 1 day ago:
          Claude design AI slob.
       
          embedding-shape wrote 1 day ago:
          Agents or no agents, people still need to test their websites on
          different resolutions or at least window width, but seems this is
          becoming a lost art.
       
            mirekrusin wrote 1 day ago:
            Yeah, doesnât look designed for people who want to read it beyond
            animated typing animation.
       
        canadiantim wrote 1 day ago:
        So what's best low cost coding agent these days? Kimi 2.6? Qwen's
        latest closed model? Composer 2.5? DeepSeek?
       
          throw10920 wrote 1 day ago:
          Cursor with Composer 2.5 seems to be competitive with frontier models
          (Opus and GPT-5.5) for a significant price discount. Benchmarks are
          gamed, as always, but $0.55/task vs $11.02 a task definitely
          indicates that there's some cost advantage.
          
 (HTM)    [1]: https://cursor.com/evals
       
          abalashov wrote 1 day ago:
          Although I have little interest in agentic coding, when I do use it,
          I have found Kimi K2.6 to give Opus-quality output, and have switched
          entirely to it for pretty much everything.
       
            throw10920 wrote 1 day ago:
            I've used Opus extensively and tried K2.6 on a few projects, and
            the gap is huge. K2.6 is nowhere near the performance of Opus.
            That's fine because it's also far cheaper, but public benchmarks
            line up with my own personal experience that they aren't comparable
            in terms of intelligence.
            
            (that is, different places on the Pareto efficiency graph)
       
              abalashov wrote 1 day ago:
              No two uses are alike, I suppose. For me, whatever difference is
              a wash. However, I probably tend to shy away from throwing
              high-complexity/long-horizon tasks at the model.
       
          stavros wrote 1 day ago:
          For me, it's by far Deepseek. It's many times cheaper than
          competitors, and about as good as Sonnet 4.6.
       
            fouric wrote 1 day ago:
            I'd generally agree about Deepseek being as good as Sonnet - but I
            have extreme trouble with prompt compliance with V4 Pro in a way
            that I've never had with Sonnet. I'll tell it "find the bug, but
            don't fix it" or "please use this tool I just developed" and it'll
            ignore me a high fraction of the time.
            
            It's bad enough that I'm working on guardrails at the harness level
            because prompting appears to be useless.
            
            Do you have the same issue?
       
              stavros wrote 1 day ago:
              I have Opus make a fairly detailed plan, then Deepseek
              implements, and GPT reviews. With that setup, I have zero issues,
              probably because what you mention is handled (the plan keeps it
              on track and the reviewer catches any issues).
              
              Now that you mention it, though, I have seen it do a few things
              that weren't in the plan. The reviewer caught them, though, so
              they didn't cause a problem, and it's so cheap that overall it's
              a massive improvement.
       
                e2e4 wrote 1 day ago:
                Which CLIs are you using for each of the steps?
       
                  stavros wrote 1 day ago:
                  OpenCode for everything:
                  
 (HTM)            [1]: https://www.stavros.io/posts/how-i-write-software-wi...
       
                    e2e4 wrote 23 hours 43 min ago:
                    thank you; will read your post
       
          passive wrote 1 day ago:
          I've gone through ~600m tokens in Xiaomi Mimo though Claude, and it's
          been the most effective use of an agent I've had yet. It's very
          capable, but generally not ambitious, picking simple but effective
          solutions to most problems I give it. 
          Going to write something longer about the experience when I get to a
          billion tokens.
       
            Alifatisk wrote 1 day ago:
            I do have my eyes on the coding plan, which is quite generous.
            
 (HTM)      [1]: https://mimo.mi.com
       
            gandreani wrote 1 day ago:
            Are you using Mimo 2.5 pro?
       
              passive wrote 1 day ago:
              Yes. I tried a couple of weeks with non-Pro, and it was pretty
              good, but I had too many spare tokens, so I switched back to Pro.
              :)
       
          skeledrew wrote 1 day ago:
          Seems to be DeepSeek.
          
 (HTM)    [1]: https://news.ycombinator.com/item?id=48237663
       
          ac29 wrote 1 day ago:
          Kimi 2.6 is great. Qwen3.7-max benchmarks similarly but I havent used
          it yet
       
          lostmsu wrote 1 day ago:
          Just use codex with 5.5 on low reasoning levels
       
          bwfan123 wrote 1 day ago:
          In my experience, it is claude-code paired with deepseek-v4. For
          penny-pinchers like me, I can have long coding sessions with it with
          no anxiety about the cost. Also, prompting it to what you want and
          verifying the outputs is more important than the quality of the
          model. So, I am better off with a cheaper model and taking the
          responsibility for prompting it and verifying the results.
       
            raybb wrote 20 hours 52 min ago:
            How to do connect deepseek to Claude code?
       
              qaz_plm wrote 11 hours 56 min ago:
              
              
 (HTM)        [1]: https://api-docs.deepseek.com/quick_start/agent_integrat...
       
            esperent wrote 1 day ago:
            It's obviously much cheaper paying by the token but how does it
            compare to a codex subscription on cost?
       
            epolanski wrote 1 day ago:
            Can you quantify the actual costs in a week and the use you make?
       
              wongarsu wrote 1 day ago:
              Not GP, but for my use I'd estimate $0.10-0.30 per hour of use
              per agent with DeepSeek v4 Pro
       
        embedding-shape wrote 1 day ago:
        I'm not sure you need a "DeepSeek native coding agent" to take
        advantage of DeepSeeks cache, yesterday as the Codex quota usage issue
        still wasn't solved for me, I wrote a tiny little bridge so I could use
        DeepSeek V4 Pro via Codex, and seems most of everything I did was
        basically cached as far as I can tell: [1] (2026-05-23 Input (Cache
        hit): 39,123,200 tokens, Input (Cache miss) 1,692,286), and the bridge
        is doing not special, just massage the DeepSeek API shape into what
        Codex expects, nothing particular about caching at all.
        
        Besides being even better at the caching, I'm not sure what benefits
        you'd get compared to just firing up OpenCode with the DeepSeek API
        yourself, it'll similarly do caching for sure and also "talks directly
        to api.deepseek.com" if that matters, and you'll get a much more mature
        harness.
        
 (HTM)  [1]: https://i.imgur.com/7eKn6wN.png
       
          tontinton wrote 1 day ago:
          Yep exactly my thoughts, went and looked at the code for the deepseek
          provider in my coding agent. and basically all of what the author
          wrote there is implemented... [1] for the curios
          
 (HTM)    [1]: http://github.com/tontinton/maki
       
          kiproping wrote 1 day ago:
          This would be a better page to link to [1] They explain some of the
          the reasons why they have a better solution and why they are very
          opinionated
          
          >Automatic prefix caching activates only when the exact byte prefix
          of the previous request matches. Most agent loops reorder, rewrite,
          or inject fresh timestamps each turn â cache hit rate in practice:
          <20%.
          
          So they optimize on this plus other techniques to improve cache hits,
          making it cheaper.
          
 (HTM)    [1]: https://github.com/esengine/DeepSeek-Reasonix/blob/main/docs...
       
            embedding-shape wrote 13 hours 29 min ago:
            > Most agent loops reorder, rewrite, or inject fresh timestamps
            each turn
            
            I haven't seen that, it'd be crazy slow if they did this. What
            "agent loops" are they talking about here specifically? The
            vagueness makes it sound potentially made up.
       
            sparkleMing wrote 15 hours 34 min ago:
            The last time I heard about something like this, it was Claude Code
            intentionally injecting random strings to break caching when you're
            not using a Claude model. Aside from that kind of intentional
            sabotage, I don't think any coding agent would just ignore prefix
            caching.
       
              ikurei wrote 6 hours 44 min ago:
              I haven't heard about this, could you please share more info,
              some reference on that Claude Code intentional bug?
       
                davesque wrote 2 hours 18 min ago:
                I'm not sure what the mechanism is, but I've definitely had
                Claude refuse to work on sessions that were touched by other
                models. Some kind of integrity check failure. Resetting the
                session back to the point before I used the other model fixed
                the problem.
       
            vidarh wrote 16 hours 16 min ago:
            I've never seen an agent loop "reorder, rewrite, or inject fresh
            timestamps" each turn other than mostly towards the end of the
            messages. Messing with a large part of the context every turn would
            be a fairly crazy thing to do.
       
              nawitus wrote 11 hours 16 min ago:
              Yeah. Those claims are just some random AI slop from claude..
       
                vidarh wrote 10 hours 12 min ago:
                It's a really lazy one too - there are so many open source
                harnesses, including e.g. Codex and Kimi-CLI, and of course the
                leaked Claude Code source, so it's trivial to verify if someone
                even just bothered to ask an agent to check actual source code
                examples.
       
            krackers wrote 22 hours 1 min ago:
            >Most agent loops reorder, rewrite, or inject fresh timestamps each
            turn
            
            That's really surprising, since it'd defeat the whole point of KV
            caching. I mean I buy it considering how sloppily coded the
            harnesses seem to be, but this like obvious low hanging fruit.
            
            I've also often wondered why LLMs aren't trained with a format of
            having a dedicated contextual system-instruction role at the _end_,
            which you could use to put context like current time or other misc
            stuff.
       
              jeremyjh wrote 9 hours 9 min ago:
              Its not surprising, that doc is full of AI slop.
       
          3uler wrote 1 day ago:
          Opencode has really bad cache stability issues that they seem
          uninterested in fixing at the moment.
       
            verdverm wrote 6 hours 52 min ago:
            There are some that are specific to certain models like qwen/gemma
            
            I switched to vLLM and those went away. Need to look at my opencode
            config and adjust some others based on things I see here
       
            magicalhippo wrote 18 hours 18 min ago:
            What I noticed when using OpenCode with llama.cpp, was that the
            default host RAM prompt cache size in llama.cpp was way too small
            for say 128k Qwen3.6 27B.
            
            The default is just 8GB and a full 128k context for the dense model
            can take most of that. So then comes an agent and causes eviction
            and subsequent cache miss.
            
            Bumped the cache size (--cram IIRC) up to 48GB and had much better
            results.
       
            estebarb wrote 1 day ago:
            I'm not sure that is really the case, or relevant in practice. I
            have been using OpenCode with DeepSeek lately (regular coding). For
            instance, today I got 120 million input tokens hitting cache, vs
            just 2.59million missing cache.
       
              ctxc wrote 20 hours 40 min ago:
              Reads like a LOT of tokens to me. What does your usage /workflow
              look like? I'm v curious because although I do use Claude code,
              my token counts aren't nearly as much
              
              I want to know if I'm missing something cool!
       
                mordae wrote 15 hours 36 min ago:
                Not OP, but I routinely load 150k tokens into context. A full
                sub-package to work on, select other files in the monorepo,
                e.g. front-end visualization and back-end data loader. Then
                work some 150k tokens, then start again.
                
                At the end, cache hit rate is like 99.5% if Novita is not
                having issues.
                
                For official DeepSeek API, 99.9% or something.
                
                Custom harness that never compacts or otherwise doctors the
                history.
       
                  ctxc wrote 11 hours 41 min ago:
                  Those numbers make sense to me...120 million input tokens is
                  like 120 sessions of hitting the full context limit, which
                  seems like a lot to me though
       
            metalspot wrote 1 day ago:
            I am getting 98.6% cache hit ratio on deepseek-v4-flash with
            opencode
       
              upcoming-sesame wrote 1 day ago:
              out of curiosity, how do you measure cache hit rate in opencode ?
       
                malikNF wrote 1 day ago:
                opencode stats
       
                  hackernows_test wrote 1 day ago:
                  The first
       
                  lugu wrote 1 day ago:
                  So the calculation is:
                  
                  Total input token = input + cache read + cache write
                  Cache hit rate = cache read / total input token.
                  
                  That is 71% in my very limited use of opencode.
       
              bobkb wrote 1 day ago:
              Thatâs impressive!
              
              On the sheer performance itâs comparable to Opus ?
       
                stavros wrote 1 day ago:
                Here are my stats (from DeepSeek directly, with a script I
                wrote). The prices are what equivalent Sonnet usage would have
                cost, the actual amount I paid was $10. On performance,
                DeepSeek V4 Pro is comparable to Sonnet for me.
                
                     ./cost.py amount-2026-5.csv 0.3 3.75 15
                    input_cache_hit_tokens: 472,971,520 tokens -> $141.8915
                    input_cache_miss_tokens: 13,299,013 tokens -> $49.8713
                    output_tokens: 3,334,962 tokens -> $50.0244
                    cache hit rate: 97.27% (472,971,520/486,270,533)
                    cache miss rate: 2.73% (13,299,013/486,270,533)
                    total: $241.7872
                
                All of this usage was with an OpenCode subagent exclusively.
       
            dathery wrote 1 day ago:
            The OpenCode devs talk about this on Twitter a lot, e.g. [1] > tool
            call pruning breaks cache and people will tell you this is horrible
            and expensive
            
            > except i looked at some anthropic data and real user behavior
            ends up with better cache hits and 30% less spend
            
            > even this is needs to be analyzed further, it's just not simple
            
            > for openai data it's inverted! cache hit ratio is actually better
            [sic: I think he meant worse based on the screenshot] with tool
            call pruning turned on
            
            > but the net $ saved is only 5%
            
            > kimi is a funny one - it has better cache hits with pruning
            on...but is also more expensive!
            
            There was also another thread recently where he discussed that
            pruning improves user experience (models are smarter with less
            context) but I can't find it.
            
            This can also be disabled in the config:
            
 (HTM)      [1]: https://xcancel.com/thdxr/status/2048268697790300343
 (HTM)      [2]: https://opencode.ai/docs/config/#compaction
       
              awoimbee wrote 16 hours 16 min ago:
              You didn't quote the interesting part:
              
              > our implementation is it only prunes calls from > 3 user
              messages ago, if context is > 40K, and only if there's at least
              20K tokens to be removed
              
              Seems reasonable to me and explains why I can have long sessions
              (way longer than with zed agents) while still hitting cache.
              Opencode is just missing per-provider TTL.
       
                arthurcolle wrote 14 hours 56 min ago:
                I found that keeping current context utilization at 18% of
                total context length was best for minimizing spend, across all
                models with 400k context length or more
       
              soerxpso wrote 1 day ago:
              My understanding of caching with most models/providers is that a
              prefix substring of the context has to be reused for a cache hit,
              but not necessarily the whole entire context window. So if you
              prune tool calls from the history, you're going to get one cache
              miss on the newly-pruned history, and then you're going to be
              getting cache hits on every subsequent turn, with a lower number
              of input tokens. If you prune subsequent tool calls after that,
              you would still get a cache hit for the already-pruned portion of
              the context, just not the full context.
       
                __natty__ wrote 1 day ago:
                So it makes sense to first send stable prompt, reasoning and
                files content, tool calls summary and actual tool calls at the
                very end?
       
                  leemoore wrote 1 day ago:
                  The way you do this (and the way opencode does it) is you do
                  most of your pruning in more recent history. Last I looked at
                  opencode, they start pruning tool call results after 2 full
                  agentic turns. So you probably dont get quite as good hits on
                  cache for the most recent 1-5% of your turns, but after that
                  everything else caches fine and those tool calls that likely
                  aren't relavent to your session anymore are gone.
       
              hirako2000 wrote 1 day ago:
              They are. Empirical evidence on my side. Because attention is
              sparse across the context. It's not truly treating a million
              token the way it treats a fraction of that count. For
              performance.
       
            huqedato wrote 1 day ago:
            I can't confirm this. Having utilized Opencode for a large project
            over the past 10 months, with multiple models and agents, we've
            never run into such 'cache stability issues'."
       
            embedding-shape wrote 1 day ago:
            That'd be really easy to spot and also fix, most likely. Any open
            issue you could point us to, must surely been reported already?
       
              3uler wrote 13 hours 6 min ago:
              
              
 (HTM)        [1]: https://github.com/anomalyco/opencode/pull/14743
       
              krzyk wrote 1 day ago:
              Opencode (and other coding agents) have hundreds of open issues
              reported. It is quite discouraging when they are not being
              closed/fixed.
       
                verdverm wrote 6 hours 48 min ago:
                These projects have also been the recipients of PR spam, lots
                of duplicates and unconfirmed in there for less technical
                people and clawd operators
       
              nolok wrote 1 day ago:
              > That'd be really easy to spot and also fix, most likely
              
              Ah, reminds me of good old "There are only 2 hard problems in
              computer science: cache invalidation, naming things, and off-by-1
              errors."
       
                criemen wrote 1 day ago:
                > Ah, reminds me of good old "There are only 2 hard problems in
                computer science: cache invalidation, naming things, and
                off-by-1 errors."
                
                You quip, but LLM KV caching (from the harness side) is quite
                easy: You get a cache hit on stable prompt prefixes, period.
                That means you want to keep the prefix stable, and only append
                at the end of the conversation.
                Made up example: Don't put the git branch name into the system
                prompt part (that comes first), as whenever the branch name
                changes, that'd trigger a cache invalidation of the entire
                prompt.
                
                Getting this right requires some care to not by accident modify
                the prefix, basically, and some design on communicating the
                things that can change (user configuration, working dir, git
                information, ...).
       
                  franknord23 wrote 1 day ago:
                  That sounds like the experience of writing Containerfiles;
                  since steps are cached you want to pull the thing you are
                  iterating on as far down as possible.
       
                    verdverm wrote 6 hours 49 min ago:
                    It's even closer to prefix matching on super long strings
                    by chunk
       
                    gopher_space wrote 1 day ago:
                    All of this work has been done before in different
                    contexts.  Memory management with bigger blocks and weaker
                    definitions that change whenever some grad student gets a
                    bright idea.
       
                      vidarh wrote 16 hours 10 min ago:
                      100%. Since you mention memory management: Generational
                      GC is pretty much the same idea: Keep the stuff that's
                      least likely to change an important property (liveness)
                      together.
                      
                      Conceptually the underlying general idea is to sort
                      things based on stability if you can avoid recomputing
                      properties of the stable part.
       
          himata4113 wrote 1 day ago:
          this appears to be native to the terminal, as in, there's no special
          application that runs or wraps an agent inside a tui. So basically
          instead of commands you type plain english?
       
            embedding-shape wrote 1 day ago:
            > this appears to be native to the terminal, as in, there's no
            special application that runs or wraps an agent inside a tui
            
            Same with codex? codex-rs at least, is a TUI as well, it does run a
            "app-server" in the background, that the TUI actually interacts
            with, but that's just an implementation detail. Also makes it easy
            to hook in your own programs to fire of codex "headless" sessions
            even without the TUI.
       
          bwfan123 wrote 1 day ago:
          > I wrote a tiny little bridge so I could use DeepSeek V4 Pro via
          Codex
          
          Can you share the bridge.  DeepSeek v4 is awesome paired with
          claude-code or opencode. I found that claude code costs me less than
          opencode and I am presuming this is due to a better engineered
          harness.
       
            spacedcowboy wrote 12 hours 55 min ago:
            I don't think DeepSeek v4 Flash is as good as Claude for relatively
            complex tasks. I ran with DeepSeek for a week, giving it the same
            sort of tasks that Claude normally does, and then ran Claude and
            asked it to continue. It found a whole bunch of things that had
            been "overlooked" by DeepSeek, and spent some time fixing them
            before wanting to move on.
            
            DeepSeek is good, Claude is better, at least IMHO. Deepseek is a
            lot cheaper though :)
       
            NamlchakKhandro wrote 1 day ago:
            Claude code and open code are streaming piles of shit
       
            bayesianbot wrote 1 day ago:
            LiteLLM can serve OpenAI API endpoint IIRC and proxy that to other
            providers like DeepSeek, should work with Codex
       
            Den_VR wrote 1 day ago:
            Iâm feeling more a novice every day, but how isnât this just
            handing over your code to team deepseek for whatever they might
            want
       
              spacedcowboy wrote 12 hours 53 min ago:
              Somehow I don't think DeepSeek will be that interested in a 6502
              compiler [1]...
              
              1:
              
 (HTM)        [1]: https://atari-xt.com/
       
              dudisubekti wrote 20 hours 59 min ago:
              Yeah, but it's miles better than giving Anthropic and OpenAI your
              data. At least Deepseek is releasing open-weight models and a lot
              of open-source libraries.
              
              If you're concerned about espionage then the only solution is
              host the models yourself, which again, only open-weight models
              like Deepseek enable you to do this.
       
              jijji wrote 1 day ago:
              there's laws on the books in China that says that every company
              operating in China must aid and abet the Chinese government in
              espionage against the rest of the world.  given those facts, I
              find it deeply troubling to be using anything coming out of
              China, especially a program that runs in the context of a Linux
              terminal on a machine that might have something important on it.
              I'd argue it's a back door waiting to happen, if not sooner than
              obviously later.
       
                zaphirplane wrote 12 hours 7 min ago:
                Like which country allows companies to not follow a legal
                directive. How weird
       
                subscribed wrote 13 hours 38 min ago:
                I forbidden from working on the company code with DS, but if I
                have a private something that looks pretty much like one of the
                thousands repositories put there, it doesn't matter that much.
       
                cicko wrote 17 hours 56 min ago:
                Yes, and the Russians are still coming!
       
                Danox wrote 1 day ago:
                The four biggest (obvious) backdoor countries in the world in
                no particular order the United States, Israel, Russia, China.
                Honorable mentions, North Korea, Ukraineâ¦
       
                tim-projects wrote 1 day ago:
                Is it not better to have a country far away spying on you than
                your own country?
       
                  azinman2 wrote 1 day ago:
                  Not if itâs industrial espionage.
       
                goobatrooba wrote 1 day ago:
                As a European I have to admit I am these days more worried
                about the US than China. See yesterday's article about the US
                government forcing Microsoft to give them lists of Dutch
                government officials. Utter madness. At least the Chinese
                mainly care about the money and power levers, the US about
                strange worlds of revenge and manipulation, trying to change or
                influence your government. E.g. which of the two countries has
                put crippling personal sanctions on staff of the international
                criminal court?
                
                Honestly I'd love to love the US again, but basically after
                Obama things have just gone down and down and no soul will
                trust the US again in the next generation or two.
       
                  gizajob wrote 16 hours 54 min ago:
                  That particular rot actually turned cancerous with Bush and
                  Cheney, not Obama, IMO.
       
                  c1sc0 wrote 20 hours 4 min ago:
                  Besides the language barrier itâs actually also just
                  simpler to do business with the Chinese. There are issues
                  like censorship but they are known & can be routed around.
                  Itâs best to just ignore the US and move your business
                  elsewhere.
       
                  monch1962 wrote 22 hours 0 min ago:
                  As an Australian, I completely agree with every point in your
                  response
       
                  jijji wrote 22 hours 42 min ago:
                  The situation you reference is related to a specific
                  investigation by US congress requesting documents about
                  potentially illegal censorship actions by EU officials from a
                  specific company (microsoft). The difference is that the laws
                  in china are broadly defined to include giving all
                  intellectual property of anyone back to the government with
                  no oversight, for the purposes of espionage.
                  
                  The former relates to a specific investigation about
                  potential criminal activity, the latter relates to broad
                  illegal activity committed by the government itself unrelated
                  to any specific case.
                  
                  The US has no laws on the books forcing companies to wantonly
                  give intellectual property and other espionage level material
                  back to the government. If they did, no one would use cloud
                  providers.
                  
                  To avoid this, you can run your own hosted machine in a
                  colocation facility, because in the US, people do have
                  reduced rights when their data is controlled by a third party
                  versus being controlled by themselves.    Its the same as if
                  the data was in your house, they would need a search warrant
                  to obtain it, but when its at a Azure or AWS datacenter not
                  controlled by you, your privacy rights are reduced by doing
                  this.
       
                    MASNeo wrote 15 hours 8 min ago:
                    > no one would use cloud providers
                    
                    I think many are trying to move away from US providers
                    actually. FISA section 702 and the current administrations
                    liberties taken towards international law are not helping.
                    The trust problem is real.
                    
                    Not sure Iâd trust China with anything onshore. But
                    offshore, it does seem they play by the rules, because it
                    pragmatically serves the stability of the people. China has
                    not started wars in the past 50 years or so. By that logic
                    one may assume theyâd not abuse the arguably broad powers
                    over Chinese firms abroad to risk one now.
                    
                    In a world where rules are increasingly less important how
                    states use power matters more to me than how they claim to
                    be monitored.
       
                    watwut wrote 15 hours 47 min ago:
                    >  If they did, no one would use cloud providers.
                    
                    EU has literal directive about location of data which has
                    to be located in the EU and not in the USA, because the
                    data are in danger otherwise.
       
                      miroljub wrote 15 hours 7 min ago:
                      Yep, and then they let US companies handle that data. One
                      more proof EU regime is run by ... no, I won't tell,
                      don't wanna get arrested.
       
                    cicko wrote 17 hours 53 min ago:
                    > The US has no laws on the books
                    
                    Correct. They come up on Twitter daily. Pardon, this other
                    truth bullshit.
       
                  dominotw wrote 1 day ago:
                  so govt forcing a private coroporation being a big deal that
                  a its on the worldwide news is more scary to you than an
                  implicit mandate that china forces on its companies?
       
                  OtomotO wrote 1 day ago:
                  Exactly this.
                  
                  I don't care about the US more than about Russia or China
                  these days.
                  
                  They are definitely not our allies anymore.
       
                    dominotw wrote 1 day ago:
                    you dont need to be allies to do business. walmart is not
                    my ally.
       
                      ajuc wrote 11 hours 1 min ago:
                      Not enough trust to do business.
       
                      schubidubiduba wrote 12 hours 57 min ago:
                      The difference is that Walmart is a stable, reliable
                      trade partner that honors contracts and is not trying to
                      use propaganda to make you a fascist
       
                      OtomotO wrote 17 hours 40 min ago:
                      True, but then I expect them to betray me at any junction
                      and I'll gladly do the same.
       
                _3u10 wrote 1 day ago:
                FISA section 702 / Five eyes / Room 641A.
       
              oldmanhorton wrote 1 day ago:
              Youâre not a novice, there are a lot of us who know exactly
              what we are doing and see this as a huge downside. We are just
              being told to go faster, faster, faster lest we miss out onâ¦
              something?
       
              embedding-shape wrote 1 day ago:
              Not everyone is working with state secrets or user personal data
              (or even more closely guarded, company secrets) on a daily basis,
              most of what I hack on is either FOSS already, or will be, not
              much to keep secret here.
              
              Obviously, if you do deal with any sort of secrets, then using
              local LLMs over OpenAI, Anthropic, DeepSeek or whoever is
              obviously preferred, and in the case of personal data of users,
              probably a requirement.
       
                jack_pp wrote 1 day ago:
                either this or you work on software that even if copied won't
                get you far since the business relies on network effects or
                pure networking.
                
                Getting the source code of facebook or instagram doesn't mean
                you could compete with them.
                
                I work for a company that has built relationship with event
                organizers over the past 10 years. The code I maintain could be
                written from scratch in maybe 2-3 months even though it was
                built over the past 10 years but besides that you have frontend
                / DB / hardware / logistics etc
       
                  Demiurge wrote 1 day ago:
                  I actually agree with you, for the most part. The code I work
                  with actually does contain some valuable algorithms, but Im
                  pretty sure the effort of integrating them into a larger
                  system is pointless without the data. Itâs almost like
                  stealing half-life 2 source code without any assets.
                  
                  Still, âGetting the source code of facebook or instagram
                  doesn't mean you could compete with them.â I think to
                  giants like that, having access to their source code could
                  open up some very interesting loop holes for manipulating the
                  ranking algorithms, or even security vulnerabilities.
       
                    jack_pp wrote 1 day ago:
                    True, haven't thought of that. However very few actual
                    projects / companies are in a situation where the chinese
                    GOVT would be interested to spend resources to hack your
                    platform. For the ones that are afraid of that there's
                    always self hosting of course
       
                      Den_VR wrote 11 hours 51 min ago:
                      I used to work with HVAC companies, and I noticed that
                      many of their customers mistakenly believed they were
                      purchasing air conditioners. They didnât consider these
                      devices, which they connected to the internet, as
                      computers. Despite being systems that required user
                      names, passwords, updates, monitoring, and other
                      maintenance, the prevailing attitude among these
                      customers was, âThis is an appliance, and why would
                      anyone care about my air conditioner?â
                      
                      All this to say, not even subject matter experts
                      necessarily appreciate the risk involved in their work
       
            embedding-shape wrote 1 day ago:
            Sure, keep in mind it's a steaming pile of hacked together hacks,
            probably won't work in every case, doesn't support every feature
            that should be supported (like parallel tool calling, both Codex +
            DeepSeek API support it), and it might make your computer catch on
            fire: [1] I only used it for a few hours to play around with stuff
            before the quota issue was fixed and I could resume using GPT
            models, and the bridge was coded by DeepSeek-V4-Flash-IQ2XXS +
            DwarfStar4 locally, I take no responsibility for what might happen
            with your computer or you, during usage or just reading the code.
            
            Edit: heh, like don't look at line 117 for example where seemingly
            it likes to handle misspellings in the .env file which totally
            wasn't my fault for typo'ing the API key in that file... I'm sure
            there are tons of sharp edges and dumb stuff in there.
            
 (HTM)      [1]: https://gist.github.com/embedding-shapes/eab3e63e5a95d3d78...
       
       
 (DIR) <- back to front page