hngopher.com

       [HN Gopher] A Research Preview of Codex
       ___________________________________________________________________
        
       A Research Preview of Codex
        
       Author : meetpateltech
       Score  : 336 points
       Date   : 2025-05-16 15:02 UTC (7 hours ago)
        
 (HTM) web link (openai.com)
 (TXT) w3m dump (openai.com)
        
       | haffi112 wrote:
       | (watching live) I'm wondering how it performs on the METR
       | benchmark (https://metr.org/blog/2025-03-19-measuring-ai-ability-
       | to-com...).
        
       | colesantiago wrote:
       | I think the benchmark test for these programming agents that I
       | would like to see an Agent making a flawless PR or patch to the
       | BSD / Linux kernel.
       | 
       | This should be possible today and surely Linus would also see
       | this in the future.
        
         | _kb wrote:
         | There's a fairly pragmatic discussion in that exact topic with
         | Linus here: https://youtu.be/VHHT6W-N0ak.
        
       | tptacek wrote:
       | Maddening: "codex" is also the name of their open-source Claude-
       | Code-alike, and was previously the name of an at-the-time
       | frontier coding model. It's like they name things just to fuck
       | with us.
        
         | tekacs wrote:
         | So -- that client-side thing is _technically_ called `codex-
         | cli` (in the parent 'codex' repo, which looks like a
         | monorepo?).
         | 
         | Still super confusing, though!
         | 
         | I feel like companies working with and shipping LLMs would do
         | well to remember that it's not just humans who get confused by
         | this, but LLMs themselves... it makes for a painful time,
         | sending off a request and noting that a third of the way into
         | its reasoning that the model has gotten tow things with almost-
         | identical names confused.
        
           | tough wrote:
           | they also have a dual implementation on rust and typescript
           | there's codex-rs in that monorepo
        
             | fabmilo wrote:
             | more excited about the rust impl than the typescript one.
        
               | tptacek wrote:
               | Besides packaging of their releases, what possible
               | difference could that make in this problem domain?
        
               | tough wrote:
               | I just think it's nice to have open source code to
               | reference so maybe he meant just in that -educational-
               | way, certainly more to learn from the rust one than the
               | TS one for most folks? even if the problem-space doesn't
               | require system-level safety code indeed
        
           | quantadev wrote:
           | If it's name is 'codex-cli' then that means "Codex Command
           | Line Interface" so the name is absolutely codex.
        
         | manojlds wrote:
         | And with themselves and their models. The Codex open source had
         | prompt to disambiguate it from the model.
        
         | scottfalconer wrote:
         | Next week: OpenAI rebrands Windsurf as Codex.
        
           | odie5533 wrote:
           | Codex IDE. Calling it.
        
             | dbbk wrote:
             | VS Codex
        
       | prhn wrote:
       | Is anyone using any of these tools to write non boilerplate code?
       | 
       | I'm very interested.
       | 
       | In my experience ChatGPT and Gemini are absolutely terrible at
       | these types of things. They are constantly wrong. I know I'm not
       | saying anything new, but I'm waiting to personally experience an
       | LLM that does something useful with any of the code I give it.
       | 
       | These tools aren't useless. They're great as search engines and
       | pointing me in the right direction. They write dumb bash scripts
       | that save me time here and there. That's it.
       | 
       | And it's hilarious to me how these people present these tools. It
       | generates a bunch of code, and then you spend all your time
       | auditing and fixing what is expected to be wrong.
       | 
       | That's not the type of code I'm putting in my company's code
       | base, and I could probably write the damn code more correctly in
       | less time than it takes to review for expected errors.
       | 
       | What am I missing?
        
         | icapybara wrote:
         | It's probably what you're asking. You can't just say "write me
         | an app", you have to break a big problem into small problems
         | for it.
        
         | spariev wrote:
         | I think it all depends on your platform and use cases. In my
         | experience AI tools work best with Python and JS/Typescript and
         | some simple use cases (web apps, basic data science etc). Also,
         | I've found they can be of great help with refactorings and
         | cases when you need to do something similar to already existing
         | code, but with a twist or change.
        
         | volkk wrote:
         | you might be missing small things to create more guardrails
         | like effective prompting and maintaining what's been done using
         | files, carefully controlling context, committing often in-
         | between changes, but largely, you're not missing anything. i
         | use AI constantly, but always for subtasks of a larger
         | complicated thing that my brain has thought through. and often
         | use higher cost models to help me abstractly think through
         | complex things/point me in the right directions.
         | 
         | personally, i've always operated in a codebase in a way that i
         | _need_ to understand how things work for me to be productive
         | and make the right decisions. I operate the same way with AI.
         | every change is carefully reviewed, if it's dumb, i make it
         | redo it and explain why it's dumb. and if it gets caught in a
         | loop, i reset the context and try to reframe the problem.
         | overall, i'm definitely more productive, but if you truly want
         | to be hands off--you're in for a very bad time. i've been
         | there.
         | 
         | lastly, some codebases don't work well with AI. I was working
         | on a problem that was a bit more novel/out there and no model
         | could solve it. Just yapped endlessly about these complex, very
         | potentially smart sounding solutions that did absolutely
         | nothing. went all the way to o1-pro. the craziest part to me
         | was the fact that across claude, deepseek and openai, they used
         | the same specific vernacular for this particular problem which
         | really highlights how a lot of these models are just a mish-
         | mash of the same underlying architecture/internet data. some of
         | these models use responses from other models for their training
         | data, which to me is like incest. you won't get good genetical
         | results
        
         | Workaccount2 wrote:
         | >What am I missing?
         | 
         | That you are trying to use LLMs to create giant sprawling
         | codebase feature packed software packages that define the
         | modern software landscape. What's being missed is that any one
         | user might only utilize 5% of the code base on any given day.
         | Software is written to accommodate every need every user could
         | have in one package. Then the users just use the small slice
         | that accommodates their specific needs.
         | 
         | I have now created 5 hyper narrow programs that are used daily
         | by my company to do work. I am not a programmer and my company
         | is not a tech company located in a tech bubble. We are a tiny
         | company that does old school manufacturing.
         | 
         | To give a quick general example, Betty uses Excel to manage
         | payroll. A list of employees, a list of wages, a list of hours
         | worked (which she copys from the time clock software .csv that
         | she imports to excel).
         | 
         | Excel is a few million LOC program and costs ~$10/mo. Betty
         | needs maybe 2k LOC to do what she uses excel for. Something an
         | LLM can do easily, a python GUI wrapper on an SQLite DB. And
         | she would be blown away at how fast it is, and how it is
         | written for her use specifically.
         | 
         | How software is written and how it is used will change to
         | accommodate LLMs. We didn't design cars to drive on horse
         | paths, we put down pavement.
        
           | kridsdale3 wrote:
           | The Romans put down paved roads to make their horse paths
           | more reliable.
           | 
           | But yes, I hope we get away from the giant conglomeration of
           | everything, ESPECIALLY the reality of people doing 90% of
           | their business inside a Google Chrome widow. Move towards the
           | UNIX philosophy of tiny single-purpose programs.
        
           | alfalfasprout wrote:
           | > I have now created 5 hyper narrow programs that are used
           | daily by my company to do work. I am not a programmer and my
           | company is not a tech company located in a tech bubble. We
           | are a tiny company that does old school manufacturing.
           | 
           | OK, great.
           | 
           | > That you are trying to use LLMs to create giant sprawling
           | codebase feature packed software packages that define the
           | modern software landscape. What's being missed is that any
           | one user might only utilize 5% of the code base on any given
           | day. Software is written to accommodate every need every user
           | could have in one package. Then the users just use the small
           | slice that accommodates their specific needs.
           | 
           | With all due respect, the fact that you made a few small
           | programs to help with your tasks is wonderful but this last
           | statement alone rather disqualifies your expertise to make an
           | assessment on software engineering in general.
           | 
           | There's a great number of reasons why codebases get large.
           | Complex problems inherently come with complexity and scale in
           | both code and integrations. You can choose to move the
           | complexity around but never fully get rid of it.
        
             | mupuff1234 wrote:
             | But how much of the software industry is truly solving
             | inherently complex problems?
             | 
             | At a very conservative guess I'd say no more than 10%.
        
         | Cu3PO42 wrote:
         | Occasionally. I find that there is a certain category of task
         | that I can hand over to an LLM and get a result that takes me
         | significantly less time to clean up than it would have taken me
         | to write from scratch.
         | 
         | A recent example from a C# project I was working in. The
         | project used builder classes that were constructed according to
         | specified rules, but all of these builders were written by
         | hand. I wanted to automatically generate these builders, and
         | not using AI, just good old meta-programming.
         | 
         | Now I knew enough to know that I needed a C# source generator,
         | but I had absolutely no experience with writing them. Could I
         | have figured this out in an hour or two? Probably. Did I write
         | a prompt in less than five minutes and get a source generator
         | that worked correctly in the first shot? Also yes. I then spent
         | some time cleaning up that code and understanding the API it
         | uses to hook into everything and was done in half an hour and
         | still learnt something from it.
         | 
         | You can make the argument that this source generator is in
         | itself "boilerplate", because it doesn't contain any special
         | sauce, but I still saved significant time in this instance.
        
         | uludag wrote:
         | I feel things get even worse when you use a more niche
         | language. I get extremely disappointed any time I try to get it
         | do anything useful in Clojure. Even as a search engine,
         | especially when asking it about libraries, these tools
         | completely fail expectation.
         | 
         | I can't even fathom how frustrating such tools would be with
         | poorly written confusing Clojure code using some niche
         | dependency.
         | 
         | That being said, I can imagine a whole class of problems where
         | this could succeed very well at and provide value. Then again,
         | the type of problems that I feel these systems could get right
         | 99% of the time are problems that a skilled developer could fix
         | in minutes.
        
         | sottol wrote:
         | I tried using Gemini 2.5 Pro for a side-side-project, seemed
         | like a good project to explore LLMs and how they'd fit into my
         | workflow. 2-3 weeks later it's around 7k loc of Python auto-
         | gerating about 35k loc of C from JSON spec.
         | 
         | This project is not your typical Webdev project, so maybe
         | that's an interesting case-study. It takes a C-API spec in
         | JSON, loads and processes it in Python and generates a
         | C-library that turns a UI marked up YAML/JSON into C-Api calls
         | to render that UI. [1]
         | 
         | The result is pretty hacky code (by my design, can't/won't use
         | FFI) that's 90% written by Gemini 2.5 Pro Pre/Exp but it mostly
         | worked. It's around 7k lines of Python that generate a 30-40k
         | loc C-library from a JSON LVGL-API-spec to render an LVGL UI
         | from YAML/JSON markup.
         | 
         | I probably spent 2-3 weeks on this, I might have been able to
         | do something similar in maybe 2x the time but this is about 20%
         | of the mental overhead/exhaustion it would have taken me
         | otherwise. Otoh, I would have had a much better understanding
         | of the tradeoffs and maybe a slightly cleaner architecture if I
         | would have to write it. But there's also a chance I would have
         | gotten lost in some of the complexity and never finished (esp
         | since it's a side-project that probably no-one else will ever
         | see).
         | 
         | What worked well:
         | 
         | * It mostly works(!). Unlike previous attempts with Gemini 1.5
         | where I had to spend about as much or more time fixing than
         | it'd have taken me to write the code. Even adding complicated
         | features after the fact usually works pretty well with minor
         | fixing on my end.
         | 
         | * Lowers mental "load" - you don't have to think so much about
         | how to tackle features, refactors, ...
         | 
         | Other stuff:
         | 
         | * I really did not like Cursor or Windsurf - I half-use VSCode
         | for embedded hobby projects but I don't want to then have
         | another "thing" on top of that. Aider works, but it would
         | probably require some more work to get used to the automatic
         | features. I really need to get used to the tooling, not an
         | insignificant time investment. It doesn't vibe with how I work,
         | yet.
         | 
         | * You can generate a *significant* amount of code in a short
         | time. It doesn't feel like it's "your" code though, it's like
         | joining a startup - a mountain of code, someone else's
         | architecture, their coding style, comment style, ... and,
         | 
         | * there's this "fog of code", where you can sorta bumble around
         | the codebase but don't really 100% understand it. I still have
         | mid/low confidence in the changes I make by hand, even 1 week
         | after the codebase has largely stabilized. Again, it's like
         | getting familiar with someone else's code.
         | 
         | * Code quality is ok but not great (and partially my fault).
         | Probably depends on how you got to the current code - ie how
         | clean was your "path". But since it is easier to "evolve" the
         | whole project (I changed directions once or twice when I sort
         | of hit a wall) it's also easier to end up with a messy-ish
         | codebase. Maybe the way to go is to first explore, then codify
         | all the requirements and start afresh from a clean slate
         | instead of trying to evolve the code-base. But that's also not
         | an insignificant amount of work and also mental load (because
         | now you really need to understand the whole codebase or trust
         | that an LLM can sufficiently distill it).
         | 
         | * I got much better results with very precise prompts. Maybe
         | I'm using it wrong, ie I usually (think I) know what I want and
         | just instruct the LLM instead of having an exploratory chat but
         | the more explicit I am, the more closely the output is to what
         | I'd like to see. I've tried to discuss proposed changes a few
         | times to generate a spec to implement in another session but it
         | takes time and was not super successful. Another thing to
         | practice.
         | 
         | * A bit of a later realization, but modular code and short,
         | self-contained modules are really important though this might
         | depend on your workflow.
         | 
         | To summarize:
         | 
         | * It works.
         | 
         | * It lowers initial mental burden.
         | 
         | * But to get really good results, you still have to put a lot
         | of effort into it.
         | 
         | * At least right now, it seems you will still eventually have
         | to put in the mental effort at some point, normally it's
         | "front-loaded" where you have to do the design and think about
         | it hard, whereas the AI does all the initial work but it
         | becomes harder to cope with the codebase once you reach a
         | certain complexity. Eventually you will have to understand it
         | though even if just to instruct the LLM to make the exact
         | changes you want.
         | 
         | [1] https://github.com/thingsapart/lvgl_ui_preview
        
         | asadm wrote:
         | yes, think of it as search engine that auto-applies that
         | stackoverflow fix to your code.
         | 
         | But I have done larger tasks (write device drivers) using
         | gemini.
        
         | browningstreet wrote:
         | I've built a number of personal data-oriented and single
         | purpose tools in Replit. I've constrained my ambitions to what
         | I think it can do but I've added use cases beyond my initial
         | concept.
         | 
         | In short, the tools work. I've built things 10x faster than
         | doing it from scratch. I also have a sense of what else I'll be
         | able to build in a year. I also enjoy not having to add cycles
         | to communicate with external contributors -- I think, then I
         | do, even if there's a bit of wrestling. Wrangling with a coding
         | agent feels a bit like "compile, test, fix, re-compile". Re-
         | compiling generally got faster in subsequent generations of
         | compiler releases.
         | 
         | My company is building internal business functions using AI
         | right now. It works too. We're not putting that stuff in front
         | of our customers yet, but I can see that it'll come. We may put
         | agents into the product that let them build things for
         | themselves.
         | 
         | I get the grumpiness & resistance, but I don't see how it's
         | buying you anything. The puck isn't underfoot.
        
         | IXCoach wrote:
         | Hey there!
         | 
         | Lots missing here, but I had the same issues, it takes
         | iteration and practice. I use claude code in terminal windows,
         | and text expander to save explicit reminders that I have to
         | inject super regularly because anthropic obscures access to
         | system prompts.
         | 
         | For example, I have 3 to 8 paragraph long instructions I will
         | place regularly about not assuming, checking deterministically
         | etc. and for most things I have the agents write a report with
         | a specific instruction set.
         | 
         | I pop the instructions into text expander so I just type - docs
         | when saying go figure this out, and give me the path to the
         | report when done.
         | 
         | They come back with a path, and I copy it and search vscode
         | 
         | It opens as an md and i use preview mode, its similar to a
         | google doc.
         | 
         | And ill review it. always, things will be wrong, tons of
         | assumptions, failures to check determistically, etc... but I
         | see that in the doc and have it fix it. correct
         | misunderstandings, update the doc until its perfect.
         | 
         | From there ill say add a plan in a table with status for each
         | task based on this ( another text expander snippet with
         | instructions )
         | 
         | And WHEN thats 100% right, Ill say implement and update as you
         | go. The update as you go forces it to recognize and remember
         | the scope of the task.
         | 
         | Greatest points of failure in the system is misalignment.
         | Ethics teams got that right. It compounds FAST if allowed. you
         | let them assume things, they state assumptions as facts, that
         | becomes what other agents read and you get true chaos
         | unchecked.
         | 
         | I started rebuilding claude code from scratch literally because
         | they block us from accessing system prompts and I NEED these
         | agents to stop lying to me about things that are not done or
         | assumed, which highlights the true chaos possible when applied
         | to system critical operations in governance or at scale.
         | 
         | I also built my own tool like codex for managing agent tasks
         | and making this simpler, but getting them to use it without
         | getting confused is still a gap.
         | 
         | Let me know if you have any other questions. I am performing
         | the work of 20 Engineers as of today, rewrote 2 years of back
         | end code that required a team of 2 engineers full time work in
         | 4 weeks by myself with this system... so I am, I guess quite
         | good at it.
         | 
         | I need to push my edges further into this latest tech, have not
         | tried codex cli or the new tool yet.
        
           | IXCoach wrote:
           | Its a total of about 30 snippets, avg 6 paragraphs long, that
           | I have to inject. for each role switch it goes through i have
           | to re inject them.
           | 
           | its a pain but it works.
           | 
           | Even TDD it will hallucinate the mocks without management.
           | and hallucinate the requirements. Each layer has to be
           | checked atomically, but the text expander snippets done right
           | can get it close to 75% right.
           | 
           | My main project faces 5000 users so I cant let the agents run
           | freely, whereas with isolated projects in separate repos I
           | can let them run more freely, then review in gitkraken before
           | committing.
        
             | Rudybega wrote:
             | You could just use something like roo code with custom
             | modes rather than manually injecting them. The orchestrator
             | mode can decide on the other appropriate modes to use for
             | subtasks.
             | 
             | You can customize the system prompts, baseline propmts, and
             | models used for every single mode and have as many or as
             | few as you want.
        
         | arkmm wrote:
         | I think most code these days is boilerplate, though the
         | composition of boilerplate snippets can become something unique
         | and differentiated.
        
         | evilduck wrote:
         | It may depend on what you consider boilerplate. I use them
         | quite a bit for scripting outside of direct product code
         | development. Essentially, AI coding tools have moved this
         | chart's decision making math for me: https://xkcd.com/1205/ The
         | cost to automate manual tasking is now significantly lower so I
         | end up doing more of it.
        
         | lispisok wrote:
         | A lot of people are deeply invested in these things being
         | better than they really are. From the OpenAI's and Google's
         | spending $100s of billions EACH developing LLMs to VC backed
         | startups promising their "AI agent" can replace entire teams of
         | white collar employees. That's why your experience matches mine
         | and every other developer I personally know but you see
         | comments everywhere making much grander claims.
        
           | triMichael wrote:
           | I agree, but I'd add that it's not just the tech giants who
           | want them to be better than they are, but also non-
           | programmers.
           | 
           | IMO LLMs are actually pretty good at writing small scripts.
           | First, it's much more common for a small script to be in the
           | LLM's training data, and second, it's much easier to find and
           | fix a bug. So the LLM actually does allow a non-programmer to
           | write correct code with minimal effort (for some simple
           | task), and then they are blown away thinking writing software
           | is a solved problem. However, these kinds of people have no
           | idea of the difference between a hundred line script where an
           | error is easily found and isn't a big deal and a million line
           | codebase where an error can be invisible and shut everything
           | down.
           | 
           | Worst of all is when the two sides of tech-giants and non-
           | programmers meet. These two sides may sound like opposites
           | but they really aren't. In particular, there are plenty of
           | non-programmers involved at the C-level and the HR levels of
           | tech companies. These people are particularly vulnerable to
           | being wowed by LLMs seemingly able to do complex tasks that
           | in their minds are the same tasks their employees are doing.
           | As a result, they stop hiring new people and tell their
           | current people to "just use LLMs", leading to the current
           | hiring crisis.
        
           | alfalfasprout wrote:
           | TBH, this website in the last few years has attracted an
           | increasingly non-technical audience. And the field, in
           | general, has attracted a lot of less experienced folks that
           | don't understand the implications of what they're doing. I
           | don't mean that as a diss-- but just a reflection of reality.
           | 
           | Indeed, even codex (and i've been using it prior to this
           | release) is not remotely at the level of even a junior
           | engineer outside of a set of tasks.
        
       | energy123 wrote:
       | Where can I read OpenAI's promise that it won't use the repos I
       | upload for training?
        
       | alvis wrote:
       | Is it surprising? Hmm perhaps nope. But is it better than cursor
       | etc? Hmm perhaps it's a wrong question.
       | 
       | Feels like codex is for product managers to fix bugs without
       | touching any developer resources. Then it's insanely surprising!
        
         | gbalduzzi wrote:
         | It sounds nice, but are product managers able to spot
         | regressions or other potential issues (performance, data
         | protection, legal, etc) in the codex result?
        
           | alvis wrote:
           | If codex can analyze the whole code base, I can't see why
           | not? I can even imagine one can set up a CI task that any
           | committed code must pass all sort of legal/data protection
           | requirements too
        
             | kenjackson wrote:
             | Exactly this. In fact the product manager should be the one
             | that knows what the set of checks that need to be done over
             | the code base. You need a dev though to do make sure the
             | last mile is doing what you expect it to do.
        
         | bhl wrote:
         | I've been contracting with a startup. The bottleneck is not the
         | lack of tools; it's agency. There's so much work, it becomes
         | work to assign and organize work.
         | 
         | But now who's going to do that work? Still engineers.
        
       | ilaksh wrote:
       | As someone who works on his own open source agent framework/UI
       | (https://github.com/runvnc/mindroot), it's kind of interesting
       | how announcements from vendors tend to mirror features that I am
       | working on.
       | 
       | For example, in the last month or so, I added a job queue plugin.
       | The ability to run multiple tasks that they demoed today is quite
       | similar. The issue I ran into with users is that without
       | Enterprise plans, complex tasks run into rate limits when trying
       | to run concurrently.
       | 
       | So I am adding an ability to have multiple queues, with each
       | possibly using different models and/or providers, to get around
       | rate limits.
       | 
       | By the way, my system has features that are somewhat similar not
       | only to this tool they are showing but also things like Manus. It
       | is quite rough around the edges though because I am doing 100% of
       | it myself.
       | 
       | But it is MIT Licensed and it would be great if any developer on
       | the planet wanted to contribute anything.
        
       | asadm wrote:
       | Is there an open source version of this? that essentially uses
       | microvms to git clone my repo and essentially run codex-cli or
       | equivalent and sends me a PR.
       | 
       | I made one for github action but it's not as realtime and is 2
       | years old now: https://github.com/asadm/chota
        
         | illnewsthat wrote:
         | I haven't checked in on it recently, but maybe a similar open-
         | source option would be https://github.com/All-Hands-
         | AI/OpenHands
         | 
         | A not open-source option this looks close to is also
         | https://githubnext.com/projects/copilot-workspace (released
         | April 2024, but I'm not sure it's gotten any significant
         | updates since)
        
           | asadm wrote:
           | oh openDevin became openHANDS. Interestingly, I committed the
           | LICENSE file to that repo haha
        
             | tough wrote:
             | did they relicense too w the rename?
        
       | simianwords wrote:
       | I wonder if tools like these are best for semi structured
       | refactors like upgrade to python3, migrate to postgres etc
        
       | btbuildem wrote:
       | > To balance safety and utility, Codex was trained to identify
       | and precisely refuse requests aimed at development of malicious
       | software, while clearly distinguishing and supporting legitimate
       | tasks.
       | 
       | I can't say I am a big fan of neutering these paradigm-shifting
       | tools according to one culture's code of ethics / way of doing
       | business / etc.
       | 
       | One man's revolutionary is another's enemy combatant and all
       | that. What if we need top-notch malware to take down the robot
       | dogs lobbing mortars at our madmaxian compound?!
        
         | amarcheschi wrote:
         | If I had to guess, only for the general public they'll be
         | neutered, not for the 3 letters agencies
        
           | pixl97 wrote:
           | TLA's have very few of their own coders, they contract
           | everything out. Now I'm sure OAI will lend an unrestricted
           | model to groups that pay large private contracts they won't
           | disclose.
        
         | lumenwrites wrote:
         | You gotta think about it in terms of cost vs benefit. How much
         | damage will a malicious AI do, vs how much value will you get
         | out of non-neutered model?
        
         | GolfPopper wrote:
         | > _What if we need top-notch malware to take down the robot
         | dogs lobbing mortars at our madmaxian compound?!_
         | 
         | I wouldn't sweat it. According to it's developers, Codex
         | understands 'malicious software', it has just been trained to
         | say, "But I won't do that" when such requests are made to it.
         | Judging from the recent past [1][2] getting LLMs to bypass such
         | safeguards is pretty easy.
         | 
         | 1.https://hiddenlayer.com/innovation-hub/novel-universal-
         | bypas... 2.https://cyberpress.org/researchers-bypass-
         | safeguards-in-17-p...
        
         | rowanG077 wrote:
         | Agreed, I'm a big proponent that people should be in control of
         | the tools they use. I don't think the approach where there is
         | wise dicator enforcing I can't use my flathead screwdriver to
         | screw down a phillips head screw is good. I think it's actively
         | undermining people.
        
       | scudsworth wrote:
       | pleased to see a paragraph-long comment in the examples. now
       | thats good coding.
        
         | 2OEH8eoCRo0 wrote:
         | More generated slop for a real human to sift through. Can I get
         | an ai summary of that comment?
        
       | kleiba wrote:
       | Just curious: is your company happy sharing their code-base with
       | an AI provider? Or are you using a local installation?
        
         | asadm wrote:
         | why not? OpenAI won't be stupid to look at my code and be that
         | vulnerable legally. It ain't worth it.
        
           | KaiserPro wrote:
           | They literally scraped half of youtube, made a library to
           | extract the audio and released it as whisper.
           | 
           | Of _course_ they are training on your shit.
        
         | bhl wrote:
         | Cursor has enterprise mode which forces a data privacy feature.
        
         | pixl97 wrote:
         | Companies commonly share their code with SAAS providers.
         | Typically they'll have a contract to prevent usage otherwise.
        
         | nmca wrote:
         | It is a cost benefit trade off, as with all things. Benefits
         | look pretty good.
        
           | layer8 wrote:
           | The cost of sharing your code is unknown, though.
        
             | philomath_mn wrote:
             | Under what circumstances would that cost be high? Is OpenAI
             | going to rip off your app? Why would they waste a second on
             | that when there are better models to be built?
        
         | odie5533 wrote:
         | For 99% of companies, their code is worthless to anyone but
         | them.
        
           | manquer wrote:
           | For copying the product / service yes it is not worth much .
           | 
           | However for people trying to compromise your system access to
           | your code can be a valuable asset .The worth of that could be
           | well beyond just enterprise value of the organization , it
           | could people's lives or bring down critical infrastructure.
           | 
           | You don't just have access to code you created and have
           | complete control to. Organizations have vendors providing
           | code(drivers , libraries...) with narrow licenses that
           | prohibit sharing or leaking in anyway. So this type of leak
           | can open you to a lot of liability.
        
       | tough wrote:
       | so i just upgraded to pro plan but yet https://chatgpt.com/codex
       | doesnt work for me and asks me to -try chatgpt pro- and shows me
       | the upsell modal, even if already on the higher tier
       | 
       | sigh
        
         | modeless wrote:
         | You mean Pro? It's only in the $200 Pro tier.
        
           | tough wrote:
           | Yes sorry meant pro,
           | 
           | I just enabled on Settings > Connectors > Github
           | 
           | hoping that makes it work
           | 
           | ... still doesnt work, is it geo-restricted maybe? idk
        
         | rapfaria wrote:
         | They said Plus soon, not today.
        
         | jiocrag wrote:
         | same here. Paying for Pro ($200) but the "try it" link just
         | leads to the Pro sign up page, where it says I'm already on
         | Pro. Hyper intelligent coding agents, but can't make their
         | website work.
        
           | tough wrote:
           | > Hyper intelligent coding agents, but can't make their
           | website work.
           | 
           | I know right
           | 
           | also no human to contact on support... tempted to cancel the
           | sub lol i'll give them 24h
        
         | fear91 wrote:
         | Same here, paying for Pro but I just get redirected to vanilla
         | version...
        
           | piskov wrote:
           | > will be rolling
           | 
           | [?] available now to all pro users
        
             | tough wrote:
             | ok but I baited the hook and now am waiting.
             | 
             | Every -big- release they gatekeep something to pro I pay
             | for it like every 3 months, then cancel after the high
             | 
             | when will i learn
        
         | gizmodo59 wrote:
         | It says "Rolling out to users on the ChatGPT Pro Plan today" So
         | it ll happen throughout the day
        
       | alvis wrote:
       | I used to work for a bank and the legal team used to ping us to
       | make tiny changes to the app for compliance related issues. Now
       | they can fix themselves. I think they'd be very proud and happy
        
         | ajkjk wrote:
         | Hopefully nobody lets legal touch anything without the ability
         | to run the code to test it, plus code reviews. So probably not.
        
         | singularity2001 wrote:
         | that will be an interesting new Bug tracker: anyone in the
         | company will be able to report any bug or add any future
         | request, if the model will be able to solve it automatically
         | perfect otherwise some human might take over. The interesting
         | question then will be what code changes are legal and within
         | the standards of what the company wants. So non-technical
         | code/issue reviewer will become a super important and
         | ubiquitous job.
        
           | SketchySeaBeast wrote:
           | Not just legal/within the standards, but which actually meet
           | the unspoken requirements of the request. "We just need a new
           | checkbox that asks if you're left handed" might seem easy,
           | but then it has ramifications for the Application PDF that
           | gets generated, as well as any systems downstream, and maybe
           | it requires a data conversion of some sort somewhere. I know
           | that the PO's I work with miss stuff or assume that the
           | request will just have features by default.
        
         | asdev wrote:
         | I promise you the legal team is not pushing any code changes
        
       | skovati wrote:
       | I'm curious how many ICs are truly excited about these
       | advancements in coding agents. It seems to me the general trend
       | is we become more like PMs managing agents and reviewing PRs, all
       | for the sake of productivity gains.
       | 
       | I imagine many engineers are like myself in that they got into
       | programming because they liked tinkering and hacking and
       | implementation details, all of which are likely to be abstracted
       | over in this new era of prompting.
        
         | awestroke wrote:
         | At the end of the day, it's your job to deliver value. If a
         | tool allows you to deliver more faster, without sacrificing
         | quality, it's your responsibility to use that tool. You'll just
         | have to make sure you can fully take responsibility for the end
         | deliverables. And these tools are not only useful for writing
         | the final code
        
           | enjoylife wrote:
           | > these tools are not only useful for writing the final code
           | 
           | This sparked a thought in how a large part of the job is
           | often the work needed to demonstrate impact. I think this
           | aspect is often overlooked by some of the good engineers not
           | yet taking advantage of the AI tooling. LLM loops may not yet
           | be good enough to produce shippable code by themselves, but
           | they sure are capable to help reduce the overhead of these up
           | and out communicative tasks.
        
             | tough wrote:
             | you mean like hacking a first POC with AI to sell a
             | product/feature internally to get buy-in from the rest of
             | the team before actually shipping production version of it?
        
           | whyowhy3484939 wrote:
           | It's actually not. My job description does not say "deliver
           | value" and nobody talks about my work like that so I'm not
           | quite sure what to make of that.
           | 
           | > without sacrificing quality
           | 
           | Right..
           | 
           | > it's your responsibility to use that tool
           | 
           | Again, it's actually not. It's my responsibility to do my
           | job, not to make my boss' - or his boss' - car nicer. I know
           | that's what we all know will create "job security" but let's
           | not conflate these things. My job is to do my end of the
           | bargain. My boss' job is paying me for doing that. If he
           | deems it necessary to force me to use AI bullshit, I will of
           | course, but it is definitely not my responsibility to do so
           | autonomously.
        
           | blibble wrote:
           | > At the end of the day, it's your job to deliver value. If a
           | tool allows you to deliver more faster, without sacrificing
           | quality
           | 
           | I guess that's LLMs ruled out then
        
         | kridsdale3 wrote:
         | I do feel that way, so I'll still do bespoke creation when I
         | want to. But this is like a sewing machine. My job is to design
         | fashion, and a whole line of it. I can do that when a machine
         | is making the stitches instead of my using a needle in hand.
        
         | manojlds wrote:
         | We (dare I say we instead of I) like talking to computers and
         | AI is another computer you talk with. So I am still all
         | excited. It's people that I want to avoid :)
        
         | qntmfred wrote:
         | people can still write code by hand for fun
         | 
         | people who want to make software that enables people to
         | accomplish [task] will get the software they need quicker.
        
         | davedx wrote:
         | I think the death of our craft is around the corner. It doesn't
         | fill me with joy.
        
           | evantbyrne wrote:
           | Software engineering requires a fair amount of intelligence,
           | so if these tools ever get to replacement levels of quality
           | then it's not just developers that will be out of jobs. ARC-
           | AGI-2, the countless anecdotes from professionals I've seen
           | across the industry, and personal experience all very clearly
           | point to a significant gap between the tools that exist today
           | and general intelligence. I would recommend keeping an eye on
           | improvements just because of the sheer capital investments
           | going into it, but I won't be losing any sleep waiting for
           | the rapture.
        
         | ramoz wrote:
         | I see it differently. Like a kid with legos.
         | 
         | We had to tinker piece by piece to build a miniature castle.
         | Over many hours.
         | 
         | Now I can tinker concept by concept, and build much larger
         | castles, much faster. Like waving a wand, seeing my thoughts
         | come to fruition in near real time.
         | 
         | No vanity lost in my opinion. Possibly more to be gained.
        
           | CapcomGo wrote:
           | I think the bigger issue with this is that the number of
           | developer jobs will shrink.
        
           | nluken wrote:
           | I think there's a disconnect between what you and the person
           | you're replying to are defining as "tinkering". Your
           | conception of it seems more focused on the end product when,
           | to use your analogy, the original comment seems unconcerned
           | with the size of castles.
           | 
           | If you derive enjoyment from actually assembling the castle,
           | you lose out on that by using the wand that makes it happen
           | instantly. Sure wand's castles may be larger, but you don't
           | put a Lego castle together for the finished product.
        
           | lherron wrote:
           | Factorio blueprints in action.
        
           | whyowhy3484939 wrote:
           | > build much larger castles, much faster
           | 
           | See that never was the purpose.. going bigger and faster,
           | towards what exactly? Chaos? By the way we never managed to
           | fully tackle manual software development by trained
           | professionals and we now expect Shangri-La by throwing
           | everything and the kitchen sink into giant inscrutable
           | matrices. This time by amateurs as well. I'm sure this will
           | all turn out very well and very, very productive.
        
         | chilmers wrote:
         | While I share your reservations, how many millions of people
         | have experienced the exact same disruption to their jobs and
         | industries because of software that we, software engineers,
         | have created? It's a bit too late, and a touch hypocritical,
         | for us to start complaining about technology now it is
         | disrupting our way of working in a way we don't like.
        
         | orange_puff wrote:
         | I used to think this way too. Here are a few ways I've tried to
         | re frame things that has helped.
         | 
         | 1. When I work on side projects and use AI, sometimes I wonder
         | "what's the point if I am just copy / pasting code? I am not
         | learning anything" but what I have come to realize is building
         | apps with AI assistance is the skill that I am learning, rather
         | than writing code per se as it was a few years ago.
         | 
         | 2. I work in high scale distributed computing, so I am still
         | presented with ample opportunities to get very low level, which
         | I love. I am not sure how much I care about writing code per se
         | anymore. Working with AI still is tinkering, it has not changed
         | that much for me. It is quite different, but the underlying fun
         | parts are still present.
        
       | simianwords wrote:
       | Does any one how the quality drops with size of codebase?
        
       | yanis_t wrote:
       | So it's looking like it's only running in the cloud, that is it
       | will push commits to my remote repo before I have a chance to see
       | if it works?
       | 
       | When I'm using aider, after it make a commit what I do, I then
       | immediately run git reset HEAD^ and then git diff (actually I use
       | github desktop client to see the diff) to evaluate what exactly
       | it did, and if I like it or not. Then I usually make some
       | adjustments and only after that commit and push.
        
         | flakiness wrote:
         | You can think of this as a managed (cloud) version of their
         | codex command line tool, which runs locally on your laptop.
         | 
         | The secret sauce here seems like their new model, but I expect
         | it to come to API at some point.
        
         | codemac wrote:
         | watch the live stream, it shows you the diff as the completed
         | task, you decide whether or not to generate a github pr when
         | you see the diff.
        
         | danielbln wrote:
         | You may want to pass --no-auto-commits to Aider if you peel
         | them off HEAD afterwards anyway.
        
       | adamTensor wrote:
       | not buying windsurf then???
        
         | motoxpro wrote:
         | This would be the why of that acquisition as this needs a more
         | integrated UI. Guessing by the speed at which this came out,
         | this was in the works long before that acquisition.
        
           | adamTensor wrote:
           | it is not even clear *if* they are going to buy windsurf at
           | all. And thats a big if. This might've just been the 'why'
           | that deal is not happening.
        
             | shmoogy wrote:
             | This probably came out to beat Google I/O or something
             | similar - odd Friday release otherwise.
        
       | ianbutler wrote:
       | Im super curious to see how this actually does at finding
       | significant bugs, we've been working in the space on
       | https://www.bismuth.sh for a while and one of the things we're
       | focused on is deep validation of the code being outputted.
       | 
       | There's so many of these "vibe coding" tools and there has to be
       | real engineering rigor at some point. I saw them demo "find the
       | bug" but the bugs they found were pretty superficial and thats
       | something we've seen in our internal benchmark from both Devin
       | and Cursor. A lot of noise and false positives or superficial
       | fixes.
        
       | orliesaurus wrote:
       | Why hasn't Github released this? Why it's OpenAI releasing this?!
        
         | adpirz wrote:
         | It's on their roadmap: https://github.blog/news-
         | insights/product-news/github-copilo...
         | 
         | But they aren't moving nearly as fast as OpenAI. And it remains
         | to be seen if first mover will mean anything.
        
         | taytus wrote:
         | Github moves too slow, and OpenAI moves too fast.
        
         | danielbln wrote:
         | GitHub has released this, it's called Copilot Agent.
        
       | johnjwang wrote:
       | Some engineers on my team at Assembled and I have been a part of
       | the alpha test of Codex, and I'll say it's been quite impressive.
       | 
       | We've long used local agents like Cursor and Claude Code, so we
       | didn't expect too much. But Codex shines in a few areas:
       | 
       | Parallel task execution: You can batch dozens of small edits
       | (refactors, tests, boilerplate) and run them concurrently without
       | context juggling. It's super nice to run a bunch of tasks at the
       | same time (something that's really hard to do in Cursor, Cline,
       | etc.)
       | 
       | It kind of feels like a junior engineer on steroids, you just
       | need to point it at a file or function, specify the change, and
       | it scaffolds out most of a PR. You still need to do a lot of work
       | to get it production ready, but it's as if you have an infinite
       | number of junior engineers at your disposal now all working on
       | different things.
       | 
       | Model quality is good, but hard to say it's that much better than
       | other models. In side-by-side tests with Cursor + Gemini 2.5-pro,
       | naming, style and logic are relatively indistinguishable, so
       | quality meets our bar but doesn't yet exceed it.
        
         | fourside wrote:
         | > You still need to do a lot of work to get it production
         | ready, but it's as if you have an infinite number of junior
         | engineers at your disposal now all working on different things.
         | 
         | One issue with junior devs is that because they're not fully
         | autonomous, you have to spend a non trivial amount of time
         | guiding them and reviewing their code. Even if I had easy
         | access to a lot of them, pretty quickly that overhead would
         | become the bottleneck.
         | 
         | Did you think that managing a lot of these virtual devs could
         | get overwhelming or are they pretty autonomous?
        
           | fabrice_d wrote:
           | They wrote "You still need to do a lot of work to get it
           | production ready". So I would say it's not much better than
           | real colleagues. Especially since junior devs will improve to
           | a point they don't need your hand holding (remember you also
           | were a junior at some point), which is not proven will happen
           | with AI tools.
        
             | bmcahren wrote:
             | Counter-point A: AI coding assistance tools are rapidly
             | advancing at a clip that is inarguably faster than humans.
             | 
             | Counter-point B: AI does not get tired, does not need
             | space, does not need catering to their experience. AI is
             | fine being interrupted and redirected. AI is fine spending
             | two days on something that gets overwritten and thrown away
             | (no morale loss).
        
               | HappMacDonald wrote:
               | Counter-counter-point A: If I work with a human Junior
               | and they make an error or I familiarize them with any
               | quirk of our workflow, and I correct them, they will
               | recall that correction moving forward. An AI assistant
               | either will not remember 5 minutes later (in a different
               | prompt on a related project) and repeat the mistake, or
               | I'll have to take the extra time to code some reminder
               | into the system prompt for every project moving forward.
               | 
               | Advancements in general AI knowledge over time will not
               | correlate to improvements in remembering any matters as
               | colloquial as this.
               | 
               | Counter-counter-point B: AI _absolutely_ needs catering
               | to their experience. Prompter must always learn how to
               | phrase things so that the AI will understand them, adjust
               | things when they get stuck in loops by removing confusing
               | elements from the prompt, etc.
        
               | SketchySeaBeast wrote:
               | I find myself thinking about juniors vs AI as babies vs
               | cats. A cat is more capable sooner, you can trust it when
               | you leave the house for two hours, but it'll never grow
               | past shitting in a box and needing to be fed.
        
           | rfoo wrote:
           | You don't need to be nice to your virtual junior devs. Saves
           | quite a lot time too.
           | 
           | As long as I spend less time reviewing and guiding than doing
           | it myself it's a win for me. I don't have any fun doing these
           | things and I'd rather yelling at a bunch of "agents". For
           | those who enjoy doing bunch of small edits I guess it's the
           | opposite.
        
             | HappMacDonald wrote:
             | I'm definitely wary of the concept of dismissing courtesy
             | when working with AI agents, because I certainly don't want
             | to lose that habit when I turn around and have to interact
             | with humans again.
        
         | strangescript wrote:
         | it feels like openai are at a ceiling with their models, codex1
         | seems to be another RLHF derivative from the same base model.
         | You can see this in their own self reported o3-high comparison
         | where at 8 tries they converge at the same accuracy.
         | 
         | It also seems very telling they have not mentioned o4-high
         | benchmarks at all. o4-mini exists, so logically there is an o4
         | full model right?
        
           | aorobin wrote:
           | Seems likely that they are waiting to release o4 full results
           | until the gpt-5 release later this year, presumably because
           | gpt-5 is bundled with a roughly o4 level reasoning
           | capability, and they want gpt-5 to feel like a significant
           | release.
        
             | losvedir wrote:
             | Do you still think there will be a gpt-5? I thought the
             | consensus was GPT-5 never really panned out and was
             | released with little fanfare as 4.1.
        
         | NewEntryHN wrote:
         | The advantage of Cursor is the reduced feedback loop where you
         | watch it live and can intervene at any moment to steer it in
         | the right direction. Is Codex such a superior model that it
         | makes sense to take the direction of a mostly background agent,
         | on which you seemingly have a longer feedback loop?
        
         | woah wrote:
         | > Parallel task execution: You can batch dozens of small edits
         | (refactors, tests, boilerplate) and run them concurrently
         | without context juggling. It's super nice to run a bunch of
         | tasks at the same time (something that's really hard to do in
         | Cursor, Cline, etc.)
         | 
         | > It kind of feels like a junior engineer on steroids, you just
         | need to point it at a file or function, specify the change, and
         | it scaffolds out most of a PR. You still need to do a lot of
         | work to get it production ready, but it's as if you have an
         | infinite number of junior engineers at your disposal now all
         | working on different things.
         | 
         | What's the benefit of this? It sounds like it's just a gimmick
         | for the "AI will replace programmers" headlines. In reality,
         | LLMs complete their tasks within seconds, and the time
         | consuming part is specifying the tasks and then reviewing and
         | correcting them. What is the point of parallelizing the fastest
         | part of the process?
        
           | ctoth wrote:
           | > Each task is processed independently in a separate,
           | isolated environment preloaded with your codebase. Codex can
           | read and edit files, as well as run commands including test
           | harnesses, linters, and type checkers. Task completion
           | typically takes between 1 and 30 minutes, depending on
           | complexity, and you can monitor Codex's progress in real
           | time.
        
           | johnjwang wrote:
           | In my experience, it still does take quite a bit of time
           | (minutes) to run a task on these agentic LLMs (especially
           | with the latest reasoning models), and in Cursor / Cline /
           | other code editor versions of AI, it's enough time for you to
           | get distracted, lose context, and start working on another
           | task.
           | 
           | So the benefit is really that during this "down" time, you
           | can do multiple useful things in parallel. Previously, our
           | engineers were waiting on the Cursor agent to finish, but the
           | parallelization means you're explicitly turning your brain
           | off of one task and moving on to a different task.
        
             | woah wrote:
             | In my experience in Cursor with Claude 3.5 and Gemini 2.5,
             | if an agent has run for more than a minute it has usually
             | lost the plot. Maybe model use in Codex is a new breed?
        
               | odie5533 wrote:
               | It depends what level you ask them to work on, but I
               | agree, all of my agent coding is active and completed in
               | usually <15 seconds.
        
           | kfajdsl wrote:
           | A single response can take a few seconds, but tasks with
           | agentic flows can be dozens of back and forths. I've had a
           | fairly complicated Roo Code task take 10 minutes (multiple
           | subtasks).
        
         | Jimmc414 wrote:
         | > We've long used local agents like Cursor and Claude Code, so
         | we didn't expect too much.
         | 
         | If you don't mind, what were the strengths and limitations of
         | Claude Code compared to Codex? You mentioned parallel task
         | execution being a standout feature for Codex - was this a
         | particular pain point with Claude Code? Any other insights on
         | how Claude Code performed for your team would be valuable. We
         | are pleased with Claude Code at the moment and were a bit
         | underwhelmed by comparable Codex CLI tool OAI released earlier
         | this month.
        
           | t_a_mm_acq wrote:
           | Post realizing CC can operate same code base, same file tree
           | on different terminals instances, it's been a significant
           | unlock for us. Most devs have 3 running concurrently. 1.
           | master task list + checks for completion on tasks. 2.
           | operating on current task + documentation. 3. side quests,
           | bugs, additional context.
           | 
           | rinse and repeat once task done, update #1 and cycle again.
           | Add in another CC window if need more tasks concurrently.
           | 
           | downside is cost but if not an issue, it's great for getting
           | stuff done across distributed teams..
        
             | naiv wrote:
             | do you have then instance 2 and 3 listening to instance 1
             | with just a prompt? or how does this work?
        
               | naiv wrote:
               | to answer my own questions , it is actually laid out in
               | chapter 6 of
               | https://www.anthropic.com/engineering/claude-code-best-
               | pract...
        
         | criddell wrote:
         | If you aren't hiring junior engineers to do these kinds of
         | things, where do you think the senior engineers you need in the
         | future will come from?
         | 
         | My kid recently graduated from a very good school with a degree
         | in computer science and what she's told me about the job market
         | is scary. It seems that, relatively speaking, there's a lot of
         | postings for senior engineers and very little for new grads.
         | 
         | My employer has hired recently and the flood of resumes after
         | posting for a relatively low level position was nuts. There was
         | just no hope of giving each candidate a fair chance and that
         | really sucks.
         | 
         | My kid's classmates who did find work did it mostly through
         | personal connections.
        
           | echelon wrote:
           | The never ending march of progress.
           | 
           | It's probably over for these folks.
           | 
           | There will likely(?, hopefully?) be new adjacent gradients
           | for people to climb.
           | 
           | In any case, I would worry more about your own job prospects.
           | It's coming for everyone.
        
             | voidspark wrote:
             | It's his daughter. He is worried about his daughter first
             | and foremost. Weird reply.
        
               | echelon wrote:
               | I'm sorry. I was skimming. I had no idea he mentioned his
               | kid.
               | 
               | I was running a quick errand between engineering meetings
               | and saw the first few lines about hiring juniors, and I
               | wrote a couple of comments about how I feel about all of
               | this.
               | 
               | I'm not always guilty of skimming, but today I was.
        
           | hintymad wrote:
           | > If you aren't hiring junior engineers to do these kinds of
           | things, where do you think the senior engineers you need in
           | the future will come from?
           | 
           | Unfortunately this is not how companies think. I read
           | somewhere more than 20 years ago about outsourcing and
           | manufacturing offshoring. The author basically asked the
           | same: if we move out the so-called low-end jobs, where do we
           | think we will get the senior engineers? Yet companies
           | continued offshoring, and the western lost talent and know-
           | how, while watching our competitor you-know-who become the
           | world leader in increasingly more industries.
        
             | echelon wrote:
             | It's happening to Hollywood right now. In the past three
             | years, since roughly 2022, the majority of IATSE folks
             | (film crew, grips, etc.) have seen their jobs disappear to
             | Eastern Europe where the labor costs one tenth of what it
             | does here. And there are no rules for maximum number of
             | consecutive hours worked.
        
             | lurking_swe wrote:
             | ahh, the classic "i shall please my investors next quarter
             | while ignoring reality, so i can disappoint my shareholders
             | in 10 years". lol.
             | 
             | As you say, happens all the time. Also doesn't make sense
             | because so few people are buying individual stocks anyway.
             | Goal should be to consistently outperform over the long
             | term. Wall street tends to be very myopic.
             | 
             | Thinking long term is a hard concept for the bean counters
             | at these tech companies i guess...
        
               | miohtama wrote:
               | What then ends up happening is that companies how fall
               | behind in R&D eventually lose market share and get
               | replaced by more agile competitors.
               | 
               | But this does not happen in industry verticals that are
               | protected by regulation (banks) or national interest
               | (Boring).
        
           | kypro wrote:
           | > If you aren't hiring junior engineers to do these kinds of
           | things, where do you think the senior engineers you need in
           | the future will come from?
           | 
           | They'll probably just need to learn for longer and if
           | companies ever get so desperate for senior engineers then
           | just take the most able/experienced junior/mid level dev.
           | 
           | But I'd argue before they do that if companies can't find
           | skilled labour domestically they should consider bringing
           | skilled workers from abroad. There are literally hundreds of
           | millions of Indians who got connected to the internet over
           | the last decade. There's no reason a company should struggle
           | to find senior engineers.
        
             | oytis wrote:
             | So basically all education facilities should go abroad too
             | if no one needs Western fresh grads. Will provide a lot of
             | shareholder value, but there are some externalities too.
        
             | rboyd wrote:
             | India coming online just in time for AI is awkward
        
           | slater wrote:
           | > If you aren't hiring junior engineers to do these kinds of
           | things, where do you think the senior engineers you need in
           | the future will come from?
           | 
           | Money number must _always_ go up. Hiring people costs money.
           | "Oh hey I just read this article, sez you can have A.I. code
           | your stuff, for pennies?"
        
           | ilaksh wrote:
           | I don't think jobs are necessarily a good plan at all
           | anymore. Figure out how to leverage AIs and robots as cheap
           | labor, and sell services or products. But if someone is
           | trying to get a job, I get the impression that networking
           | helps more than anything.
        
             | sandspar wrote:
             | Yeah, the value of the typical job application meta is
             | trending to zero very quickly. Entrepreneurship has a steep
             | learning curve; you should start learning it as soon as
             | possible. Don't waste your time learning to run a straight
             | line - we're entering off-road territory.
        
           | DGAP wrote:
           | There aren't going to be senior engineers in the future.
        
           | _bin_ wrote:
           | This is a bit of a game theory problem. "Training senior
           | engineers" is an expensive and thankless task: you bear
           | essentially all the cost, and most of the total benefit
           | accrues to others as a positive externality. Griping at
           | companies that they should undertake to provide this positive
           | externality isn't really a constructive solution.
           | 
           | I think some people are betting on the fact that AI can
           | replace junior devs in 2-5 years and seniors in 10-20, when
           | the old ones are largely gone. But that's sort of beside the
           | point as far as most corporate decision-making.
        
             | nopinsight wrote:
             | With Agentic RL training and sufficient data, AI operating
             | at the level of average _senior engineers_ should become
             | plausible in a couple to a few years.
             | 
             | Top-tier engineers who integrate a deep understanding of
             | business and user needs into technical design will likely
             | be safe until we get full-fledged AGI.
        
               | yahoozoo wrote:
               | Why in a few years? What training data is missing that we
               | can't have senior level agents today?
        
             | al_borland wrote:
             | That sounds like a dangerous bet.
        
               | SketchySeaBeast wrote:
               | Sounds like a bet a later CEO will need to check.
        
               | _bin_ wrote:
               | As I see it, it's actually the only safe bet.
               | 
               | Case 1: you keep training engineers.
               | 
               | Case 1.1: AGI soon, you don't need juniors or seniors
               | besides a very few. You cost yourself a ton of money that
               | competitors can reinvest into R&D, use to undercut your
               | prices, or return to keep their investors happy.
               | 
               | Case 1.2: No AGI. Wages rise, a lot. You must remain in
               | line with that to avoid losing those engineers you
               | trained.
               | 
               | Case 2: You quit training juniors and let AI do the work.
               | 
               | Case 2.1: AGI soon, you have saved yourself a bundle of
               | cash and remain mostly in in line with the market.
               | 
               | Case 2.2: no AGI, you are in the same bidding war for
               | talent as everyone else, the same place you'd have been
               | were you to have spent all that cash to train engineers.
               | You now have a juicier balance sheet with which to enter
               | this bidding war.
               | 
               | The only way out of this, you can probably see, is some
               | sort of external co-ordination, as is the case with most
               | of these situations. The high-EV move is to quit training
               | juniors, by a mile, independently of whether AI can
               | replace senior devs in a decade.
        
               | spongebobstoes wrote:
               | An interesting thing to consider is that Codex might get
               | people to be better at delegating, which might improve
               | the effectiveness of hiring junior engineers. Because the
               | senior engineers will have better skills at delegating,
               | leading to a more effective collaboration.
        
               | al_borland wrote:
               | You're looking at it from the point of view of an
               | individual company. I'm seeing it as a risk for the
               | entire industry.
               | 
               | Senior engineers are already very well paid. Wages rising
               | a lot from where they already are, while companies
               | compete for a few people, and those who can't afford it
               | need to lean on AI or wait 10+ years for someone to
               | develop with equivalent expertise... all of this sounds
               | bad for the industry. It's only good for the few senior
               | engineers that are about to retire, and the few who went
               | out of their way to not use AI and acquire actual skills.
        
             | dorian-graph wrote:
             | This hyper-fixation on replacing engineers in writing code
             | is hilarious, and dangerous, to me. Many people, even in
             | tech companies, have no idea how software is built,
             | maintained, and run.
             | 
             | I think instead we should focus on getting rid of managers
             | and product owners.
        
               | jchanimal wrote:
               | The real judge will be survivorship bias and as a betting
               | man, I might think product owners are the ones with the
               | entrepreneurial spirit to make it to the other side.
        
               | MoonGhost wrote:
               | I've worked for a company which turned from startup to
               | this. Product owners had no clue what they own. And no
               | brain capacity to suggest something useful. They were
               | just taken from the street at best, most likely had
               | relatives' helping hands. In a couple of years company
               | probably tripled manages headcount. It didn't help.
        
               | QuadmasterXLII wrote:
               | it's obviously intensely correlated: the vast majority of
               | scenarios either both are replaced or neither
        
               | odie5533 wrote:
               | As a dev, if you try taking away my product owners I will
               | fight you. Who am I going to ask for requirements and
               | sign-offs, the CEO?
        
               | oytis wrote:
               | Your architect, principal engineer etc. (one spot-on job
               | title I've seen is "product architect"), who in turn
               | talks to the senior management. Basically an engineer
               | with a talent and experience for building products rather
               | than a manager with superficial understanding of
               | engineering. I think the most ambitious teams have
               | someone like this on top - or at least around
        
               | deadmutex wrote:
               | Perhaps the role will merge into one, and will replace a
               | good chunk of those jobs.
               | 
               | E.g.:
               | 
               | If we have 10 PMs and 90 devs today, that could be
               | hypothetically be replace by 8 PM+Dev, 20 specialized
               | devs, and 2 specialized PMs in the future.
        
             | hooverd wrote:
             | I think it'll be great if you're working in software not
             | for a software company.
        
           | sam0x17 wrote:
           | Hiring of juniors is basically dead these days and it has
           | been like this for about 10 years and I hate it. I remember
           | when I was a junior in 2014 there were actually startups who
           | would hire cohorts of juniors (like 10 at a time, fresh out
           | of CS degree sort of folks with almost no applied coding
           | experience) and then train them up to senior for a few years,
           | and then a small number will stay and the rest will go
           | elsewhere and the company will hire their next batch of
           | juniors. Now no one does this, everyone wants a senior no
           | matter how simple the task. This has caused everyone in the
           | industry to stuff their resume, so you end up in a situation
           | where companies are looking for 10 years of experience in
           | ecosystems that are only 5 years old.
           | 
           | That said, back in the early 00s there was much more of a
           | culture of everyone is expected to be self-taught and doing
           | real web dev probably before they even get to college, so by
           | the time they graduate they are in reality quite senior. This
           | was true for me and a lot of my friends, but I feel like
           | these days there are many CS grads who haven't done a lot of
           | applied stuff. But at the same time, to be fair, this was a
           | way easier task in the early 00s because if you knew
           | JS/HTML/CSS/SQL, C++ and maybe some .NET language that was
           | pretty much it you could do everything (there were virtually
           | no frameworks), now there are thousands of frameworks and
           | languages and ecosystems and you could spend 5+ years
           | learning any one of them. It is no longer possible for one
           | person to learn all of tech, people are much more specialized
           | these days.
           | 
           | But I agree that eventually someone is going to have to start
           | hiring juniors again or there will be no seniors.
        
             | dgb23 wrote:
             | I recently read an article about the US having a relatively
             | weak occupational training.
             | 
             | To contrast, CH and GER are known to have very robust and
             | regulated apprenticeship programs. Meaning you start
             | working at a much earlier age (16) and go to vocational
             | school at the same time for about 4 years. This path is
             | then supported with all kinds of educational stepping
             | stones later down the line.
             | 
             | There are many software developers who went that route in
             | CH for example, starting with an application development
             | apprenticeship, then getting to technical college in their
             | mid 20's and so on.
             | 
             | I think this model has a lot of advantages. University is
             | for kids who like school and the academic approach to
             | learning. Apprenticeships plus further education or an
             | autodidactic path then casts a much broader net, where you
             | learn practical skills much earlier.
             | 
             | There are several advantages and disadvantages of both
             | paths. In summary I think the academic path provides deeper
             | CS knowledge which can be a force multiplier. The
             | apprenticeship path leads to earlier high productivity and
             | pragmatism.
             | 
             | My opinion is that in combination, both being strongly
             | supported paths, creates more opportunities for people and
             | strengthens the economy as a whole.
        
               | oytis wrote:
               | I know about this system, but I am not convinced it can
               | work in such a dynamic field as software. When tools
               | change all the time, you need strong fundamentals to stay
               | afloat - which is what universities provide.
               | 
               | Vocational training focusing on immediate fit for the
               | market is great for companies that want to extract
               | maximal immediate value from labour for minimal cost, but
               | longer term is not good for engineers themselves.
        
             | thomasahle wrote:
             | > But at the same time, to be fair, this was a way easier
             | task in the early 00s
             | 
             | The best junior I've hired was a big contributor to an open
             | source library we were starting to use.
             | 
             | I think there's still lots of opportunity for honing your
             | skill, and showing it off, outside of schools.
        
           | oytis wrote:
           | I guess the industry leaders think we'll not need senior
           | engineers either as capabilities evolve.
           | 
           | But also, I think this underestimates significantly what
           | junior engineers do. Junior engineers are people who have
           | spent 4 to 6 years receiving a specialised education in a
           | university - and they normally need to be already good at
           | school math. All they lack is experience applying this
           | education on a job - but they are professionals - educated,
           | proactive and mostly smart.
           | 
           | The market is tough indeed, and as much it is tough for a
           | senior engineer like myself, I don't envy the current cohort
           | of fresh grads. It being tough is only tangentially related
           | to the AI though. Main factor is the general economic
           | slowdown, with AI contributing by distracting already scarce
           | investment from non-AI companies and producing a lot of
           | uncertainty in how many and what employees companies will
           | need in the future. Their current capabilities are nowhere
           | near to having a real economic impact.
           | 
           | Wish your kid and you a lot of patience, grit and luck.
        
           | voidspark wrote:
           | This is exactly the problem. The top level executives are
           | setting up to retire with billions in the bank, while the
           | workers develop their own replacements before they retire
           | with millions in the bank. Senior developers will be mostly
           | obsolete too.
           | 
           | I have mentored junior developers and found it to be a
           | rewarding part of the job. My colleagues mostly ignore
           | juniors, provide no real guidance, couldn't care less. I see
           | this attitude from others in the comments here, relieved they
           | don't have to face that human interaction anymore. There are
           | too many antisocial weirdos in this industry.
           | 
           | Without a strong moral and cultural foundation the AGI
           | paradigm will be a dystopia. Humans obsolete across all
           | industries.
        
             | criddell wrote:
             | > I have mentored junior developers and found it to be a
             | rewarding part of the job.
             | 
             | That's really awesome. I hope my daughter finds a job
             | somewhere that values professional development. I'd hate
             | for her to quit the industry before she sees just how
             | interesting and rewarding it can be.
             | 
             | I didn't have many mentors when starting out, but the ones
             | I had were so unbelievably helpful both professionally and
             | personally. If I didn't have their advice and
             | encouragement, I don't think I'd still be doing what I'm
             | doing.
        
               | aprdm wrote:
               | She can try to reach out to possible mentors / people on
               | Linkedin. A bit like cold calling. It works, people
               | (usually) want to help and don't mind sharing their
               | experiences / tips. I know I have helped many random
               | linedin cold messages from recent grads/people in uni
        
             | oytis wrote:
             | > I have mentored junior developers and found it to be a
             | rewarding part of the job.
             | 
             | Can totally relate. Unfortunately the trend for all-senior
             | teams and companies has started long before ChatGPT, so the
             | opportunities have been quite scarce, at least in a
             | professional environment.
        
           | layer8 wrote:
           | I share your worries, but the time horizon for the supply of
           | senior engineers drying up is just too long for companies to
           | care at this time, in particular if productivity keeps
           | increasing. And it's completely unclear what the state of the
           | art will be in 20 years; the problem might mostly solve
           | itself.
        
           | johnjwang wrote:
           | To be clear, we still hire engineers who are early in their
           | careers (and we've found them to be some of the best folks on
           | our team).
           | 
           | All the same principles apply as before: smart, driven, high
           | ownership engineers make a huge difference to a company's
           | success, and I find that the trend is even stronger now than
           | before because of all the tools that these early career
           | engineers have access to. Many of the folks we've hired have
           | been able to spin up on our codebase much faster than in the
           | past.
           | 
           | We're mainly helping them develop taste for what good code /
           | good practices look like.
        
             | criddell wrote:
             | > we still hire engineers who are early in their careers
             | 
             | That's really great to hear.
             | 
             | Your experience that a new engineer equipped with modern
             | tools is more effective and productive than in the past is
             | important to highlight. It makes total sense.
        
             | startupsfail wrote:
             | More recent models are not without drive and are not stupid
             | either.
             | 
             | There's still quite a bit of a gap in terms of trust.
        
           | dgb23 wrote:
           | AI might play a role here. But there's also a lot of economic
           | uncertainty.
           | 
           | It's not long ago when the correction of the tech job market
           | started, because it got blown up during and after covid. The
           | geopolitical situation is very unstable.
           | 
           | I also think there is way too much FUD around AI, including
           | coding assistants, than necessary. Typically coming either
           | from people who want to sell it or want to get in on the
           | hype.
           | 
           | Things are shifting and moving, which creates uncertainty.
           | But it also opens new doors. Maybe it's a time for risk
           | takers, the curious, the daring. Small businesses and new
           | kinds of services might rise from this, like web development
           | came out of the internet revolution. To me, it seems like
           | things are opening up and not closing down.
           | 
           | Besides that, I bet there are more people today who write,
           | read or otherwise deal directly with assembly code than ever
           | before, even though we had higher level languages for many
           | decades.
           | 
           | As for the job market specifically: SWE and CS (adjacent)
           | jobs are still among the fastest growing, coming up in all
           | kinds of lists.
        
           | ikiris wrote:
           | Much like everything in the economy currently, externalities
           | are to be shouldered by "others" and if there is no "other"
           | in aggregate, well, it's not our problem. Yet.
        
           | polskibus wrote:
           | I think the bigger problem, that started around 2022 is much
           | lower volume of jobs in software development. Projects were
           | shutdown, funding was retracted, even the big wave of
           | migrations to the cloud died down.
           | 
           | Today startups mostly wrap LLMs as this is what VCs expect.
           | Larger companies have smaller IT budgets than before
           | (adjusted for inflation). This is the real problem that
           | causes the jobs shortage.
        
           | geekraver wrote:
           | Same, mine is about to graduate with a CS masters from a
           | great school. Couldn't get any internships, and is now
           | incredibly negative about ever being able to find work, which
           | doesn't help. We're pretty much looking at minimum wage jobs
           | doing tech support for NGOs at this point (and the current
           | wave of funding cuts from Federal government for those kind
           | of orgs is certainly not going to help with that).
        
             | MoonGhost wrote:
             | With so many graduates looking for a job why don't they
             | bang together and do something. If not for money then just
             | to show off their skills, something to put in the resume.
             | 
             | It's not going to get any easier in next next few years, I
             | think. Till the point when fresh grad using AI can make
             | something valuable. After that it will be period when
             | anybody can just ask AI to do something and it will find
             | soft in its library or write from scratch. In long terms,
             | 10 years may be, humanity probably will not need this many
             | developers. There will be split like in games industry:
             | tools/libs developers and product devs/artists/designers.
             | With the majority in second category.
        
           | atonse wrote:
           | I feel for your daughter. I can totally see how tools like
           | this will destroy the junior job market.
           | 
           | But I also wonder (I'm thinking out loud here, so pardon the
           | raw unfiltered thoughts), if being a junior today is
           | unrecognizable.
           | 
           | Like for example, that whatever a "junior" will be now, will
           | have to get better at thinking at a higher level, rather than
           | the minute that we did as juniors (like design patterns and
           | all that stuff).
           | 
           | So maybe the levels of abstraction change?
        
           | FilosofumRex wrote:
           | > If you aren't hiring junior engineers..., where do you
           | think the senior engineers you need in the future will come
           | from?
           | 
           | This problem might be new to CS, but has happened to other
           | engineers, notably to MechE in the 90's, ChemE in 80's,
           | Aerospace in 70's, etc... due to rapid pace of automation and
           | product commoditization.
           | 
           | The senior jobs will disappear too, or offshored to a
           | developing country: Exxon (India 152 - 78 US)
           | https://jobs.exxonmobil.com/ Chevron (India 159 - 4 US)
           | https://careers.chevron.com/search-jobs
        
             | MoonGhost wrote:
             | > The senior jobs will disappear too
             | 
             | Golden age of software development will be over soon?
             | Probably, for humans. How cool is it, the most enthusiastic
             | part will be replaced first.
        
           | harrison_clarke wrote:
           | i think there's an opportunity here
           | 
           | a lot of junior eng tasks don't really help you become a
           | senior engineer. someone needs to make a form and a backend
           | API for it to talk to, because it's a business need. but
           | doing 50 of those doesn't really impart a lot of wisdom
           | 
           | same with writing tests. you'll probably get faster at
           | writing tests, but that's about it. knowing that you need the
           | tests, and what kinds of things might go wrong, is the senior
           | engineer skill
           | 
           | with the LLMs current ability to help people research a
           | topic, and their growing ability to write functioning code,
           | my hunch is that people with the time to spare can learn
           | senior engineer skills while bypassing being a junior
           | engineer
           | 
           | convincing management of that is another story, though. if
           | you can't afford to do unpaid self-directed study, it's
           | probably going to be a bumpy road until industry figures out
           | how to not eat the seed corn
        
           | ozgrakkurt wrote:
           | Graduating as a junior is just not enough in a more
           | competitive market like there is now. I don't think it is
           | related to anything else. If you can hire a developer that is
           | spending 10x time coding or a developer that has studied and
           | graduated, this is not much of a choice. If you don't have
           | the option than you might go with a junior
        
           | mhitza wrote:
           | > It seems that, relatively speaking, there's a lot of
           | postings for senior engineers and very little for new grads.
           | 
           | That's been the case for most of the last 15 years in my
           | experience. You have to follow local job markets, get in
           | through an internship, or walk in at local companies and ask.
           | Applying en mass can also help, and so does having some code
           | on GitHub to show off.
        
           | dalemhurley wrote:
           | We have seen this in other industries and professions.
           | 
           | As everything is so new and different at this stage we are in
           | a state of discovery which requires more senior skills to
           | work out the lay of the land.
           | 
           | As we progress, create new procedures, processes, and
           | practices, particularly guardrails then hiring new juniors
           | will become the focus.
        
         | runako wrote:
         | > Parallel task execution: You can batch dozens of small edits
         | (refactors, tests, boilerplate) and run them concurrently
         | without context juggling.
         | 
         | This is also part of a recent update to Zed. I typically use
         | Zed with my own Claude API key.
        
           | ai-christianson wrote:
           | Is Zed managing the containerized dev environments, or
           | creating multiple worktrees or anything like that? Or are
           | they all sharing the same work tree?
        
             | runako wrote:
             | As far as I know, they are sharing a single work tree. So I
             | suppose that could get messy by default.
             | 
             | That said, it might be possible to tell each agent to
             | create a branch and do work there? I haven't tried that.
             | 
             | I haven't seen anything about Zed using containers, but
             | again you might be able to tell each agent to use some
             | container tooling you have in place since it can run
             | commands if you give it permission.
        
         | _bin_ wrote:
         | I believe cursor now supports parallel tasks, no? I haven't
         | done much with it personally but I have buddies who have.
         | 
         | If you want one idiot's perspective, please hyper-focus on
         | model quality. The barrier right now is not tooling, it's the
         | fact that _models are not good enough for a large amount of
         | work_. More importantly, they 're still closer to interns than
         | junior devs: you must give them a ton of guidance, constant
         | feedback, and a very stern eye for them to do even pretty
         | simple tasks.
         | 
         | I'd like to see something with an o1-preview/pro level of
         | quality that isn't insanely expensive, particularly since a lot
         | of programming isn't about syntax (which most SotA modls have
         | down pat) but about _understanding_ the underlying concepts, an
         | area in which they remain weak.
         | 
         | Atp I really don't care if the tooling sucks. Just give me
         | really, really good mdoels that don't cost a kidney.
        
         | quantumHazer wrote:
         | CTO of an AI agents company (which has worked with AI labs)
         | says agents works fine. Nothing new under the sun.
        
         | hintymad wrote:
         | It looks we are in this interesting cycle: millions of
         | engineers contribute to open-source on github. The best of our
         | minds use the code to develop powerful models to replace
         | exactly these engineers. In fact, the more code a group
         | contributes to github, the easier it is for the companies to
         | replace this group. Case in point, frontend engineers are
         | impacted most so far.
         | 
         | Does this mean people will be less incentivized to contribute
         | to open source as time goes by?
         | 
         | P.S., I think the current trend is a wakeup call to us software
         | engineers. We thought we were doing highly creative work, but
         | in reality we spend a lot of time doing the basic job of
         | knowledge workers: retrieving knowledge and interpolating some
         | basic and highly predictable variations. Unfortunately, the
         | current AI is really good at replacing this type of work.
         | 
         | My optimistic view is that in long term we will have invent or
         | expand into more interesting work, but I'm not sure how long we
         | will have to wait. The current generation of software engineers
         | may suffer high supply but low demand of our profession for
         | years to come.
        
           | Daishiman wrote:
           | > P.S., I think the current trend is a wakeup call to us
           | software engineers. We thought we were doing highly creative
           | work, but in reality we spend a lot of time doing the basic
           | job of knowledge workers: retrieving knowledge and
           | interpolating some basic and highly predictable variations.
           | Unfortunately, the current AI is really good at replacing
           | this type of work.
           | 
           | Most of the waking hours of most creative work have this type
           | of drudgery. Professional painters and designers spend most
           | of their time replicating ideas that are well fleshed-out.
           | Musicians spend most of their time rehearsing existing
           | compositions.
           | 
           | There is a point to be made that these repetitive tasks are a
           | prerequisite to come up with creative ideas.
        
             | rowanG077 wrote:
             | I disagree. AI have shown to most capable in what we
             | consider creative jobs. Music creation, voice acting,
             | text/story writing, art creation, video creation and more.
        
               | roflyear wrote:
               | If you mean create as in literally, sure. But not in
               | being creative. AI can't solve novel problems yet. The
               | person you're replying to obviously means being creative
               | not literally creating something.
        
               | crat3r wrote:
               | What is the qualifier for this? Didn't one of the models
               | recently create a "novel" algorithm for a math problem?
               | I'm not sure this holds water anymore.
        
               | rowanG077 wrote:
               | You can't say AI is creating something new but that it
               | isn't being creative with clearly explaining why you
               | think that's the case. AI is creating novel solution to
               | problems humans haven't cracked in centuries. I don't see
               | anything more creative than this.
        
               | KaiserPro wrote:
               | > AI have shown to most capable in what we consider
               | creative jobs
               | 
               | no it creates shit thats close enough for people who are
               | in a rush and dont care.
               | 
               | ie, you need artwork for shit on temu, boom job done.
               | 
               | You want to make a poster for a bake sale, boom job done.
               | 
               | Need some free music that sounds close enough to be
               | swifty, but not enough to get sued, great.
               | 
               | But as an expression of creativity, most people cant get
               | it to do that.
               | 
               | Its currently slightly more configurable clipart.
        
               | rowanG077 wrote:
               | > AI creates novel algorithms beating thousands of
               | googlers.
               | 
               | Random HNer on an AI post one day later
               | 
               | > Its currently slightly more configurable clipart.
               | 
               | It's so ridiculous at this point that I can just laugh
               | about this.
        
           | electrondood wrote:
           | > doing the basic job of knowledge workers
           | 
           | If you extrapolate and generalize further... what is at risk
           | is any task that involves taking information input (text,
           | audio, images, video, etc.), and applying it to create some
           | information output or perform some action which is useful.
           | 
           | That's basically the definition of work. It's not just
           | knowledge work, it's literally any work.
        
           | lispisok wrote:
           | As much as I support community developed software and "free
           | as in freedom", "Open Source" got completely perverted into
           | tricking people to work for free for huge financial benefits
           | for others. Your comment is just one example of that.
           | 
           | For that reason all my silly little side projects are now in
           | private repos. I dont care the chance somebody builds a
           | business around them is slim to none. Dont think putting a
           | license will protect you either. You'd have to know somebody
           | is violating your license before you can even think about
           | doing anything and that's basically impossible if it gets
           | ripped into a private codebase and isnt obvious externally.
        
             | hintymad wrote:
             | > "Open Source" got completely perverted into tricking
             | people to work for free for huge financial benefits for
             | others
             | 
             | I'm quite conflicted on this assessment. On one hand, I was
             | wondering if we would get better job market if there were
             | not much open-sourced systems. We may have had a much
             | slower growth, but we would see our growth last for a lot
             | more years, which mean we may enjoy our profession until
             | our retirement and more. On the other hand, open source did
             | create large cakes, right? Like the "big data" market, the
             | ML market, the distributed system market, and etc. Like the
             | millions of data scientists who could barely use Pandas and
             | scipy, or hundreds of thousands of ML engineers who
             | couldn't even bother to know what semi positive definite
             | matrix is.
             | 
             | Interesting times.
        
           | blibble wrote:
           | > Does this mean people will be less incentivized to
           | contribute to open source as time goes by?
           | 
           | personally, I completely stopped 2 years ago
           | 
           | it's the same as the stack overflow problem: the incentive to
           | contribute tends towards zero, at which point the plagiarism
           | machine stops improving
        
           | SubiculumCode wrote:
           | Now do open science.
           | 
           | More generally, specialty knowledge is valuable. From now on,
           | all employees will be monitored in order to replace them.
        
         | dakiol wrote:
         | This whole "LLMs == junior engineers" is so pedantic. Don't we
         | realize that the same way senior engineers thinkg that LLMs can
         | just replace junior engineers, high-level executives think that
         | LLMs will soon replace senior ones?
         | 
         | Junior engineers are not cattle. They are the future senior
         | ones, they bring new insights into teams, new perspectives;
         | diversity. I can tell you the times I have learnt so many
         | valuable things from so-called junior engineers (and not only
         | tech-wise things).
         | 
         | LLMs have their place, but ffs, stop with the "junior engineer
         | replacement" shit.
        
           | obsolete_wagie wrote:
           | You need someone thats technical to look at the agent output,
           | senior engineers will be around. Junior engineers are
           | certainly being replaced
        
             | dakiol wrote:
             | Thanks, Sherlock. Now, tell me, when senior engineers start
             | to retire, who will replace them? Ah, yeah, I can hear you
             | say "LLMs!". And LLMs will rewrite themselves so we won't
             | need seniors anymore writing code. And LLMs will write all
             | the code companies need. So obvious, of course. We won't
             | need a single senior because we won't have them, because
             | they are not hired these days anymore. Perfect plan.
        
           | alfalfasprout wrote:
           | TBH the people I see parroting the LLM=junior engineer BS are
           | almost always technically incompetent or so disconnected at
           | this point from what's happening on the ground that they
           | wouldn't know either way.
           | 
           | I've been using the codex agent since before this
           | announcement btw along with most of the latest LLMs. I
           | literally work in the AI/ML tooling space. We're entering a
           | dangerous world now where there's super useful technology but
           | people are trying to use it to replace others instead of
           | enhance them. And that's causing the wrong tools to be built.
        
         | fullstackchris wrote:
         | Are you payed to say this? Sorry for my frankness but I dont
         | understand how you can have multiple agents concurrently
         | editing the same areas of code without any sort of merge
         | conflicts later.
        
       | tough wrote:
       | can someone give me a test prompt to one-shot something in go for
       | testing?
       | 
       | (Im trying something)
       | 
       | what would be an impressive program that an agent should be able
       | to one-shot in one go?
        
       | blixt wrote:
       | They mentioned "microVM" in the live stream. Notably there's no
       | browser or internet access. It makes sense, running specialized
       | Firecracker/Unikraft/etc microkernels is way faster and cheaper
       | so you can scale it up. But there will be a big technical
       | scalability difficulty jump from this to the "agents with their
       | own computers". ChatGPT Operator already does have a browser, so
       | they definitely can do this, but I imagine the demand is orders
       | of magnitudes different.
       | 
       | There must be room for a Modal/Cloudflare/etc infrastructure
       | company that focuses only on providing full-fledged computer
       | environments specifically for AI with forking/snapshotting
       | (pause/resume), screen access, human-in-the-loop support, and so
       | forth, and it would be very lucrative. We have browser-use, etc,
       | but they don't (yet) capture the whole flow.
        
       | sudohalt wrote:
       | When it runs the code I assume it does so via a docker container,
       | does anyone know how it is configured? Assuming the user hasn't
       | specified an AGENTS.md file or a Dockerfile in the repo. Does it
       | generate it via LLM based on the repo, and what it thinks is
       | needed? Does it use static analysis (package.json, requirements
       | txt, etc)? Do they just have a super generic Dockerfile that can
       | handle most envs? Combination of different things?
        
         | ilaksh wrote:
         | I think they mentioned it was a similar environment to what it
         | trains on, so maybe they have a default Dockerfile. Of course
         | containers can also install additional packages or at least
         | python packages.
        
           | nkko wrote:
           | Yes, and one test failed as it missed pydantic dependency
        
         | hansonw wrote:
         | More about that here!
         | https://platform.openai.com/docs/codex#advanced-configuratio...
        
           | sudohalt wrote:
           | Thanks!
        
           | sudohalt wrote:
           | It seems LLMs are doing a lot of the heavy lifting figuring
           | out the exact test, build, lint commands to run (even if the
           | AGENTS.md file gives it direction and hints). I wonder if
           | there are any plans to support user defined build, test, and
           | pre commit commands to avoid unnecessary cost and keep it
           | deterministic. Also wonder how monolith repos (or distinct
           | but related repos) are supported, does it run everything in
           | one container or loop through the envs that are edited?
           | 
           | I assume one easy next step is to just run GitHub Actions in
           | the container since everything is defined there (assuming the
           | user set it up)
        
       | bionhoward wrote:
       | What about privacy, training opt out?
       | 
       | What about using it for AI / developing models that compete with
       | our new overlords?
       | 
       | Seems like using this is just asking to get rug pulled for
       | competing with em when they release something that competes with
       | your thing. Am I just an old who's crowing about nothing? It's ok
       | for them to tell us we own outputs we can't use to compete with
       | em?
        
         | piskov wrote:
         | What the video: there is an explicit switch at one of the steps
         | about (not) allowing to train on your repo.
        
           | lurking_swe wrote:
           | That's nice. And we trust that it does what it says
           | because...? The AI company (openai, anthropic, etc) pinky
           | promised? Have we seen their source code? How do you know
           | they don't train?
           | 
           | Facebook has been caught in recent DOJ hearings breaking the
           | law with how they run their business, just as one example.
           | They claimed under oath, previously, to not be doing X, and
           | then years later there was proof they did exactly that.
           | 
           | https://youtu.be/7ZzxxLqWKOE?si=_FD2gikJkSH1V96r
           | 
           | A companies "word" means nothing imo. None of this makes
           | sense if i'm being honest. Unless you personally have a
           | negotiated contract with the provider, and can somehow be
           | certain they are doing what they claim, and can later sue for
           | damages, all of this is just crossing your fingers and hoping
           | for the best.
        
             | tough wrote:
             | On the other hand you can enable explicit sharing of your
             | data and get a few million free tokens daily
        
             | wilg wrote:
             | If you don't trust the company your opt-out strategy is
             | much easier, you simply do not authorize them to access
             | your code.
        
       | ofirpress wrote:
       | [I'm one of the co-creators of SWE-bench] The team managed to
       | improve on the already very strong o3 results on SWE-bench, but
       | it's interesting that we're just seeing an improvement of a few
       | percentage points. I wonder if getting to 85% from 75% on
       | Verified is going to take as long as it took to get from 20% to
       | 75%.
        
         | Snuggly73 wrote:
         | I can be completely off base, but it feels to me like
         | benchmaxxing is going on with swe-bench.
         | 
         | Look at the results from multi swe bench - https://multi-swe-
         | bench.github.io/#/
         | 
         | swe polybench - https://amazon-science.github.io/SWE-PolyBench/
         | 
         | Kotlin bench - https://firebender.com/leaderboard
        
         | mr_north_london wrote:
         | How long did it take to go from 20% to 75%?
        
       | nadis wrote:
       | In the preview video, I appreciated Katy Shi's comment on "I
       | think this is a reflection of where engineering work has moved
       | over the past where a lot of my time now is spent reviewing code
       | rather than writing it."
       | 
       | Preview video from Open AI:
       | https://www.youtube.com/watch?v=hhdpnbfH6NU&t=878s
       | 
       | As I think about what "AI-native" or just the future of building
       | software loos like, its interesting to me that - right now -
       | developers are still just reading code and tests rather than
       | looking at simulations.
       | 
       | While a new(ish) concept for software development, simulations
       | could provide a wider range of outcomes and, especially for the
       | front end, are far easier to evaluate than just code/tests alone.
       | I'm biased because this is something I've been exploring but it
       | really hit me over the head looking at the Codex launch
       | materials.
        
         | ai-christianson wrote:
         | > rather than looking at simulations
         | 
         | You mean like automated test suites?
        
           | tough wrote:
           | automated visual fuzzy-testing with some self-reinforcement
           | loops
           | 
           | There's already library's for QA testing and VLM's can give
           | critique on a series of screenshots automated by a playwright
           | script per branch
        
             | ai-christianson wrote:
             | Cool. Putting vision in the loop is a great idea.
             | 
             | Ambitious idea, but I like it.
        
               | tough wrote:
               | SmolVLM, Gemma, LlaVa, in case you wanna play with some
               | of the ones i've tried.
               | 
               | https://huggingface.co/blog/smolvlm
               | 
               | recently both llama.cpp and ollama got better support for
               | them too, which makes this kind of integration with
               | local/self-hosted models now more attainable/less
               | expensive
        
               | tough wrote:
               | also this for the visual regression testing parts, but
               | you can add some AI onto the mix ;)
               | https://github.com/lost-pixel/lost-pixel
        
               | ericghildyal wrote:
               | I used Cline to build a tiny testing helper app and this
               | is exactly what it did!
               | 
               | It made changes in TS/Next.js given just the boiletplate
               | from create-next-app, ran `yarn dev` then opened its mini
               | LLM browser and navigated to localhost to verify
               | everything looked correct.
               | 
               | It found 1 mistake and fixed the issue then ran `yarn
               | dev` again, opened a new browser, navigated to localhost
               | (pointing at the original server it brought up, not the
               | new one at another port) and confirmed the change was
               | correct.
               | 
               | I was very impressed but still laughed at how it somehow
               | backed its way into a flow the worked, but only because
               | Next has hot-reloading.
        
         | fosterfriends wrote:
         | ++ Kind of my whole thesis with Graphite. As more code gets AI-
         | generated, the weight shifts to review, testing, and
         | integration. Even as someone helping build AI code reviewers,
         | we'll _need_ humans stamping forever - for many reasons, but
         | fundamentally for accountability. A computer can never be held
         | accountable
         | 
         | https://constelisvoss.com/pages/a-computer-can-never-be-held...
        
           | hintymad wrote:
           | > A computer can never be held accountable
           | 
           | I think the issue is not about humans being entirely
           | replaced. Instead, the issue is that if AI replaces enough
           | number of knowledge workers while there's no new or expanded
           | market to absorb the workforce, the new balance of supply and
           | demand will mean that many of us will have suppressed pay or
           | worse, losing our jobs forever.
        
       | DGAP wrote:
       | If you still don't think software engineering as a high paying
       | job is over, I don't know what to tell you.
        
         | whyowhy3484939 wrote:
         | It's high paying?
        
       | asdev wrote:
       | is the point of this to actually assign tasks to an AI to
       | complete end to end? Every task I do with AI requires atleast
       | some bit of hand holding, sometimes reprompting etc. So I don't
       | see why I would want to run tasks in parallel, I don't think it
       | would increase throughput. Curious if others have better
       | experiences with this
        
       | RhysabOweyn wrote:
       | I believe that code from one of these things will eventually
       | cause a disaster affecting the capital owners. Then all of a
       | sudden you will need a PE license, ABET degree, 5 years working
       | experience, etc. to call yourself a software engineer. It would
       | not even be historically unique. Charlatans are the reason that
       | lawyers, medical doctors, and civil engineers have to go through
       | lots of education, exams, and vocational training to get into
       | their profession. AI will probably force software engineering as
       | a profession into that category as well.
       | 
       | On the other hand, if your job was writing code at certain
       | companies whose profits were based on shoving ads in front of
       | people then I would agree that no one will care if it is written
       | by a machine or not. The days of those jobs making >$200k a year
       | are numbered.
        
         | alfalfasprout wrote:
         | Even ads have risk. Customer service has risk. The widespread
         | proliferation of this stuff is a legal minefield waiting to be
         | stepped on.
        
       | SketchySeaBeast wrote:
       | Is this the same idea as when we switched to multicore machines?
       | The rate of change on the capabilities of a single agent has
       | slowed enough now the only way for OpenAI to appearing to be
       | making decent progress is to have many?
        
       | ionwake wrote:
       | Im sorry if Im being silly, but I have paid for the Pro version,
       | $200 a month, everytime I click on Try Codex, it takes me to a
       | pricing page with the "Team Plan"
       | https://chatgpt.com/codex#pricing.
       | 
       | Is this still rolling out? I dont need the team plan too do I?
       | 
       | I have been using openAI products for years now and I am keen to
       | try but I have no idea what I am doing wrong.
        
         | mr_north_london wrote:
         | It's still rolling out
        
           | ionwake wrote:
           | Thx for the reply, Im in london too ( atm )
        
         | jdee wrote:
         | im the same, and it appeared for me 2 mins ago. looks like its
         | still rolling out
        
           | ionwake wrote:
           | cool it appeared - I wa sjsut worried it was a payment issue.
           | thanks guys.
        
         | throwaway314155 wrote:
         | They do this with every major release. Never going to
         | understand why.
        
       | hintymad wrote:
       | I remember HN had a repeating popular post on the the most
       | important data structures. They are all the basic ones that a
       | first-year college student can learn. The youngest one was
       | skiplist, which was invented in 1990. When I was a student, my
       | class literally read the original paper and implemented the data
       | structure and analyzed the complexity in our first data structure
       | course.
       | 
       | This seems imply that the software engineering as a profession
       | has been quite mature and saturated for a while, to the point
       | that a model can predict most of the output. Yes, yes, I know
       | there are thousands of advanced algorithms and amazing systems in
       | production. It's just that the market does not need millions of
       | engineers for such advanced skills.
       | 
       | Unless we get yet another new domain like cloud or like internet,
       | I'm afraid the core value of software engineers: trailblazing for
       | new business scenarios, will continue diminishing and being
       | marginalized by AI. As a result, we get way less demand for our
       | job, and many of us will either take a lower pay, or lose our
       | jobs for extended time.
        
       | theappsecguy wrote:
       | I am so damn tired of all the AI garbage shoved down our throats
       | every day. Can't wait for all of it to crash and burn.
        
       | fullstackchris wrote:
       | Reading these threads its clear to me people are so cooked and no
       | longer understand (or perhaps never did) understand the simple
       | process of how source code is shared, built, and merged together
       | with multiple editors has ever worked
        
       | swisniewski wrote:
       | Has anyone else been able to get "secrets" to work?
       | 
       | They seem to be injected fine in the "environment setup" but
       | don't seem to be injected when running tasks against the
       | enviornment. This consistently repros even if I delete and re-
       | create the enviornment and archive and resubmit the task.
        
       ___________________________________________________________________
       (page generated 2025-05-16 23:00 UTC)