[HN Gopher] A Research Preview of Codex
___________________________________________________________________
A Research Preview of Codex
Author : meetpateltech
Score : 336 points
Date : 2025-05-16 15:02 UTC (7 hours ago)
(HTM) web link (openai.com)
(TXT) w3m dump (openai.com)
| haffi112 wrote:
| (watching live) I'm wondering how it performs on the METR
| benchmark (https://metr.org/blog/2025-03-19-measuring-ai-ability-
| to-com...).
| colesantiago wrote:
| I think the benchmark test for these programming agents that I
| would like to see an Agent making a flawless PR or patch to the
| BSD / Linux kernel.
|
| This should be possible today and surely Linus would also see
| this in the future.
| _kb wrote:
| There's a fairly pragmatic discussion in that exact topic with
| Linus here: https://youtu.be/VHHT6W-N0ak.
| tptacek wrote:
| Maddening: "codex" is also the name of their open-source Claude-
| Code-alike, and was previously the name of an at-the-time
| frontier coding model. It's like they name things just to fuck
| with us.
| tekacs wrote:
| So -- that client-side thing is _technically_ called `codex-
| cli` (in the parent 'codex' repo, which looks like a
| monorepo?).
|
| Still super confusing, though!
|
| I feel like companies working with and shipping LLMs would do
| well to remember that it's not just humans who get confused by
| this, but LLMs themselves... it makes for a painful time,
| sending off a request and noting that a third of the way into
| its reasoning that the model has gotten tow things with almost-
| identical names confused.
| tough wrote:
| they also have a dual implementation on rust and typescript
| there's codex-rs in that monorepo
| fabmilo wrote:
| more excited about the rust impl than the typescript one.
| tptacek wrote:
| Besides packaging of their releases, what possible
| difference could that make in this problem domain?
| tough wrote:
| I just think it's nice to have open source code to
| reference so maybe he meant just in that -educational-
| way, certainly more to learn from the rust one than the
| TS one for most folks? even if the problem-space doesn't
| require system-level safety code indeed
| quantadev wrote:
| If it's name is 'codex-cli' then that means "Codex Command
| Line Interface" so the name is absolutely codex.
| manojlds wrote:
| And with themselves and their models. The Codex open source had
| prompt to disambiguate it from the model.
| scottfalconer wrote:
| Next week: OpenAI rebrands Windsurf as Codex.
| odie5533 wrote:
| Codex IDE. Calling it.
| dbbk wrote:
| VS Codex
| prhn wrote:
| Is anyone using any of these tools to write non boilerplate code?
|
| I'm very interested.
|
| In my experience ChatGPT and Gemini are absolutely terrible at
| these types of things. They are constantly wrong. I know I'm not
| saying anything new, but I'm waiting to personally experience an
| LLM that does something useful with any of the code I give it.
|
| These tools aren't useless. They're great as search engines and
| pointing me in the right direction. They write dumb bash scripts
| that save me time here and there. That's it.
|
| And it's hilarious to me how these people present these tools. It
| generates a bunch of code, and then you spend all your time
| auditing and fixing what is expected to be wrong.
|
| That's not the type of code I'm putting in my company's code
| base, and I could probably write the damn code more correctly in
| less time than it takes to review for expected errors.
|
| What am I missing?
| icapybara wrote:
| It's probably what you're asking. You can't just say "write me
| an app", you have to break a big problem into small problems
| for it.
| spariev wrote:
| I think it all depends on your platform and use cases. In my
| experience AI tools work best with Python and JS/Typescript and
| some simple use cases (web apps, basic data science etc). Also,
| I've found they can be of great help with refactorings and
| cases when you need to do something similar to already existing
| code, but with a twist or change.
| volkk wrote:
| you might be missing small things to create more guardrails
| like effective prompting and maintaining what's been done using
| files, carefully controlling context, committing often in-
| between changes, but largely, you're not missing anything. i
| use AI constantly, but always for subtasks of a larger
| complicated thing that my brain has thought through. and often
| use higher cost models to help me abstractly think through
| complex things/point me in the right directions.
|
| personally, i've always operated in a codebase in a way that i
| _need_ to understand how things work for me to be productive
| and make the right decisions. I operate the same way with AI.
| every change is carefully reviewed, if it's dumb, i make it
| redo it and explain why it's dumb. and if it gets caught in a
| loop, i reset the context and try to reframe the problem.
| overall, i'm definitely more productive, but if you truly want
| to be hands off--you're in for a very bad time. i've been
| there.
|
| lastly, some codebases don't work well with AI. I was working
| on a problem that was a bit more novel/out there and no model
| could solve it. Just yapped endlessly about these complex, very
| potentially smart sounding solutions that did absolutely
| nothing. went all the way to o1-pro. the craziest part to me
| was the fact that across claude, deepseek and openai, they used
| the same specific vernacular for this particular problem which
| really highlights how a lot of these models are just a mish-
| mash of the same underlying architecture/internet data. some of
| these models use responses from other models for their training
| data, which to me is like incest. you won't get good genetical
| results
| Workaccount2 wrote:
| >What am I missing?
|
| That you are trying to use LLMs to create giant sprawling
| codebase feature packed software packages that define the
| modern software landscape. What's being missed is that any one
| user might only utilize 5% of the code base on any given day.
| Software is written to accommodate every need every user could
| have in one package. Then the users just use the small slice
| that accommodates their specific needs.
|
| I have now created 5 hyper narrow programs that are used daily
| by my company to do work. I am not a programmer and my company
| is not a tech company located in a tech bubble. We are a tiny
| company that does old school manufacturing.
|
| To give a quick general example, Betty uses Excel to manage
| payroll. A list of employees, a list of wages, a list of hours
| worked (which she copys from the time clock software .csv that
| she imports to excel).
|
| Excel is a few million LOC program and costs ~$10/mo. Betty
| needs maybe 2k LOC to do what she uses excel for. Something an
| LLM can do easily, a python GUI wrapper on an SQLite DB. And
| she would be blown away at how fast it is, and how it is
| written for her use specifically.
|
| How software is written and how it is used will change to
| accommodate LLMs. We didn't design cars to drive on horse
| paths, we put down pavement.
| kridsdale3 wrote:
| The Romans put down paved roads to make their horse paths
| more reliable.
|
| But yes, I hope we get away from the giant conglomeration of
| everything, ESPECIALLY the reality of people doing 90% of
| their business inside a Google Chrome widow. Move towards the
| UNIX philosophy of tiny single-purpose programs.
| alfalfasprout wrote:
| > I have now created 5 hyper narrow programs that are used
| daily by my company to do work. I am not a programmer and my
| company is not a tech company located in a tech bubble. We
| are a tiny company that does old school manufacturing.
|
| OK, great.
|
| > That you are trying to use LLMs to create giant sprawling
| codebase feature packed software packages that define the
| modern software landscape. What's being missed is that any
| one user might only utilize 5% of the code base on any given
| day. Software is written to accommodate every need every user
| could have in one package. Then the users just use the small
| slice that accommodates their specific needs.
|
| With all due respect, the fact that you made a few small
| programs to help with your tasks is wonderful but this last
| statement alone rather disqualifies your expertise to make an
| assessment on software engineering in general.
|
| There's a great number of reasons why codebases get large.
| Complex problems inherently come with complexity and scale in
| both code and integrations. You can choose to move the
| complexity around but never fully get rid of it.
| mupuff1234 wrote:
| But how much of the software industry is truly solving
| inherently complex problems?
|
| At a very conservative guess I'd say no more than 10%.
| Cu3PO42 wrote:
| Occasionally. I find that there is a certain category of task
| that I can hand over to an LLM and get a result that takes me
| significantly less time to clean up than it would have taken me
| to write from scratch.
|
| A recent example from a C# project I was working in. The
| project used builder classes that were constructed according to
| specified rules, but all of these builders were written by
| hand. I wanted to automatically generate these builders, and
| not using AI, just good old meta-programming.
|
| Now I knew enough to know that I needed a C# source generator,
| but I had absolutely no experience with writing them. Could I
| have figured this out in an hour or two? Probably. Did I write
| a prompt in less than five minutes and get a source generator
| that worked correctly in the first shot? Also yes. I then spent
| some time cleaning up that code and understanding the API it
| uses to hook into everything and was done in half an hour and
| still learnt something from it.
|
| You can make the argument that this source generator is in
| itself "boilerplate", because it doesn't contain any special
| sauce, but I still saved significant time in this instance.
| uludag wrote:
| I feel things get even worse when you use a more niche
| language. I get extremely disappointed any time I try to get it
| do anything useful in Clojure. Even as a search engine,
| especially when asking it about libraries, these tools
| completely fail expectation.
|
| I can't even fathom how frustrating such tools would be with
| poorly written confusing Clojure code using some niche
| dependency.
|
| That being said, I can imagine a whole class of problems where
| this could succeed very well at and provide value. Then again,
| the type of problems that I feel these systems could get right
| 99% of the time are problems that a skilled developer could fix
| in minutes.
| sottol wrote:
| I tried using Gemini 2.5 Pro for a side-side-project, seemed
| like a good project to explore LLMs and how they'd fit into my
| workflow. 2-3 weeks later it's around 7k loc of Python auto-
| gerating about 35k loc of C from JSON spec.
|
| This project is not your typical Webdev project, so maybe
| that's an interesting case-study. It takes a C-API spec in
| JSON, loads and processes it in Python and generates a
| C-library that turns a UI marked up YAML/JSON into C-Api calls
| to render that UI. [1]
|
| The result is pretty hacky code (by my design, can't/won't use
| FFI) that's 90% written by Gemini 2.5 Pro Pre/Exp but it mostly
| worked. It's around 7k lines of Python that generate a 30-40k
| loc C-library from a JSON LVGL-API-spec to render an LVGL UI
| from YAML/JSON markup.
|
| I probably spent 2-3 weeks on this, I might have been able to
| do something similar in maybe 2x the time but this is about 20%
| of the mental overhead/exhaustion it would have taken me
| otherwise. Otoh, I would have had a much better understanding
| of the tradeoffs and maybe a slightly cleaner architecture if I
| would have to write it. But there's also a chance I would have
| gotten lost in some of the complexity and never finished (esp
| since it's a side-project that probably no-one else will ever
| see).
|
| What worked well:
|
| * It mostly works(!). Unlike previous attempts with Gemini 1.5
| where I had to spend about as much or more time fixing than
| it'd have taken me to write the code. Even adding complicated
| features after the fact usually works pretty well with minor
| fixing on my end.
|
| * Lowers mental "load" - you don't have to think so much about
| how to tackle features, refactors, ...
|
| Other stuff:
|
| * I really did not like Cursor or Windsurf - I half-use VSCode
| for embedded hobby projects but I don't want to then have
| another "thing" on top of that. Aider works, but it would
| probably require some more work to get used to the automatic
| features. I really need to get used to the tooling, not an
| insignificant time investment. It doesn't vibe with how I work,
| yet.
|
| * You can generate a *significant* amount of code in a short
| time. It doesn't feel like it's "your" code though, it's like
| joining a startup - a mountain of code, someone else's
| architecture, their coding style, comment style, ... and,
|
| * there's this "fog of code", where you can sorta bumble around
| the codebase but don't really 100% understand it. I still have
| mid/low confidence in the changes I make by hand, even 1 week
| after the codebase has largely stabilized. Again, it's like
| getting familiar with someone else's code.
|
| * Code quality is ok but not great (and partially my fault).
| Probably depends on how you got to the current code - ie how
| clean was your "path". But since it is easier to "evolve" the
| whole project (I changed directions once or twice when I sort
| of hit a wall) it's also easier to end up with a messy-ish
| codebase. Maybe the way to go is to first explore, then codify
| all the requirements and start afresh from a clean slate
| instead of trying to evolve the code-base. But that's also not
| an insignificant amount of work and also mental load (because
| now you really need to understand the whole codebase or trust
| that an LLM can sufficiently distill it).
|
| * I got much better results with very precise prompts. Maybe
| I'm using it wrong, ie I usually (think I) know what I want and
| just instruct the LLM instead of having an exploratory chat but
| the more explicit I am, the more closely the output is to what
| I'd like to see. I've tried to discuss proposed changes a few
| times to generate a spec to implement in another session but it
| takes time and was not super successful. Another thing to
| practice.
|
| * A bit of a later realization, but modular code and short,
| self-contained modules are really important though this might
| depend on your workflow.
|
| To summarize:
|
| * It works.
|
| * It lowers initial mental burden.
|
| * But to get really good results, you still have to put a lot
| of effort into it.
|
| * At least right now, it seems you will still eventually have
| to put in the mental effort at some point, normally it's
| "front-loaded" where you have to do the design and think about
| it hard, whereas the AI does all the initial work but it
| becomes harder to cope with the codebase once you reach a
| certain complexity. Eventually you will have to understand it
| though even if just to instruct the LLM to make the exact
| changes you want.
|
| [1] https://github.com/thingsapart/lvgl_ui_preview
| asadm wrote:
| yes, think of it as search engine that auto-applies that
| stackoverflow fix to your code.
|
| But I have done larger tasks (write device drivers) using
| gemini.
| browningstreet wrote:
| I've built a number of personal data-oriented and single
| purpose tools in Replit. I've constrained my ambitions to what
| I think it can do but I've added use cases beyond my initial
| concept.
|
| In short, the tools work. I've built things 10x faster than
| doing it from scratch. I also have a sense of what else I'll be
| able to build in a year. I also enjoy not having to add cycles
| to communicate with external contributors -- I think, then I
| do, even if there's a bit of wrestling. Wrangling with a coding
| agent feels a bit like "compile, test, fix, re-compile". Re-
| compiling generally got faster in subsequent generations of
| compiler releases.
|
| My company is building internal business functions using AI
| right now. It works too. We're not putting that stuff in front
| of our customers yet, but I can see that it'll come. We may put
| agents into the product that let them build things for
| themselves.
|
| I get the grumpiness & resistance, but I don't see how it's
| buying you anything. The puck isn't underfoot.
| IXCoach wrote:
| Hey there!
|
| Lots missing here, but I had the same issues, it takes
| iteration and practice. I use claude code in terminal windows,
| and text expander to save explicit reminders that I have to
| inject super regularly because anthropic obscures access to
| system prompts.
|
| For example, I have 3 to 8 paragraph long instructions I will
| place regularly about not assuming, checking deterministically
| etc. and for most things I have the agents write a report with
| a specific instruction set.
|
| I pop the instructions into text expander so I just type - docs
| when saying go figure this out, and give me the path to the
| report when done.
|
| They come back with a path, and I copy it and search vscode
|
| It opens as an md and i use preview mode, its similar to a
| google doc.
|
| And ill review it. always, things will be wrong, tons of
| assumptions, failures to check determistically, etc... but I
| see that in the doc and have it fix it. correct
| misunderstandings, update the doc until its perfect.
|
| From there ill say add a plan in a table with status for each
| task based on this ( another text expander snippet with
| instructions )
|
| And WHEN thats 100% right, Ill say implement and update as you
| go. The update as you go forces it to recognize and remember
| the scope of the task.
|
| Greatest points of failure in the system is misalignment.
| Ethics teams got that right. It compounds FAST if allowed. you
| let them assume things, they state assumptions as facts, that
| becomes what other agents read and you get true chaos
| unchecked.
|
| I started rebuilding claude code from scratch literally because
| they block us from accessing system prompts and I NEED these
| agents to stop lying to me about things that are not done or
| assumed, which highlights the true chaos possible when applied
| to system critical operations in governance or at scale.
|
| I also built my own tool like codex for managing agent tasks
| and making this simpler, but getting them to use it without
| getting confused is still a gap.
|
| Let me know if you have any other questions. I am performing
| the work of 20 Engineers as of today, rewrote 2 years of back
| end code that required a team of 2 engineers full time work in
| 4 weeks by myself with this system... so I am, I guess quite
| good at it.
|
| I need to push my edges further into this latest tech, have not
| tried codex cli or the new tool yet.
| IXCoach wrote:
| Its a total of about 30 snippets, avg 6 paragraphs long, that
| I have to inject. for each role switch it goes through i have
| to re inject them.
|
| its a pain but it works.
|
| Even TDD it will hallucinate the mocks without management.
| and hallucinate the requirements. Each layer has to be
| checked atomically, but the text expander snippets done right
| can get it close to 75% right.
|
| My main project faces 5000 users so I cant let the agents run
| freely, whereas with isolated projects in separate repos I
| can let them run more freely, then review in gitkraken before
| committing.
| Rudybega wrote:
| You could just use something like roo code with custom
| modes rather than manually injecting them. The orchestrator
| mode can decide on the other appropriate modes to use for
| subtasks.
|
| You can customize the system prompts, baseline propmts, and
| models used for every single mode and have as many or as
| few as you want.
| arkmm wrote:
| I think most code these days is boilerplate, though the
| composition of boilerplate snippets can become something unique
| and differentiated.
| evilduck wrote:
| It may depend on what you consider boilerplate. I use them
| quite a bit for scripting outside of direct product code
| development. Essentially, AI coding tools have moved this
| chart's decision making math for me: https://xkcd.com/1205/ The
| cost to automate manual tasking is now significantly lower so I
| end up doing more of it.
| lispisok wrote:
| A lot of people are deeply invested in these things being
| better than they really are. From the OpenAI's and Google's
| spending $100s of billions EACH developing LLMs to VC backed
| startups promising their "AI agent" can replace entire teams of
| white collar employees. That's why your experience matches mine
| and every other developer I personally know but you see
| comments everywhere making much grander claims.
| triMichael wrote:
| I agree, but I'd add that it's not just the tech giants who
| want them to be better than they are, but also non-
| programmers.
|
| IMO LLMs are actually pretty good at writing small scripts.
| First, it's much more common for a small script to be in the
| LLM's training data, and second, it's much easier to find and
| fix a bug. So the LLM actually does allow a non-programmer to
| write correct code with minimal effort (for some simple
| task), and then they are blown away thinking writing software
| is a solved problem. However, these kinds of people have no
| idea of the difference between a hundred line script where an
| error is easily found and isn't a big deal and a million line
| codebase where an error can be invisible and shut everything
| down.
|
| Worst of all is when the two sides of tech-giants and non-
| programmers meet. These two sides may sound like opposites
| but they really aren't. In particular, there are plenty of
| non-programmers involved at the C-level and the HR levels of
| tech companies. These people are particularly vulnerable to
| being wowed by LLMs seemingly able to do complex tasks that
| in their minds are the same tasks their employees are doing.
| As a result, they stop hiring new people and tell their
| current people to "just use LLMs", leading to the current
| hiring crisis.
| alfalfasprout wrote:
| TBH, this website in the last few years has attracted an
| increasingly non-technical audience. And the field, in
| general, has attracted a lot of less experienced folks that
| don't understand the implications of what they're doing. I
| don't mean that as a diss-- but just a reflection of reality.
|
| Indeed, even codex (and i've been using it prior to this
| release) is not remotely at the level of even a junior
| engineer outside of a set of tasks.
| energy123 wrote:
| Where can I read OpenAI's promise that it won't use the repos I
| upload for training?
| alvis wrote:
| Is it surprising? Hmm perhaps nope. But is it better than cursor
| etc? Hmm perhaps it's a wrong question.
|
| Feels like codex is for product managers to fix bugs without
| touching any developer resources. Then it's insanely surprising!
| gbalduzzi wrote:
| It sounds nice, but are product managers able to spot
| regressions or other potential issues (performance, data
| protection, legal, etc) in the codex result?
| alvis wrote:
| If codex can analyze the whole code base, I can't see why
| not? I can even imagine one can set up a CI task that any
| committed code must pass all sort of legal/data protection
| requirements too
| kenjackson wrote:
| Exactly this. In fact the product manager should be the one
| that knows what the set of checks that need to be done over
| the code base. You need a dev though to do make sure the
| last mile is doing what you expect it to do.
| bhl wrote:
| I've been contracting with a startup. The bottleneck is not the
| lack of tools; it's agency. There's so much work, it becomes
| work to assign and organize work.
|
| But now who's going to do that work? Still engineers.
| ilaksh wrote:
| As someone who works on his own open source agent framework/UI
| (https://github.com/runvnc/mindroot), it's kind of interesting
| how announcements from vendors tend to mirror features that I am
| working on.
|
| For example, in the last month or so, I added a job queue plugin.
| The ability to run multiple tasks that they demoed today is quite
| similar. The issue I ran into with users is that without
| Enterprise plans, complex tasks run into rate limits when trying
| to run concurrently.
|
| So I am adding an ability to have multiple queues, with each
| possibly using different models and/or providers, to get around
| rate limits.
|
| By the way, my system has features that are somewhat similar not
| only to this tool they are showing but also things like Manus. It
| is quite rough around the edges though because I am doing 100% of
| it myself.
|
| But it is MIT Licensed and it would be great if any developer on
| the planet wanted to contribute anything.
| asadm wrote:
| Is there an open source version of this? that essentially uses
| microvms to git clone my repo and essentially run codex-cli or
| equivalent and sends me a PR.
|
| I made one for github action but it's not as realtime and is 2
| years old now: https://github.com/asadm/chota
| illnewsthat wrote:
| I haven't checked in on it recently, but maybe a similar open-
| source option would be https://github.com/All-Hands-
| AI/OpenHands
|
| A not open-source option this looks close to is also
| https://githubnext.com/projects/copilot-workspace (released
| April 2024, but I'm not sure it's gotten any significant
| updates since)
| asadm wrote:
| oh openDevin became openHANDS. Interestingly, I committed the
| LICENSE file to that repo haha
| tough wrote:
| did they relicense too w the rename?
| simianwords wrote:
| I wonder if tools like these are best for semi structured
| refactors like upgrade to python3, migrate to postgres etc
| btbuildem wrote:
| > To balance safety and utility, Codex was trained to identify
| and precisely refuse requests aimed at development of malicious
| software, while clearly distinguishing and supporting legitimate
| tasks.
|
| I can't say I am a big fan of neutering these paradigm-shifting
| tools according to one culture's code of ethics / way of doing
| business / etc.
|
| One man's revolutionary is another's enemy combatant and all
| that. What if we need top-notch malware to take down the robot
| dogs lobbing mortars at our madmaxian compound?!
| amarcheschi wrote:
| If I had to guess, only for the general public they'll be
| neutered, not for the 3 letters agencies
| pixl97 wrote:
| TLA's have very few of their own coders, they contract
| everything out. Now I'm sure OAI will lend an unrestricted
| model to groups that pay large private contracts they won't
| disclose.
| lumenwrites wrote:
| You gotta think about it in terms of cost vs benefit. How much
| damage will a malicious AI do, vs how much value will you get
| out of non-neutered model?
| GolfPopper wrote:
| > _What if we need top-notch malware to take down the robot
| dogs lobbing mortars at our madmaxian compound?!_
|
| I wouldn't sweat it. According to it's developers, Codex
| understands 'malicious software', it has just been trained to
| say, "But I won't do that" when such requests are made to it.
| Judging from the recent past [1][2] getting LLMs to bypass such
| safeguards is pretty easy.
|
| 1.https://hiddenlayer.com/innovation-hub/novel-universal-
| bypas... 2.https://cyberpress.org/researchers-bypass-
| safeguards-in-17-p...
| rowanG077 wrote:
| Agreed, I'm a big proponent that people should be in control of
| the tools they use. I don't think the approach where there is
| wise dicator enforcing I can't use my flathead screwdriver to
| screw down a phillips head screw is good. I think it's actively
| undermining people.
| scudsworth wrote:
| pleased to see a paragraph-long comment in the examples. now
| thats good coding.
| 2OEH8eoCRo0 wrote:
| More generated slop for a real human to sift through. Can I get
| an ai summary of that comment?
| kleiba wrote:
| Just curious: is your company happy sharing their code-base with
| an AI provider? Or are you using a local installation?
| asadm wrote:
| why not? OpenAI won't be stupid to look at my code and be that
| vulnerable legally. It ain't worth it.
| KaiserPro wrote:
| They literally scraped half of youtube, made a library to
| extract the audio and released it as whisper.
|
| Of _course_ they are training on your shit.
| bhl wrote:
| Cursor has enterprise mode which forces a data privacy feature.
| pixl97 wrote:
| Companies commonly share their code with SAAS providers.
| Typically they'll have a contract to prevent usage otherwise.
| nmca wrote:
| It is a cost benefit trade off, as with all things. Benefits
| look pretty good.
| layer8 wrote:
| The cost of sharing your code is unknown, though.
| philomath_mn wrote:
| Under what circumstances would that cost be high? Is OpenAI
| going to rip off your app? Why would they waste a second on
| that when there are better models to be built?
| odie5533 wrote:
| For 99% of companies, their code is worthless to anyone but
| them.
| manquer wrote:
| For copying the product / service yes it is not worth much .
|
| However for people trying to compromise your system access to
| your code can be a valuable asset .The worth of that could be
| well beyond just enterprise value of the organization , it
| could people's lives or bring down critical infrastructure.
|
| You don't just have access to code you created and have
| complete control to. Organizations have vendors providing
| code(drivers , libraries...) with narrow licenses that
| prohibit sharing or leaking in anyway. So this type of leak
| can open you to a lot of liability.
| tough wrote:
| so i just upgraded to pro plan but yet https://chatgpt.com/codex
| doesnt work for me and asks me to -try chatgpt pro- and shows me
| the upsell modal, even if already on the higher tier
|
| sigh
| modeless wrote:
| You mean Pro? It's only in the $200 Pro tier.
| tough wrote:
| Yes sorry meant pro,
|
| I just enabled on Settings > Connectors > Github
|
| hoping that makes it work
|
| ... still doesnt work, is it geo-restricted maybe? idk
| rapfaria wrote:
| They said Plus soon, not today.
| jiocrag wrote:
| same here. Paying for Pro ($200) but the "try it" link just
| leads to the Pro sign up page, where it says I'm already on
| Pro. Hyper intelligent coding agents, but can't make their
| website work.
| tough wrote:
| > Hyper intelligent coding agents, but can't make their
| website work.
|
| I know right
|
| also no human to contact on support... tempted to cancel the
| sub lol i'll give them 24h
| fear91 wrote:
| Same here, paying for Pro but I just get redirected to vanilla
| version...
| piskov wrote:
| > will be rolling
|
| [?] available now to all pro users
| tough wrote:
| ok but I baited the hook and now am waiting.
|
| Every -big- release they gatekeep something to pro I pay
| for it like every 3 months, then cancel after the high
|
| when will i learn
| gizmodo59 wrote:
| It says "Rolling out to users on the ChatGPT Pro Plan today" So
| it ll happen throughout the day
| alvis wrote:
| I used to work for a bank and the legal team used to ping us to
| make tiny changes to the app for compliance related issues. Now
| they can fix themselves. I think they'd be very proud and happy
| ajkjk wrote:
| Hopefully nobody lets legal touch anything without the ability
| to run the code to test it, plus code reviews. So probably not.
| singularity2001 wrote:
| that will be an interesting new Bug tracker: anyone in the
| company will be able to report any bug or add any future
| request, if the model will be able to solve it automatically
| perfect otherwise some human might take over. The interesting
| question then will be what code changes are legal and within
| the standards of what the company wants. So non-technical
| code/issue reviewer will become a super important and
| ubiquitous job.
| SketchySeaBeast wrote:
| Not just legal/within the standards, but which actually meet
| the unspoken requirements of the request. "We just need a new
| checkbox that asks if you're left handed" might seem easy,
| but then it has ramifications for the Application PDF that
| gets generated, as well as any systems downstream, and maybe
| it requires a data conversion of some sort somewhere. I know
| that the PO's I work with miss stuff or assume that the
| request will just have features by default.
| asdev wrote:
| I promise you the legal team is not pushing any code changes
| skovati wrote:
| I'm curious how many ICs are truly excited about these
| advancements in coding agents. It seems to me the general trend
| is we become more like PMs managing agents and reviewing PRs, all
| for the sake of productivity gains.
|
| I imagine many engineers are like myself in that they got into
| programming because they liked tinkering and hacking and
| implementation details, all of which are likely to be abstracted
| over in this new era of prompting.
| awestroke wrote:
| At the end of the day, it's your job to deliver value. If a
| tool allows you to deliver more faster, without sacrificing
| quality, it's your responsibility to use that tool. You'll just
| have to make sure you can fully take responsibility for the end
| deliverables. And these tools are not only useful for writing
| the final code
| enjoylife wrote:
| > these tools are not only useful for writing the final code
|
| This sparked a thought in how a large part of the job is
| often the work needed to demonstrate impact. I think this
| aspect is often overlooked by some of the good engineers not
| yet taking advantage of the AI tooling. LLM loops may not yet
| be good enough to produce shippable code by themselves, but
| they sure are capable to help reduce the overhead of these up
| and out communicative tasks.
| tough wrote:
| you mean like hacking a first POC with AI to sell a
| product/feature internally to get buy-in from the rest of
| the team before actually shipping production version of it?
| whyowhy3484939 wrote:
| It's actually not. My job description does not say "deliver
| value" and nobody talks about my work like that so I'm not
| quite sure what to make of that.
|
| > without sacrificing quality
|
| Right..
|
| > it's your responsibility to use that tool
|
| Again, it's actually not. It's my responsibility to do my
| job, not to make my boss' - or his boss' - car nicer. I know
| that's what we all know will create "job security" but let's
| not conflate these things. My job is to do my end of the
| bargain. My boss' job is paying me for doing that. If he
| deems it necessary to force me to use AI bullshit, I will of
| course, but it is definitely not my responsibility to do so
| autonomously.
| blibble wrote:
| > At the end of the day, it's your job to deliver value. If a
| tool allows you to deliver more faster, without sacrificing
| quality
|
| I guess that's LLMs ruled out then
| kridsdale3 wrote:
| I do feel that way, so I'll still do bespoke creation when I
| want to. But this is like a sewing machine. My job is to design
| fashion, and a whole line of it. I can do that when a machine
| is making the stitches instead of my using a needle in hand.
| manojlds wrote:
| We (dare I say we instead of I) like talking to computers and
| AI is another computer you talk with. So I am still all
| excited. It's people that I want to avoid :)
| qntmfred wrote:
| people can still write code by hand for fun
|
| people who want to make software that enables people to
| accomplish [task] will get the software they need quicker.
| davedx wrote:
| I think the death of our craft is around the corner. It doesn't
| fill me with joy.
| evantbyrne wrote:
| Software engineering requires a fair amount of intelligence,
| so if these tools ever get to replacement levels of quality
| then it's not just developers that will be out of jobs. ARC-
| AGI-2, the countless anecdotes from professionals I've seen
| across the industry, and personal experience all very clearly
| point to a significant gap between the tools that exist today
| and general intelligence. I would recommend keeping an eye on
| improvements just because of the sheer capital investments
| going into it, but I won't be losing any sleep waiting for
| the rapture.
| ramoz wrote:
| I see it differently. Like a kid with legos.
|
| We had to tinker piece by piece to build a miniature castle.
| Over many hours.
|
| Now I can tinker concept by concept, and build much larger
| castles, much faster. Like waving a wand, seeing my thoughts
| come to fruition in near real time.
|
| No vanity lost in my opinion. Possibly more to be gained.
| CapcomGo wrote:
| I think the bigger issue with this is that the number of
| developer jobs will shrink.
| nluken wrote:
| I think there's a disconnect between what you and the person
| you're replying to are defining as "tinkering". Your
| conception of it seems more focused on the end product when,
| to use your analogy, the original comment seems unconcerned
| with the size of castles.
|
| If you derive enjoyment from actually assembling the castle,
| you lose out on that by using the wand that makes it happen
| instantly. Sure wand's castles may be larger, but you don't
| put a Lego castle together for the finished product.
| lherron wrote:
| Factorio blueprints in action.
| whyowhy3484939 wrote:
| > build much larger castles, much faster
|
| See that never was the purpose.. going bigger and faster,
| towards what exactly? Chaos? By the way we never managed to
| fully tackle manual software development by trained
| professionals and we now expect Shangri-La by throwing
| everything and the kitchen sink into giant inscrutable
| matrices. This time by amateurs as well. I'm sure this will
| all turn out very well and very, very productive.
| chilmers wrote:
| While I share your reservations, how many millions of people
| have experienced the exact same disruption to their jobs and
| industries because of software that we, software engineers,
| have created? It's a bit too late, and a touch hypocritical,
| for us to start complaining about technology now it is
| disrupting our way of working in a way we don't like.
| orange_puff wrote:
| I used to think this way too. Here are a few ways I've tried to
| re frame things that has helped.
|
| 1. When I work on side projects and use AI, sometimes I wonder
| "what's the point if I am just copy / pasting code? I am not
| learning anything" but what I have come to realize is building
| apps with AI assistance is the skill that I am learning, rather
| than writing code per se as it was a few years ago.
|
| 2. I work in high scale distributed computing, so I am still
| presented with ample opportunities to get very low level, which
| I love. I am not sure how much I care about writing code per se
| anymore. Working with AI still is tinkering, it has not changed
| that much for me. It is quite different, but the underlying fun
| parts are still present.
| simianwords wrote:
| Does any one how the quality drops with size of codebase?
| yanis_t wrote:
| So it's looking like it's only running in the cloud, that is it
| will push commits to my remote repo before I have a chance to see
| if it works?
|
| When I'm using aider, after it make a commit what I do, I then
| immediately run git reset HEAD^ and then git diff (actually I use
| github desktop client to see the diff) to evaluate what exactly
| it did, and if I like it or not. Then I usually make some
| adjustments and only after that commit and push.
| flakiness wrote:
| You can think of this as a managed (cloud) version of their
| codex command line tool, which runs locally on your laptop.
|
| The secret sauce here seems like their new model, but I expect
| it to come to API at some point.
| codemac wrote:
| watch the live stream, it shows you the diff as the completed
| task, you decide whether or not to generate a github pr when
| you see the diff.
| danielbln wrote:
| You may want to pass --no-auto-commits to Aider if you peel
| them off HEAD afterwards anyway.
| adamTensor wrote:
| not buying windsurf then???
| motoxpro wrote:
| This would be the why of that acquisition as this needs a more
| integrated UI. Guessing by the speed at which this came out,
| this was in the works long before that acquisition.
| adamTensor wrote:
| it is not even clear *if* they are going to buy windsurf at
| all. And thats a big if. This might've just been the 'why'
| that deal is not happening.
| shmoogy wrote:
| This probably came out to beat Google I/O or something
| similar - odd Friday release otherwise.
| ianbutler wrote:
| Im super curious to see how this actually does at finding
| significant bugs, we've been working in the space on
| https://www.bismuth.sh for a while and one of the things we're
| focused on is deep validation of the code being outputted.
|
| There's so many of these "vibe coding" tools and there has to be
| real engineering rigor at some point. I saw them demo "find the
| bug" but the bugs they found were pretty superficial and thats
| something we've seen in our internal benchmark from both Devin
| and Cursor. A lot of noise and false positives or superficial
| fixes.
| orliesaurus wrote:
| Why hasn't Github released this? Why it's OpenAI releasing this?!
| adpirz wrote:
| It's on their roadmap: https://github.blog/news-
| insights/product-news/github-copilo...
|
| But they aren't moving nearly as fast as OpenAI. And it remains
| to be seen if first mover will mean anything.
| taytus wrote:
| Github moves too slow, and OpenAI moves too fast.
| danielbln wrote:
| GitHub has released this, it's called Copilot Agent.
| johnjwang wrote:
| Some engineers on my team at Assembled and I have been a part of
| the alpha test of Codex, and I'll say it's been quite impressive.
|
| We've long used local agents like Cursor and Claude Code, so we
| didn't expect too much. But Codex shines in a few areas:
|
| Parallel task execution: You can batch dozens of small edits
| (refactors, tests, boilerplate) and run them concurrently without
| context juggling. It's super nice to run a bunch of tasks at the
| same time (something that's really hard to do in Cursor, Cline,
| etc.)
|
| It kind of feels like a junior engineer on steroids, you just
| need to point it at a file or function, specify the change, and
| it scaffolds out most of a PR. You still need to do a lot of work
| to get it production ready, but it's as if you have an infinite
| number of junior engineers at your disposal now all working on
| different things.
|
| Model quality is good, but hard to say it's that much better than
| other models. In side-by-side tests with Cursor + Gemini 2.5-pro,
| naming, style and logic are relatively indistinguishable, so
| quality meets our bar but doesn't yet exceed it.
| fourside wrote:
| > You still need to do a lot of work to get it production
| ready, but it's as if you have an infinite number of junior
| engineers at your disposal now all working on different things.
|
| One issue with junior devs is that because they're not fully
| autonomous, you have to spend a non trivial amount of time
| guiding them and reviewing their code. Even if I had easy
| access to a lot of them, pretty quickly that overhead would
| become the bottleneck.
|
| Did you think that managing a lot of these virtual devs could
| get overwhelming or are they pretty autonomous?
| fabrice_d wrote:
| They wrote "You still need to do a lot of work to get it
| production ready". So I would say it's not much better than
| real colleagues. Especially since junior devs will improve to
| a point they don't need your hand holding (remember you also
| were a junior at some point), which is not proven will happen
| with AI tools.
| bmcahren wrote:
| Counter-point A: AI coding assistance tools are rapidly
| advancing at a clip that is inarguably faster than humans.
|
| Counter-point B: AI does not get tired, does not need
| space, does not need catering to their experience. AI is
| fine being interrupted and redirected. AI is fine spending
| two days on something that gets overwritten and thrown away
| (no morale loss).
| HappMacDonald wrote:
| Counter-counter-point A: If I work with a human Junior
| and they make an error or I familiarize them with any
| quirk of our workflow, and I correct them, they will
| recall that correction moving forward. An AI assistant
| either will not remember 5 minutes later (in a different
| prompt on a related project) and repeat the mistake, or
| I'll have to take the extra time to code some reminder
| into the system prompt for every project moving forward.
|
| Advancements in general AI knowledge over time will not
| correlate to improvements in remembering any matters as
| colloquial as this.
|
| Counter-counter-point B: AI _absolutely_ needs catering
| to their experience. Prompter must always learn how to
| phrase things so that the AI will understand them, adjust
| things when they get stuck in loops by removing confusing
| elements from the prompt, etc.
| SketchySeaBeast wrote:
| I find myself thinking about juniors vs AI as babies vs
| cats. A cat is more capable sooner, you can trust it when
| you leave the house for two hours, but it'll never grow
| past shitting in a box and needing to be fed.
| rfoo wrote:
| You don't need to be nice to your virtual junior devs. Saves
| quite a lot time too.
|
| As long as I spend less time reviewing and guiding than doing
| it myself it's a win for me. I don't have any fun doing these
| things and I'd rather yelling at a bunch of "agents". For
| those who enjoy doing bunch of small edits I guess it's the
| opposite.
| HappMacDonald wrote:
| I'm definitely wary of the concept of dismissing courtesy
| when working with AI agents, because I certainly don't want
| to lose that habit when I turn around and have to interact
| with humans again.
| strangescript wrote:
| it feels like openai are at a ceiling with their models, codex1
| seems to be another RLHF derivative from the same base model.
| You can see this in their own self reported o3-high comparison
| where at 8 tries they converge at the same accuracy.
|
| It also seems very telling they have not mentioned o4-high
| benchmarks at all. o4-mini exists, so logically there is an o4
| full model right?
| aorobin wrote:
| Seems likely that they are waiting to release o4 full results
| until the gpt-5 release later this year, presumably because
| gpt-5 is bundled with a roughly o4 level reasoning
| capability, and they want gpt-5 to feel like a significant
| release.
| losvedir wrote:
| Do you still think there will be a gpt-5? I thought the
| consensus was GPT-5 never really panned out and was
| released with little fanfare as 4.1.
| NewEntryHN wrote:
| The advantage of Cursor is the reduced feedback loop where you
| watch it live and can intervene at any moment to steer it in
| the right direction. Is Codex such a superior model that it
| makes sense to take the direction of a mostly background agent,
| on which you seemingly have a longer feedback loop?
| woah wrote:
| > Parallel task execution: You can batch dozens of small edits
| (refactors, tests, boilerplate) and run them concurrently
| without context juggling. It's super nice to run a bunch of
| tasks at the same time (something that's really hard to do in
| Cursor, Cline, etc.)
|
| > It kind of feels like a junior engineer on steroids, you just
| need to point it at a file or function, specify the change, and
| it scaffolds out most of a PR. You still need to do a lot of
| work to get it production ready, but it's as if you have an
| infinite number of junior engineers at your disposal now all
| working on different things.
|
| What's the benefit of this? It sounds like it's just a gimmick
| for the "AI will replace programmers" headlines. In reality,
| LLMs complete their tasks within seconds, and the time
| consuming part is specifying the tasks and then reviewing and
| correcting them. What is the point of parallelizing the fastest
| part of the process?
| ctoth wrote:
| > Each task is processed independently in a separate,
| isolated environment preloaded with your codebase. Codex can
| read and edit files, as well as run commands including test
| harnesses, linters, and type checkers. Task completion
| typically takes between 1 and 30 minutes, depending on
| complexity, and you can monitor Codex's progress in real
| time.
| johnjwang wrote:
| In my experience, it still does take quite a bit of time
| (minutes) to run a task on these agentic LLMs (especially
| with the latest reasoning models), and in Cursor / Cline /
| other code editor versions of AI, it's enough time for you to
| get distracted, lose context, and start working on another
| task.
|
| So the benefit is really that during this "down" time, you
| can do multiple useful things in parallel. Previously, our
| engineers were waiting on the Cursor agent to finish, but the
| parallelization means you're explicitly turning your brain
| off of one task and moving on to a different task.
| woah wrote:
| In my experience in Cursor with Claude 3.5 and Gemini 2.5,
| if an agent has run for more than a minute it has usually
| lost the plot. Maybe model use in Codex is a new breed?
| odie5533 wrote:
| It depends what level you ask them to work on, but I
| agree, all of my agent coding is active and completed in
| usually <15 seconds.
| kfajdsl wrote:
| A single response can take a few seconds, but tasks with
| agentic flows can be dozens of back and forths. I've had a
| fairly complicated Roo Code task take 10 minutes (multiple
| subtasks).
| Jimmc414 wrote:
| > We've long used local agents like Cursor and Claude Code, so
| we didn't expect too much.
|
| If you don't mind, what were the strengths and limitations of
| Claude Code compared to Codex? You mentioned parallel task
| execution being a standout feature for Codex - was this a
| particular pain point with Claude Code? Any other insights on
| how Claude Code performed for your team would be valuable. We
| are pleased with Claude Code at the moment and were a bit
| underwhelmed by comparable Codex CLI tool OAI released earlier
| this month.
| t_a_mm_acq wrote:
| Post realizing CC can operate same code base, same file tree
| on different terminals instances, it's been a significant
| unlock for us. Most devs have 3 running concurrently. 1.
| master task list + checks for completion on tasks. 2.
| operating on current task + documentation. 3. side quests,
| bugs, additional context.
|
| rinse and repeat once task done, update #1 and cycle again.
| Add in another CC window if need more tasks concurrently.
|
| downside is cost but if not an issue, it's great for getting
| stuff done across distributed teams..
| naiv wrote:
| do you have then instance 2 and 3 listening to instance 1
| with just a prompt? or how does this work?
| naiv wrote:
| to answer my own questions , it is actually laid out in
| chapter 6 of
| https://www.anthropic.com/engineering/claude-code-best-
| pract...
| criddell wrote:
| If you aren't hiring junior engineers to do these kinds of
| things, where do you think the senior engineers you need in the
| future will come from?
|
| My kid recently graduated from a very good school with a degree
| in computer science and what she's told me about the job market
| is scary. It seems that, relatively speaking, there's a lot of
| postings for senior engineers and very little for new grads.
|
| My employer has hired recently and the flood of resumes after
| posting for a relatively low level position was nuts. There was
| just no hope of giving each candidate a fair chance and that
| really sucks.
|
| My kid's classmates who did find work did it mostly through
| personal connections.
| echelon wrote:
| The never ending march of progress.
|
| It's probably over for these folks.
|
| There will likely(?, hopefully?) be new adjacent gradients
| for people to climb.
|
| In any case, I would worry more about your own job prospects.
| It's coming for everyone.
| voidspark wrote:
| It's his daughter. He is worried about his daughter first
| and foremost. Weird reply.
| echelon wrote:
| I'm sorry. I was skimming. I had no idea he mentioned his
| kid.
|
| I was running a quick errand between engineering meetings
| and saw the first few lines about hiring juniors, and I
| wrote a couple of comments about how I feel about all of
| this.
|
| I'm not always guilty of skimming, but today I was.
| hintymad wrote:
| > If you aren't hiring junior engineers to do these kinds of
| things, where do you think the senior engineers you need in
| the future will come from?
|
| Unfortunately this is not how companies think. I read
| somewhere more than 20 years ago about outsourcing and
| manufacturing offshoring. The author basically asked the
| same: if we move out the so-called low-end jobs, where do we
| think we will get the senior engineers? Yet companies
| continued offshoring, and the western lost talent and know-
| how, while watching our competitor you-know-who become the
| world leader in increasingly more industries.
| echelon wrote:
| It's happening to Hollywood right now. In the past three
| years, since roughly 2022, the majority of IATSE folks
| (film crew, grips, etc.) have seen their jobs disappear to
| Eastern Europe where the labor costs one tenth of what it
| does here. And there are no rules for maximum number of
| consecutive hours worked.
| lurking_swe wrote:
| ahh, the classic "i shall please my investors next quarter
| while ignoring reality, so i can disappoint my shareholders
| in 10 years". lol.
|
| As you say, happens all the time. Also doesn't make sense
| because so few people are buying individual stocks anyway.
| Goal should be to consistently outperform over the long
| term. Wall street tends to be very myopic.
|
| Thinking long term is a hard concept for the bean counters
| at these tech companies i guess...
| miohtama wrote:
| What then ends up happening is that companies how fall
| behind in R&D eventually lose market share and get
| replaced by more agile competitors.
|
| But this does not happen in industry verticals that are
| protected by regulation (banks) or national interest
| (Boring).
| kypro wrote:
| > If you aren't hiring junior engineers to do these kinds of
| things, where do you think the senior engineers you need in
| the future will come from?
|
| They'll probably just need to learn for longer and if
| companies ever get so desperate for senior engineers then
| just take the most able/experienced junior/mid level dev.
|
| But I'd argue before they do that if companies can't find
| skilled labour domestically they should consider bringing
| skilled workers from abroad. There are literally hundreds of
| millions of Indians who got connected to the internet over
| the last decade. There's no reason a company should struggle
| to find senior engineers.
| oytis wrote:
| So basically all education facilities should go abroad too
| if no one needs Western fresh grads. Will provide a lot of
| shareholder value, but there are some externalities too.
| rboyd wrote:
| India coming online just in time for AI is awkward
| slater wrote:
| > If you aren't hiring junior engineers to do these kinds of
| things, where do you think the senior engineers you need in
| the future will come from?
|
| Money number must _always_ go up. Hiring people costs money.
| "Oh hey I just read this article, sez you can have A.I. code
| your stuff, for pennies?"
| ilaksh wrote:
| I don't think jobs are necessarily a good plan at all
| anymore. Figure out how to leverage AIs and robots as cheap
| labor, and sell services or products. But if someone is
| trying to get a job, I get the impression that networking
| helps more than anything.
| sandspar wrote:
| Yeah, the value of the typical job application meta is
| trending to zero very quickly. Entrepreneurship has a steep
| learning curve; you should start learning it as soon as
| possible. Don't waste your time learning to run a straight
| line - we're entering off-road territory.
| DGAP wrote:
| There aren't going to be senior engineers in the future.
| _bin_ wrote:
| This is a bit of a game theory problem. "Training senior
| engineers" is an expensive and thankless task: you bear
| essentially all the cost, and most of the total benefit
| accrues to others as a positive externality. Griping at
| companies that they should undertake to provide this positive
| externality isn't really a constructive solution.
|
| I think some people are betting on the fact that AI can
| replace junior devs in 2-5 years and seniors in 10-20, when
| the old ones are largely gone. But that's sort of beside the
| point as far as most corporate decision-making.
| nopinsight wrote:
| With Agentic RL training and sufficient data, AI operating
| at the level of average _senior engineers_ should become
| plausible in a couple to a few years.
|
| Top-tier engineers who integrate a deep understanding of
| business and user needs into technical design will likely
| be safe until we get full-fledged AGI.
| yahoozoo wrote:
| Why in a few years? What training data is missing that we
| can't have senior level agents today?
| al_borland wrote:
| That sounds like a dangerous bet.
| SketchySeaBeast wrote:
| Sounds like a bet a later CEO will need to check.
| _bin_ wrote:
| As I see it, it's actually the only safe bet.
|
| Case 1: you keep training engineers.
|
| Case 1.1: AGI soon, you don't need juniors or seniors
| besides a very few. You cost yourself a ton of money that
| competitors can reinvest into R&D, use to undercut your
| prices, or return to keep their investors happy.
|
| Case 1.2: No AGI. Wages rise, a lot. You must remain in
| line with that to avoid losing those engineers you
| trained.
|
| Case 2: You quit training juniors and let AI do the work.
|
| Case 2.1: AGI soon, you have saved yourself a bundle of
| cash and remain mostly in in line with the market.
|
| Case 2.2: no AGI, you are in the same bidding war for
| talent as everyone else, the same place you'd have been
| were you to have spent all that cash to train engineers.
| You now have a juicier balance sheet with which to enter
| this bidding war.
|
| The only way out of this, you can probably see, is some
| sort of external co-ordination, as is the case with most
| of these situations. The high-EV move is to quit training
| juniors, by a mile, independently of whether AI can
| replace senior devs in a decade.
| spongebobstoes wrote:
| An interesting thing to consider is that Codex might get
| people to be better at delegating, which might improve
| the effectiveness of hiring junior engineers. Because the
| senior engineers will have better skills at delegating,
| leading to a more effective collaboration.
| al_borland wrote:
| You're looking at it from the point of view of an
| individual company. I'm seeing it as a risk for the
| entire industry.
|
| Senior engineers are already very well paid. Wages rising
| a lot from where they already are, while companies
| compete for a few people, and those who can't afford it
| need to lean on AI or wait 10+ years for someone to
| develop with equivalent expertise... all of this sounds
| bad for the industry. It's only good for the few senior
| engineers that are about to retire, and the few who went
| out of their way to not use AI and acquire actual skills.
| dorian-graph wrote:
| This hyper-fixation on replacing engineers in writing code
| is hilarious, and dangerous, to me. Many people, even in
| tech companies, have no idea how software is built,
| maintained, and run.
|
| I think instead we should focus on getting rid of managers
| and product owners.
| jchanimal wrote:
| The real judge will be survivorship bias and as a betting
| man, I might think product owners are the ones with the
| entrepreneurial spirit to make it to the other side.
| MoonGhost wrote:
| I've worked for a company which turned from startup to
| this. Product owners had no clue what they own. And no
| brain capacity to suggest something useful. They were
| just taken from the street at best, most likely had
| relatives' helping hands. In a couple of years company
| probably tripled manages headcount. It didn't help.
| QuadmasterXLII wrote:
| it's obviously intensely correlated: the vast majority of
| scenarios either both are replaced or neither
| odie5533 wrote:
| As a dev, if you try taking away my product owners I will
| fight you. Who am I going to ask for requirements and
| sign-offs, the CEO?
| oytis wrote:
| Your architect, principal engineer etc. (one spot-on job
| title I've seen is "product architect"), who in turn
| talks to the senior management. Basically an engineer
| with a talent and experience for building products rather
| than a manager with superficial understanding of
| engineering. I think the most ambitious teams have
| someone like this on top - or at least around
| deadmutex wrote:
| Perhaps the role will merge into one, and will replace a
| good chunk of those jobs.
|
| E.g.:
|
| If we have 10 PMs and 90 devs today, that could be
| hypothetically be replace by 8 PM+Dev, 20 specialized
| devs, and 2 specialized PMs in the future.
| hooverd wrote:
| I think it'll be great if you're working in software not
| for a software company.
| sam0x17 wrote:
| Hiring of juniors is basically dead these days and it has
| been like this for about 10 years and I hate it. I remember
| when I was a junior in 2014 there were actually startups who
| would hire cohorts of juniors (like 10 at a time, fresh out
| of CS degree sort of folks with almost no applied coding
| experience) and then train them up to senior for a few years,
| and then a small number will stay and the rest will go
| elsewhere and the company will hire their next batch of
| juniors. Now no one does this, everyone wants a senior no
| matter how simple the task. This has caused everyone in the
| industry to stuff their resume, so you end up in a situation
| where companies are looking for 10 years of experience in
| ecosystems that are only 5 years old.
|
| That said, back in the early 00s there was much more of a
| culture of everyone is expected to be self-taught and doing
| real web dev probably before they even get to college, so by
| the time they graduate they are in reality quite senior. This
| was true for me and a lot of my friends, but I feel like
| these days there are many CS grads who haven't done a lot of
| applied stuff. But at the same time, to be fair, this was a
| way easier task in the early 00s because if you knew
| JS/HTML/CSS/SQL, C++ and maybe some .NET language that was
| pretty much it you could do everything (there were virtually
| no frameworks), now there are thousands of frameworks and
| languages and ecosystems and you could spend 5+ years
| learning any one of them. It is no longer possible for one
| person to learn all of tech, people are much more specialized
| these days.
|
| But I agree that eventually someone is going to have to start
| hiring juniors again or there will be no seniors.
| dgb23 wrote:
| I recently read an article about the US having a relatively
| weak occupational training.
|
| To contrast, CH and GER are known to have very robust and
| regulated apprenticeship programs. Meaning you start
| working at a much earlier age (16) and go to vocational
| school at the same time for about 4 years. This path is
| then supported with all kinds of educational stepping
| stones later down the line.
|
| There are many software developers who went that route in
| CH for example, starting with an application development
| apprenticeship, then getting to technical college in their
| mid 20's and so on.
|
| I think this model has a lot of advantages. University is
| for kids who like school and the academic approach to
| learning. Apprenticeships plus further education or an
| autodidactic path then casts a much broader net, where you
| learn practical skills much earlier.
|
| There are several advantages and disadvantages of both
| paths. In summary I think the academic path provides deeper
| CS knowledge which can be a force multiplier. The
| apprenticeship path leads to earlier high productivity and
| pragmatism.
|
| My opinion is that in combination, both being strongly
| supported paths, creates more opportunities for people and
| strengthens the economy as a whole.
| oytis wrote:
| I know about this system, but I am not convinced it can
| work in such a dynamic field as software. When tools
| change all the time, you need strong fundamentals to stay
| afloat - which is what universities provide.
|
| Vocational training focusing on immediate fit for the
| market is great for companies that want to extract
| maximal immediate value from labour for minimal cost, but
| longer term is not good for engineers themselves.
| thomasahle wrote:
| > But at the same time, to be fair, this was a way easier
| task in the early 00s
|
| The best junior I've hired was a big contributor to an open
| source library we were starting to use.
|
| I think there's still lots of opportunity for honing your
| skill, and showing it off, outside of schools.
| oytis wrote:
| I guess the industry leaders think we'll not need senior
| engineers either as capabilities evolve.
|
| But also, I think this underestimates significantly what
| junior engineers do. Junior engineers are people who have
| spent 4 to 6 years receiving a specialised education in a
| university - and they normally need to be already good at
| school math. All they lack is experience applying this
| education on a job - but they are professionals - educated,
| proactive and mostly smart.
|
| The market is tough indeed, and as much it is tough for a
| senior engineer like myself, I don't envy the current cohort
| of fresh grads. It being tough is only tangentially related
| to the AI though. Main factor is the general economic
| slowdown, with AI contributing by distracting already scarce
| investment from non-AI companies and producing a lot of
| uncertainty in how many and what employees companies will
| need in the future. Their current capabilities are nowhere
| near to having a real economic impact.
|
| Wish your kid and you a lot of patience, grit and luck.
| voidspark wrote:
| This is exactly the problem. The top level executives are
| setting up to retire with billions in the bank, while the
| workers develop their own replacements before they retire
| with millions in the bank. Senior developers will be mostly
| obsolete too.
|
| I have mentored junior developers and found it to be a
| rewarding part of the job. My colleagues mostly ignore
| juniors, provide no real guidance, couldn't care less. I see
| this attitude from others in the comments here, relieved they
| don't have to face that human interaction anymore. There are
| too many antisocial weirdos in this industry.
|
| Without a strong moral and cultural foundation the AGI
| paradigm will be a dystopia. Humans obsolete across all
| industries.
| criddell wrote:
| > I have mentored junior developers and found it to be a
| rewarding part of the job.
|
| That's really awesome. I hope my daughter finds a job
| somewhere that values professional development. I'd hate
| for her to quit the industry before she sees just how
| interesting and rewarding it can be.
|
| I didn't have many mentors when starting out, but the ones
| I had were so unbelievably helpful both professionally and
| personally. If I didn't have their advice and
| encouragement, I don't think I'd still be doing what I'm
| doing.
| aprdm wrote:
| She can try to reach out to possible mentors / people on
| Linkedin. A bit like cold calling. It works, people
| (usually) want to help and don't mind sharing their
| experiences / tips. I know I have helped many random
| linedin cold messages from recent grads/people in uni
| oytis wrote:
| > I have mentored junior developers and found it to be a
| rewarding part of the job.
|
| Can totally relate. Unfortunately the trend for all-senior
| teams and companies has started long before ChatGPT, so the
| opportunities have been quite scarce, at least in a
| professional environment.
| layer8 wrote:
| I share your worries, but the time horizon for the supply of
| senior engineers drying up is just too long for companies to
| care at this time, in particular if productivity keeps
| increasing. And it's completely unclear what the state of the
| art will be in 20 years; the problem might mostly solve
| itself.
| johnjwang wrote:
| To be clear, we still hire engineers who are early in their
| careers (and we've found them to be some of the best folks on
| our team).
|
| All the same principles apply as before: smart, driven, high
| ownership engineers make a huge difference to a company's
| success, and I find that the trend is even stronger now than
| before because of all the tools that these early career
| engineers have access to. Many of the folks we've hired have
| been able to spin up on our codebase much faster than in the
| past.
|
| We're mainly helping them develop taste for what good code /
| good practices look like.
| criddell wrote:
| > we still hire engineers who are early in their careers
|
| That's really great to hear.
|
| Your experience that a new engineer equipped with modern
| tools is more effective and productive than in the past is
| important to highlight. It makes total sense.
| startupsfail wrote:
| More recent models are not without drive and are not stupid
| either.
|
| There's still quite a bit of a gap in terms of trust.
| dgb23 wrote:
| AI might play a role here. But there's also a lot of economic
| uncertainty.
|
| It's not long ago when the correction of the tech job market
| started, because it got blown up during and after covid. The
| geopolitical situation is very unstable.
|
| I also think there is way too much FUD around AI, including
| coding assistants, than necessary. Typically coming either
| from people who want to sell it or want to get in on the
| hype.
|
| Things are shifting and moving, which creates uncertainty.
| But it also opens new doors. Maybe it's a time for risk
| takers, the curious, the daring. Small businesses and new
| kinds of services might rise from this, like web development
| came out of the internet revolution. To me, it seems like
| things are opening up and not closing down.
|
| Besides that, I bet there are more people today who write,
| read or otherwise deal directly with assembly code than ever
| before, even though we had higher level languages for many
| decades.
|
| As for the job market specifically: SWE and CS (adjacent)
| jobs are still among the fastest growing, coming up in all
| kinds of lists.
| ikiris wrote:
| Much like everything in the economy currently, externalities
| are to be shouldered by "others" and if there is no "other"
| in aggregate, well, it's not our problem. Yet.
| polskibus wrote:
| I think the bigger problem, that started around 2022 is much
| lower volume of jobs in software development. Projects were
| shutdown, funding was retracted, even the big wave of
| migrations to the cloud died down.
|
| Today startups mostly wrap LLMs as this is what VCs expect.
| Larger companies have smaller IT budgets than before
| (adjusted for inflation). This is the real problem that
| causes the jobs shortage.
| geekraver wrote:
| Same, mine is about to graduate with a CS masters from a
| great school. Couldn't get any internships, and is now
| incredibly negative about ever being able to find work, which
| doesn't help. We're pretty much looking at minimum wage jobs
| doing tech support for NGOs at this point (and the current
| wave of funding cuts from Federal government for those kind
| of orgs is certainly not going to help with that).
| MoonGhost wrote:
| With so many graduates looking for a job why don't they
| bang together and do something. If not for money then just
| to show off their skills, something to put in the resume.
|
| It's not going to get any easier in next next few years, I
| think. Till the point when fresh grad using AI can make
| something valuable. After that it will be period when
| anybody can just ask AI to do something and it will find
| soft in its library or write from scratch. In long terms,
| 10 years may be, humanity probably will not need this many
| developers. There will be split like in games industry:
| tools/libs developers and product devs/artists/designers.
| With the majority in second category.
| atonse wrote:
| I feel for your daughter. I can totally see how tools like
| this will destroy the junior job market.
|
| But I also wonder (I'm thinking out loud here, so pardon the
| raw unfiltered thoughts), if being a junior today is
| unrecognizable.
|
| Like for example, that whatever a "junior" will be now, will
| have to get better at thinking at a higher level, rather than
| the minute that we did as juniors (like design patterns and
| all that stuff).
|
| So maybe the levels of abstraction change?
| FilosofumRex wrote:
| > If you aren't hiring junior engineers..., where do you
| think the senior engineers you need in the future will come
| from?
|
| This problem might be new to CS, but has happened to other
| engineers, notably to MechE in the 90's, ChemE in 80's,
| Aerospace in 70's, etc... due to rapid pace of automation and
| product commoditization.
|
| The senior jobs will disappear too, or offshored to a
| developing country: Exxon (India 152 - 78 US)
| https://jobs.exxonmobil.com/ Chevron (India 159 - 4 US)
| https://careers.chevron.com/search-jobs
| MoonGhost wrote:
| > The senior jobs will disappear too
|
| Golden age of software development will be over soon?
| Probably, for humans. How cool is it, the most enthusiastic
| part will be replaced first.
| harrison_clarke wrote:
| i think there's an opportunity here
|
| a lot of junior eng tasks don't really help you become a
| senior engineer. someone needs to make a form and a backend
| API for it to talk to, because it's a business need. but
| doing 50 of those doesn't really impart a lot of wisdom
|
| same with writing tests. you'll probably get faster at
| writing tests, but that's about it. knowing that you need the
| tests, and what kinds of things might go wrong, is the senior
| engineer skill
|
| with the LLMs current ability to help people research a
| topic, and their growing ability to write functioning code,
| my hunch is that people with the time to spare can learn
| senior engineer skills while bypassing being a junior
| engineer
|
| convincing management of that is another story, though. if
| you can't afford to do unpaid self-directed study, it's
| probably going to be a bumpy road until industry figures out
| how to not eat the seed corn
| ozgrakkurt wrote:
| Graduating as a junior is just not enough in a more
| competitive market like there is now. I don't think it is
| related to anything else. If you can hire a developer that is
| spending 10x time coding or a developer that has studied and
| graduated, this is not much of a choice. If you don't have
| the option than you might go with a junior
| mhitza wrote:
| > It seems that, relatively speaking, there's a lot of
| postings for senior engineers and very little for new grads.
|
| That's been the case for most of the last 15 years in my
| experience. You have to follow local job markets, get in
| through an internship, or walk in at local companies and ask.
| Applying en mass can also help, and so does having some code
| on GitHub to show off.
| dalemhurley wrote:
| We have seen this in other industries and professions.
|
| As everything is so new and different at this stage we are in
| a state of discovery which requires more senior skills to
| work out the lay of the land.
|
| As we progress, create new procedures, processes, and
| practices, particularly guardrails then hiring new juniors
| will become the focus.
| runako wrote:
| > Parallel task execution: You can batch dozens of small edits
| (refactors, tests, boilerplate) and run them concurrently
| without context juggling.
|
| This is also part of a recent update to Zed. I typically use
| Zed with my own Claude API key.
| ai-christianson wrote:
| Is Zed managing the containerized dev environments, or
| creating multiple worktrees or anything like that? Or are
| they all sharing the same work tree?
| runako wrote:
| As far as I know, they are sharing a single work tree. So I
| suppose that could get messy by default.
|
| That said, it might be possible to tell each agent to
| create a branch and do work there? I haven't tried that.
|
| I haven't seen anything about Zed using containers, but
| again you might be able to tell each agent to use some
| container tooling you have in place since it can run
| commands if you give it permission.
| _bin_ wrote:
| I believe cursor now supports parallel tasks, no? I haven't
| done much with it personally but I have buddies who have.
|
| If you want one idiot's perspective, please hyper-focus on
| model quality. The barrier right now is not tooling, it's the
| fact that _models are not good enough for a large amount of
| work_. More importantly, they 're still closer to interns than
| junior devs: you must give them a ton of guidance, constant
| feedback, and a very stern eye for them to do even pretty
| simple tasks.
|
| I'd like to see something with an o1-preview/pro level of
| quality that isn't insanely expensive, particularly since a lot
| of programming isn't about syntax (which most SotA modls have
| down pat) but about _understanding_ the underlying concepts, an
| area in which they remain weak.
|
| Atp I really don't care if the tooling sucks. Just give me
| really, really good mdoels that don't cost a kidney.
| quantumHazer wrote:
| CTO of an AI agents company (which has worked with AI labs)
| says agents works fine. Nothing new under the sun.
| hintymad wrote:
| It looks we are in this interesting cycle: millions of
| engineers contribute to open-source on github. The best of our
| minds use the code to develop powerful models to replace
| exactly these engineers. In fact, the more code a group
| contributes to github, the easier it is for the companies to
| replace this group. Case in point, frontend engineers are
| impacted most so far.
|
| Does this mean people will be less incentivized to contribute
| to open source as time goes by?
|
| P.S., I think the current trend is a wakeup call to us software
| engineers. We thought we were doing highly creative work, but
| in reality we spend a lot of time doing the basic job of
| knowledge workers: retrieving knowledge and interpolating some
| basic and highly predictable variations. Unfortunately, the
| current AI is really good at replacing this type of work.
|
| My optimistic view is that in long term we will have invent or
| expand into more interesting work, but I'm not sure how long we
| will have to wait. The current generation of software engineers
| may suffer high supply but low demand of our profession for
| years to come.
| Daishiman wrote:
| > P.S., I think the current trend is a wakeup call to us
| software engineers. We thought we were doing highly creative
| work, but in reality we spend a lot of time doing the basic
| job of knowledge workers: retrieving knowledge and
| interpolating some basic and highly predictable variations.
| Unfortunately, the current AI is really good at replacing
| this type of work.
|
| Most of the waking hours of most creative work have this type
| of drudgery. Professional painters and designers spend most
| of their time replicating ideas that are well fleshed-out.
| Musicians spend most of their time rehearsing existing
| compositions.
|
| There is a point to be made that these repetitive tasks are a
| prerequisite to come up with creative ideas.
| rowanG077 wrote:
| I disagree. AI have shown to most capable in what we
| consider creative jobs. Music creation, voice acting,
| text/story writing, art creation, video creation and more.
| roflyear wrote:
| If you mean create as in literally, sure. But not in
| being creative. AI can't solve novel problems yet. The
| person you're replying to obviously means being creative
| not literally creating something.
| crat3r wrote:
| What is the qualifier for this? Didn't one of the models
| recently create a "novel" algorithm for a math problem?
| I'm not sure this holds water anymore.
| rowanG077 wrote:
| You can't say AI is creating something new but that it
| isn't being creative with clearly explaining why you
| think that's the case. AI is creating novel solution to
| problems humans haven't cracked in centuries. I don't see
| anything more creative than this.
| KaiserPro wrote:
| > AI have shown to most capable in what we consider
| creative jobs
|
| no it creates shit thats close enough for people who are
| in a rush and dont care.
|
| ie, you need artwork for shit on temu, boom job done.
|
| You want to make a poster for a bake sale, boom job done.
|
| Need some free music that sounds close enough to be
| swifty, but not enough to get sued, great.
|
| But as an expression of creativity, most people cant get
| it to do that.
|
| Its currently slightly more configurable clipart.
| rowanG077 wrote:
| > AI creates novel algorithms beating thousands of
| googlers.
|
| Random HNer on an AI post one day later
|
| > Its currently slightly more configurable clipart.
|
| It's so ridiculous at this point that I can just laugh
| about this.
| electrondood wrote:
| > doing the basic job of knowledge workers
|
| If you extrapolate and generalize further... what is at risk
| is any task that involves taking information input (text,
| audio, images, video, etc.), and applying it to create some
| information output or perform some action which is useful.
|
| That's basically the definition of work. It's not just
| knowledge work, it's literally any work.
| lispisok wrote:
| As much as I support community developed software and "free
| as in freedom", "Open Source" got completely perverted into
| tricking people to work for free for huge financial benefits
| for others. Your comment is just one example of that.
|
| For that reason all my silly little side projects are now in
| private repos. I dont care the chance somebody builds a
| business around them is slim to none. Dont think putting a
| license will protect you either. You'd have to know somebody
| is violating your license before you can even think about
| doing anything and that's basically impossible if it gets
| ripped into a private codebase and isnt obvious externally.
| hintymad wrote:
| > "Open Source" got completely perverted into tricking
| people to work for free for huge financial benefits for
| others
|
| I'm quite conflicted on this assessment. On one hand, I was
| wondering if we would get better job market if there were
| not much open-sourced systems. We may have had a much
| slower growth, but we would see our growth last for a lot
| more years, which mean we may enjoy our profession until
| our retirement and more. On the other hand, open source did
| create large cakes, right? Like the "big data" market, the
| ML market, the distributed system market, and etc. Like the
| millions of data scientists who could barely use Pandas and
| scipy, or hundreds of thousands of ML engineers who
| couldn't even bother to know what semi positive definite
| matrix is.
|
| Interesting times.
| blibble wrote:
| > Does this mean people will be less incentivized to
| contribute to open source as time goes by?
|
| personally, I completely stopped 2 years ago
|
| it's the same as the stack overflow problem: the incentive to
| contribute tends towards zero, at which point the plagiarism
| machine stops improving
| SubiculumCode wrote:
| Now do open science.
|
| More generally, specialty knowledge is valuable. From now on,
| all employees will be monitored in order to replace them.
| dakiol wrote:
| This whole "LLMs == junior engineers" is so pedantic. Don't we
| realize that the same way senior engineers thinkg that LLMs can
| just replace junior engineers, high-level executives think that
| LLMs will soon replace senior ones?
|
| Junior engineers are not cattle. They are the future senior
| ones, they bring new insights into teams, new perspectives;
| diversity. I can tell you the times I have learnt so many
| valuable things from so-called junior engineers (and not only
| tech-wise things).
|
| LLMs have their place, but ffs, stop with the "junior engineer
| replacement" shit.
| obsolete_wagie wrote:
| You need someone thats technical to look at the agent output,
| senior engineers will be around. Junior engineers are
| certainly being replaced
| dakiol wrote:
| Thanks, Sherlock. Now, tell me, when senior engineers start
| to retire, who will replace them? Ah, yeah, I can hear you
| say "LLMs!". And LLMs will rewrite themselves so we won't
| need seniors anymore writing code. And LLMs will write all
| the code companies need. So obvious, of course. We won't
| need a single senior because we won't have them, because
| they are not hired these days anymore. Perfect plan.
| alfalfasprout wrote:
| TBH the people I see parroting the LLM=junior engineer BS are
| almost always technically incompetent or so disconnected at
| this point from what's happening on the ground that they
| wouldn't know either way.
|
| I've been using the codex agent since before this
| announcement btw along with most of the latest LLMs. I
| literally work in the AI/ML tooling space. We're entering a
| dangerous world now where there's super useful technology but
| people are trying to use it to replace others instead of
| enhance them. And that's causing the wrong tools to be built.
| fullstackchris wrote:
| Are you payed to say this? Sorry for my frankness but I dont
| understand how you can have multiple agents concurrently
| editing the same areas of code without any sort of merge
| conflicts later.
| tough wrote:
| can someone give me a test prompt to one-shot something in go for
| testing?
|
| (Im trying something)
|
| what would be an impressive program that an agent should be able
| to one-shot in one go?
| blixt wrote:
| They mentioned "microVM" in the live stream. Notably there's no
| browser or internet access. It makes sense, running specialized
| Firecracker/Unikraft/etc microkernels is way faster and cheaper
| so you can scale it up. But there will be a big technical
| scalability difficulty jump from this to the "agents with their
| own computers". ChatGPT Operator already does have a browser, so
| they definitely can do this, but I imagine the demand is orders
| of magnitudes different.
|
| There must be room for a Modal/Cloudflare/etc infrastructure
| company that focuses only on providing full-fledged computer
| environments specifically for AI with forking/snapshotting
| (pause/resume), screen access, human-in-the-loop support, and so
| forth, and it would be very lucrative. We have browser-use, etc,
| but they don't (yet) capture the whole flow.
| sudohalt wrote:
| When it runs the code I assume it does so via a docker container,
| does anyone know how it is configured? Assuming the user hasn't
| specified an AGENTS.md file or a Dockerfile in the repo. Does it
| generate it via LLM based on the repo, and what it thinks is
| needed? Does it use static analysis (package.json, requirements
| txt, etc)? Do they just have a super generic Dockerfile that can
| handle most envs? Combination of different things?
| ilaksh wrote:
| I think they mentioned it was a similar environment to what it
| trains on, so maybe they have a default Dockerfile. Of course
| containers can also install additional packages or at least
| python packages.
| nkko wrote:
| Yes, and one test failed as it missed pydantic dependency
| hansonw wrote:
| More about that here!
| https://platform.openai.com/docs/codex#advanced-configuratio...
| sudohalt wrote:
| Thanks!
| sudohalt wrote:
| It seems LLMs are doing a lot of the heavy lifting figuring
| out the exact test, build, lint commands to run (even if the
| AGENTS.md file gives it direction and hints). I wonder if
| there are any plans to support user defined build, test, and
| pre commit commands to avoid unnecessary cost and keep it
| deterministic. Also wonder how monolith repos (or distinct
| but related repos) are supported, does it run everything in
| one container or loop through the envs that are edited?
|
| I assume one easy next step is to just run GitHub Actions in
| the container since everything is defined there (assuming the
| user set it up)
| bionhoward wrote:
| What about privacy, training opt out?
|
| What about using it for AI / developing models that compete with
| our new overlords?
|
| Seems like using this is just asking to get rug pulled for
| competing with em when they release something that competes with
| your thing. Am I just an old who's crowing about nothing? It's ok
| for them to tell us we own outputs we can't use to compete with
| em?
| piskov wrote:
| What the video: there is an explicit switch at one of the steps
| about (not) allowing to train on your repo.
| lurking_swe wrote:
| That's nice. And we trust that it does what it says
| because...? The AI company (openai, anthropic, etc) pinky
| promised? Have we seen their source code? How do you know
| they don't train?
|
| Facebook has been caught in recent DOJ hearings breaking the
| law with how they run their business, just as one example.
| They claimed under oath, previously, to not be doing X, and
| then years later there was proof they did exactly that.
|
| https://youtu.be/7ZzxxLqWKOE?si=_FD2gikJkSH1V96r
|
| A companies "word" means nothing imo. None of this makes
| sense if i'm being honest. Unless you personally have a
| negotiated contract with the provider, and can somehow be
| certain they are doing what they claim, and can later sue for
| damages, all of this is just crossing your fingers and hoping
| for the best.
| tough wrote:
| On the other hand you can enable explicit sharing of your
| data and get a few million free tokens daily
| wilg wrote:
| If you don't trust the company your opt-out strategy is
| much easier, you simply do not authorize them to access
| your code.
| ofirpress wrote:
| [I'm one of the co-creators of SWE-bench] The team managed to
| improve on the already very strong o3 results on SWE-bench, but
| it's interesting that we're just seeing an improvement of a few
| percentage points. I wonder if getting to 85% from 75% on
| Verified is going to take as long as it took to get from 20% to
| 75%.
| Snuggly73 wrote:
| I can be completely off base, but it feels to me like
| benchmaxxing is going on with swe-bench.
|
| Look at the results from multi swe bench - https://multi-swe-
| bench.github.io/#/
|
| swe polybench - https://amazon-science.github.io/SWE-PolyBench/
|
| Kotlin bench - https://firebender.com/leaderboard
| mr_north_london wrote:
| How long did it take to go from 20% to 75%?
| nadis wrote:
| In the preview video, I appreciated Katy Shi's comment on "I
| think this is a reflection of where engineering work has moved
| over the past where a lot of my time now is spent reviewing code
| rather than writing it."
|
| Preview video from Open AI:
| https://www.youtube.com/watch?v=hhdpnbfH6NU&t=878s
|
| As I think about what "AI-native" or just the future of building
| software loos like, its interesting to me that - right now -
| developers are still just reading code and tests rather than
| looking at simulations.
|
| While a new(ish) concept for software development, simulations
| could provide a wider range of outcomes and, especially for the
| front end, are far easier to evaluate than just code/tests alone.
| I'm biased because this is something I've been exploring but it
| really hit me over the head looking at the Codex launch
| materials.
| ai-christianson wrote:
| > rather than looking at simulations
|
| You mean like automated test suites?
| tough wrote:
| automated visual fuzzy-testing with some self-reinforcement
| loops
|
| There's already library's for QA testing and VLM's can give
| critique on a series of screenshots automated by a playwright
| script per branch
| ai-christianson wrote:
| Cool. Putting vision in the loop is a great idea.
|
| Ambitious idea, but I like it.
| tough wrote:
| SmolVLM, Gemma, LlaVa, in case you wanna play with some
| of the ones i've tried.
|
| https://huggingface.co/blog/smolvlm
|
| recently both llama.cpp and ollama got better support for
| them too, which makes this kind of integration with
| local/self-hosted models now more attainable/less
| expensive
| tough wrote:
| also this for the visual regression testing parts, but
| you can add some AI onto the mix ;)
| https://github.com/lost-pixel/lost-pixel
| ericghildyal wrote:
| I used Cline to build a tiny testing helper app and this
| is exactly what it did!
|
| It made changes in TS/Next.js given just the boiletplate
| from create-next-app, ran `yarn dev` then opened its mini
| LLM browser and navigated to localhost to verify
| everything looked correct.
|
| It found 1 mistake and fixed the issue then ran `yarn
| dev` again, opened a new browser, navigated to localhost
| (pointing at the original server it brought up, not the
| new one at another port) and confirmed the change was
| correct.
|
| I was very impressed but still laughed at how it somehow
| backed its way into a flow the worked, but only because
| Next has hot-reloading.
| fosterfriends wrote:
| ++ Kind of my whole thesis with Graphite. As more code gets AI-
| generated, the weight shifts to review, testing, and
| integration. Even as someone helping build AI code reviewers,
| we'll _need_ humans stamping forever - for many reasons, but
| fundamentally for accountability. A computer can never be held
| accountable
|
| https://constelisvoss.com/pages/a-computer-can-never-be-held...
| hintymad wrote:
| > A computer can never be held accountable
|
| I think the issue is not about humans being entirely
| replaced. Instead, the issue is that if AI replaces enough
| number of knowledge workers while there's no new or expanded
| market to absorb the workforce, the new balance of supply and
| demand will mean that many of us will have suppressed pay or
| worse, losing our jobs forever.
| DGAP wrote:
| If you still don't think software engineering as a high paying
| job is over, I don't know what to tell you.
| whyowhy3484939 wrote:
| It's high paying?
| asdev wrote:
| is the point of this to actually assign tasks to an AI to
| complete end to end? Every task I do with AI requires atleast
| some bit of hand holding, sometimes reprompting etc. So I don't
| see why I would want to run tasks in parallel, I don't think it
| would increase throughput. Curious if others have better
| experiences with this
| RhysabOweyn wrote:
| I believe that code from one of these things will eventually
| cause a disaster affecting the capital owners. Then all of a
| sudden you will need a PE license, ABET degree, 5 years working
| experience, etc. to call yourself a software engineer. It would
| not even be historically unique. Charlatans are the reason that
| lawyers, medical doctors, and civil engineers have to go through
| lots of education, exams, and vocational training to get into
| their profession. AI will probably force software engineering as
| a profession into that category as well.
|
| On the other hand, if your job was writing code at certain
| companies whose profits were based on shoving ads in front of
| people then I would agree that no one will care if it is written
| by a machine or not. The days of those jobs making >$200k a year
| are numbered.
| alfalfasprout wrote:
| Even ads have risk. Customer service has risk. The widespread
| proliferation of this stuff is a legal minefield waiting to be
| stepped on.
| SketchySeaBeast wrote:
| Is this the same idea as when we switched to multicore machines?
| The rate of change on the capabilities of a single agent has
| slowed enough now the only way for OpenAI to appearing to be
| making decent progress is to have many?
| ionwake wrote:
| Im sorry if Im being silly, but I have paid for the Pro version,
| $200 a month, everytime I click on Try Codex, it takes me to a
| pricing page with the "Team Plan"
| https://chatgpt.com/codex#pricing.
|
| Is this still rolling out? I dont need the team plan too do I?
|
| I have been using openAI products for years now and I am keen to
| try but I have no idea what I am doing wrong.
| mr_north_london wrote:
| It's still rolling out
| ionwake wrote:
| Thx for the reply, Im in london too ( atm )
| jdee wrote:
| im the same, and it appeared for me 2 mins ago. looks like its
| still rolling out
| ionwake wrote:
| cool it appeared - I wa sjsut worried it was a payment issue.
| thanks guys.
| throwaway314155 wrote:
| They do this with every major release. Never going to
| understand why.
| hintymad wrote:
| I remember HN had a repeating popular post on the the most
| important data structures. They are all the basic ones that a
| first-year college student can learn. The youngest one was
| skiplist, which was invented in 1990. When I was a student, my
| class literally read the original paper and implemented the data
| structure and analyzed the complexity in our first data structure
| course.
|
| This seems imply that the software engineering as a profession
| has been quite mature and saturated for a while, to the point
| that a model can predict most of the output. Yes, yes, I know
| there are thousands of advanced algorithms and amazing systems in
| production. It's just that the market does not need millions of
| engineers for such advanced skills.
|
| Unless we get yet another new domain like cloud or like internet,
| I'm afraid the core value of software engineers: trailblazing for
| new business scenarios, will continue diminishing and being
| marginalized by AI. As a result, we get way less demand for our
| job, and many of us will either take a lower pay, or lose our
| jobs for extended time.
| theappsecguy wrote:
| I am so damn tired of all the AI garbage shoved down our throats
| every day. Can't wait for all of it to crash and burn.
| fullstackchris wrote:
| Reading these threads its clear to me people are so cooked and no
| longer understand (or perhaps never did) understand the simple
| process of how source code is shared, built, and merged together
| with multiple editors has ever worked
| swisniewski wrote:
| Has anyone else been able to get "secrets" to work?
|
| They seem to be injected fine in the "environment setup" but
| don't seem to be injected when running tasks against the
| enviornment. This consistently repros even if I delete and re-
| create the enviornment and archive and resubmit the task.
___________________________________________________________________
(page generated 2025-05-16 23:00 UTC)