[HN Gopher] Launch HN: Cyberdesk (YC S25) - Automate Windows leg...
___________________________________________________________________
Launch HN: Cyberdesk (YC S25) - Automate Windows legacy desktop
apps
Hi HN, We're Mahmoud and Alan, building Cyberdesk
(https://www.cyberdesk.io/), a deterministic computer use agent for
automating Windows desktop applications. Developers use us to
automate repetitive tasks in legacy software in healthcare,
accounting, construction, and more, by executing clicks and
keystrokes directly into the desktop. Here's a couple demos of
Cyberdesk's computer use agent: A fast file import automation into
a legacy desktop app: https://youtu.be/H_lRzrCCN0E Working on a
monster of a Windows monolith called OpenDental (showcases agent
learning process as well): https://youtu.be/nXiJDebOJD0. Filing a
W-2 tax form: https://youtu.be/6VNEzHdc8mc Many industries are
stuck with legacy Windows desktop applications, with staff plagued
by repetitive tasks that are incredibly time consuming. Vendors
offering automations for these end up writing brittle Robotic
Process Automation (RPA) scripts or hiring off-shore teams for
manual task execution. RPA often breaks due to inevitable UI
changes or unexpected popups like a Windows update or a random in-
app notification. Off-shore teams are often unreliable and costlier
than software, plus they're not always an option for regulated
industries. I previously built RPA scripts impacting 20K+
employees at a Fortune 100 company where I experienced first hand
RPA's brittleness and inflexibility. It was obvious to me that this
was a bandaid solution to an unsolved problem. Alan was building a
computer use agent for his previous startup and realized its huge
potential to automate a ton of manual computer tasks across many
industries, so we started working on Cyberdesk. Computer use
models can struggle with abstract, long-horizon tasks, but they
excel at making context-aware decisions on a screen-by-screen
basis, so they're a good fit for automating these desktop apps.
The key to reliability is crafting prompts that are highly specific
and well thought out. Much like with ChatGPT, vague or ambiguous
prompts won't get you the results you want. This is especially true
in computer use because the model is processing nearly an entire
desktop screen's worth of extra visual information; without precise
instructions, it doesn't know which details to focus on or how to
act. Unlike RPA, Cyberdesk's agents don't blindly replay clicks.
They read the screen state before every action and self-correct
when flows drift (pop-ups, latency, UI changes). Unlike off-the-
shelf computer use AIs, Cyberdesk runs deterministically in
production: the agent primarily follows the steps it has learned
and only falls back to reasoning when anomalies occur. Cyberdesk
learns workflows from natural-language instructions, capturing
nuance and handling dynamic tasks - far beyond what a simple screen
recording of a few runs can encode. This approach is good for both
reliability and cost: reliability, because we fall back to a
computer use model in unexpected situations; and cost because the
computer use models are expensive and we only use them when we need
to. Otherwise we leverage faster, more affordable visual LLMs for
checking the screen state step-by-step during deterministic runs.
Our agents are also equipped with tools like failsafes, data
extraction, screen evaluation to handle dynamic and sensitive
situations. How it works: you install our open source driver on
any Windows machine (https://github.com/cyberdesk-hq/cyberdriver).
It communicates with our backend to receive commands (click, type,
scroll, screenshot) and sends back data (screenshots, API
responses, etc). You give our computer use agent a detailed natural
language description of the process for a given task, just like an
SOP for an employee learning a new task for the first time. The
agent then leverages computer use AI models to learn the steps and
memorizes them by saving each screenshot alongside its action
(click on these coordinates, type XYZ, wait for page to load, etc).
The agent deterministically runs through these steps to run fast
and predictably. In order to account for popups and UI changes, our
agent checks the live screen state against the memorized state to
determine whether it's safe to proceed with the memorized step. If
no major changes prevent safe execution of the memorized step, it
proceeds; otherwise, it falls back to a computer use model with
context on past actions and the remaining task. Customers are
currently using us for manual tasks like importing and exporting
files from legacy desktop applications, booking appointments for
patients on a desktop PMS, and data entry for filling our forms
like patient profiles and such in an EMR. We don't have a self-
serve option yet but we'd love to onboard you manually. Book a demo
here to learn more! (https://www.cyberdesk.io/) If you'd rather
wait for the self-serve option a little later down the line, please
do submit your email here (https://forms.gle/HfQLxMXKcv9Eh8Gs8) so
you can be notified as soon as that's ready. You can also check out
our docs here: https://docs.cyberdesk.io/. We'd absolutely love to
hear your thoughts on our approach and on desktop automation for
legacy industries!
Author : mahmoud-almadi
Score : 49 points
Date : 2025-08-14 15:24 UTC (7 hours ago)
| throw03172019 wrote:
| Looks great. For the EMR use cases, do you sign BAAs? Which CUA
| models are being used? No data retention?
| mahmoud-almadi wrote:
| We sign BAAs with all our healthcare customers + all our
| vendors. Currently using Claude computer-use. Zero-data
| retention signed with both Anthropic and OpenAI, so none of the
| information getting sent to their LLMs ever get retained
| hermitcrab wrote:
| >none of the information getting sent to their LLMs ever get
| retained
|
| Is it possible to verify that?
| sgtwompwomp wrote:
| Yup! We have signed certificates that explicitly state
| this, with all LLM providers we use.
| herval wrote:
| I'm guessing OP is asking if it's possible to verify
| they're honoring the contract and deleting the data?
| feisty0630 wrote:
| That's not "verification" by any definition of the word.
| downrightmike wrote:
| Is it a 3rd party that is verifying?
| mahmoud-almadi wrote:
| We haven't looked into this kind of approach yet, but
| definitely worthwhile to do at some point!
| bozhark wrote:
| So you're taking the largest copywriting infringements at
| their word for it?
| mahmoud-almadi wrote:
| Right now we are taking the policies we signed with our
| LLM vendors as a verification of a zero data retention
| policy. We did also get their SOC 2 Type II reports and
| they showed no significant security vulnerabilities that
| will impact our usage of their service. We're doing our
| best to deliver value while taking as many security
| precautions as possible: our own data retention policy,
| encrypting data at rest and in transit, row-level
| security, SOC 2 Type I and HIPAA compliance (in
| observation for Type II), secret managers. We have other
| measures we plan to take like de-identifying screenshots
| before sending them up. Would love to get your thoughts
| on any other security measures you would recommend!
| mahmoud-almadi wrote:
| Good point. In a way we can verify to a customer that we
| have that policy set up with them by showing them the
| certificate. But you are correct in that we haven't gone
| as far as asking for proof from Anthropic or OpenAI on
| not retaining any of our data but what we did do is we
| got their SOC 2 Type II reports and they showed no
| significant security vulnerabilities that will impact our
| usage of their service. So now we have been operating
| under the assumption that they are honoring our signed
| agreement within the context of the SOC 2 Type II report
| we retrieved, and our customers have been okay with that.
| But we are definitely open to pursuing that kind of proof
| at some point.
| DaiPlusPlus wrote:
| Honestly, I'm surprised your lawyers let you post that
| here.
|
| +1 for honesty and transparency
| sethhochberg wrote:
| Typically with this sort of thing the way it really works
| is that you, the startup, use a service provider (like
| OpenAI) who publish their own external audit reports
| (like a SOC 2 Type 2) and then the SOC 2 auditors will
| see that the service provider company has a policy
| related to how it handles customer data for customers
| covered by Agreement XYZ, and require evidence to prove
| that the service provider company is following its
| policies related to not using that data for undeclared
| purposes or whatever else.
|
| Audit rights are all about who has the most power in a
| given situation. Just like very few customers are big
| enough to go to AWS and say "let us audit you", you're
| not going to get that right with a vendor like Anthropic
| or OpenAI unless you're certifiably huge, and even then
| it will come with lots of caveats. Instead, you trust the
| audit results they publish and implicitly are trusting
| the auditors they hire.
|
| Whether that is sufficient level of trust is really up to
| the customer buying the service. There's a reason many
| companies sell on-prem hosted solutions or even support
| airgapped deployments, because no level of external trust
| is quite enough. But for many other companies and
| industries, some level of trust in a reputable auditor is
| acceptable.
| mahmoud-almadi wrote:
| Thanks for the breakdown Seth! We did indeed get their
| SOC 2 Type II reports and made sure they showed no
| significant security vulnerabilities that will impact our
| usage of their service.
| bozhark wrote:
| Nope.
| rkagerer wrote:
| Personally I think this approach is flawed because it runs in the
| cloud. If it were an agent I could run locally I'd be much more
| interested.
| mahmoud-almadi wrote:
| Are you referring to the LLM being used or where the actions
| (click, type, etc) are being executed? The actual actions can
| be executed on any windows machine, so the actual execution can
| take place locally on your device. The LLMs we're using right
| now are cloud LLMs. We haven't done an LLM self hosting option
| yet. Can I ask what reservations you have about running in the
| cloud? We have zero-date retention signed with our LLM vendors,
| so none of the data getting sent to them ever gets retained.
| iptq wrote:
| If this can't run full-local, isn't that basically a botnet?
| You're talking about installing a kernel-level driver that
| receives instructions on what to do from a cloud service.
| mahmoud-almadi wrote:
| Great point! Yes you are correct in that the actual "agent"
| lives in the cloud and its actions are executed by a proxy
| running on the desktop. Hopefully at some point we can set
| up a straightforward installation procedure to have the AI
| models running entirely on the desktop, but that's
| constrained by desktop specs for now. VMs and desktops with
| the specs to handle that would be prohibitively expensive
| for a lot of teams trying to build these automations.
| rm_-rf_slash wrote:
| Out of curiosity, what would the minimum specs need to be
| in order to run this locally?
|
| My PC is just good enough to run a DeepSeek distill. Is
| that on par with the requirements for your model?
| sgtwompwomp wrote:
| There isn't a viable computer use model that can be ran
| locally yet unfortunately. Am extremely excited for the
| day that happens though. Essentially the key capability
| that makes a model a computer use model is precise
| coordinate generation.
|
| So if you come across a local model that can do that
| well, let us know! We're also keeping a close watch.
| rkagerer wrote:
| What would it take to train your own?
| ciaranmca wrote:
| Haven't looked into them much but I thought the Chinese
| labs had released some for this kind of thing
| rkagerer wrote:
| I'm talking about the LLM (and any other infrastructure
| involved). Reasons are:
|
| - Pricing. If I grow to do this at scale, I don't want to be
| paying per-action, per-month, per-token, etc.
|
| - Privacy. I don't want my data, screenshots, whatever being
| sent to you or the cloud AI providers.
|
| - Control. I don't want to be vulnerable to you or other
| third parties going bankrupt, arbitrarily deciding to kill
| the product or it's dependencies, or restructuring
| plans/pricing/etc. I also want to be able to keep my day to
| day operations running even if there's a major cloud outage
| (that's one reason we're still using this "old fashioned",
| non-cloud software in the first place).
|
| I think I'm simply not your target market.
|
| I advise several companies who could be (they run "legacy"
| software with vast teams of human operators whose daily tasks
| include some portion of work that would be a good candidate
| for increased automation), but most of them are in a space
| where one or more of the above factors would be potential
| deal breakers.
|
| The retention agreements between you and your vendors are
| great (I mean that sincerely), but I'm not party to them so
| they don't do anything for me. If you offered a contractual
| agreement with some teeth in it (eg. underwritten or bond-
| backed to the tune of several digits, committing to specific
| security-related measures that are audited, with a tacit
| acknowledgement any proven breach of contract in and of
| itself constitutes damages) it could go a long way to address
| the privacy issues.
|
| In terms of pricing it feels like the core of your product is
| an outside vendor's computer-operating AI model, and you've
| written a prompt wrapper and plumbing around it that ferries
| screenshots and directives back and forth. This could be
| totally awesome for a small scale customer that wants to dip
| their toes into AI automation and try it out as a turnkey
| solution. But the moat doesn't seem very big, and I'd need to
| be convinced it's a really slick solution in order to favour
| that route instead of rolling my own wrapper.
|
| Please don't take this the wrong way, it's just one datapoint
| of feedback and I do wish you luck with your venture.
| MortyWaves wrote:
| Frankly quite insulting to call any Windows app legacy
| mahmoud-almadi wrote:
| sorry it came off that way! could you elaborate on that
| thought?
| boombapoom wrote:
| windows itself is legacy.
| mattfrommars wrote:
| Looks great to automate workload for Windows desktop application.
| I'd love to understand more deeply how your application works, so
| the set of commands your backend send is click, scroll,
| screenshot. Does it send command to say type character into an
| input field? How is it able to pin point a text field from a
| screenshot? Is LLM reliable to pin point x and y to click on a
| field?
|
| Also, to have this run in a large scale, Does it become
| prohibitively expensive to run on daily basis on thousand of
| custom workflows? I assume this runs on the cloud.
| sgtwompwomp wrote:
| Thanks! And yes, so our pathfinder agents utilize Sonnet 4's
| precise coordinate generation capabilities. You give it a
| screenshot, give it a task, and it can output exact coordinates
| of where to click on an input field, for example.
|
| And yes we've found the computer use models are quite reliable.
|
| Great questions on scale: the whole way we designed our engine
| is that in the happy path, we actually use very little LLMs.
| The agent runs deterministically, only checking at various
| critical spots if anomalies occurred (if it does, we fallback
| to computer use to take it home). If not, our system can
| complete an entire task end to end, on the order of less than
| $0.0001.
|
| So it's a hybrid system at the end of the day. This results in
| really low costs at scale, as well as speed and reliability
| improvements (since in the happy path, we run exactly what has
| worked before).
| deepdarkforest wrote:
| Congrats! I think the space is very interesting, I was a founder
| of a similar windows CUA infra/ RPA agents but pivoted. My
| thoughts:
|
| 1) The funny thing about determinism is how deterministic you
| should be when to break, its kind of a recursive problem. agents
| are inherently very tough to guardrail on an action space so big
| like in CUA. The guys from browser use realized it as well and
| built workflow-use. Or you could try RL or finetuning per task
| but is not viable(economically or tech wise) currently.
|
| 2) As you know, It's a very client facing/customized solution
| space You might find this interesting, it reflects my thoughts in
| the space as well. Tough to scale as a fresh startup unless you
| really niche down on some specific workflows.
| https://x.com/erikdunteman/status/1923140514549043413 (he is also
| building in the deterministic agent space now funnily enough) 3)
| It actually is annoyingly expensive with Claude if you break
| caching, which you have to at some point if you feed in every
| screenshot etc. You mentioned you use multiple models (i guess
| uitars/omniparser?), but in the comments you said claude?
|
| 4) Ultimately the big bet in the RPA space, as again you know, is
| that the TAM wont shrink a lot due to more and more SAP's, ERP's
| etc implementing API's. Of course the big money will always be in
| ancient apps that wont, but then again in that space, uipath and
| the others have a chokehold. (and their agentic tech is actually
| surprisingly good when i had a look 3 months ago)
|
| Good luck in any case! I feel like its one of those spaces that
| we are definitely still a touch too early, but its such a big
| market that there is plenty of space for a lot of people.
| mwcampbell wrote:
| Have you looked at using accessibility APIs, such as UI
| Automation on Windows, to augment screenshots and simulated mouse
| clicks?
| throw03172019 wrote:
| Isn't this an optional feature for developers? They can disable
| it / remove the names of the buttons, etc to make RPA harder?
| gerdesj wrote:
| Autoit must be a good 20 years old:
| https://www.autoitscript.com/site/
| MetaWhirledPeas wrote:
| Can it do assertions? This could be useful for testing old
| software.
___________________________________________________________________
(page generated 2025-08-14 23:00 UTC)