[HN Gopher] Tabby: Self-hosted AI coding assistant
       ___________________________________________________________________
        
       Tabby: Self-hosted AI coding assistant
        
       Author : saikatsg
       Score  : 340 points
       Date   : 2025-01-12 18:43 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | thecal wrote:
       | Unfortunate name. Can you connect Tabby to the OpenAI-compatible
       | TabbyAPI? https://github.com/theroyallab/tabbyAPI
        
         | mbernstein wrote:
         | At least per Github, the TabbyML project is older than the
         | TabbyAPI project.
        
           | mynameisvlad wrote:
           | Also, _wildly_ more popular, to the tune of several
           | magnitudes more forks and stars. If anything, this question
           | should be asked of the TabbyAPI project.
        
             | karolist wrote:
             | I'm not sure what's going on with TabbyAPI's github
             | metrics, but exl2 quants are very popular among nvidia
             | local LLM crowd and TabbyAPI comes in tons of reddit posts
             | of people using it. Might be just my bubble, not saying
             | they're not accurate, just generally surprised such a
             | useful project has under 1k stars. On the flip side, LLMs
             | will hallucinate about TabbyML if you ask it TabbyAPI
             | related questions, so I'd agree the naming is unfortunate.
        
         | Medox wrote:
         | I though that Tabby, the ssh client [1], got AI capabilities...
         | 
         | [1] https://github.com/Eugeny/tabby
        
       | wsxiaoys wrote:
       | Never imagined our project would make it to the HN front page on
       | Sunday!
       | 
       | Tabby has undergone significant development since its launch two
       | years ago [0]. It is now a comprehensive AI developer platform
       | featuring code completion and a codebase chat, with a team [1] /
       | enterprise focus (SSO, Access Control, User Authentication).
       | 
       | Tabby's adopters [2][3] have discovered that Tabby is the only
       | platform providing a fully self-service onboarding experience as
       | an on-prem offering. It also delivers performance that rivals
       | other options in the market. If you're curious, I encourage you
       | to give it a try!
       | 
       | [0]: https://www.tabbyml.com
       | 
       | [1]: https://demo.tabbyml.com/search/how-to-add-an-embedding-
       | api-...
       | 
       | [2]: https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ
       | 
       | [3]: https://www.linkedin.com/posts/kelvinmu_last-week-i-
       | introduc...
        
         | maille wrote:
         | Do you have a plugin for MSVC?
        
           | wsxiaoys wrote:
           | Not yet, consider subscribe
           | https://github.com/TabbyML/tabby/issues/322 for future
           | updates!
        
           | somberi wrote:
           | https://github.com/codespin-ai/codespin-vscode-extension
        
         | tootie wrote:
         | Is it only compatible with Nvidia and Apple? Will this work
         | with an AMD GPU?
        
           | wsxiaoys wrote:
           | Yes - AMD GPU is supported through vulkan backend:
           | 
           | https://github.com/TabbyML/tabby/releases/tag/v0.23.0
           | 
           | https://tabby.tabbyml.com/blog/2024/05/01/vulkan-support/
        
       | thih9 wrote:
       | As someone unfamiliar with local AIs and eager to try, how does
       | the "run tabby in 1 minute"[1] compare to e.g. chatgpt's free
       | 4o-mini? Can I run that docker command on a medium specced
       | macbook pro and have an AI that is comparably fast and capable?
       | Or are we not there (yet)?
       | 
       | Edit: looks like there is a separate page with instructions for
       | macbooks[2] that has more context.
       | 
       | > The compute power of M1/M2 is limited and is likely to be
       | sufficient only for individual usage. If you require a shared
       | instance for a team, we recommend considering Docker hosting with
       | CUDA or ROCm.
       | 
       | [1]: https://github.com/TabbyML/tabby#run-tabby-in-1-minute
       | docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data
       | tabbyml/tabby serve --model StarCoder-1B --device cuda --chat-
       | model Qwen2-1.5B-Instruct
       | 
       | [2]: https://tabby.tabbyml.com/docs/quick-
       | start/installation/appl...
        
         | eric-burel wrote:
         | Side question : open source models tend to be less "smart" than
         | private ones, do you intend to compensate by providing a better
         | context (eg query relevant technology docs to feed context)?
        
         | coder543 wrote:
         | gpt-4o-mini might not be the best point of reference for what
         | good LLMs can do with code:
         | https://aider.chat/docs/leaderboards/#aider-polyglot-benchma...
         | 
         | A teeny tiny model such as a 1.5B model is really dumb, and
         | _not good_ at interactively generating code in a conversational
         | way, but models in the 3B or less size can do a good job of
         | suggesting tab completions.
         | 
         | There are larger "open" models (in the 32B - 70B range) that
         | you can run locally that should be much, much better than
         | gpt-4o-mini at just about everything, including writing code.
         | For a few examples, llama3.3-70b-instruct and
         | qwen2.5-coder-32b-instruct are pretty good. If you're really
         | pressed for RAM, qwen2.5-coder-7b-instruct or codegemma-7b-it
         | might be okay for some simple things.
         | 
         | > medium specced macbook pro
         | 
         | medium specced doesn't mean much. How much RAM do you have?
         | Each "B" (billion) of parameters is going to require _about_
         | 1GB of RAM, as a rule of thumb. (500MB for really heavily
         | quantized models, 2GB for un-quantized models... but, 8-bit
         | quants use 1GB, and that 's usually fine.)
        
           | eurekin wrote:
           | Also context size significantly impacts ram/vram usage and in
           | programming those chats get big quickly
        
           | Ringz wrote:
           | Thanks for your explanation! Very helpful!
        
       | mjrpes wrote:
       | What is the recommended hardware? GPU required? Could this run OK
       | on an older Ryzen APU (Zen 3 with Vega 7 graphics)?
        
         | wsxiaoys wrote:
         | Check https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ to see a
         | local setup with 3090.
        
           | mkl wrote:
           | That thread doesn't seem to mention hardware. It would be
           | really helpful to just put hardware requirements in the
           | GitHub README.
        
         | coder543 wrote:
         | The usual bottleneck for self-hosted LLMs is memory bandwidth.
         | It doesn't really matter if there are integrated graphics or
         | not... the models will run at the same (very slow) speed on
         | CPU-only. Macs are only decent for LLMs because Apple has given
         | Apple Silicon unusually high memory bandwidth, but they're
         | still nowhere near as fast as a high-end GPU with _extremely_
         | fast VRAM.
         | 
         | For extremely tiny models like you would use for tab
         | completion, even an old AMD CPU is probably going to do okay.
        
           | mjrpes wrote:
           | Good to know. It also looks like you can host TabbyML as an
           | on-premise server with docker and serve requests over a
           | private network. Interesting to think that a self-hosted GPU
           | server might become a thing.
        
       | jslakro wrote:
       | Duplicated https://news.ycombinator.com/item?id=35470915
        
         | mkl wrote:
         | Not a dupe, as that was nearly two years ago.
         | https://news.ycombinator.com/newsfaq.html#reposts
        
           | jslakro wrote:
           | In that case I'm going to start reposting all good old links.
        
       | leke wrote:
       | So does this run on your personal machine, or can you install it
       | on a local company server and have everyone in the company
       | connect to it?
        
         | wsxiaoys wrote:
         | Tabby is engineered for team usage, intended to be deployed on
         | a shared server. However, with robust local computing
         | resources, you can also run Tabby on your individual machine.
         | Check https://www.reddit.com/r/LocalLLaMA/s/lznmkWJhAZ to see a
         | local setup with 3090.
        
       | d--b wrote:
       | Didn't you mean to name it Spacey?
        
         | szszrk wrote:
         | I've heard of tab vs spaces flamewars, but never heard of
         | "space completion" camp.
         | 
         | It clearly references to LLM doing a tab completion.
        
           | d--b wrote:
           | WHAT? TAB COMPLETION? YOU CANT BE SERIOUS.
           | 
           | Just joking. But yeah, Space completion is definitely a
           | thing. Also triggering suggestions is often Ctrl+Space
        
             | szszrk wrote:
             | "Ctrl+space" should be air/space related software house
             | name :)
        
       | thedangler wrote:
       | How would I tell this to use an api framework it doesn't know ?
        
         | wsxiaoys wrote:
         | Tabby comes with builtin RAG support so you can add this api
         | framework to it.
         | 
         | Example: https://demo.tabbyml.com/search/how-to-configure-sso-
         | in-tabb...
         | 
         | Settings page: https://demo.tabbyml.com/settings/providers/doc
        
       | SOLAR_FIELDS wrote:
       | > How to utilize multiple NVIDIA GPUs?
       | 
       | | Tabby only supports the use of a single GPU. To utilize
       | multiple GPUs, you can initiate multiple Tabby instances and set
       | CUDA_VISIBLE_DEVICES (for cuda) or HIP_VISIBLE_DEVICES (for rocm)
       | accordingly.
       | 
       | So using 2 NVLinked GPU's with inference is not supported? Or is
       | that situation different because NVLink treats the two GPU as a
       | single one?
        
         | wsxiaoys wrote:
         | > So using 2 NVLinked GPU's with inference is not supported?
         | 
         | To make better use of multiple GPUs, we suggest employing a
         | dedicated backend for serving the model. Please refer to
         | https://tabby.tabbyml.com/docs/references/models-http-api/vl...
         | for an example
        
           | SOLAR_FIELDS wrote:
           | I see. So this is like, I can have tabby be my LLM server
           | with this limitation or I can just turn that feature off and
           | point tabby at my self hosted LLM as any other OpenAI
           | compatible endpoint?
        
             | wsxiaoys wrote:
             | Yes - however, the FIM model requires careful configuration
             | to properly set the prompt template.
        
       | KronisLV wrote:
       | For something similar I use Continue.dev with ollama, it's always
       | nice to see more tools in the space! But as usual, you need
       | pretty formidable hardware to run the actually good models, like
       | the 32B version of Qwen2.5-coder.
        
       | st3fan wrote:
       | The demo on the homepage for the completion of the findMaxElement
       | function is a good example of what is to come. Or maybe where we
       | are at now?
       | 
       | The six lines of Python suggested for that function can also be
       | replaced with a simple "return max(arr)". The suggested code
       | works but is absolute junior level.
       | 
       | I am terrified of what is to come. Not just horrible code but
       | also how people who blindly "autocomplete" this code are going to
       | stall in their skill level progress.
       | 
       | You may score some story points but did you actually get any
       | better at your craft?
        
         | MaKey wrote:
         | The silver lining is that the value of your skills is going up.
        
         | shcheklein wrote:
         | On the other hand it might become a next level of abstraction.
         | 
         | Machine -> Asm -> C -> Python -> LLM (Human language)
         | 
         | It compiles human prompt into some intermediate code (in this
         | case Python). Probably initial version of CPython was not
         | perfect at all, and engineers were also terrified. If we are
         | lucky this new "compiler" will be becoming better and better,
         | more efficient. Never perfect, but people will be paying the
         | same price they are already paying for not dealing directly
         | with ASM.
        
           | MVissers wrote:
           | Yup!
           | 
           | No goal to become a programmer- But I like to build programs.
           | 
           | Build a rather complex AI-ecosystem simulator with me as the
           | director and GPT-4 now Claude 3.5 as the programmer.
           | 
           | Would never have been able to do this beforehand.
        
           | saurik wrote:
           | I think there is a big difference between an abstraction
           | layer that can improve -- one where you maybe write "code" in
           | prompts and then have a compiler build through real code,
           | allowing that compiler to get better over time -- and an
           | interactive tool that locks bad decisions autocompleted today
           | into both your codebase and your brain, involving you still
           | working at the lower layer but getting low quality "help" in
           | your editor. I am totally pro- compilers and high-level
           | languages, but I think the idea of writing assembly with the
           | help of a partial compiler where you kind of write stuff and
           | then copy/paste the result into your assembly file with some
           | munging to fix issues is dumb.
           | 
           | By all means, though: if someone gets us to the point where
           | the "code" I am checking in is a bunch of English -- for
           | which I will likely need a law degree in addition to an
           | engineering background to not get evil genie with a cursed
           | paw results from it trying to figure out what I must have
           | meant from what I said :/ -- I will think that's pretty cool
           | and will actually be a new layer of abstraction in the same
           | class as compiler... and like, if at that point I don't use
           | it, it will only be because I think it is somehow dangerous
           | to humanity itself (and even then I will admit that it is
           | probably more effective)... but we aren't there yet and
           | "we're on the way there" doesn't count anywhere near as much
           | as people often want it to ;P.
        
           | sdesol wrote:
           | > Machine -> Asm -> C -> Python -> LLM (Human language)
           | 
           | Something that you neglected to mention is, with every
           | abstraction layer up to Python, everything is predictable and
           | repeatable. With LLMs, we can give the exact same
           | instructions, and not be guaranteed the same code.
        
             | 12345hn6789 wrote:
             | assuming you have full control over which compiler youre
             | using for each step ;)
             | 
             | What's to say LLMs will not have a "compiler" interface in
             | the future that will reign in their variance
        
               | sdesol wrote:
               | > assuming you have full control over which compiler
               | youre using for each step ;)
               | 
               | With existing tools, we know if we need to do something,
               | we can. The issue with LLMs, is they are very much black
               | boxes.
               | 
               | > What's to say LLMs will not have a "compiler" interface
               | in the future that will reign in their variance
               | 
               | Honestly, having a compiler interface for LLMs isn't a
               | bad idea...for some use cases. What I don't see us being
               | able to do is use natural language to build complex apps
               | in a deterministic manner. Solving this problem would
               | require turning LLMs into deterministic machines, which I
               | don't believe will be an easy task, given how LLMs work
               | today.
               | 
               | I'm a strong believer in that LLMs will change how we
               | develop and create software development tools. In the
               | past, you would need Google and Microsoft level of
               | funding to integrate natural language into a tool, but
               | with LLMs, we can easily have LLMs parse input and have
               | it map to deterministic functions in days.
        
             | theptip wrote:
             | I'm not sure why that matters here. Users want code that
             | solves their business need. In general most don't care
             | about repeatability if someone else tries to solve their
             | problem.
             | 
             | The question that matters is: can businesses solve their
             | problems cheaper for the same quality, or at lower quality
             | while beating the previous Pareto-optimal cost/quality
             | frontier.
        
               | thesz wrote:
               | Recognizable repetition can be abstracted, reducing code
               | base and its (running) support cost.
               | 
               | The question that matters is: will businesses crumble due
               | to overproduction of same (or lower) quality code sooner
               | or later.
        
               | chii wrote:
               | > The question that matters is: will businesses crumble
               | due to overproduction of same (or lower) quality code
               | sooner or later.
               | 
               | but why doesn't that happen today? Cheap code can be had
               | by hiring in cheap locations (outsourced for example).
               | 
               | The reality is that customers are the ultimate arbiters,
               | and if it satisfies them, the business will not collapse.
               | And i have not seen a single customer demonstrate that
               | they care about the quality of the code base behind the
               | product they enjoy paying for.
        
               | sdesol wrote:
               | > Cheap code can be had by hiring in cheap locations
               | (outsourced for example).
               | 
               | If you outsource and like what you get, you would assume
               | the place you outsourced to can help provide continued
               | support. What assurance do you have with LLMs? A working
               | solution doesn't mean it can be easily maintained and/or
               | evolved.
               | 
               | > And i have not seen a single customer demonstrate that
               | they care about the quality of the code base behind the
               | product they enjoy paying for.
               | 
               | That is true, but they will complain if bugs cannot be
               | fixed and features are added. It is true that customers
               | don't care, and they shouldn't, until it does matter, of
               | course.
               | 
               | The challenge with software development isn't necessarily
               | with the first iteration, but rather it is with continued
               | support. Where I think LLMs can really shine is in
               | providing domain experts (those who understand the
               | problem) with a better way to demonstrate their needs.
        
               | OvbiousError wrote:
               | It is happening. There is a lot of bad software out
               | there. Terrible to use, but still functional enough that
               | it keeps selling. The question is how much crap you can
               | pile on top of that already bad code before it falls
               | apart.
        
               | carschno wrote:
               | It happens today. However, companies fail for multiple
               | problems that come together. Bad software quality (from
               | whatever source) is typically not a very visible one
               | among them because when business people take over, they
               | only see (at most) that software development/maintenance
               | cost more money that it could yield.
        
               | thesz wrote:
               | > And i have not seen a single customer demonstrate that
               | they care about the quality of the code base behind the
               | product they enjoy paying for.
               | 
               | The code quality translates to speed of introduction of
               | changes, fixes of defects and amount of user-facing
               | defects.
               | 
               | While customers may not express any care about code
               | quality directly they can and will express
               | (dis)satisfaction with performance and defects of the
               | product.
        
               | CamperBob2 wrote:
               | _Recognizable repetition can be abstracted_
               | 
               | ... which is the whole idea behind training, isn 't it?
               | 
               |  _The question that matters is: will businesses crumble
               | due to overproduction of same (or lower) quality code
               | sooner or later._
               | 
               | The problem is really the opposite -- most programmers
               | are employed to create _very_ minor variations on work
               | done either by other programmers elsewhere, by other
               | programmers in the same organization, or by their own
               | younger selves. The resulting inefficiency is massive in
               | human terms, not just in managerial metrics. Smart people
               | are wasting their lives on pointlessly repetitive work.
               | 
               | When it comes to the art of computer programming, there
               | are more painters than there are paintings to create.
               | That's why a genuinely-new paradigm is so important, and
               | so overdue... and it's why I get so frustrated when
               | supposed "hackers" stand in the way.
        
               | thesz wrote:
               | >> Recognizable repetition can be abstracted         >
               | ... which is the whole idea behind training, isn't it?
               | 
               | The comment I was answering specifically dismissed LLM's
               | inability to answer same question with same... answer as
               | unimportant. My point is that this ability is crucial to
               | software engineering - answers to similar problems should
               | be as similar as possible.
               | 
               | Also, I bet that LLM's are not trained to abstract. In my
               | experience they lately are trained to engage users in
               | pointless dialogue as long as possible.
        
               | CamperBob2 wrote:
               | No, only the spec is important. How the software
               | implements the spec is not important in the least. (To
               | the extent that's not true, fix the spec!)
               | 
               | Nor is whether the implementation is the same from one
               | build to the next.
        
             | CamperBob2 wrote:
             | _With LLMs, we can give the exact same instructions, and
             | not be guaranteed the same code._
             | 
             | That's something we'll have to give up and get over.
             | 
             | See also: understanding how the underlying code actually
             | works. You don't need to know assembly to use a high-level
             | programming language (although it certainly doesn't hurt),
             | and you won't need to know a high-level programming
             | language to write the functional specs in English that the
             | code generator model uses.
             | 
             | I say bring it on. 50+ years was long enough to keep doing
             | things the same way.
        
             | omgwtfbyobbq wrote:
             | Aren't some models deterministic with temperature set to 0?
        
             | zurn wrote:
             | > > Machine -> Asm -> C -> Python -> LLM (Human language)
             | 
             | > Something that you neglected to mention is, with every
             | abstraction layer up to Python, everything is predictable
             | and repeatable.
             | 
             | As long as you consider C and dragons flying out of your
             | nose predictable.
             | 
             | (Insert similar quip about hardware)
        
             | jsjohnst wrote:
             | > With LLMs, we can give the exact same instructions, and
             | not be guaranteed the same code.
             | 
             | Set temperature appropriately, that problem is then solved,
             | no?
        
               | sdesol wrote:
               | No, it is much more involved and not all providers allow
               | the necessary tweakings. This means you will need to use
               | local models (with hardware caveats) which will require
               | us to ask:
               | 
               | - Are local models good enough?
               | 
               | - What are we giving up for deterministic behaviour?
               | 
               | For example, will it be much more difficult to write
               | prompts. Will the output be nonsensical and more.
        
             | compumetrika wrote:
             | LLMs use pseudo-random numbers. You can set the seed and
             | get exactly the same output with the same model and input.
        
               | blibble wrote:
               | you won't because floating point arithmetic isn't
               | associative
               | 
               | and the GPU scheduler isn't deterministic
        
               | threeducks wrote:
               | You can set PyTorch to deterministic mode with a small
               | performance penalty: https://pytorch.org/docs/stable/note
               | s/randomness.html#avoidi...
               | 
               | Unfortunately, this is only deterministic on the same
               | hardware, but there is no reason why one couldn't write
               | reasonably efficient LLM kernels. It just has not been a
               | priority.
               | 
               | Nevertheless, I still agree with the main point that it
               | is difficult to get LLMs to produce the same output
               | reliably. A small change in the context might trigger all
               | kinds of changes in the generated code.
        
             | zajio1am wrote:
             | There is no reason to assume that say C compiler generates
             | the same machine code for the same source code. AFAIK, a C
             | compiler that chooses randomly between multiple
             | C-semantically equivalent sequences of instructions is a
             | valid C compiler.
        
             | SkyBelow wrote:
             | Even compiling code isn't deterministic given different
             | compilers and different items installed on a machine can
             | influence the final resulting code, right? Ideally they
             | shouldn't have any noticeable impact, but in edge cases it
             | might, which is why you compile your code once during a
             | build step and then deploy the same compiled code to
             | different environments instead of compiling it per
             | environment.
        
           | vages wrote:
           | It may be a "level of abstraction", but not a good one,
           | because it is imprecise.
           | 
           | When you want to make changes to the code (which is what we
           | spend most of our time on), you'll have to either (1) modify
           | the prompt and accept the risk of using the new code or (2)
           | modify the original code, which you can't do unless you know
           | the lower level of abstraction.
           | 
           | Recommended reading: https://ian-cooper.writeas.com/is-ai-a-
           | silver-bullet
        
         | svachalek wrote:
         | Keep in mind that this is the stupidest the LLM will ever be
         | and we can expect major improvements every few months. On the
         | other hand junior devs will always be junior devs. At some
         | point python and C++ will be like assembly now, something
         | that's always out there but not something the vast majority of
         | developers will ever need to read or write.
        
           | sdesol wrote:
           | > we can expect major improvements every few months.
           | 
           | I'm not sure this is grounded in reality. We've already seen
           | articles related to how OpenAI is behind schedule with GPT-5.
           | I do believe things will improve over time, mainly due to
           | advancements in hardware. With better hardware, we can better
           | brute force correct answers.
           | 
           | > junior devs will always be junior devs
           | 
           | Junior developers turn into senior developers over time.
        
             | smcnally wrote:
             | > I'm not sure this is grounded in reality. We've already
             | seen articles related to how OpenAI is behind schedule with
             | GPT-5.
             | 
             | Progress by Google, meta, Microsoft, Qwen and Deepseek is
             | unhampered by OpenAI's schedule. Their latest -- including
             | Gemini 2.0, Llama 3.3, Phi 4 -- and the coding fine tunes
             | that follow are all pretty good.
        
               | sdesol wrote:
               | > unhampered by OpenAI's schedule
               | 
               | Sure, but if the advancements are to catch up to OpenAI,
               | then major improvements by other vendors are nice and
               | all, but I don't believe that was what the commenter was
               | implying. Right now the leaders in my opinion are OpenAI
               | and Anthropic and unless they are making major
               | improvements every few months, the industry as a whole is
               | not making major improvements.
        
               | smcnally wrote:
               | OpenAI and Anthropic are definitely among the leaders.
               | Playing catch-up to these leaders' mind-share and
               | technology is some of the motivation for others. Calling
               | the progress being made in the space by Google (Gemini),
               | MSFT (Phi), Meta (llama), Alibaba (Qwen) "nice and all"
               | is a position you might be pleasantly surprised to
               | reconsider if this technology interests you. And don't
               | sleep on Apple and AMZ -
               | 
               | In the space covered by Tabby, Copilot, aider, Continue
               | and others, capabilities continue to improve considerably
               | month-over-month.
               | 
               | In the segments of the industry I care most about, I
               | agree 100% with what the commenter said w/r/t expecting
               | major improvements every few months. Pay even passing
               | attention to huggingface and github and see work being
               | done by indies as well as corporate behemoths happening
               | at breakneck pace. Some work is pushing the SOTA. Some is
               | making the SOTA more widely available. Lots of it is
               | different approaches to solving similar challenges. Most
               | of it benefits consumers and creators looking use and
               | learn from all of this.
        
           | llamaLord wrote:
           | My experience observing commercial LLM's since the release of
           | GPT-4 is actually the opposite of this.
           | 
           | Sure, they've gotten much cheaper on a per-token basis, but
           | that cost reduction has come with a non-trivial
           | accuracy/reliability cost.
           | 
           | The problem is, tokens that are 10x cheaper are still useless
           | if what they say is straight up wrong.
        
             | maeil wrote:
             | > Sure, they've gotten much cheaper on a per-token basis,
             | but that cost reduction has come with a non-trivial
             | accuracy/reliability cost.
             | 
             | This only holds for OpenAI.
        
           | maeil wrote:
           | > Keep in mind that this is the stupidest the LLM will ever
           | be and we can expect major improvements every few months.
           | 
           | We have seen no noticable improvements (at usable prices) for
           | 7 months, when the original Sonnet 3.5 came out.
           | 
           | Maybe specialized hardware for LLM inference will improve so
           | rapidly that o1 (full) will be quick and cheap enough a year
           | from now, but it seems extremely unlikely. For the end user,
           | the top models hadn't gotten cheaper for kore than a year
           | until the release of Deepseek v3 a few weeks ago. Even that
           | is currently very slow at non-Deepseek providers, and who
           | knows just how subsidized the pricing and speed at Deepseek
           | itself is, given political interests.
        
             | Eliezer wrote:
             | No major AI advancements for 7 months? Guess everyone's
             | jobs are safe for another year, and after that we're all
             | dead?
        
           | n144q wrote:
           | GitHub Copilot came out in 2021.
        
           | harvodex wrote:
           | I wish this was true as being a shitty programmer who is old
           | , I would benefit from this as much as anyone here but I
           | think it is delusional.
           | 
           | From my experience I wouldn't even say LLMs are stupid. The
           | LLM is a carrier and the intelligence is in the training
           | data. Unfortunately, the training data is not going to get
           | smarter.
           | 
           | If any of this had anything to do with reality then we should
           | already have a programming specific model only trained on CS
           | and math textbooks that is awesome. Of course, that doesn't
           | work because the LLM is not abstracting the concepts how we
           | normally think of in order to be stupid or intelligent.
           | 
           | It hardly shocking that next token prediction on math and CS
           | textbooks is of limited use. You hardly have to think about
           | it to see how flawed the whole idea is.
        
         | runeblaze wrote:
         | I mean you can treat it as just a general pseudocode-ish
         | implementation of an O(n) find_max algorithm. Tons of people
         | use Python to illustrate algorithms.
         | 
         | (Not to hide your point though -- people please review your
         | LLM-generated code!)
        
         | tippytippytango wrote:
         | This is self correcting. Code of this quality won't let you
         | ship things. You are forced to understand the last 20%-30% of
         | details the LLM can't help you with to pass all your tests.
         | But, it also turns out, to understand the 20% of details the
         | LLM couldn't handle, you need to understand the 80% the LLM
         | _could_ handle.
         | 
         | I'm just not worried about this, LLMs don't ship.
        
           | tyingq wrote:
           | In the case where it write functionally "good enough" code
           | that performs terribly, it rewards the LLM vendor...since the
           | LLM vendor is also often your IaC vendor. And now you need to
           | buy more infra.
        
             | grahamj wrote:
             | I sense a new position coming up: slop cleanup engineer
        
               | cootsnuck wrote:
               | So an engineer.
        
             | HPsquared wrote:
             | That's one hell of a synergy. Win-win-lose
        
           | grahamj wrote:
           | This needs to be shouted from the rooftops. If you _could_ do
           | it yourself then LLMs can be a great help, speeding things
           | up, offering suggestions and alternatives etc.
           | 
           | But if you're asking for something you don't know how to do
           | you might end up with junk and not even know it.
        
             | cootsnuck wrote:
             | But if that junk doesn't work (which it likely won't for
             | any worthwhile problem) then you have to get it working.
             | And to get it working you almost always have to figure out
             | how the junk code works. And in that process I've found is
             | where the real magic happens. You learn by fixing, pruning,
             | optimizing.
             | 
             | I think there's a whole meta level of the actual dynamic
             | between human<>LLM interactions that is not being
             | sufficiently talked about. I think there's, _potentially_ ,
             | many secondary benefits that can come from using them
             | simply due to the ways you have to react to their outputs
             | (if a person decides to rise to that occasion).
        
               | powersnail wrote:
               | If the junk doesn't work right from the beginning, yes.
               | The problem is that sometimes the junk might look like it
               | works at first, and then later you find out that it
               | doesn't, and you ended up having to make urgent fixes on
               | a Friday night.
               | 
               | > And in that process I've found is where the real magic
               | happens
               | 
               | It might be good way to learn if there's someone who's
               | supervising the process, so they _know_ that the code is
               | incorrect, and tells you to figure out what's wrong and
               | how to fixes.
               | 
               | If you are shipping this stuff yourself, this sounds like
               | a way of deploying giant foot-guns into production.
               | 
               | I still think it's a better to learn if you try to
               | understand the code from the beginning (in the same way
               | that a person should try to understand code they read
               | from tutorials and stackoverflow), rather than delaying
               | the learning until something doesn't work. This is like
               | trying to make yourself do reinforcement learning on the
               | outputs of an LLM, which sounds really inefficient to me.
        
           | shriek wrote:
           | Wait till they come with auto review/merge agents, or maybe
           | there already is. _gulp_
        
         | 999900000999 wrote:
         | LLMs also love to double down on solutions that don't work.
         | 
         | Case in point, I'm working on a game that's essentially a
         | website right now. Since I'm very very bad with web design I'm
         | using an LLM.
         | 
         | It's perfect 75% of the time. The other 25% it just doesn't
         | work. Multiple LLMs will misunderstand basic tasks. Let's add
         | properties and invent functions.
         | 
         | It's like you had hired a college junior who insists their
         | never wrong and keeps pushing non functional code.
         | 
         | The entire mindset is whatever it's close enough, good luck.
         | 
         | God forbid you need to do anything using an uncommon node
         | module or anything like that.
        
           | smcnally wrote:
           | > LLMs also love to double down on solutions that don't work.
           | 
           | "Often wrong but never in doubt" is not proprietary to LLMs.
           | It's off-putting and we want them to be correct and to have
           | humility when they're wrong. But we should remember LLMs are
           | trained on work created by people, and many of those people
           | have built successful careers being exceedingly confident in
           | solutions that don't work.
        
             | deltaburnt wrote:
             | So now you have an overconfident human using an
             | overconfident tool, both of which will end up coding
             | themselves into a corner? Compilers at least, for the most
             | part, offer very definitive feedback that act as guard
             | rails to those overconfident humans.
             | 
             | Also, let's not forget LLMs are a product of the internet
             | and anonymity. Human interaction on the internet is
             | significantly different from in person interaction, where
             | typically people are more humble and less overconfident. If
             | someone at my office acted like some overconfident
             | SO/reddit/HN users I would probably avoid them like the
             | plague.
        
               | smcnally wrote:
               | A compiler in the mix is very helpful. That and other
               | sanity checks wielded by a skilled engineer doing code
               | reviews can provide valuable feedback to other developers
               | and to LLMs. The knowledgeable human in the loop makes
               | the coding process and final products so much better. Two
               | LLMs with tool usage capabilities reviewing the code
               | isn't as good today but is available today.
               | 
               | The LLMs overconfidence is based on it spitting out the
               | most-probable tokens based on its training data and your
               | prompt. When LLMs learn real hubris from actual anonymous
               | internet jackholes, we will have made significant
               | progress toward AGI.
        
             | 999900000999 wrote:
             | The issue is LLMs never say:
             | 
             | "I don't know how to do this".
             | 
             | When it comes to programming. Tell me you don't know so I
             | can do something else. I ended up just refactoring my UX to
             | work around it. In this case it's a personal prototype so
             | it's not a big deal.
        
               | smcnally wrote:
               | That is definitely an issue with many LLMs. I've had
               | limited success including instructions like "Don't invent
               | facts" in the system prompt and more success saying "that
               | was not correct. Please answer again and check to ensure
               | your code works before giving it to me" within the
               | context of chats. More success still comes from
               | requesting second opinions from a different model -- e.g.
               | asking Claude's opinion of Qwen's solution.
               | 
               | To the other point, not admitting to gaps in knowledge or
               | experience is also something that people do all the time.
               | "I copied & pasted that from the top answer in Stack
               | Overflow so it must be correct!" is a direct analog.
        
         | ripped_britches wrote:
         | The most underrated thing I do on nearly every cursor
         | suggestion is to follow up with "are there any better ways to
         | do this?".
        
           | smcnally wrote:
           | A deeper version of the same idea is to ask a second model to
           | check the first model's answers. aider's "architect" is an
           | automated version of this approach.
           | 
           | https://aider.chat/docs/usage/modes.html#architect-mode-
           | and-...
        
           | avandekleut wrote:
           | I always ask it to "analyze approached to achieve X and then
           | make a suggestion, no code" in the chat. Then a refinement
           | step where I give feedback on the generated code. I also
           | always try to give it an "out" between making changes and
           | keeping it to same to stave off the bias of action.
        
             | cootsnuck wrote:
             | Yea, the "analyze and explain but no code yet" approach
             | works well. Let's me audit its approach beforehand.
        
         | csomar wrote:
         | > I am terrified of what is to come.
         | 
         | Don't worry. Like everything else in life, you get what you pay
         | for.
        
         | 55555 wrote:
         | I used to know things. Then they made Google, and I just looked
         | things up. But at least I could still do things. Now we have
         | AI, and I just ask it to do things for me. Now I don't know
         | anything and I can't do anything.
        
           | deltaburnt wrote:
           | I feel like I've seen this comment so many times but actually
           | genuine. The cult like dedication is kind of baffling.
        
           | nyarlathotep_ wrote:
           | Programmers (and adjacent positions) of late strike me as
           | remarkably shortsighted and myopic.
           | 
           | Cheering for remote work leading to loads of new positions
           | being offered overseas opposed to domestically, and now
           | loudly celebrating LLMs writing "boilerplate" for them.
           | 
           | How folks don't see the consequences of their actions is
           | remarkable to me.
        
         | worble wrote:
         | > The suggested code works but is absolute junior level
         | 
         | This isn't far the current status quo. Good software companies
         | pay for people who write top quality code, and the rest pay
         | juniors to work far above their pay grade or offshore it to the
         | cheapest bidder. Now it will be offloaded to LLM's instead.
         | Same code, different writer, same work for a contractor who
         | knows what they're doing to come and fix later.
         | 
         | And so the cycle continues.
        
         | dizhn wrote:
         | Anybody care to comment whether the quality of the existing
         | code influences how good the AI's assistance is? In other
         | words, would they suggest sloppy code where the existing code
         | is sloppy and better (?) code when the existing code is good?
        
           | cootsnuck wrote:
           | What do you think? (I don't mean that in a snarky way.) Based
           | on how LLMs work, I can't see how that would not be the case.
           | 
           | But in my experience there are nuances to this. It's less
           | about "good" vs "bad"/"sloppy" code and more about
           | discernable. If it's discernably sloppy (i.e. the type of
           | sloppy a beginning programmer might do which is familiar to
           | all of us) I would say that's better than opaque "good" code
           | (good really only meaning functional).
           | 
           | These things predict tokens. So when you use them, help them
           | increase their chances of predicting the thing you want. Good
           | comments on code, good function names, explain what you don't
           | know, etc. etc. The same things you would ideally do if
           | working with another person on a codebase.
        
         | sirsinsalot wrote:
         | Reminds me of the 2000s outsourcing hype. I made a lot of money
         | cleaning up that mess. Entire projects late, buggy, unreadable
         | and unmaintainable.
         | 
         | Business pay big when they need to recover from that kind of
         | thing and save face to investors.
        
         | generalizations wrote:
         | > people who blindly "autocomplete" this code are going to
         | stall in their skill level progress
         | 
         | AI is just going to widen the skill level bell curve. Enables
         | some people to get away with far more mediocre work than
         | before, but also enables some people to become far more
         | capable. You can't make someone put in more effort, but the
         | ones who do will really shine.
        
         | shihab wrote:
         | I think that example says more about the company that chose to
         | put that code as a demo in their _homepage_.
        
       | mindcrime wrote:
       | Very cool. I'm especially happy to see that there is an Eclipse
       | client[1]. One note though: I had to dig around a bit to find the
       | info about the Eclipse client. It's not mentioned in the main
       | readme, or in the list of IDE extensions in the docs. Not sure if
       | that's an oversight or because it's not "ready for prime time"
       | yet or what.
       | 
       | [1]:
       | https://github.com/TabbyML/tabby/tree/3bd73a8c59a1c21312e812...
        
       | mlepath wrote:
       | Awesome project! I love the idea of not sending my data to a big
       | company and trust their TOS.
       | 
       | The effectiveness of coding assistant is directly proportional to
       | context length and the open models you can run on your computer
       | are usually much smaller. Would love to see something more
       | quantified around the usefulness on more complex codebases.
        
         | fullstackwife wrote:
         | I hope for proliferation of 100% local coding assistants, but
         | for now the recommendation of "Works best on $10K+ GPU" is a
         | show stopper, and we are forced to use the "big company". :(
        
           | danw1979 wrote:
           | It's not really that bad. You can run some fairly big models
           | on an Apple Silicon machine costing PS2k (M4 Pro Mac Mini
           | with 64GB RAM).
        
       | trevor-e wrote:
       | fyi the pricing page has a typo for "Singel Sign-On"
        
         | wsxiaoys wrote:
         | Appreciated! Fixed
        
       | nbzso wrote:
       | I will go out on a limb and predict that in the next 10 years AI
       | code assistant will be forbidden:)
        
       | qwertox wrote:
       | > Toggle IDE / Extensions telemetry
       | 
       | Cannot be turned off in the Community Edition. What does this
       | telemetry data contain?
        
         | andypants wrote:
         | struct HealthState {             model: String,
         | chat_model: Option<String>,             device: String,
         | arch: String,             cpu_info: String,
         | cpu_count: usize,             cuda_devices: Vec<String>,
         | version: Version,             webserver: Option<bool>,
         | }
         | 
         | https://tabby.tabbyml.com/docs/administration/usage-collecti...
        
       | nikkwong wrote:
       | Maybe a good product but terrible company to interview with. I
       | went through several rounds and was basically ghosted after the
       | 4th with no explanation or follow up. The last interview was to
       | write an blog post for their blog which I submitted and then
       | didn't hear back until continuously nagging months later. It was
       | pretty disheartening since all of the interviews were some form
       | of a take-home and I spent a combined total of ~10 hours or more.
        
         | NetOpWibby wrote:
         | Were you at least paid?
        
           | swyx wrote:
           | you know that paid interview processes are not the norm, "at
           | least" is unlikely
        
             | nikkwong wrote:
             | If I was paid, I probably wouldn't be complaining publicly.
             | :-) It's probably better for both interests if these types
             | of engagements are paid.
        
               | fhd2 wrote:
               | I've worked with paid take home tests for a while, but
               | stopped again. Hiring managers started to make the
               | assignments more convoluted, i.e. stopped respecting the
               | candidate's time. Candidates, on the flip side, always
               | said they don't want to bother with the bureaucracy of
               | writing an invoice and reporting it for their taxes etc.,
               | so didn't want to be paid.
               | 
               | Now my logic is: If a take home test is designed to take
               | more than two hours, we need to redesign it. Two hours of
               | interviews, two hours of take home test, that ought to
               | suffice.
               | 
               | If we're still unsure after that, I sometimes offered the
               | candidate a time limited freelance position, paid
               | obviously. We've ended up hiring everyone who went into
               | that process though.
        
               | avandekleut wrote:
               | I just finished interviewing with a company called
               | Infisical. The take-homes were crazy (the kind of thing
               | that normally takes a few days or a week). I was paid but
               | it took me 12 hours.
        
         | csomar wrote:
         | > The last interview was to write an blog post for their blog
         | 
         | Where you applying as a Software Dev.? Because that's not a
         | software (or an interview) assignment.
        
           | nikkwong wrote:
           | Yes I was applying for software engineer. I think they wanted
           | engineers who were good at explaining the product to users.
        
             | csomar wrote:
             | Sure. Writing and a good command of the language is
             | important. There are multiple ways to showcase that.
             | Writing a blog post for _their_ blog is not one of them.
        
               | nikkwong wrote:
               | I was willing to jump through hoops--I really wanted the
               | job.
        
               | 55555 wrote:
               | Did the blog post get published on their blog?
        
         | j45 wrote:
         | Hope they paid for the work.
        
         | aitchnyu wrote:
         | Did their engineers spend time with you or did they get their
         | blog post otherwise? I once made 1 minute videos for interview
         | process of an AI training data company. I have a hunch they
         | were just harvesting the data.
        
           | nikkwong wrote:
           | They did get the blog post but I don't believe they used it;
           | it's possible that they didn't think it was well written and
           | that's why I was ghosted but I will never know. I know they
           | were interviewing many very talented people for the position.
           | It's okay to be disorganized as a startup, but I think that
           | keeping people happy, employee or otherwise, should always be
           | the top priority. It would have taken just a few seconds to
           | write an email to me to reject me, and by not doing so, this
           | comment has probably evolved into a big nightmare for them. I
           | didn't expect it to get this much attention, but yeah; I
           | guess my general sentiment is shared by many.
        
         | lgrapenthin wrote:
         | Such interview processes are big red flags. The company can't
         | afford taking a risk with you and at the same time tests how
         | desperate you are by making you work for free. They are likely
         | short on cash and short on experience. Expect crunch and bad
         | management. Run.
        
         | redwood wrote:
         | Did they post the blog publicly?
        
         | jejeyyy77 wrote:
         | your first mistake was doing any kind of take-home exercise at
         | all.
        
         | jph wrote:
         | IMHO companies should aim for courteous interviews, with faster
         | decisions, and if there's any take home work then it's fully
         | paid. I've seen your work at Beaver.digital and on
         | GetFractals.com. If you're still looking, feel free to contact
         | me; I'm hiring for a startup doing AI/ML data analysis.
         | Specifically Figma + DaisyUI + TypeScript + Python + Pandas +
         | AWS + Postgres.
        
       | chvid wrote:
       | All the examples are for code that would otherwise be found in a
       | library. Some of the code is of dubious quality.
       | 
       | LLMs - a spam bot for your codebase?
        
       | leke wrote:
       | I'm currently investigating a self hosted AI solution for my
       | workplace.
       | 
       | I was wondering, how does this company make money?
       | 
       | From the pricing there is a free/community/opensource option, but
       | how is the "up to 5 users" monitored?
       | 
       | https://www.tabbyml.com/pricing
       | 
       | * Up to 5 users
       | 
       | * Local deployment
       | 
       | * Code Completion, Answer Engine, In-line chat & Context Provider
       | 
       | What if we have more than 5 users?
        
         | rirze wrote:
         | Are you asking on a public forum, on how to get around using a
         | product for a commercial setting by using the non-commercial
         | version of the product?
        
           | leke wrote:
           | I'm saying I don't understand their open source model. I
           | thought open source meant you could use and modify code and
           | run it yourself without having to pay a license. ie
           | completely independent of the maintainer. So I was confused
           | by this limit of how many were allowed to use something you
           | are running yourself.
        
       | SirMaster wrote:
       | All these things that claim to be an alternative to GitHub
       | Copilot, none of them seem to work in VS2022... So how is it
       | really an alternative?
       | 
       | All I want is a self-hosted AI assistant for VS2022. VS2022
       | supports plugins yes, so what gives?
        
       | jimmydoe wrote:
       | Not using VSCode, would be great to have Sublime Text or Zed
       | support.
        
       | larwent wrote:
       | I've been using something similar called Twinny. It's a vscode
       | extension that connects to an ollama locally hosted LLM of your
       | choice and works like CoPilot.
       | 
       | It's an extra step to install Ollama, so not as plugnplay as tfa
       | but the license is MIT which makes it worthwhile for me.
       | 
       | https://github.com/twinnydotdev/twinny
        
       ___________________________________________________________________
       (page generated 2025-01-13 23:01 UTC)