[HN Gopher] Phind-70B: Closing the code quality gap with GPT-4 T...
       ___________________________________________________________________
        
       Phind-70B: Closing the code quality gap with GPT-4 Turbo while
       running 4x faster
        
       Author : rushingcreek
       Score  : 266 points
       Date   : 2024-02-22 18:54 UTC (4 hours ago)
        
 (HTM) web link (www.phind.com)
 (TXT) w3m dump (www.phind.com)
        
       | SethTro wrote:
       | > Phind-70B is significantly faster than GPT-4 Turbo ... We're
       | able to achieve this by running NVIDIA's TensorRT-LLM library on
       | H100 GPUs
        
         | kkielhofner wrote:
         | As someone who has utilized Nvidia Triton Inference Server for
         | years it's really interesting to see people publicly disclosing
         | use of TensorRT-LLM (almost certainly in conjunction with
         | Triton).
         | 
         | Up until TensorRT-LLM Triton had been kind of an in-group
         | secret amongst high scale inference providers. Now you can
         | readily find announcements, press releases, etc of Triton
         | (TensorRT-LLM) usage from the likes of Mistral, Phind,
         | Cloudflare, Amazon, etc.
        
           | brucethemoose2 wrote:
           | Being accesible is huge.
           | 
           | I still see post of people running ollama on H100s or
           | whatever, and that's just because its so easy to set up.
        
       | pama wrote:
       | Every day now there are new AI models especially LLMs, which
       | might warrant some consideration from a wide part of the human
       | population. In a couple years we will have multiple new
       | announcements per hour and we might need some earlier models to
       | evaluate these new developments and test them. For Phind-70B in
       | particular, I hope that lmsys will share a version that will be
       | part of the human evaluation leaderboard so we get a rounded
       | evaluation. But for code assistants there should be a totally
       | separate impartial evaluation benchmark, ideally still human
       | judged for another year or so but eventually maybe some way of
       | having the models fighting out competitive coding battles that
       | they can help create.
        
         | swatcoder wrote:
         | > In a couple years we will have multiple new announcements per
         | hour
         | 
         | Models are research output. If 10 new models are being
         | announced _every day_ in a couple years, it would mean that
         | generative AI research has failed to stabilize and produce a
         | stable, reliable component ready for product engineering. And
         | if that 's where we are in a couple years, that's almost
         | certainly a sign that the hype was misplaced and that money is
         | chasing after itself trying to recoup sunk costs. That's a
         | failure scenario for this technology, not what an AI-optimist
         | (you otherwise seem to be one) should be anticipating.
        
           | int_19h wrote:
           | That doesn't follow at all. It just means that there are
           | still low-hanging fruits to pursue for better (smarter,
           | faster, larger context etc) new models, but it doesn't say
           | anything about the stability and usefulness of existing
           | models.
        
           | nickpsecurity wrote:
           | That's not true. Both good science and market-driven
           | engineering favor continued iterations on existing ideas
           | looking for improvements or alternatives. We're often
           | exploring a giant space of solutions.
           | 
           | Unlike many fields, the A.I. people are publicly posting many
           | of their steps in this journey, their iterations, for review.
           | While it brings lots of fluff, such openness dramatically
           | increases innovation rate compared to fields where you only
           | see results once or twice a year. Both people using cloud
           | API's and FOSS developers are steadily increasing
           | effectiveness in both experimentation and product
           | development. So, it's working.
        
         | ilove_banh_mi wrote:
         | this is how the WWW started, one new website every other day,
         | then a couple every few hours, then ...
        
       | renewiltord wrote:
       | Anyone tried Phind Pro? The benchmarks are never useful to
       | compare things. I think they're kind of overfit now.
        
         | rushingcreek wrote:
         | Phind founder here. You can try the model for free, without a
         | login, by selecting Phind-70B from the homepage:
         | https://phind.com.
        
           | cl42 wrote:
           | Just tried it out with a Python query. So nice and fast.
           | Great work!
        
           | unshavedyak wrote:
           | interesting, i can't try Phind-70b. It says i have 0 uses of
           | Phind-70b left.
           | 
           | Context: I used to be a Phind Pro subscriber, but I've not
           | used Phind in probably two months.
        
             | vasili111 wrote:
             | Try in browser with Incognito mode?
        
               | unshavedyak wrote:
               | Yup, that works (10 uses avail). Though i wasn't too
               | concerned with actually using it, just thought it was
               | interesting and wanted to expose that maybe-bug.
        
       | karmasimida wrote:
       | HumanEval can be skipped at this point ...
        
       | bugglebeetle wrote:
       | I understand why they're doing this from a cost and dependency
       | perspective, but I've pretty much stopped using Phind since they
       | switched over to their own models. I used to use it in the past
       | for thing like API docs summarization, but it seems to give
       | mostly wrong answers for that now. I think this is mostly a "RAG
       | doesn't work very well without a very strong general model
       | parsing the context" problem, which their prior use of GPT-4 was
       | eliding.
        
         | dingnuts wrote:
         | I used it for awhile and it was pretty good at Bash or Emacs
         | Lisp one-liners but it was wrong often enough that it was
         | faster to just search on Kagi for the information that I want
         | first, instead of performing N searches to check the answer
         | from Phind after querying Phind.
        
         | rushingcreek wrote:
         | Phind founder here. Thanks for the feedback -- I'd love to hear
         | your thoughts on this new model. You can try it for free,
         | without a login, by selecting it from the homepage:
         | https://phind.com.
        
           | bugglebeetle wrote:
           | I just tried using the 70B model and the answer was listed as
           | being returned using the 34B model instead of the 70B model
           | and was wrong. Is there some logic that ignores user choice,
           | depending on what the service thinks can be answered?
        
           | int_19h wrote:
           | I don't know about coding specifically, but its ability to
           | solve logical puzzles is certainly vastly inferior to GPT-4.
           | Have a look:
           | 
           | https://www.phind.com/agent?cache=clsxnhahk0006jn08zjvcgc9g
           | 
           | https://chat.openai.com/share/ec5bad29-2cda-48b5-9aee-
           | da9149...
        
       | kristianp wrote:
       | Any Sublime Text plugin? I can't stand how distracting VS code
       | is.
        
         | DoesntMatter22 wrote:
         | Out of curiosity how do you find it to be distracting
        
         | jsmith12673 wrote:
         | Rare to find a fellow ST4 user these days
        
           | bigstrat2003 wrote:
           | Fellow ST4 user checking in. It does everything VSCode does
           | (minus remote development, which I don't need) with 1/4 of
           | the resource usage. Just a quality piece of software that
           | I'll keep using for as long as I can.
        
             | mmmuhd wrote:
             | Does SFTP + Git on ST4 not count as remote development?
             | Cause i am using them as my remote development stack.
        
           | arbuge wrote:
           | We're here.
        
           | anonymous344 wrote:
           | You guys have ST4?? I'm still with 3 because that's what I
           | paid for..as an "lifetime licence" if remembering correctly
        
         | Alifatisk wrote:
         | My config of vscode made it as minimalistic as sublime.
        
           | vasili111 wrote:
           | Did VScode became also more responsive?
        
             | mewpmewp2 wrote:
             | VSCode used to be great, but now it feels garbage, or was
             | it garbage all the time?
             | 
             | I used it because it was faster than WebStorm, but WebStorm
             | was always just better. Now it seems VSCode is as slow as
             | WebStorm, but is still garbage in everything.
        
               | vasili111 wrote:
               | I use VSCode for Python programming with Python for data
               | science related tasks (never used for web design). I
               | especially like Python interactive mode:
               | https://code.visualstudio.com/docs/python/jupyter-
               | support-py
               | 
               | It will be interesting to hear from other people why they
               | do not like VSCode for data science related tasks.
        
               | beeburrt wrote:
               | I wonder if [VSCodium](https://vscodium.com/) suffers
               | from same issues
        
             | Alifatisk wrote:
             | I wouldn't say so, it's still bloated but it's hidden. The
             | only change is that the ui is very minimal, like sublime.
             | 
             | My extensions is still there and I can access everything
             | through shortcuts or the command palette.
        
       | behnamoh wrote:
       | In other words: "our 70B finetune is as good as a 8x200B model"
       | 
       | Yeah, right.
        
         | google234123 wrote:
         | I'm not sure GPT 4 is still 8x200B
        
         | minimaxir wrote:
         | The one thing we've learnt from the past few months of LLM
         | optimization is that model size is no longer the most important
         | thing in determining LLM quality.
         | 
         | A better training regimen and better architecture optimizations
         | have allowed smaller models to push above their weight. The
         | leaderboard has many open 7B and 13B models that are comparable
         | with 72B models:
         | https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...
        
           | ein0p wrote:
           | It kinda is, if you want not just performance on synthetic
           | benchmarks but a good coverage of the long tail. This is
           | where GPT4 excels, and also why I pay for it. Transformers
           | are basically fancy associative memories. A smaller model,
           | much like a smaller search index, will not be able to contain
           | as much nuanced information for some hard, immutable,
           | information theoretic reasons.
        
           | behnamoh wrote:
           | > The leaderboard has many open 7B and 13B models that are
           | comparable with 72B models: https://huggingface.co/spaces/Hug
           | gingFaceH4/open_llm_leaderb...
           | 
           | I follow your posts and comments here so I'm surprised you
           | say that. The leaderboard at this point is pretty pointless.
           | Lots of ways to "cheat" and get higher ranking there.
           | 
           | I do agree that smaller models have made significant
           | progress, but somethings you can't just solve without adding
           | #parameters and FLOPs. Not to mention, ctx_window is an
           | important factor in code quality, but most OSS models
           | (including llama 2) have pretty limited ctx, despite methods
           | like grp and yarn.
        
             | minimaxir wrote:
             | It's more a comment on the capabilities of smaller models,
             | the quality of output outside of benchmarks is always
             | subjective and you'd need something like Chatbot Arena
             | (https://chat.lmsys.org/) to evaluate it more
             | quantitatively. Even after filtering out the common cheat
             | techniques like merges, there are still 7B and 13B near the
             | top, but yes it's still possible to train models on the
             | evaluation datasets without decontamination.
             | 
             | If you look at the Chatbot Arena leaderboards there are
             | still decently-high ELOs for 7B models.
        
               | visarga wrote:
               | I evaluated many Mistrals for an information extraction
               | task and the merged models were much better than direct
               | fine-tunes. About 5% better.
        
           | brucethemoose2 wrote:
           | I agree...
           | 
           | Except for the leaderboard. Its all but useless, not just
           | because of the data contamination/cheating but because the
           | benchmarks themselves are flawed. They are full of
           | ambiguity/errors, and they dont even use instruct formatting.
        
           | ignoramous wrote:
           | I've found that GPT4 (via GitHub Copilot) and Gemini models
           | are better at code tasks like reviewing for logical and
           | functional errors, reasoning about structure and test/edge
           | cases, and refactoring. Gemini is capable of devouring some
           | very large files I've thrown at it.
           | 
           | Phind at times is hampered by whatever it is they're doing in
           | addition (RAG?). It is still phenomenal, though. I regularly
           | find myself using Phind to grok assembly code or learn
           | Typescript.
        
             | sroussey wrote:
             | How do you know that copilot is using gpt4?
             | 
             | I pay for it and for chatGPT and I find copilot much worse.
        
               | ignoramous wrote:
               | Looks like _Copilot_ may use GPT4 or GPT3.5 depending on
               | as of yet unpublished criteria:
               | https://github.com/microsoft/vscode-copilot-
               | release/issues/6...
               | 
               | For code review, I tend to engage _Copilot Chat_ which
               | probably uses GPT4 more often? https://github.com/orgs/co
               | mmunity/discussions/58059#discussi...
        
           | SirMaster wrote:
           | But what if you apply the same level of optimization, same
           | training regimen to the larger models?
        
         | rushingcreek wrote:
         | Phind-70B is a specialist model, unlike GPT-4. It optimizes for
         | a different function than GPT-4 and therefore needs fewer
         | parameters to learn it.
         | 
         | It's also true that specialist models still need to be
         | sufficiently large to be able to reason well, but we've
         | observed diminishing returns as models get larger.
        
         | CuriouslyC wrote:
         | I mean, it could be as good or better at a lot of reasoning
         | related tasks and just have less baked in general knowledge, in
         | which case it'd make an amazing RAG model if the context length
         | is reasonable.
        
       | ipsum2 wrote:
       | What's the story behind the melted h100? I've been having down
       | clocking issues when using fp8 because of thermals as well.
        
         | rushingcreek wrote:
         | We noticed that the training run crashed because one of the
         | GPUs fell off the bus. Power cycling the host server didn't
         | help and diagnostics showed thermal damage. We were able to
         | swap in a different node, but apparently the entire host server
         | needed to be replaced.
         | 
         | We've generally noticed a relatively high failure rate for H100
         | hardware and I'm not quite sure what is behind that.
        
           | ipsum2 wrote:
           | The entire server? That's crazy. Are you doing FP8 training
           | or did you encounter this with BF16?
        
           | davidzweig wrote:
           | Check PLX chips are getting enough airflow, assuming you have
           | them?
        
         | alecco wrote:
         | FWIW, 4090 has fp8 throttling issues:
         | 
         | https://forums.developer.nvidia.com/t/ada-geforce-rtx-4090-f...
        
       | rushingcreek wrote:
       | Phind founder here. You can try the model for free, without a
       | login, by selecting Phind-70B from the homepage:
       | https://phind.com.
        
         | goldemerald wrote:
         | Very nice. I've been working with GPT4 since it released, and I
         | tried some of my coding tasks from today with Phind-70B. The
         | speed, conciseness, and accuracy are very impressive.
         | Subjectively, the answers it gives just _feel_ better than
         | GPT4, I 'm definitely gonna give pro a try this month.
        
           | visarga wrote:
           | I prefer Phind's web search with LLM to both Google search
           | and GPT-4. I have switched my default search engine, only
           | using Google for finding sites, not for finding information
           | anymore.
           | 
           | GPT-4 might be a better LLM but its search capability is
           | worse, sometimes sends really stupid search keywords that are
           | clearly not good enough.
        
         | browningstreet wrote:
         | Hmm, when I try I see this in the dropdown:
         | 
         | 0 Phind-70B uses left
         | 
         | And I've never made any selection there.
        
           | rushingcreek wrote:
           | I'd suggest logging in in that case -- you will still get
           | your free uses. The Phind-70B counter for non-logged in users
           | has carried over from when we offered GPT-4 uses without a
           | login. If you've already consumed those uses, you'll need to
           | log in to use Phind-70B.
        
             | browningstreet wrote:
             | Thanks.
        
         | shrubble wrote:
         | I tried a question about Snobol4 and was impressed with what it
         | said (it couldn't provide an exact example due to paucity of
         | examples). When testing more mainstream languages I have found
         | it very helpful.
        
         | bee_rider wrote:
         | Important and hard-hitting question from me: have you ever
         | considered calling yourself the Phinder or the Phiounder?
        
           | Zacharias030 wrote:
           | or the PhiTO / PhiEO
        
           | fragmede wrote:
           | Find Phounder
        
         | declaredapple wrote:
         | Any chances of an API?
         | 
         | And are there plans to release any more weights? Perhaps one or
         | two revisions behind your latest ones?
        
           | parineum wrote:
           | Ask phind to make you one that screen scrapes
        
         | Fervicus wrote:
         | I don't use LLMs a lot, maybe once a week or so. But I always
         | pick Phind as my first choice because it's not behind a login
         | and I can use it without giving my phone number. Hopefully
         | you'll keep it that way!
        
       | brucethemoose2 wrote:
       | I have not had luck with codellama 70B models for coding, nor
       | have I had it with the mistral leak.
       | 
       | If I were Phind, I'd be looking at Deepseek 33B instead. While
       | obviously dumber for anything else, it feels much better at
       | coding. Its just begging for a continued pretrain like that, and
       | it will be significantly faster on 80GB cards.
        
         | mewpmewp2 wrote:
         | Does this run on 4090 16gb vram?
         | 
         | What's best that can run fast on 4090 laptop?
        
           | brucethemoose2 wrote:
           | Your options are:
           | 
           | - Hybrid offloading with llama.cpp, but with slow inference.
           | 
           | - _Squeezing_ it in with extreme quantization (exllamav2
           | ~2.6bpw, or llama.cpp IQ3XS), but reduced quality and a
           | relatively short context.
           | 
           | 30B-34B is more of a sweetspot for 24GB of VRAM.
           | 
           | If you do opt for the high quantization, make sure your
           | laptop dGPU is totally empty, and that its completely filled
           | by the weights. And I'd recommend doing your own code focused
           | exl2/imatrix quantization, so it doesn't waste a megabyte of
           | your vram.
        
         | rushingcreek wrote:
         | We've found that CodeLlama-70B is a much more capable base
         | model than DeepSeek-33B. I'd love to hear your feedback on
         | Phind-70B specifically.
        
           | brucethemoose2 wrote:
           | Yeah I will have to test it out, though TBH I am more
           | inclined to run models locally.
           | 
           | As I mentioned, being such an extensive continuation train
           | can (sometimes) totally change the capabilities of a model.
        
         | rickette wrote:
         | Deepseek 33B is great. Also runs well on a modern (beefy) MBP.
        
         | shapenamer wrote:
         | After running a bunch of models on my own PC (a pretty good
         | one), I have to say by FAR the best results for coding has been
         | with Deepseek models. However, I just spent 20 minutes playing
         | with this Phind 70B model and it's totally nailing the
         | questions I'm asking it. Pretty impressed.
        
         | johnfn wrote:
         | Is this related to the post? Phind has introduced their own
         | model. Codellama 70B isn't related to Phind's model, other than
         | presumably the "70B" size.
        
           | rushingcreek wrote:
           | Phind-70B is an extensive fine-tune on top of CodeLlama-70B
        
             | brucethemoose2 wrote:
             | Yeah, and I'd go so far as to call it a continued pretrain
             | with that many tokens. More like a whole new model than a
             | traditional finetune.
        
       | afiodorov wrote:
       | I don't trust the code quality evalution. The other day at work I
       | wanted to split my string by ; but only if it's not within single
       | quotes (think about splitting many SQL statements). I explicitly
       | asked for stdlib python solution and preferrably avoid counting
       | quotes since that's a bit verbose.
       | 
       | GPT4 gave me a regex found on https://stackoverflow.com/a/2787979
       | (without "), explained it to me and then it successfully added
       | all the necessary unit tests and they passed - I commited all of
       | that to the repo and moved on.
       | 
       | I couldn't get 70B to answer this question even with multiple
       | nudges.
       | 
       | Every time I try something non GPT-4 I always go back - it's
       | feels like a waste of time otherwise. A bit sad that LLMs follow
       | the typical winner-takes-it-all tech curve. However if you could
       | ask the smartest guy in the room your question every time, why
       | wouldn't you?
       | 
       | ---
       | 
       | Edit: _USE CODE MODE_ and it 'll actually solve it.
        
         | rushingcreek wrote:
         | Thanks for the feedback, could you please post the cached Phind
         | link so we can take a look?
         | 
         | It might also be helpful to try Phind Chat mode in cases like
         | this.
         | 
         | EDIT: It seems like Phind-70B is capable of getting the right
         | regex nearly every time when Chat mode is used or search
         | results are disabled. It seems that the search results are
         | polluting the answer for this example, we'll look into how to
         | fix it.
        
           | afiodorov wrote:
           | https://www.phind.com/search?cache=r2a52gs77wtmi277o0xi4z2a
        
             | rushingcreek wrote:
             | Phind-70B worked well for me just now: https://www.phind.co
             | m/agent?cache=clsxokt2u0002ig09n1e11bj9.
             | 
             | For writing/manipulating code, Chat mode might work better
             | than Search.
        
               | afiodorov wrote:
               | You're right! It solved it. I didn't know about the
               | Code/Search distinction. I still struggled for it to
               | write me the unit tests. It does write them, they just
               | don't pass. But this is definitely much closer to GPT4
               | than I originally thought.
        
           | samstave wrote:
           | May you please. PLEASE
           | 
           | post as to how the chat option was polluting stuff, and the
           | pipeline of whatever made that happen.
           | 
           | Make this less opaque. (actually just post how pollution
           | happens, as well as a definition to pollution as pertains to
           | such.
           | 
           | Diminishing trust is at stake.
        
         | kunalgupta wrote:
         | same exp
        
       | tastyminerals2 wrote:
       | I used to use Phind for couple of months. I liked the UI
       | improvements but the slow limited free GPT4 and fast lackluster
       | Phind model turned me off. I tried Bing and it wasn't worse, had
       | more free searches per day.
        
       | fsniper wrote:
       | I tried the model and asked it to write a kubernetes operator
       | with required DockerFiles, Resources, application code.. Asked it
       | to migrate application to different languages. It looks like it's
       | pretty capable and fast. It is impressive.
        
       | jameswlepage wrote:
       | Is there any API? Would love to plug it into our pipeline and see
       | what happens
        
       | atemerev wrote:
       | Impressive on my tests, excellent work! Indeed, it is better than
       | GPT-4 for coding-related activities.
       | 
       | I suppose you are not releasing the weights, right? Anyway, good
       | luck! I hope investors are already forming a nice queue before
       | your door :)
        
         | rushingcreek wrote:
         | Thanks for the feedback :)
         | 
         | We will eventually release the weights.
        
           | atemerev wrote:
           | Wow, thanks!
        
       | sergiotapia wrote:
       | Terrific stuff. I always enjoy using Phind for dev related
       | questions.
       | 
       | Is it possible the chat history gets some product love? I would
       | like to organize my conversations with tags, and folders. Make it
       | easier to go back to what was said in the past instead of asking
       | the question again.
       | 
       | Thanks!
        
       | devinprater wrote:
       | Can we get a few accessibility fixed? The expandable button after
       | the sign in button and the button after that are unlabeled. The
       | image on the heading at level 1 has no Alt-text. The three
       | buttons after the "Phind-34B" button are not labeled. The ones
       | between that and the suggestions. On search results, there's an
       | unlabeled button after each one, followed by a button labeled
       | something like " search cache=tbo0oyn4s955gf03o...".
       | 
       | There's probably more, but hopefully that should get things
       | started if you can fix these.
        
         | bakkoting wrote:
         | Physician, heal thyself!
         | 
         | https://www.phind.com/agent?cache=clsxs6doj000wl008yk8wb4k8
         | 
         | It pointed out the lack of alt-text as well as a couple other
         | issues. Some of the suggestions aren't applicable, but it's not
         | bad as a starting point.
        
       | hamilyon2 wrote:
       | Impressive, it solved puzzles gpt-4 struggled with with some
       | prompting
        
         | rushingcreek wrote:
         | Thanks! Can you send the cached link?
        
       | Eisenstein wrote:
       | So far only GPT4 and mistral-next have answered this question
       | correctly.
       | 
       | * https://www.phind.com/search?cache=rj4tpu6ut0jyzkf876e2fahh
       | 
       | The answer is 'lower' because the weight of the ball as a volume
       | of water is larger than the volume of the ball.
        
         | rushingcreek wrote:
         | Phind-70B can get this too:
         | https://www.phind.com/search?cache=b7w0rt4zybaajbsogatrb7q6.
        
       | computerex wrote:
       | Phind makes impressive claims. They also claimed that their fine
       | tune of codellama beat gpt4, but their finetune is _miles behind_
       | gpt4 in open domain code generation.
       | 
       | Not impressed. Also this is a closed walled garden model.
        
       | lagniappe wrote:
       | I chose 70B and gave it a code task, and it answered as
       | Phind-34B. This was my first query. Did I trip a limit or do
       | something wrong?
        
         | rushingcreek wrote:
         | Try logging in please if that's the case.
        
           | lagniappe wrote:
           | Thank you for the reply, I'd like to congratulate you on the
           | release, first. I'm a bit of a minimalist with regard to
           | signups, unfortunately, so unless this is a known limit then
           | I'd likely just spectate the thread and be happy for you from
           | a distance.
        
       | visitor4712 wrote:
       | "summary of plato's politeia"
       | 
       | the answer was good. two follow up answers were also fine.
       | 
       | just curious: what about the copyright status of the given
       | sources?
       | 
       | the best result I received so far was with MS Bing app (android).
       | 
       | had reasonable results with my local llama2 13B.
       | 
       | cheers
        
         | littlestymaar wrote:
         | Plato being dead around 2300 years ago, and two millennia
         | before copyright was invented, I think it's going to be fine
         | ;).
        
           | mkl wrote:
           | Translations can be copyrighted.
        
         | imglorp wrote:
         | Phind is for developers. Wouldn't you rather it grok
         | documentation than philosophy?
        
       | nerdo wrote:
       | > Phind-70B is also less "lazy" than GPT-4 Turbo and doesn't
       | hesistate to generate detailed code examples.
       | 
       | OpenAI's leaked prompt literally encourages it to try harder[1]:
       | 
       | > Use high effort; only tell the user that you were not able to
       | find anything as a last resort. Keep trying instead of giving up.
       | 
       | 1: https://pastebin.com/vnxJ7kQk
        
         | rushingcreek wrote:
         | Yep, LLMs are wacky. Telling Phind-70B to "take a deep breath"
         | helps it answer better!
        
       | jamesponddotco wrote:
       | I'm impressed with the speed, really impressed, but not so much
       | with the quality of the responses. This is a prompt I usually try
       | with new LLMs:
       | 
       | > Acting as an expert Go developer, write a RoundTripper that
       | retries failed HTTP requests, both GET and POST ones.
       | 
       | GPT-4 takes a few tries but usually takes the POST part into
       | account, saving the body for new retries and whatnot. Phind in
       | the other hand, in the two or three times I tried, ignores the
       | POST part and focus on GET only.
       | 
       | Maybe that problem is just too hard for LLMs? Or the prompt
       | sucks? I'll see how it handle other things since I still have a
       | few tries left.
        
         | rushingcreek wrote:
         | Thanks, can you send the cached link please? I'd also suggest
         | trying Chat mode for questions like this, where there are
         | unlikely to benefit from an internet search.
         | 
         | Just tried your query now and it seemed to work well -- what
         | are your thoughts?
         | 
         | https://www.phind.com/search?cache=tvyrul1spovzcpwtd8phgegj
        
           | jamesponddotco wrote:
           | Here you go:
           | 
           | https://www.phind.com/search?cache=k56i132ekpg43zdc7j5z1h1x
           | 
           | I'll give chat mode a try. Didn't see that it existed until
           | now.
           | 
           | EDIT
           | 
           | Chat mode didn't do much better:
           | 
           | https://www.phind.com/agent?cache=clsxpl4t80002l008v3vjqw5j
           | 
           | For the record, this is the interface I asked it to
           | implement:
           | 
           | https://pkg.go.dev/net/http#RoundTripper
        
             | rushingcreek wrote:
             | Thanks for the links. It seems like it switched to
             | Phind-34B, which is worse.
             | 
             | Phind-70B seems to be able to get the right interface every
             | time. Please make sure that it says Phind-70B at the top of
             | the page while it's generating.
        
               | dimask wrote:
               | In the link it says "Phind-70B", how do we know if it
               | switched to 34B?
        
               | coder543 wrote:
               | The first link definitely says Phind-34B on my browser.
        
         | coder543 wrote:
         | "RoadTripper"? Or "RoundTripper"?
        
           | jamesponddotco wrote:
           | Ops, haha. Interesting that GPT-4 still got it right though.
           | 
           | Phind still forgot about POST, but at least now it got the
           | interface right.
           | 
           | https://www.phind.com/search?cache=ipu8z1tb3bnn7nfgfibcix38
        
             | coder543 wrote:
             | I'm not sure what you mean that it "forgot" about POST?
             | Even as an experienced Go developer, I looked at the code
             | and thought it would probably work for both GET and POST. I
             | couldn't easily see a problem, yet I had not forgotten
             | about POST being part of the request. It's just not an
             | obvious problem. This is absolutely what I would classify
             | as a "brain teaser". It's a type of problem that makes an
             | interviewer feel clever, but it's not great for actually
             | evaluating candidates.
             | 
             | Only on running the code did I realize that it wasn't doing
             | anything to handle the problem of the request body, where
             | it works on the first attempt, but the ReadCloser is empty
             | on subsequent attempts. It looks like Phind-70B corrected
             | this issue once it was pointed out.
             | 
             | I've seen GPT-4 make plenty of small mistakes when
             | generating code, so being iterative seems normal, even if
             | GPT-4 might have this one specific brain teaser completely
             | memorized.
             | 
             | I am not at the point where I expect any LLM to blindly
             | generate perfect code every time, but if it can usually
             | correct issues with feedback from an error message, then
             | that's still quite good.
        
         | shapenamer wrote:
         | I'm a human and I don't have the slightest idea what you're
         | asking for.
        
           | Powdering7082 wrote:
           | Do you use Go? It makes sense to me
        
       | zettabomb wrote:
       | A fun little challenge I like to give LLMs is to ask some basic
       | logic puzzles, i.e. how can I measure 2 liters using a 3 liter
       | and a 5 liter container? Usually if it's possible, they seem to
       | do ok. When it's not possible, they produce a variety of wacky
       | results. Phind-34B is rather amusing, and seems to get stuck in a
       | loop: https://www.phind.com/agent?cache=clsxpravk0001la081cc9dl45
        
         | thelittleone wrote:
         | These are interesting tests. I wonder how far we are away from
         | AIs solving these (the ones that have no solution) without any
         | special programming to teach them how.
        
       | satellite2 wrote:
       | > We love the open-source community and will be releasing the
       | weights for the latest Phind-34B model in the coming weeks. We
       | intend to release the weights for Phind-70B in time as well.
       | 
       | I don't understand the utility of this comment?
        
       | EmilStenstrom wrote:
       | Contrary to many other models I've tried, this one works really
       | well for Swedish as well. Nice!
        
       | dilo_it wrote:
       | Weirdly enough, when I asked "give me a formula for the fourier
       | transform in the continuous domain" to the 70B model, it gave me
       | a latex-like formatted string, while when asked for "give me
       | pseudocode for the fft" I got a nice code snippet with proper
       | formatting. The formulas though were both correct. We're not at
       | Groq level of speed here, but I have to say, it looks pretty good
       | to me. cache=uyem9mo96tjeibaeljm1ztts for the devs if they wanna
       | look it up.
        
       | mike_hearn wrote:
       | Do you have an API that could be plugged into https://aider.chat/
       | ? It's by far the best way to use GPT4 for coding, in my
       | experience, and more speed is exactly what it could use. But it
       | needs an OpenAI compatible API.
        
       | simplyinfinity wrote:
       | I just tried this.. It's a bit more lazy than chatgpt 3.5/4 which
       | sometimes go ahead and translate a Go file to C# in full. Most
       | times they omit most of the logic because "it's too complex" "it
       | would require extensive resources". Phind is no different, but it
       | entirely refuses to do entire code translation.
       | 
       | https://www.phind.com/agent?cache=clsxrt4200001jp08wwi55rm1
        
       ___________________________________________________________________
       (page generated 2024-02-22 23:00 UTC)