[HN Gopher] Phind-70B: Closing the code quality gap with GPT-4 T...
___________________________________________________________________
Phind-70B: Closing the code quality gap with GPT-4 Turbo while
running 4x faster
Author : rushingcreek
Score : 266 points
Date : 2024-02-22 18:54 UTC (4 hours ago)
(HTM) web link (www.phind.com)
(TXT) w3m dump (www.phind.com)
| SethTro wrote:
| > Phind-70B is significantly faster than GPT-4 Turbo ... We're
| able to achieve this by running NVIDIA's TensorRT-LLM library on
| H100 GPUs
| kkielhofner wrote:
| As someone who has utilized Nvidia Triton Inference Server for
| years it's really interesting to see people publicly disclosing
| use of TensorRT-LLM (almost certainly in conjunction with
| Triton).
|
| Up until TensorRT-LLM Triton had been kind of an in-group
| secret amongst high scale inference providers. Now you can
| readily find announcements, press releases, etc of Triton
| (TensorRT-LLM) usage from the likes of Mistral, Phind,
| Cloudflare, Amazon, etc.
| brucethemoose2 wrote:
| Being accesible is huge.
|
| I still see post of people running ollama on H100s or
| whatever, and that's just because its so easy to set up.
| pama wrote:
| Every day now there are new AI models especially LLMs, which
| might warrant some consideration from a wide part of the human
| population. In a couple years we will have multiple new
| announcements per hour and we might need some earlier models to
| evaluate these new developments and test them. For Phind-70B in
| particular, I hope that lmsys will share a version that will be
| part of the human evaluation leaderboard so we get a rounded
| evaluation. But for code assistants there should be a totally
| separate impartial evaluation benchmark, ideally still human
| judged for another year or so but eventually maybe some way of
| having the models fighting out competitive coding battles that
| they can help create.
| swatcoder wrote:
| > In a couple years we will have multiple new announcements per
| hour
|
| Models are research output. If 10 new models are being
| announced _every day_ in a couple years, it would mean that
| generative AI research has failed to stabilize and produce a
| stable, reliable component ready for product engineering. And
| if that 's where we are in a couple years, that's almost
| certainly a sign that the hype was misplaced and that money is
| chasing after itself trying to recoup sunk costs. That's a
| failure scenario for this technology, not what an AI-optimist
| (you otherwise seem to be one) should be anticipating.
| int_19h wrote:
| That doesn't follow at all. It just means that there are
| still low-hanging fruits to pursue for better (smarter,
| faster, larger context etc) new models, but it doesn't say
| anything about the stability and usefulness of existing
| models.
| nickpsecurity wrote:
| That's not true. Both good science and market-driven
| engineering favor continued iterations on existing ideas
| looking for improvements or alternatives. We're often
| exploring a giant space of solutions.
|
| Unlike many fields, the A.I. people are publicly posting many
| of their steps in this journey, their iterations, for review.
| While it brings lots of fluff, such openness dramatically
| increases innovation rate compared to fields where you only
| see results once or twice a year. Both people using cloud
| API's and FOSS developers are steadily increasing
| effectiveness in both experimentation and product
| development. So, it's working.
| ilove_banh_mi wrote:
| this is how the WWW started, one new website every other day,
| then a couple every few hours, then ...
| renewiltord wrote:
| Anyone tried Phind Pro? The benchmarks are never useful to
| compare things. I think they're kind of overfit now.
| rushingcreek wrote:
| Phind founder here. You can try the model for free, without a
| login, by selecting Phind-70B from the homepage:
| https://phind.com.
| cl42 wrote:
| Just tried it out with a Python query. So nice and fast.
| Great work!
| unshavedyak wrote:
| interesting, i can't try Phind-70b. It says i have 0 uses of
| Phind-70b left.
|
| Context: I used to be a Phind Pro subscriber, but I've not
| used Phind in probably two months.
| vasili111 wrote:
| Try in browser with Incognito mode?
| unshavedyak wrote:
| Yup, that works (10 uses avail). Though i wasn't too
| concerned with actually using it, just thought it was
| interesting and wanted to expose that maybe-bug.
| karmasimida wrote:
| HumanEval can be skipped at this point ...
| bugglebeetle wrote:
| I understand why they're doing this from a cost and dependency
| perspective, but I've pretty much stopped using Phind since they
| switched over to their own models. I used to use it in the past
| for thing like API docs summarization, but it seems to give
| mostly wrong answers for that now. I think this is mostly a "RAG
| doesn't work very well without a very strong general model
| parsing the context" problem, which their prior use of GPT-4 was
| eliding.
| dingnuts wrote:
| I used it for awhile and it was pretty good at Bash or Emacs
| Lisp one-liners but it was wrong often enough that it was
| faster to just search on Kagi for the information that I want
| first, instead of performing N searches to check the answer
| from Phind after querying Phind.
| rushingcreek wrote:
| Phind founder here. Thanks for the feedback -- I'd love to hear
| your thoughts on this new model. You can try it for free,
| without a login, by selecting it from the homepage:
| https://phind.com.
| bugglebeetle wrote:
| I just tried using the 70B model and the answer was listed as
| being returned using the 34B model instead of the 70B model
| and was wrong. Is there some logic that ignores user choice,
| depending on what the service thinks can be answered?
| int_19h wrote:
| I don't know about coding specifically, but its ability to
| solve logical puzzles is certainly vastly inferior to GPT-4.
| Have a look:
|
| https://www.phind.com/agent?cache=clsxnhahk0006jn08zjvcgc9g
|
| https://chat.openai.com/share/ec5bad29-2cda-48b5-9aee-
| da9149...
| kristianp wrote:
| Any Sublime Text plugin? I can't stand how distracting VS code
| is.
| DoesntMatter22 wrote:
| Out of curiosity how do you find it to be distracting
| jsmith12673 wrote:
| Rare to find a fellow ST4 user these days
| bigstrat2003 wrote:
| Fellow ST4 user checking in. It does everything VSCode does
| (minus remote development, which I don't need) with 1/4 of
| the resource usage. Just a quality piece of software that
| I'll keep using for as long as I can.
| mmmuhd wrote:
| Does SFTP + Git on ST4 not count as remote development?
| Cause i am using them as my remote development stack.
| arbuge wrote:
| We're here.
| anonymous344 wrote:
| You guys have ST4?? I'm still with 3 because that's what I
| paid for..as an "lifetime licence" if remembering correctly
| Alifatisk wrote:
| My config of vscode made it as minimalistic as sublime.
| vasili111 wrote:
| Did VScode became also more responsive?
| mewpmewp2 wrote:
| VSCode used to be great, but now it feels garbage, or was
| it garbage all the time?
|
| I used it because it was faster than WebStorm, but WebStorm
| was always just better. Now it seems VSCode is as slow as
| WebStorm, but is still garbage in everything.
| vasili111 wrote:
| I use VSCode for Python programming with Python for data
| science related tasks (never used for web design). I
| especially like Python interactive mode:
| https://code.visualstudio.com/docs/python/jupyter-
| support-py
|
| It will be interesting to hear from other people why they
| do not like VSCode for data science related tasks.
| beeburrt wrote:
| I wonder if [VSCodium](https://vscodium.com/) suffers
| from same issues
| Alifatisk wrote:
| I wouldn't say so, it's still bloated but it's hidden. The
| only change is that the ui is very minimal, like sublime.
|
| My extensions is still there and I can access everything
| through shortcuts or the command palette.
| behnamoh wrote:
| In other words: "our 70B finetune is as good as a 8x200B model"
|
| Yeah, right.
| google234123 wrote:
| I'm not sure GPT 4 is still 8x200B
| minimaxir wrote:
| The one thing we've learnt from the past few months of LLM
| optimization is that model size is no longer the most important
| thing in determining LLM quality.
|
| A better training regimen and better architecture optimizations
| have allowed smaller models to push above their weight. The
| leaderboard has many open 7B and 13B models that are comparable
| with 72B models:
| https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...
| ein0p wrote:
| It kinda is, if you want not just performance on synthetic
| benchmarks but a good coverage of the long tail. This is
| where GPT4 excels, and also why I pay for it. Transformers
| are basically fancy associative memories. A smaller model,
| much like a smaller search index, will not be able to contain
| as much nuanced information for some hard, immutable,
| information theoretic reasons.
| behnamoh wrote:
| > The leaderboard has many open 7B and 13B models that are
| comparable with 72B models: https://huggingface.co/spaces/Hug
| gingFaceH4/open_llm_leaderb...
|
| I follow your posts and comments here so I'm surprised you
| say that. The leaderboard at this point is pretty pointless.
| Lots of ways to "cheat" and get higher ranking there.
|
| I do agree that smaller models have made significant
| progress, but somethings you can't just solve without adding
| #parameters and FLOPs. Not to mention, ctx_window is an
| important factor in code quality, but most OSS models
| (including llama 2) have pretty limited ctx, despite methods
| like grp and yarn.
| minimaxir wrote:
| It's more a comment on the capabilities of smaller models,
| the quality of output outside of benchmarks is always
| subjective and you'd need something like Chatbot Arena
| (https://chat.lmsys.org/) to evaluate it more
| quantitatively. Even after filtering out the common cheat
| techniques like merges, there are still 7B and 13B near the
| top, but yes it's still possible to train models on the
| evaluation datasets without decontamination.
|
| If you look at the Chatbot Arena leaderboards there are
| still decently-high ELOs for 7B models.
| visarga wrote:
| I evaluated many Mistrals for an information extraction
| task and the merged models were much better than direct
| fine-tunes. About 5% better.
| brucethemoose2 wrote:
| I agree...
|
| Except for the leaderboard. Its all but useless, not just
| because of the data contamination/cheating but because the
| benchmarks themselves are flawed. They are full of
| ambiguity/errors, and they dont even use instruct formatting.
| ignoramous wrote:
| I've found that GPT4 (via GitHub Copilot) and Gemini models
| are better at code tasks like reviewing for logical and
| functional errors, reasoning about structure and test/edge
| cases, and refactoring. Gemini is capable of devouring some
| very large files I've thrown at it.
|
| Phind at times is hampered by whatever it is they're doing in
| addition (RAG?). It is still phenomenal, though. I regularly
| find myself using Phind to grok assembly code or learn
| Typescript.
| sroussey wrote:
| How do you know that copilot is using gpt4?
|
| I pay for it and for chatGPT and I find copilot much worse.
| ignoramous wrote:
| Looks like _Copilot_ may use GPT4 or GPT3.5 depending on
| as of yet unpublished criteria:
| https://github.com/microsoft/vscode-copilot-
| release/issues/6...
|
| For code review, I tend to engage _Copilot Chat_ which
| probably uses GPT4 more often? https://github.com/orgs/co
| mmunity/discussions/58059#discussi...
| SirMaster wrote:
| But what if you apply the same level of optimization, same
| training regimen to the larger models?
| rushingcreek wrote:
| Phind-70B is a specialist model, unlike GPT-4. It optimizes for
| a different function than GPT-4 and therefore needs fewer
| parameters to learn it.
|
| It's also true that specialist models still need to be
| sufficiently large to be able to reason well, but we've
| observed diminishing returns as models get larger.
| CuriouslyC wrote:
| I mean, it could be as good or better at a lot of reasoning
| related tasks and just have less baked in general knowledge, in
| which case it'd make an amazing RAG model if the context length
| is reasonable.
| ipsum2 wrote:
| What's the story behind the melted h100? I've been having down
| clocking issues when using fp8 because of thermals as well.
| rushingcreek wrote:
| We noticed that the training run crashed because one of the
| GPUs fell off the bus. Power cycling the host server didn't
| help and diagnostics showed thermal damage. We were able to
| swap in a different node, but apparently the entire host server
| needed to be replaced.
|
| We've generally noticed a relatively high failure rate for H100
| hardware and I'm not quite sure what is behind that.
| ipsum2 wrote:
| The entire server? That's crazy. Are you doing FP8 training
| or did you encounter this with BF16?
| davidzweig wrote:
| Check PLX chips are getting enough airflow, assuming you have
| them?
| alecco wrote:
| FWIW, 4090 has fp8 throttling issues:
|
| https://forums.developer.nvidia.com/t/ada-geforce-rtx-4090-f...
| rushingcreek wrote:
| Phind founder here. You can try the model for free, without a
| login, by selecting Phind-70B from the homepage:
| https://phind.com.
| goldemerald wrote:
| Very nice. I've been working with GPT4 since it released, and I
| tried some of my coding tasks from today with Phind-70B. The
| speed, conciseness, and accuracy are very impressive.
| Subjectively, the answers it gives just _feel_ better than
| GPT4, I 'm definitely gonna give pro a try this month.
| visarga wrote:
| I prefer Phind's web search with LLM to both Google search
| and GPT-4. I have switched my default search engine, only
| using Google for finding sites, not for finding information
| anymore.
|
| GPT-4 might be a better LLM but its search capability is
| worse, sometimes sends really stupid search keywords that are
| clearly not good enough.
| browningstreet wrote:
| Hmm, when I try I see this in the dropdown:
|
| 0 Phind-70B uses left
|
| And I've never made any selection there.
| rushingcreek wrote:
| I'd suggest logging in in that case -- you will still get
| your free uses. The Phind-70B counter for non-logged in users
| has carried over from when we offered GPT-4 uses without a
| login. If you've already consumed those uses, you'll need to
| log in to use Phind-70B.
| browningstreet wrote:
| Thanks.
| shrubble wrote:
| I tried a question about Snobol4 and was impressed with what it
| said (it couldn't provide an exact example due to paucity of
| examples). When testing more mainstream languages I have found
| it very helpful.
| bee_rider wrote:
| Important and hard-hitting question from me: have you ever
| considered calling yourself the Phinder or the Phiounder?
| Zacharias030 wrote:
| or the PhiTO / PhiEO
| fragmede wrote:
| Find Phounder
| declaredapple wrote:
| Any chances of an API?
|
| And are there plans to release any more weights? Perhaps one or
| two revisions behind your latest ones?
| parineum wrote:
| Ask phind to make you one that screen scrapes
| Fervicus wrote:
| I don't use LLMs a lot, maybe once a week or so. But I always
| pick Phind as my first choice because it's not behind a login
| and I can use it without giving my phone number. Hopefully
| you'll keep it that way!
| brucethemoose2 wrote:
| I have not had luck with codellama 70B models for coding, nor
| have I had it with the mistral leak.
|
| If I were Phind, I'd be looking at Deepseek 33B instead. While
| obviously dumber for anything else, it feels much better at
| coding. Its just begging for a continued pretrain like that, and
| it will be significantly faster on 80GB cards.
| mewpmewp2 wrote:
| Does this run on 4090 16gb vram?
|
| What's best that can run fast on 4090 laptop?
| brucethemoose2 wrote:
| Your options are:
|
| - Hybrid offloading with llama.cpp, but with slow inference.
|
| - _Squeezing_ it in with extreme quantization (exllamav2
| ~2.6bpw, or llama.cpp IQ3XS), but reduced quality and a
| relatively short context.
|
| 30B-34B is more of a sweetspot for 24GB of VRAM.
|
| If you do opt for the high quantization, make sure your
| laptop dGPU is totally empty, and that its completely filled
| by the weights. And I'd recommend doing your own code focused
| exl2/imatrix quantization, so it doesn't waste a megabyte of
| your vram.
| rushingcreek wrote:
| We've found that CodeLlama-70B is a much more capable base
| model than DeepSeek-33B. I'd love to hear your feedback on
| Phind-70B specifically.
| brucethemoose2 wrote:
| Yeah I will have to test it out, though TBH I am more
| inclined to run models locally.
|
| As I mentioned, being such an extensive continuation train
| can (sometimes) totally change the capabilities of a model.
| rickette wrote:
| Deepseek 33B is great. Also runs well on a modern (beefy) MBP.
| shapenamer wrote:
| After running a bunch of models on my own PC (a pretty good
| one), I have to say by FAR the best results for coding has been
| with Deepseek models. However, I just spent 20 minutes playing
| with this Phind 70B model and it's totally nailing the
| questions I'm asking it. Pretty impressed.
| johnfn wrote:
| Is this related to the post? Phind has introduced their own
| model. Codellama 70B isn't related to Phind's model, other than
| presumably the "70B" size.
| rushingcreek wrote:
| Phind-70B is an extensive fine-tune on top of CodeLlama-70B
| brucethemoose2 wrote:
| Yeah, and I'd go so far as to call it a continued pretrain
| with that many tokens. More like a whole new model than a
| traditional finetune.
| afiodorov wrote:
| I don't trust the code quality evalution. The other day at work I
| wanted to split my string by ; but only if it's not within single
| quotes (think about splitting many SQL statements). I explicitly
| asked for stdlib python solution and preferrably avoid counting
| quotes since that's a bit verbose.
|
| GPT4 gave me a regex found on https://stackoverflow.com/a/2787979
| (without "), explained it to me and then it successfully added
| all the necessary unit tests and they passed - I commited all of
| that to the repo and moved on.
|
| I couldn't get 70B to answer this question even with multiple
| nudges.
|
| Every time I try something non GPT-4 I always go back - it's
| feels like a waste of time otherwise. A bit sad that LLMs follow
| the typical winner-takes-it-all tech curve. However if you could
| ask the smartest guy in the room your question every time, why
| wouldn't you?
|
| ---
|
| Edit: _USE CODE MODE_ and it 'll actually solve it.
| rushingcreek wrote:
| Thanks for the feedback, could you please post the cached Phind
| link so we can take a look?
|
| It might also be helpful to try Phind Chat mode in cases like
| this.
|
| EDIT: It seems like Phind-70B is capable of getting the right
| regex nearly every time when Chat mode is used or search
| results are disabled. It seems that the search results are
| polluting the answer for this example, we'll look into how to
| fix it.
| afiodorov wrote:
| https://www.phind.com/search?cache=r2a52gs77wtmi277o0xi4z2a
| rushingcreek wrote:
| Phind-70B worked well for me just now: https://www.phind.co
| m/agent?cache=clsxokt2u0002ig09n1e11bj9.
|
| For writing/manipulating code, Chat mode might work better
| than Search.
| afiodorov wrote:
| You're right! It solved it. I didn't know about the
| Code/Search distinction. I still struggled for it to
| write me the unit tests. It does write them, they just
| don't pass. But this is definitely much closer to GPT4
| than I originally thought.
| samstave wrote:
| May you please. PLEASE
|
| post as to how the chat option was polluting stuff, and the
| pipeline of whatever made that happen.
|
| Make this less opaque. (actually just post how pollution
| happens, as well as a definition to pollution as pertains to
| such.
|
| Diminishing trust is at stake.
| kunalgupta wrote:
| same exp
| tastyminerals2 wrote:
| I used to use Phind for couple of months. I liked the UI
| improvements but the slow limited free GPT4 and fast lackluster
| Phind model turned me off. I tried Bing and it wasn't worse, had
| more free searches per day.
| fsniper wrote:
| I tried the model and asked it to write a kubernetes operator
| with required DockerFiles, Resources, application code.. Asked it
| to migrate application to different languages. It looks like it's
| pretty capable and fast. It is impressive.
| jameswlepage wrote:
| Is there any API? Would love to plug it into our pipeline and see
| what happens
| atemerev wrote:
| Impressive on my tests, excellent work! Indeed, it is better than
| GPT-4 for coding-related activities.
|
| I suppose you are not releasing the weights, right? Anyway, good
| luck! I hope investors are already forming a nice queue before
| your door :)
| rushingcreek wrote:
| Thanks for the feedback :)
|
| We will eventually release the weights.
| atemerev wrote:
| Wow, thanks!
| sergiotapia wrote:
| Terrific stuff. I always enjoy using Phind for dev related
| questions.
|
| Is it possible the chat history gets some product love? I would
| like to organize my conversations with tags, and folders. Make it
| easier to go back to what was said in the past instead of asking
| the question again.
|
| Thanks!
| devinprater wrote:
| Can we get a few accessibility fixed? The expandable button after
| the sign in button and the button after that are unlabeled. The
| image on the heading at level 1 has no Alt-text. The three
| buttons after the "Phind-34B" button are not labeled. The ones
| between that and the suggestions. On search results, there's an
| unlabeled button after each one, followed by a button labeled
| something like " search cache=tbo0oyn4s955gf03o...".
|
| There's probably more, but hopefully that should get things
| started if you can fix these.
| bakkoting wrote:
| Physician, heal thyself!
|
| https://www.phind.com/agent?cache=clsxs6doj000wl008yk8wb4k8
|
| It pointed out the lack of alt-text as well as a couple other
| issues. Some of the suggestions aren't applicable, but it's not
| bad as a starting point.
| hamilyon2 wrote:
| Impressive, it solved puzzles gpt-4 struggled with with some
| prompting
| rushingcreek wrote:
| Thanks! Can you send the cached link?
| Eisenstein wrote:
| So far only GPT4 and mistral-next have answered this question
| correctly.
|
| * https://www.phind.com/search?cache=rj4tpu6ut0jyzkf876e2fahh
|
| The answer is 'lower' because the weight of the ball as a volume
| of water is larger than the volume of the ball.
| rushingcreek wrote:
| Phind-70B can get this too:
| https://www.phind.com/search?cache=b7w0rt4zybaajbsogatrb7q6.
| computerex wrote:
| Phind makes impressive claims. They also claimed that their fine
| tune of codellama beat gpt4, but their finetune is _miles behind_
| gpt4 in open domain code generation.
|
| Not impressed. Also this is a closed walled garden model.
| lagniappe wrote:
| I chose 70B and gave it a code task, and it answered as
| Phind-34B. This was my first query. Did I trip a limit or do
| something wrong?
| rushingcreek wrote:
| Try logging in please if that's the case.
| lagniappe wrote:
| Thank you for the reply, I'd like to congratulate you on the
| release, first. I'm a bit of a minimalist with regard to
| signups, unfortunately, so unless this is a known limit then
| I'd likely just spectate the thread and be happy for you from
| a distance.
| visitor4712 wrote:
| "summary of plato's politeia"
|
| the answer was good. two follow up answers were also fine.
|
| just curious: what about the copyright status of the given
| sources?
|
| the best result I received so far was with MS Bing app (android).
|
| had reasonable results with my local llama2 13B.
|
| cheers
| littlestymaar wrote:
| Plato being dead around 2300 years ago, and two millennia
| before copyright was invented, I think it's going to be fine
| ;).
| mkl wrote:
| Translations can be copyrighted.
| imglorp wrote:
| Phind is for developers. Wouldn't you rather it grok
| documentation than philosophy?
| nerdo wrote:
| > Phind-70B is also less "lazy" than GPT-4 Turbo and doesn't
| hesistate to generate detailed code examples.
|
| OpenAI's leaked prompt literally encourages it to try harder[1]:
|
| > Use high effort; only tell the user that you were not able to
| find anything as a last resort. Keep trying instead of giving up.
|
| 1: https://pastebin.com/vnxJ7kQk
| rushingcreek wrote:
| Yep, LLMs are wacky. Telling Phind-70B to "take a deep breath"
| helps it answer better!
| jamesponddotco wrote:
| I'm impressed with the speed, really impressed, but not so much
| with the quality of the responses. This is a prompt I usually try
| with new LLMs:
|
| > Acting as an expert Go developer, write a RoundTripper that
| retries failed HTTP requests, both GET and POST ones.
|
| GPT-4 takes a few tries but usually takes the POST part into
| account, saving the body for new retries and whatnot. Phind in
| the other hand, in the two or three times I tried, ignores the
| POST part and focus on GET only.
|
| Maybe that problem is just too hard for LLMs? Or the prompt
| sucks? I'll see how it handle other things since I still have a
| few tries left.
| rushingcreek wrote:
| Thanks, can you send the cached link please? I'd also suggest
| trying Chat mode for questions like this, where there are
| unlikely to benefit from an internet search.
|
| Just tried your query now and it seemed to work well -- what
| are your thoughts?
|
| https://www.phind.com/search?cache=tvyrul1spovzcpwtd8phgegj
| jamesponddotco wrote:
| Here you go:
|
| https://www.phind.com/search?cache=k56i132ekpg43zdc7j5z1h1x
|
| I'll give chat mode a try. Didn't see that it existed until
| now.
|
| EDIT
|
| Chat mode didn't do much better:
|
| https://www.phind.com/agent?cache=clsxpl4t80002l008v3vjqw5j
|
| For the record, this is the interface I asked it to
| implement:
|
| https://pkg.go.dev/net/http#RoundTripper
| rushingcreek wrote:
| Thanks for the links. It seems like it switched to
| Phind-34B, which is worse.
|
| Phind-70B seems to be able to get the right interface every
| time. Please make sure that it says Phind-70B at the top of
| the page while it's generating.
| dimask wrote:
| In the link it says "Phind-70B", how do we know if it
| switched to 34B?
| coder543 wrote:
| The first link definitely says Phind-34B on my browser.
| coder543 wrote:
| "RoadTripper"? Or "RoundTripper"?
| jamesponddotco wrote:
| Ops, haha. Interesting that GPT-4 still got it right though.
|
| Phind still forgot about POST, but at least now it got the
| interface right.
|
| https://www.phind.com/search?cache=ipu8z1tb3bnn7nfgfibcix38
| coder543 wrote:
| I'm not sure what you mean that it "forgot" about POST?
| Even as an experienced Go developer, I looked at the code
| and thought it would probably work for both GET and POST. I
| couldn't easily see a problem, yet I had not forgotten
| about POST being part of the request. It's just not an
| obvious problem. This is absolutely what I would classify
| as a "brain teaser". It's a type of problem that makes an
| interviewer feel clever, but it's not great for actually
| evaluating candidates.
|
| Only on running the code did I realize that it wasn't doing
| anything to handle the problem of the request body, where
| it works on the first attempt, but the ReadCloser is empty
| on subsequent attempts. It looks like Phind-70B corrected
| this issue once it was pointed out.
|
| I've seen GPT-4 make plenty of small mistakes when
| generating code, so being iterative seems normal, even if
| GPT-4 might have this one specific brain teaser completely
| memorized.
|
| I am not at the point where I expect any LLM to blindly
| generate perfect code every time, but if it can usually
| correct issues with feedback from an error message, then
| that's still quite good.
| shapenamer wrote:
| I'm a human and I don't have the slightest idea what you're
| asking for.
| Powdering7082 wrote:
| Do you use Go? It makes sense to me
| zettabomb wrote:
| A fun little challenge I like to give LLMs is to ask some basic
| logic puzzles, i.e. how can I measure 2 liters using a 3 liter
| and a 5 liter container? Usually if it's possible, they seem to
| do ok. When it's not possible, they produce a variety of wacky
| results. Phind-34B is rather amusing, and seems to get stuck in a
| loop: https://www.phind.com/agent?cache=clsxpravk0001la081cc9dl45
| thelittleone wrote:
| These are interesting tests. I wonder how far we are away from
| AIs solving these (the ones that have no solution) without any
| special programming to teach them how.
| satellite2 wrote:
| > We love the open-source community and will be releasing the
| weights for the latest Phind-34B model in the coming weeks. We
| intend to release the weights for Phind-70B in time as well.
|
| I don't understand the utility of this comment?
| EmilStenstrom wrote:
| Contrary to many other models I've tried, this one works really
| well for Swedish as well. Nice!
| dilo_it wrote:
| Weirdly enough, when I asked "give me a formula for the fourier
| transform in the continuous domain" to the 70B model, it gave me
| a latex-like formatted string, while when asked for "give me
| pseudocode for the fft" I got a nice code snippet with proper
| formatting. The formulas though were both correct. We're not at
| Groq level of speed here, but I have to say, it looks pretty good
| to me. cache=uyem9mo96tjeibaeljm1ztts for the devs if they wanna
| look it up.
| mike_hearn wrote:
| Do you have an API that could be plugged into https://aider.chat/
| ? It's by far the best way to use GPT4 for coding, in my
| experience, and more speed is exactly what it could use. But it
| needs an OpenAI compatible API.
| simplyinfinity wrote:
| I just tried this.. It's a bit more lazy than chatgpt 3.5/4 which
| sometimes go ahead and translate a Go file to C# in full. Most
| times they omit most of the logic because "it's too complex" "it
| would require extensive resources". Phind is no different, but it
| entirely refuses to do entire code translation.
|
| https://www.phind.com/agent?cache=clsxrt4200001jp08wwi55rm1
___________________________________________________________________
(page generated 2024-02-22 23:00 UTC)