[HN Gopher] Launch HN: GradientJ (YC W23) - Build NLP Applicatio...
       ___________________________________________________________________
        
       Launch HN: GradientJ (YC W23) - Build NLP Applications Faster with
       LLMs
        
       Hey HN, we're Daniel and Oscar, founders of GradientJ
       (https://gradientj.com), a web application that helps teams
       develop, test, and monitor natural language processing (NLP)
       applications using large language models (LLMs).  Before GradientJ,
       we'd been building NLP applications for 4 years, using transformer
       models like BERT. With the advent of LLMs and their zero-shot/few-
       shot capabilities, we saw the NLP dev cycle get flipped on its
       head. Rather than having to hire an army of data labelers and data
       scientists to fine-tune a BERT model for your use case, engineers
       can now use LLMs, like GPT-4, to build NLP endpoints in minutes.
       As powerful as this is, the problem becomes that without
       appropriate tools for version control, regression testing, and
       ongoing maintenance like monitoring and A/B testing, managing these
       models is a pain. Because the data being evaluated is often fuzzy,
       developers either have to build complex text processing regex
       pipelines or manually evaluate each output before a new release.
       Moreover, if your prompts are only maintained in a notion doc or
       google sheet, completely separate from these tests, it's difficult
       to identify what the changes were that led to underperformance. The
       workflow often devolves into manual and subjective human data
       labeling just to decide if new versions of your model are "good
       enough" to deploy.  GradientJ is a web application and API to
       address that. We let you iterate on prompts, automatically
       regression test them along multiple dimensions, and finally manage
       them once deployed.  You'd think these are pretty straightforward
       things to build, but we've noticed most versions of "LLM management
       apps" focus on organizing the workflow for these components without
       dramatically improving on automating them. At the end of the day,
       you still have to pass your side-by-side prompt comparison through
       the "eye-ball test" which creates processes that are bottlenecked
       by human time. We think by using the very same technology, NLP, you
       can dramatically reduce the developer labor required for each of
       these steps.  Here's how we do it:  For prompt iteration, rather
       than just a text-editor "playground" with some special syntax to
       delineate variables, we're trying to use large language models to
       create a Copilot-like experience for prompt engineering. This means
       aggregating all the tricks of prompt engineering behind a smart LLM
       assistant who can suggest ways to restructure your prompt for
       better output. For example, when someone just wants their output in
       JSON form, we know where to inject the appropriate text to nudge
       the model towards generating JSON. When combined with our
       regression testing API, those prompt suggestions will actually be
       based on the specific dimensions of prompt underperformance. The
       idea is that the changes required to make a prompt's output follow
       a certain structure are different from the ones you'd make to have
       the output follow a certain tone.  When it comes to testing, even
       before LLMs, configuring high quality tests for expressive NLP
       models has historically been hard. To compare anything more
       complicated than classification labels, most people resort to raw
       fuzzy string comparisons, or token distribution differences between
       the output. We're trying to make automated NLP testing more
       objective by using LLMs to actually power our regression testing
       API. We use NLP models to provide comparisons between text outputs
       along custom dimensions like "structure", "semantics", and "tone".
       This means before you deploy the latest version of your email
       generation model, you know where it stands along each of the
       discrete dimensions you care about. Additionally, this helps
       prevent your prompt engineering from becoming a game of "whack-a-
       mole":overfitting your prompt on the handful of examples you can
       copy and paste while developing.  For deployment, we provide a
       stable API that always goes to the latest iteration of a prompt
       you've chosen to deploy. This means you can push updates over-the-
       air without having to change the API code. At the same time, we're
       tracking the versions used for inference under the hood. This lets
       you use that data to further improve your regression tests,
       experiment with fine-tuning across other providers or open source
       models, or set up alerts around prompt performance.  Each of these
       pieces of our product can be used in isolation or all together,
       depending on what the rest of your NLP infrastructure looks like.
       If you use LLMs and are looking for ways to improve your workflow,
       or if you need to build NLP applications fast and want to bypass
       the traditional slow data labeling process, we'd love your
       feedback!
        
       Author : IVCrush
       Score  : 21 points
       Date   : 2023-04-04 19:56 UTC (3 hours ago)
        
       | antonioevans wrote:
       | This application is highly intriguing. It holds potential as an
       | excellent instrument for experimenting with models and fine-
       | tuning them. However, the $500 price tag for simply trying it out
       | is excessively expensive and inhibits accessibility. I cannot
       | even test things you had in your video.
        
         | ttul wrote:
         | If their target market is enterprise, $500 for trying it out is
         | not going to be a huge barrier. Perhaps their strategy is to
         | ensure that the people trying out their app are real buyers?
        
         | IVCrush wrote:
         | First time we're really opening up access so still iterating on
         | what's open to everyone.
         | 
         | Happy to get give you and anyone else full access if you shoot
         | an email to: oscar at gradientj.com
        
       | ccooffee wrote:
       | I have no experience with LLMs, so here's some website feedback
       | to be taken with a chunk of salt:
       | 
       | 1. The Youtube video at the bottom of your page is very tiny and
       | cannot be fullscreened without first loading the video in
       | Youtube. The video itself merely shows some basic-seeming
       | workflows with some (to-me) terrible background music. The video
       | does not seem to emphasize anything. It's just...wandering around
       | a web application...
       | 
       | 2. Between reading the webpage content and watching the video, I
       | don't have a good idea of what you are actually offering as a
       | product and why it is so valuable. The pitch summary in this HN
       | post is much more helpful than your website.
       | 
       | 3. Your website is not very accessible at the moment due to low
       | contrast and overuse of opacity for style. I can barely
       | understand what your images are attempting to convey. Your app
       | doesn't appear very accessible according to the Youtube video,
       | again due to low contrast among colors.
        
         | mcconaughey wrote:
         | Daniel, co-founder of GradientJ here!
         | 
         | Appreciate the feedback. We have a more detailed demo video
         | that is linked in the YT description. Will be sure the
         | resolution is adequate.
         | 
         | We will definitely work on improving the website and demos.
         | Want it to be easily accessible for everybody.
        
         | danvayn wrote:
         | Don't discount yourself; your feedback is true regardless of
         | LLM experience.
        
       | KRAKRISMOTT wrote:
       | Where do we upload/download the actual LLM models? What's your
       | privacy policy on the finetuned deltas?
        
         | IVCrush wrote:
         | Since most of our early users are just using foundational LLM
         | models over API (like OpenAI models), we're still working on
         | the best way to manage uploading custom weights and NLP models.
         | However, for users that need it asap, we can upload and
         | download fine-tuned weights/architectures manually.
         | 
         | In terms of privacy policy, we haven't had many users doing
         | much with fine-tuned deltas, but we think of it the same way we
         | think of all model data: All inference and benchmarking data
         | belongs to the user and we don't aggregate it across other
         | users or shared between orgs.
        
       ___________________________________________________________________
       (page generated 2023-04-04 23:00 UTC)