[HN Gopher] Batch computing and the coming age of AI systems
       ___________________________________________________________________
        
       Batch computing and the coming age of AI systems
        
       Author : headalgorithm
       Score  : 112 points
       Date   : 2023-04-16 11:19 UTC (11 hours ago)
        
 (HTM) web link (hazyresearch.stanford.edu)
 (TXT) w3m dump (hazyresearch.stanford.edu)
        
       | jsemrau wrote:
       | Fancy. Apparently I am using foundation models since 2015 for
       | batch processing. TBH. This article is a bit light on details. I
       | would have expected more from a post published on a Stanford
       | site. Sounds more like a Business Insider article.
        
         | ShamelessC wrote:
         | Wasn't the phrase foundation models just coined like 2 years
         | ago?
        
         | helsontaveras18 wrote:
         | Agreed. I was a little surprised when they said batch
         | processing for financial systems was a use case for
         | foundational models. Financial data requires a high degree of
         | accuracy and graceful handling of many edge cases.
         | 
         | But then they explained that one could use these models to
         | generate the code that would process the financial data.
         | 
         | Sounds interesting, but yes the question is how do you validate
         | this? Do humans write the test cases to ensure that, for
         | example, ACH files are being processed accordingly? What about
         | edge case detection? Self-correction in real-time? Many
         | questions left unanswered.
        
           | mike_hearn wrote:
           | The financial system is run off batch jobs, so that's not a
           | problem. Banking has long experience with automating messy
           | previously manual systems. Think message queues everywhere
           | with diversion to teams of humans to patch up data entry
           | errors. This is one reason why banking still isn't as 24/7 as
           | you might expect.
        
       | stoniejohnson wrote:
       | Am I the only one that feels like this article really doesn't say
       | anything at all?
       | 
       | Sure most of the work in the computing world is done by scripts
       | (written by humans).
       | 
       | Sure if an AI can magically write/manage all those scripts that
       | would be great.
       | 
       | I don't really see anything beyond someone creating an artificial
       | divide in computing that I don't really see as relevant ("batch"
       | vs using a literal app with an AI feature).
       | 
       | Of course most computing doesn't require a person to be
       | interacting with it.
       | 
       | I'm being genuine here; please clarify if I'm just lacking
       | context.
        
         | baxtr wrote:
         | I'm just waiting for a fancy marketing term to appear for this.
         | 
         | And then big cos will jump the bandwagon and we'll have a new
         | trend!
        
           | visarga wrote:
           | I think it is a good idea to give users batch mode, get 10x
           | lower price in exchange for higher latency. If you want to
           | parse millions of documents it is really useful.
        
           | SanderNL wrote:
           | AI-Enhanced Batch Processing (AIEBP)
           | 
           | IntelliBatch: Intelligent Batch Processing
           | 
           | AutomaBatch: Automated Batch Processing with AI
           | 
           | I'm getting some real good vibes here. We're getting close.
           | I'm sure actual enterprise actors can come up with something
           | even worse.
        
             | tudorw wrote:
             | maigic
        
               | peterpost2 wrote:
               | Think you won the contest here.
        
           | precompute wrote:
           | Call them "AI Managers". They manage AI, but they're not
           | managers that are AI, hence securing high-paid managerial
           | jobs. Checkmate.
        
             | moonchrome wrote:
             | I prefer AI whisperer. Manager sounds so pedestrian.
        
               | Nevermark wrote:
               | AI Conductor?
        
         | visarga wrote:
         | > Of course most computing doesn't require a person to be
         | interacting with it.
         | 
         | Most LLM output is worthless before validation. Be it code,
         | problem solving or question answering, they can all be wrong at
         | any time.
        
         | jjtheblunt wrote:
         | I was going to say the same thing, and then I also noticed that
         | since the popularity of the transformer model, there sure seems
         | to be a huge influx of Stanford articles. I wonder if that's
         | coincidence.
        
         | mrighele wrote:
         | I think what they mean is that for now most of the usage of
         | things Chatgpt, mid journey etc has a human in the loop that
         | helps getting the right result. For example plenty of people
         | post AI generated images on Twitter. This works because they
         | filter and chose what to post among a lot of generated images.
         | If Twitter decided that every single post gets an automatically
         | generated image, first of all they would have to find a way to
         | lower the immense amount of computed power, and secondly to
         | guarantee that every single time the result is valid and
         | consistent in quality.
        
         | civilized wrote:
         | What I got out of it:
         | 
         | 1. What if we used foundation models to write code?
         | 
         | 2. But the code might not work...
        
           | lumost wrote:
           | Humans write broken code all the time. Why would we expect a
           | transformer model to get it right on the first shot?
           | shouldn't the model be able to write code, build, test, and
           | iterate?
        
             | civilized wrote:
             | You could test that yourself. Ask the model to code, build,
             | test, and iterate. Does it do a better job?
        
               | mr_toad wrote:
               | The lack of proper memory hampers current AI. They can't
               | really test and iterate. Until we have an AI that can
               | truly learn on the job I doubt we'll be replacing many
               | jobs.
        
         | mcbuilder wrote:
         | Looks like a quick blog post by a PI outlining 3 of their lab's
         | projects and giving a quick overview of their research
         | directions.
         | 
         | I wouldn't take it as an informational article in the sense
         | you're looking for. It's more like a press release, and pretty
         | common in academia to release these fluff blog posts.
        
           | j33zusjuice wrote:
           | https://en.m.wikipedia.org/wiki/Principal_investigator
           | 
           | For anyone who doesn't know what a PI is (I didn't when I
           | first got a job at a university).
        
       | varelse wrote:
       | [dead]
        
       | paulsutter wrote:
       | ETL is a great use case for LLMs, and probably half of what
       | they're actually referring to with the vague phrase "batch
       | processing"
       | 
       | It makes a lot of sense to use an LLM to understand data then
       | generate code to do mass extract, but the article is really
       | oblique and it's hard to tell what's they're doing
        
         | whoevercares wrote:
         | ChatGPT can already do the whole ETL thing given an intent or
         | profile, detect data quality issues. It just hasn't fit into
         | batch processing yet. But on the other hands, do you really
         | need batch processing to understand data? Wouldn't a sample
         | work for a whole lot of cases already?
        
         | crabbone wrote:
         | ETL and batch processing have little to do with each other.
         | 
         | Batch processing is a term from HPC systems, which are
         | typically multi-user, and have to share computing resources by
         | means of a workload manager (eg. Slurm, PBS, UGE etc.) Workload
         | managers function by defining queues, queues have associated
         | resources (eg. number of CPUs, particular network hardware,
         | accelerators, s.a. GPUs and so on). Users interact with
         | workload managers by scheduling execution of their programs on
         | particular queues (because they want the resources associated
         | with those queues).
         | 
         | The interaction between the user and the workload manager is
         | described as "submission of a batch job", where this is
         | contrasted with running a program interactively (s.t. the user
         | can simply start the program and observe its behavior right
         | away). I believe, these are the batch jobs the article is
         | talking about. These kinds of jobs are very common in HPC, and
         | I'd imagine in ML performed on HPC resources.
        
           | prpl wrote:
           | Batch processing predates HPC. It goes back to mainframes -
           | it's just that HPC had a common lineage there. Like you said,
           | it's an unattended job that runs.
           | 
           | That's what it's getting at- setting up routines for
           | unattended execution in batch.
           | 
           | BTW Chris Re did stuff with UW Madison /HTCondor and
           | definitely knows what batch computing is
        
       | up2isomorphism wrote:
       | OK, so ppt scientists in Stanford agree with Elon, big deal.
        
       | AndrewKemendo wrote:
       | Computing infrastructure and UX is batch by default.
       | 
       | We don't generally build or interact with continuous control
       | systems, with one exception: Organic things
       | 
       | So humans, pets, "nature" are all "streaming" systems because
       | they don't have a set of discrete states to move between and
       | adjust to feedback loops. You can reduce or segment organic
       | action into states, but this is artificial compression.
       | 
       | The future IMO are streaming systems (with and
       | MDP/REPL/OODA/RL... etc controls), as batch isn't the way
       | intelligent systems actually behave.
        
       | whoevercares wrote:
       | Absolutely the right direction, we need FM to work at Spark scale
        
       | jerrygenser wrote:
       | Are all pretrained LM models like BERT considered foundational
       | models?
        
         | liliumregale wrote:
         | Yep! Percy Liang in an interview with Chris Potts said he sees
         | BERT and ELMo as foundation models.
        
       | haha897 wrote:
       | [flagged]
        
       | visarga wrote:
       | I've been implementing batch processing for information
       | extraction and schema matching with GPT3.5 and it works decently,
       | but still has about 10% error rate, especially when extracting
       | many fields from a document. It is also expensive to use on
       | millions of documents. The solution I see is distilling in
       | smaller models.
       | 
       | The problem I am facing now is how to filter out the mislabeled
       | training data generated with GPT3, also around 10%. Manual
       | validation is out of the question at this scale, and it seems to
       | be the most interesting training data if I could have it
       | validated. I tried GPT3 fixing itself, only works partially. What
       | I did was to train a model and use that model to rank my data,
       | and set a threshold. That's how I got the cutoff around 10%. I
       | can use the easy part, but what to do with the interesting
       | boundary region data that might be mislabeled?
        
       | haha897 wrote:
       | [flagged]
        
       ___________________________________________________________________
       (page generated 2023-04-16 23:01 UTC)