[HN Gopher] Batch computing and the coming age of AI systems
___________________________________________________________________
Batch computing and the coming age of AI systems
Author : headalgorithm
Score : 112 points
Date : 2023-04-16 11:19 UTC (11 hours ago)
(HTM) web link (hazyresearch.stanford.edu)
(TXT) w3m dump (hazyresearch.stanford.edu)
| jsemrau wrote:
| Fancy. Apparently I am using foundation models since 2015 for
| batch processing. TBH. This article is a bit light on details. I
| would have expected more from a post published on a Stanford
| site. Sounds more like a Business Insider article.
| ShamelessC wrote:
| Wasn't the phrase foundation models just coined like 2 years
| ago?
| helsontaveras18 wrote:
| Agreed. I was a little surprised when they said batch
| processing for financial systems was a use case for
| foundational models. Financial data requires a high degree of
| accuracy and graceful handling of many edge cases.
|
| But then they explained that one could use these models to
| generate the code that would process the financial data.
|
| Sounds interesting, but yes the question is how do you validate
| this? Do humans write the test cases to ensure that, for
| example, ACH files are being processed accordingly? What about
| edge case detection? Self-correction in real-time? Many
| questions left unanswered.
| mike_hearn wrote:
| The financial system is run off batch jobs, so that's not a
| problem. Banking has long experience with automating messy
| previously manual systems. Think message queues everywhere
| with diversion to teams of humans to patch up data entry
| errors. This is one reason why banking still isn't as 24/7 as
| you might expect.
| stoniejohnson wrote:
| Am I the only one that feels like this article really doesn't say
| anything at all?
|
| Sure most of the work in the computing world is done by scripts
| (written by humans).
|
| Sure if an AI can magically write/manage all those scripts that
| would be great.
|
| I don't really see anything beyond someone creating an artificial
| divide in computing that I don't really see as relevant ("batch"
| vs using a literal app with an AI feature).
|
| Of course most computing doesn't require a person to be
| interacting with it.
|
| I'm being genuine here; please clarify if I'm just lacking
| context.
| baxtr wrote:
| I'm just waiting for a fancy marketing term to appear for this.
|
| And then big cos will jump the bandwagon and we'll have a new
| trend!
| visarga wrote:
| I think it is a good idea to give users batch mode, get 10x
| lower price in exchange for higher latency. If you want to
| parse millions of documents it is really useful.
| SanderNL wrote:
| AI-Enhanced Batch Processing (AIEBP)
|
| IntelliBatch: Intelligent Batch Processing
|
| AutomaBatch: Automated Batch Processing with AI
|
| I'm getting some real good vibes here. We're getting close.
| I'm sure actual enterprise actors can come up with something
| even worse.
| tudorw wrote:
| maigic
| peterpost2 wrote:
| Think you won the contest here.
| precompute wrote:
| Call them "AI Managers". They manage AI, but they're not
| managers that are AI, hence securing high-paid managerial
| jobs. Checkmate.
| moonchrome wrote:
| I prefer AI whisperer. Manager sounds so pedestrian.
| Nevermark wrote:
| AI Conductor?
| visarga wrote:
| > Of course most computing doesn't require a person to be
| interacting with it.
|
| Most LLM output is worthless before validation. Be it code,
| problem solving or question answering, they can all be wrong at
| any time.
| jjtheblunt wrote:
| I was going to say the same thing, and then I also noticed that
| since the popularity of the transformer model, there sure seems
| to be a huge influx of Stanford articles. I wonder if that's
| coincidence.
| mrighele wrote:
| I think what they mean is that for now most of the usage of
| things Chatgpt, mid journey etc has a human in the loop that
| helps getting the right result. For example plenty of people
| post AI generated images on Twitter. This works because they
| filter and chose what to post among a lot of generated images.
| If Twitter decided that every single post gets an automatically
| generated image, first of all they would have to find a way to
| lower the immense amount of computed power, and secondly to
| guarantee that every single time the result is valid and
| consistent in quality.
| civilized wrote:
| What I got out of it:
|
| 1. What if we used foundation models to write code?
|
| 2. But the code might not work...
| lumost wrote:
| Humans write broken code all the time. Why would we expect a
| transformer model to get it right on the first shot?
| shouldn't the model be able to write code, build, test, and
| iterate?
| civilized wrote:
| You could test that yourself. Ask the model to code, build,
| test, and iterate. Does it do a better job?
| mr_toad wrote:
| The lack of proper memory hampers current AI. They can't
| really test and iterate. Until we have an AI that can
| truly learn on the job I doubt we'll be replacing many
| jobs.
| mcbuilder wrote:
| Looks like a quick blog post by a PI outlining 3 of their lab's
| projects and giving a quick overview of their research
| directions.
|
| I wouldn't take it as an informational article in the sense
| you're looking for. It's more like a press release, and pretty
| common in academia to release these fluff blog posts.
| j33zusjuice wrote:
| https://en.m.wikipedia.org/wiki/Principal_investigator
|
| For anyone who doesn't know what a PI is (I didn't when I
| first got a job at a university).
| varelse wrote:
| [dead]
| paulsutter wrote:
| ETL is a great use case for LLMs, and probably half of what
| they're actually referring to with the vague phrase "batch
| processing"
|
| It makes a lot of sense to use an LLM to understand data then
| generate code to do mass extract, but the article is really
| oblique and it's hard to tell what's they're doing
| whoevercares wrote:
| ChatGPT can already do the whole ETL thing given an intent or
| profile, detect data quality issues. It just hasn't fit into
| batch processing yet. But on the other hands, do you really
| need batch processing to understand data? Wouldn't a sample
| work for a whole lot of cases already?
| crabbone wrote:
| ETL and batch processing have little to do with each other.
|
| Batch processing is a term from HPC systems, which are
| typically multi-user, and have to share computing resources by
| means of a workload manager (eg. Slurm, PBS, UGE etc.) Workload
| managers function by defining queues, queues have associated
| resources (eg. number of CPUs, particular network hardware,
| accelerators, s.a. GPUs and so on). Users interact with
| workload managers by scheduling execution of their programs on
| particular queues (because they want the resources associated
| with those queues).
|
| The interaction between the user and the workload manager is
| described as "submission of a batch job", where this is
| contrasted with running a program interactively (s.t. the user
| can simply start the program and observe its behavior right
| away). I believe, these are the batch jobs the article is
| talking about. These kinds of jobs are very common in HPC, and
| I'd imagine in ML performed on HPC resources.
| prpl wrote:
| Batch processing predates HPC. It goes back to mainframes -
| it's just that HPC had a common lineage there. Like you said,
| it's an unattended job that runs.
|
| That's what it's getting at- setting up routines for
| unattended execution in batch.
|
| BTW Chris Re did stuff with UW Madison /HTCondor and
| definitely knows what batch computing is
| up2isomorphism wrote:
| OK, so ppt scientists in Stanford agree with Elon, big deal.
| AndrewKemendo wrote:
| Computing infrastructure and UX is batch by default.
|
| We don't generally build or interact with continuous control
| systems, with one exception: Organic things
|
| So humans, pets, "nature" are all "streaming" systems because
| they don't have a set of discrete states to move between and
| adjust to feedback loops. You can reduce or segment organic
| action into states, but this is artificial compression.
|
| The future IMO are streaming systems (with and
| MDP/REPL/OODA/RL... etc controls), as batch isn't the way
| intelligent systems actually behave.
| whoevercares wrote:
| Absolutely the right direction, we need FM to work at Spark scale
| jerrygenser wrote:
| Are all pretrained LM models like BERT considered foundational
| models?
| liliumregale wrote:
| Yep! Percy Liang in an interview with Chris Potts said he sees
| BERT and ELMo as foundation models.
| haha897 wrote:
| [flagged]
| visarga wrote:
| I've been implementing batch processing for information
| extraction and schema matching with GPT3.5 and it works decently,
| but still has about 10% error rate, especially when extracting
| many fields from a document. It is also expensive to use on
| millions of documents. The solution I see is distilling in
| smaller models.
|
| The problem I am facing now is how to filter out the mislabeled
| training data generated with GPT3, also around 10%. Manual
| validation is out of the question at this scale, and it seems to
| be the most interesting training data if I could have it
| validated. I tried GPT3 fixing itself, only works partially. What
| I did was to train a model and use that model to rank my data,
| and set a threshold. That's how I got the cutoff around 10%. I
| can use the easy part, but what to do with the interesting
| boundary region data that might be mislabeled?
| haha897 wrote:
| [flagged]
___________________________________________________________________
(page generated 2023-04-16 23:01 UTC)