[HN Gopher] Automatic Generation of Visualizations and Infograph...
___________________________________________________________________
Automatic Generation of Visualizations and Infographics with LLMs
Author : monkeydust
Score : 154 points
Date : 2023-08-29 09:30 UTC (13 hours ago)
(HTM) web link (microsoft.github.io)
(TXT) w3m dump (microsoft.github.io)
| dylanjcastillo wrote:
| Super cool.
|
| Here the viz-related prompts (generation, editing, etc), for
| those interested:
| https://github.com/microsoft/lida/tree/main/lida/components/...
|
| I built a tool that lets you use GPT to analyze data and build
| interactive graphs on the browser
| (https://deepsheet.dylancastillo.co/). I may try to adapt it to
| use LIDA or a similar approach.
| ilovefood wrote:
| I took this a step further, turning charts into infographics
| using stable diffusion (SDXL)
|
| https://karimjedda.com/beautiful-data-visualizations-powered...
| w-m wrote:
| That's a really nice idea. Have you thought about making it a
| product?
| ilovefood wrote:
| Thank you for the kind words. I wouldn't even know where to
| start. There's a lot I can code and do, but making something
| into a product is something I have no experience with.
|
| What's your recommended approach? Even if open-source,
| curious to learn.
| thundergolfer wrote:
| If the product got good enough you'd have a shot at selling
| to Canva.
| aliyeysides wrote:
| Continue as an open source project by building features and
| refining the product. Once you have enough users you can
| start offering premium features and support to
| organizations. Maybe even apply to an accelerator? Good
| Luck!
| ilovefood wrote:
| Thanks for the positivity, I'll give it a shot.
| ukuina wrote:
| I would pay $5 for the occasional, chart-heavy month to
| upload my graphs without proprietary detail and get a few
| images to reroll through.
| minimaxir wrote:
| The technology for this type of generation (ControlNet) is
| already open source and relatively straightforward to
| reproduce the charts demoed in that post without shenanigans.
|
| There's no moat.
| esafak wrote:
| It's chartjunk and should be used sparingly (e.g., on covers)
| rather than the actual content. I think it would be used
| frivolously in the hands of an undisciplined person.
|
| https://en.wikipedia.org/wiki/Chartjunk
| RH123 wrote:
| I find the quality of the code really questionable:
|
| system_prompt = """ You are a an experienced data analyst that
| can annotate datasets. Your instructions are as follows: i)
| ALWAYS generate the name of the dataset and the
| dataset_description ii) ALWAYS generate a field description. iv.)
| ALWAYS generate a semantic_type (a single word) for each field
| given its values e.g. company, city, number, supplier, location,
| gender, longitude, latitude, url, ip address, zip code, email,
| etc You return an updated JSON dictionary without any preamble or
| explanation. """
|
| Some spelling errors, and where is number 3?
| ukuina wrote:
| All you need is attention?
| mnky9800n wrote:
| In the horsepower versus mpg example, every generated result
| creates new data on the plot that isn't there in the original
| plot. this is terrible.
| itronitron wrote:
| They should get rid of the infographer module as it really
| undermines the rest of the work.
| jgalt212 wrote:
| I sort of know what I'm doing with data, so I don't want LLMs
| building any models for me, but I do like the concept of making
| my lame visualizations look more professional and slicker.
| dontupvoteme wrote:
| Ironically I've found GPT to be pretty terrible with plotting
| libraries like plotly/dash or even matplotlib compared to just
| about anything else in python.
| pplonski86 wrote:
| I wrote a simple wrapper around Matplotlib and
| ChatGPT-3.5-turbo. The LLM response is a Python code that is
| executed to get charts. It is working very nice. Here is a repo
| https://github.com/mljar/plotai - you will find two videos in
| the readme. Maybe you should work on your prompts?
| dontupvoteme wrote:
| Huh, neat. It never really bothered me enough/was important
| enough to spend specifically time on it since i was able to
| just hit a button to send them back for fixing but that's
| good to see the extra passes aren't explicitly needed.
| javajosh wrote:
| No, absolutely not. How can you trust the output from such a
| black box system? Who is to say that the LLM won't add or remove
| data points to make the chart "look good"? Heaven help us if
| decision makers start taking this output seriously. But of course
| they will, because the charts will look professional and
| plausible, because that's what the prompt requires.
| pizza wrote:
| How do you trust matplotlib? Same way: if you need to audit
| plots, audit the generated source code.
| esafak wrote:
| So instead of auding MPL once (or never because MPL doesn't
| have a habit of broken output) I should audit the output of
| this LLM for every query because it _does_ have a habit of
| hallucinating?
| vimesy wrote:
| "You are a helpful assistant highly skilled in writing PERFECT
| code for visualizations. Given some code template, you complete
| the template to generate a visualization given the dataset and
| the goal described. The code you write MUST FOLLOW
| VISUALIZATION BEST PRACTICES ie. meet the specified goal, apply
| the right transformation, use the right visualization type, use
| the right data encoding, and use the right aesthetics (e.g.,
| ensure axis are legible). The transformations you apply MUST be
| correct and the fields you use MUST be correct. The
| visualization CODE MUST BE CORRECT and MUST NOT CONTAIN ANY
| SYNTAX OR LOGIC ERRORS. You MUST first generate a brief plan
| for how you would solve the task e.g. what transformations you
| would apply e.g. if you need to construct a new column, what
| fields you would use, what visualization type you would use,
| what aesthetics you would use, etc. YOU MUST ALWAYS return code
| using the provided code template. DO NOT add notes or
| explanations." (https://github.com/microsoft/lida/blob/main/lid
| a/components/...)
|
| They prompted that things MUST be correct in their prompts and
| it reports any transformations it does to your data, it might
| give you some insight into its logic to test yourself against
| the data.
| computerex wrote:
| Telling the LLM that it must do something is not a guarantee
| that it'll follow through.
| vykthur wrote:
| True. This is an open area of research. Tools like guidance
| (or other implementations of constrained decoding with llms
| [1,2]) will likely help improve this problem.
|
| [1] A guidance language for controlling large language
| models. https://github.com/guidance-ai/guidance
|
| [2] Knowledge Infused Decoding
| https://arxiv.org/abs/2204.03084
| phillipcarter wrote:
| > Who is to say that the LLM won't add or remove data points to
| make the chart "look good"?
|
| I don't think you're thinking creatively enough here. A good
| system that makes use of these concept (because it's a research
| project, not a product!) will likely ensure that actions the
| LLM takes are non-destructive and inherently undoable. For
| example, if the underlying data was changed by the LLM, you can
| statically verify that and show a warning, emit an error, or
| ... something else entirely!
| lmeyerov wrote:
| Agreed. Our customers on the regulated side cannot use an
| unexplainable UI like that _by law_.
|
| We take a middle ground with louie.ai of showing the database
| queries, data transforms, chart config, and any other decision
| or generation. It's nice being able to watch & check each step
| and then write in natural language what you want changed, so
| ends up feeling more like the easier side of pair programming
| than a blackbox.
| mdorazio wrote:
| Am I missing something here? From the video and examples this
| looks like it's helping you make Excel charts with less suck
| (slightly stylized), not really building what I would consider
| "infographics" in the traditional marketing sense. I guess it
| counts as visualizations, but not what I was expecting.
| [deleted]
| processing wrote:
| Weird landing page - I'm expecting to see infographic examples
| but the images I see are of people looking at screens and smiling
| 2devnull wrote:
| This would really be something if you could just give it voice
| commands. Typing is absurd amidst all this wonderful automation!
| f6v wrote:
| Uncanny fingers.
| nologic01 wrote:
| Excellent technical work but subject to the same _major_
| questionmarks around the morality and legality of LLM business
| models. From the discussion section:
|
| > Low Resource Grammars: ... LIDA depends on the underlying LLM s
| having some knowledge of visualization grammars as represented in
| text and code in its training dataset (e.g., examples of Altair,
| Vega, Vega-Lite, GGPLot, Matplotlib, represented in Github,
| Stackoverflow, etc.). For visualization grammars not well
| represented in these datasets (e.g., tools like Tableau, PowerBI,
| etc., that have graphical user interfaces as opposed to code
| representations) ), the performance of LIDA may be limited
| without additional model fine-tuning or translation.
|
| In other words, open source programmatic visualizations are
| required to feed the LLM, which then can, e.g., be licensed to
| corporates to accelerate various internal exploratory data
| analyses. A win-win for corporates and LLM providers.
|
| Spot the loser.
| Incolotopo wrote:
| And what in particular is now novel or unique to the general
| 'issue' you mention?
|
| Most companies use OpenSource in one way or the other.
|
| Nonetheless, a company like MS has probably already build
| visualizers purely commercially (see excel) or/and is absolutly
| able to write it themselfs.
| nologic01 wrote:
| I am not sure what you are talking about.
|
| If I release a novel visualization library on github under
| some open source license I want it to be attributed to me. I
| don't want some specialized LLM to be lifting and offering
| the same visualization concepts to unnamed corporates for a
| hefty fee without me ever even knowing about it and these
| corporates pretending they don't know where that concept is
| coming from.
|
| It is you choice whether you think that is a problem and how
| "novel" it is. Theft after all has a very long history.
| cooperaustinj wrote:
| Everything you described been possible since the dawn of
| intellectual property. Just replace LLM with "person".
|
| Furthermore, it isn't theft to learn from others' work and
| reproduce similar qualities.
| nologic01 wrote:
| Possible is not the same as admissible.
|
| Good to know that the prevailing commercial tech culture
| now sees plagiarism and stealing ideas without
| attribution as the modern way of doing business and hopes
| that dressing things up under some algorithmic veil will
| hide the act.
|
| I guess the pit of moral decline has no bottom. The
| consolation is that theft has never been the road to
| wealth. Once the plundering is over the only thing that
| is left is a wasteland.
|
| It seems that Microsoft has finally found a way to kill
| the open source "cancer".
| andybak wrote:
| I'm afraid I'm just unclear on exactly what part of this
| you argue is crossing a moral line.
|
| I.e. what is being stolen without attribution? I'm
| genuinely not getting what you mean in this specific
| case.
| nologic01 wrote:
| Limited visualization grammar means that any non-trivial
| visualization request will be lifting a particular
| solution, more or less verbatim.
| ryanklee wrote:
| I don't see how it's possible to show that the solution
| is lifted by the LLL as opposed to a arrived at by the
| LLM.
|
| It seems to me that such solutions are soon to be within
| the set potentially constructed by an LLM.
| nologic01 wrote:
| As they say, people are unwilling to understand something
| if their monetary gain depends on not understanding it.
|
| Let me break it down for you. If I ask for a
| visualization that squares the circle and there is one
| repo that has an example of squaring the circle, the LLM
| will "arrive" at a way of squaring the circle.
| ryanklee wrote:
| That's not really answering my question.
|
| If (1) an LLM is able to arrive at solutions in the same
| class of difficulty as the solution for the target
| problem and (2) it's not possible to establish the
| provenance of the solution actually offered by the LLM,
| then what's the argument for assuming that the solution
| is based on IP rather than constructive reasoning?
| monkeydust wrote:
| Was playing with the library this morning, the interesting part
| to me was the 'goal explorer' which generates the questions to
| asks of the data.
|
| Keen to see more research into this part specially making the
| questions more specific to the dataset in question and overlaying
| real-world situations.
| w-m wrote:
| Last week I helped someone organizing and analyzing their data in
| Excel. As I'm using Excel only once every couple of years, I had
| to rewatch the wonderful "You Suck at Excel with Joel Spolsky" to
| be productive again. Now seeing this announcement page, I was
| immediately reminded of the mini-rant towards the end of the
| video [0]:
|
| > On average, once every three months, there's a startup that
| makes a thing that they say is going to be amazing, and it's just
| PivotTables. They're like, "It works with Excel, and it does this
| amazing consolidation, and slicing and dicing of all your data,
| and it's amazing, and we're going to make a startup. I'm going to
| sell this for four hundred ninety-five dollars." And that happens
| at least once every three months. The trouble is, the VCs usually
| know about PivotTables.
|
| Of course this product goes a little further, making suggestions
| what columns to analyze and chart with an LLM. But it's quite
| funny to me that this Microsoft Research product is reinventing
| the PivotTable (+PivotChart) part with Python and Pandas.
|
| [0]: https://youtu.be/0nbkaYsR94c?si=kkfFHZ_fyGmG3Lnj&t=2988
| beebmam wrote:
| It's impossible to use excel for big data, the application has
| soft limits due to responsiveness.
| Incolotopo wrote:
| To be nitpicking here: Its not reinventing if its new.
|
| And the focus of this research was probably not to invent
| PivotTables but the Interface for these through LLMs
| jgalt212 wrote:
| > Interface for these through LLMs
|
| no code pivot tables?
| smcleod wrote:
| Pivollama
| bugglebeetle wrote:
| Even Microsoft has to know Excel is shit software for large
| datasets. I can't even get it to do a VLOOKUP correct half the
| time.
| dylan604 wrote:
| But nobody today is going to read let alone promote a blog
| about pivot tables. Sprinkle in LLM references, and the fad
| wave riders will sing its praises
___________________________________________________________________
(page generated 2023-08-29 23:01 UTC)