[HN Gopher] Automatic Generation of Visualizations and Infograph...
       ___________________________________________________________________
        
       Automatic Generation of Visualizations and Infographics with LLMs
        
       Author : monkeydust
       Score  : 154 points
       Date   : 2023-08-29 09:30 UTC (13 hours ago)
        
 (HTM) web link (microsoft.github.io)
 (TXT) w3m dump (microsoft.github.io)
        
       | dylanjcastillo wrote:
       | Super cool.
       | 
       | Here the viz-related prompts (generation, editing, etc), for
       | those interested:
       | https://github.com/microsoft/lida/tree/main/lida/components/...
       | 
       | I built a tool that lets you use GPT to analyze data and build
       | interactive graphs on the browser
       | (https://deepsheet.dylancastillo.co/). I may try to adapt it to
       | use LIDA or a similar approach.
        
       | ilovefood wrote:
       | I took this a step further, turning charts into infographics
       | using stable diffusion (SDXL)
       | 
       | https://karimjedda.com/beautiful-data-visualizations-powered...
        
         | w-m wrote:
         | That's a really nice idea. Have you thought about making it a
         | product?
        
           | ilovefood wrote:
           | Thank you for the kind words. I wouldn't even know where to
           | start. There's a lot I can code and do, but making something
           | into a product is something I have no experience with.
           | 
           | What's your recommended approach? Even if open-source,
           | curious to learn.
        
             | thundergolfer wrote:
             | If the product got good enough you'd have a shot at selling
             | to Canva.
        
             | aliyeysides wrote:
             | Continue as an open source project by building features and
             | refining the product. Once you have enough users you can
             | start offering premium features and support to
             | organizations. Maybe even apply to an accelerator? Good
             | Luck!
        
               | ilovefood wrote:
               | Thanks for the positivity, I'll give it a shot.
        
             | ukuina wrote:
             | I would pay $5 for the occasional, chart-heavy month to
             | upload my graphs without proprietary detail and get a few
             | images to reroll through.
        
           | minimaxir wrote:
           | The technology for this type of generation (ControlNet) is
           | already open source and relatively straightforward to
           | reproduce the charts demoed in that post without shenanigans.
           | 
           | There's no moat.
        
         | esafak wrote:
         | It's chartjunk and should be used sparingly (e.g., on covers)
         | rather than the actual content. I think it would be used
         | frivolously in the hands of an undisciplined person.
         | 
         | https://en.wikipedia.org/wiki/Chartjunk
        
       | RH123 wrote:
       | I find the quality of the code really questionable:
       | 
       | system_prompt = """ You are a an experienced data analyst that
       | can annotate datasets. Your instructions are as follows: i)
       | ALWAYS generate the name of the dataset and the
       | dataset_description ii) ALWAYS generate a field description. iv.)
       | ALWAYS generate a semantic_type (a single word) for each field
       | given its values e.g. company, city, number, supplier, location,
       | gender, longitude, latitude, url, ip address, zip code, email,
       | etc You return an updated JSON dictionary without any preamble or
       | explanation. """
       | 
       | Some spelling errors, and where is number 3?
        
         | ukuina wrote:
         | All you need is attention?
        
       | mnky9800n wrote:
       | In the horsepower versus mpg example, every generated result
       | creates new data on the plot that isn't there in the original
       | plot. this is terrible.
        
       | itronitron wrote:
       | They should get rid of the infographer module as it really
       | undermines the rest of the work.
        
       | jgalt212 wrote:
       | I sort of know what I'm doing with data, so I don't want LLMs
       | building any models for me, but I do like the concept of making
       | my lame visualizations look more professional and slicker.
        
       | dontupvoteme wrote:
       | Ironically I've found GPT to be pretty terrible with plotting
       | libraries like plotly/dash or even matplotlib compared to just
       | about anything else in python.
        
         | pplonski86 wrote:
         | I wrote a simple wrapper around Matplotlib and
         | ChatGPT-3.5-turbo. The LLM response is a Python code that is
         | executed to get charts. It is working very nice. Here is a repo
         | https://github.com/mljar/plotai - you will find two videos in
         | the readme. Maybe you should work on your prompts?
        
           | dontupvoteme wrote:
           | Huh, neat. It never really bothered me enough/was important
           | enough to spend specifically time on it since i was able to
           | just hit a button to send them back for fixing but that's
           | good to see the extra passes aren't explicitly needed.
        
       | javajosh wrote:
       | No, absolutely not. How can you trust the output from such a
       | black box system? Who is to say that the LLM won't add or remove
       | data points to make the chart "look good"? Heaven help us if
       | decision makers start taking this output seriously. But of course
       | they will, because the charts will look professional and
       | plausible, because that's what the prompt requires.
        
         | pizza wrote:
         | How do you trust matplotlib? Same way: if you need to audit
         | plots, audit the generated source code.
        
           | esafak wrote:
           | So instead of auding MPL once (or never because MPL doesn't
           | have a habit of broken output) I should audit the output of
           | this LLM for every query because it _does_ have a habit of
           | hallucinating?
        
         | vimesy wrote:
         | "You are a helpful assistant highly skilled in writing PERFECT
         | code for visualizations. Given some code template, you complete
         | the template to generate a visualization given the dataset and
         | the goal described. The code you write MUST FOLLOW
         | VISUALIZATION BEST PRACTICES ie. meet the specified goal, apply
         | the right transformation, use the right visualization type, use
         | the right data encoding, and use the right aesthetics (e.g.,
         | ensure axis are legible). The transformations you apply MUST be
         | correct and the fields you use MUST be correct. The
         | visualization CODE MUST BE CORRECT and MUST NOT CONTAIN ANY
         | SYNTAX OR LOGIC ERRORS. You MUST first generate a brief plan
         | for how you would solve the task e.g. what transformations you
         | would apply e.g. if you need to construct a new column, what
         | fields you would use, what visualization type you would use,
         | what aesthetics you would use, etc. YOU MUST ALWAYS return code
         | using the provided code template. DO NOT add notes or
         | explanations." (https://github.com/microsoft/lida/blob/main/lid
         | a/components/...)
         | 
         | They prompted that things MUST be correct in their prompts and
         | it reports any transformations it does to your data, it might
         | give you some insight into its logic to test yourself against
         | the data.
        
           | computerex wrote:
           | Telling the LLM that it must do something is not a guarantee
           | that it'll follow through.
        
             | vykthur wrote:
             | True. This is an open area of research. Tools like guidance
             | (or other implementations of constrained decoding with llms
             | [1,2]) will likely help improve this problem.
             | 
             | [1] A guidance language for controlling large language
             | models. https://github.com/guidance-ai/guidance
             | 
             | [2] Knowledge Infused Decoding
             | https://arxiv.org/abs/2204.03084
        
         | phillipcarter wrote:
         | > Who is to say that the LLM won't add or remove data points to
         | make the chart "look good"?
         | 
         | I don't think you're thinking creatively enough here. A good
         | system that makes use of these concept (because it's a research
         | project, not a product!) will likely ensure that actions the
         | LLM takes are non-destructive and inherently undoable. For
         | example, if the underlying data was changed by the LLM, you can
         | statically verify that and show a warning, emit an error, or
         | ... something else entirely!
        
         | lmeyerov wrote:
         | Agreed. Our customers on the regulated side cannot use an
         | unexplainable UI like that _by law_.
         | 
         | We take a middle ground with louie.ai of showing the database
         | queries, data transforms, chart config, and any other decision
         | or generation. It's nice being able to watch & check each step
         | and then write in natural language what you want changed, so
         | ends up feeling more like the easier side of pair programming
         | than a blackbox.
        
       | mdorazio wrote:
       | Am I missing something here? From the video and examples this
       | looks like it's helping you make Excel charts with less suck
       | (slightly stylized), not really building what I would consider
       | "infographics" in the traditional marketing sense. I guess it
       | counts as visualizations, but not what I was expecting.
        
         | [deleted]
        
       | processing wrote:
       | Weird landing page - I'm expecting to see infographic examples
       | but the images I see are of people looking at screens and smiling
        
       | 2devnull wrote:
       | This would really be something if you could just give it voice
       | commands. Typing is absurd amidst all this wonderful automation!
        
       | f6v wrote:
       | Uncanny fingers.
        
       | nologic01 wrote:
       | Excellent technical work but subject to the same _major_
       | questionmarks around the morality and legality of LLM business
       | models. From the discussion section:
       | 
       | > Low Resource Grammars: ... LIDA depends on the underlying LLM s
       | having some knowledge of visualization grammars as represented in
       | text and code in its training dataset (e.g., examples of Altair,
       | Vega, Vega-Lite, GGPLot, Matplotlib, represented in Github,
       | Stackoverflow, etc.). For visualization grammars not well
       | represented in these datasets (e.g., tools like Tableau, PowerBI,
       | etc., that have graphical user interfaces as opposed to code
       | representations) ), the performance of LIDA may be limited
       | without additional model fine-tuning or translation.
       | 
       | In other words, open source programmatic visualizations are
       | required to feed the LLM, which then can, e.g., be licensed to
       | corporates to accelerate various internal exploratory data
       | analyses. A win-win for corporates and LLM providers.
       | 
       | Spot the loser.
        
         | Incolotopo wrote:
         | And what in particular is now novel or unique to the general
         | 'issue' you mention?
         | 
         | Most companies use OpenSource in one way or the other.
         | 
         | Nonetheless, a company like MS has probably already build
         | visualizers purely commercially (see excel) or/and is absolutly
         | able to write it themselfs.
        
           | nologic01 wrote:
           | I am not sure what you are talking about.
           | 
           | If I release a novel visualization library on github under
           | some open source license I want it to be attributed to me. I
           | don't want some specialized LLM to be lifting and offering
           | the same visualization concepts to unnamed corporates for a
           | hefty fee without me ever even knowing about it and these
           | corporates pretending they don't know where that concept is
           | coming from.
           | 
           | It is you choice whether you think that is a problem and how
           | "novel" it is. Theft after all has a very long history.
        
             | cooperaustinj wrote:
             | Everything you described been possible since the dawn of
             | intellectual property. Just replace LLM with "person".
             | 
             | Furthermore, it isn't theft to learn from others' work and
             | reproduce similar qualities.
        
               | nologic01 wrote:
               | Possible is not the same as admissible.
               | 
               | Good to know that the prevailing commercial tech culture
               | now sees plagiarism and stealing ideas without
               | attribution as the modern way of doing business and hopes
               | that dressing things up under some algorithmic veil will
               | hide the act.
               | 
               | I guess the pit of moral decline has no bottom. The
               | consolation is that theft has never been the road to
               | wealth. Once the plundering is over the only thing that
               | is left is a wasteland.
               | 
               | It seems that Microsoft has finally found a way to kill
               | the open source "cancer".
        
               | andybak wrote:
               | I'm afraid I'm just unclear on exactly what part of this
               | you argue is crossing a moral line.
               | 
               | I.e. what is being stolen without attribution? I'm
               | genuinely not getting what you mean in this specific
               | case.
        
               | nologic01 wrote:
               | Limited visualization grammar means that any non-trivial
               | visualization request will be lifting a particular
               | solution, more or less verbatim.
        
               | ryanklee wrote:
               | I don't see how it's possible to show that the solution
               | is lifted by the LLL as opposed to a arrived at by the
               | LLM.
               | 
               | It seems to me that such solutions are soon to be within
               | the set potentially constructed by an LLM.
        
               | nologic01 wrote:
               | As they say, people are unwilling to understand something
               | if their monetary gain depends on not understanding it.
               | 
               | Let me break it down for you. If I ask for a
               | visualization that squares the circle and there is one
               | repo that has an example of squaring the circle, the LLM
               | will "arrive" at a way of squaring the circle.
        
               | ryanklee wrote:
               | That's not really answering my question.
               | 
               | If (1) an LLM is able to arrive at solutions in the same
               | class of difficulty as the solution for the target
               | problem and (2) it's not possible to establish the
               | provenance of the solution actually offered by the LLM,
               | then what's the argument for assuming that the solution
               | is based on IP rather than constructive reasoning?
        
       | monkeydust wrote:
       | Was playing with the library this morning, the interesting part
       | to me was the 'goal explorer' which generates the questions to
       | asks of the data.
       | 
       | Keen to see more research into this part specially making the
       | questions more specific to the dataset in question and overlaying
       | real-world situations.
        
       | w-m wrote:
       | Last week I helped someone organizing and analyzing their data in
       | Excel. As I'm using Excel only once every couple of years, I had
       | to rewatch the wonderful "You Suck at Excel with Joel Spolsky" to
       | be productive again. Now seeing this announcement page, I was
       | immediately reminded of the mini-rant towards the end of the
       | video [0]:
       | 
       | > On average, once every three months, there's a startup that
       | makes a thing that they say is going to be amazing, and it's just
       | PivotTables. They're like, "It works with Excel, and it does this
       | amazing consolidation, and slicing and dicing of all your data,
       | and it's amazing, and we're going to make a startup. I'm going to
       | sell this for four hundred ninety-five dollars." And that happens
       | at least once every three months. The trouble is, the VCs usually
       | know about PivotTables.
       | 
       | Of course this product goes a little further, making suggestions
       | what columns to analyze and chart with an LLM. But it's quite
       | funny to me that this Microsoft Research product is reinventing
       | the PivotTable (+PivotChart) part with Python and Pandas.
       | 
       | [0]: https://youtu.be/0nbkaYsR94c?si=kkfFHZ_fyGmG3Lnj&t=2988
        
         | beebmam wrote:
         | It's impossible to use excel for big data, the application has
         | soft limits due to responsiveness.
        
         | Incolotopo wrote:
         | To be nitpicking here: Its not reinventing if its new.
         | 
         | And the focus of this research was probably not to invent
         | PivotTables but the Interface for these through LLMs
        
           | jgalt212 wrote:
           | > Interface for these through LLMs
           | 
           | no code pivot tables?
        
             | smcleod wrote:
             | Pivollama
        
         | bugglebeetle wrote:
         | Even Microsoft has to know Excel is shit software for large
         | datasets. I can't even get it to do a VLOOKUP correct half the
         | time.
        
         | dylan604 wrote:
         | But nobody today is going to read let alone promote a blog
         | about pivot tables. Sprinkle in LLM references, and the fad
         | wave riders will sing its praises
        
       ___________________________________________________________________
       (page generated 2023-08-29 23:01 UTC)