[HN Gopher] TinyTroupe, a new LLM-powered multiagent persona sim...
       ___________________________________________________________________
        
       TinyTroupe, a new LLM-powered multiagent persona simulation Python
       library
        
       Author : paulosalem
       Score  : 115 points
       Date   : 2024-11-11 16:04 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | xrd wrote:
       | I love jupyter notebooks. And, I'm amazed that a company like
       | Microsoft would put a notebook front and center that starts off
       | with a bunch of errors. Not a good look. I really think you can
       | improve your AI marketing by a lot by creating compelling jupyter
       | notebooks. Unsloth is a great example of the right way.
       | 
       | https://github.com/microsoft/TinyTroupe/blob/main/examples/a...
        
         | highcountess wrote:
         | I am glad I was not the only one that was taken aback a bit by
         | that. I am not one to be too critical about loose ends or
         | roughness in things that are provided for free and my ability
         | to contribute a change, but it is a bit surprising that
         | Microsoft would now have QA on this considering it ties into
         | the current image they are trying to build.
        
       | turing_complete wrote:
       | Written by George Hotz?
        
         | minimaxir wrote:
         | George Hotz does not have a monopoly on the word "Tiny" with
         | respect to AI.
        
         | ttul wrote:
         | Something tells me he will never work for Microsoft. Even
         | though they would probably love to employ him.
        
       | uniqueuid wrote:
       | Needs openai or azure APIs. I wonder if it's possible to just use
       | any openapi-compatible local provider.
        
         | uniqueuid wrote:
         | Yup, looks like their azure api configuration is just a generic
         | wrapper for openapi in which you can plug any endpoint url.
         | Nice.
         | 
         | https://github.com/microsoft/TinyTroupe/blob/7ae16568ad1c4de...
        
       | simonw wrote:
       | It looks like this defaults to GPT-4o:
       | https://github.com/microsoft/TinyTroupe/blob/7ae16568ad1c4de...
       | 
       | If you're going to try this out I would _strongly_ recommend
       | running it against GPT-4o mini instead. Mini is 16x cheaper and I
       | 'm confident the results you'll get out of it won't be 1/16th as
       | good for this kind of experiment.
        
         | ttul wrote:
         | I suppose the Microsoft researchers default to 4o because the
         | models are free in their environment...
        
       | cen4 wrote:
       | Might help GRRM finish his books.
        
       | simonw wrote:
       | Here's a quick way to start this running if you're using uv:
       | cd /tmp         git clone https://github.com/microsoft/tinytroupe
       | cd tinytroupe         OPENAI_API_KEY='your-key-here' uv run
       | jupyter notebook
       | 
       | I used this pattern because my OpenAI key is stashed in a LLM-
       | managed JSON file:                   OPENAI_API_KEY="$(jq -r
       | '.openai' "$(dirname "$(llm logs path)")/keys.json")" \
       | uv run jupyter notebook
       | 
       | (Which inspired me to add a new LLM feature: llm keys get openai
       | - https://github.com/simonw/llm/issues/623)
        
       | dragonwriter wrote:
       | This seems fundamentally unsuitable for its stated purpose, which
       | is "understanding human behavior".
       | 
       | While it may, as it says, produce "convincing interactions",
       | there is no basis at all peesented for believing it produces an
       | _accurate_ model of human behavior, so using it to "understand
       | human behavior" is at best willful self-deception, and probably,
       | with a little effort at tweaking inputs to produce the desired
       | results, most often when used by someone who presents it as
       | "enlightening productivity and business scenarios" it will be an
       | engine for simply manufacturing support for a pre-selected
       | option.
       | 
       | It is certainly easier and cheaper than exploring actual human
       | interactions to understand human behavior, but then so is just
       | using a magic 8-ball, which may be less _convincing_ , but for
       | all the evidence supporting this is just as _accurate_.
        
         | potatoman22 wrote:
         | I wonder how one could measure the how human-like the agents'
         | opinions and interactions are? There's a ton of value in
         | simulating preferences, but you're right that it's hard to know
         | if the simulation is accurate.
         | 
         | I have a hunch that, through sampling many AI "opinions," you
         | can arrive at something like the wisdom of the crowd, but
         | again, it's hard to validate.
        
           | kaibee wrote:
           | cw: i don't actually work in ML, i just read a lot. if
           | someone who is a real expert can tell me if my assessment
           | here is correct, please let me know.
           | 
           | > I have a hunch that, through sampling many AI "opinions,"
           | you can arrive at something like the wisdom of the crowd, but
           | again, it's hard to validate.
           | 
           | That's what an AI model already is.
           | 
           | Let's say you had 10 temperature sensors on a mountain and
           | you logged their data at time T.
           | 
           | If you take the average of those 10 readings, you get a
           | 'wisdom of the crowds' from the temperature sensors, which
           | you can model as an avg + std of your 10 real measurements.
           | 
           | You can then sample 10 new points from the normal
           | distribution defined by that avg + std. Cool for generating
           | new similar data, but it doesn't really tell you anything you
           | didn't already know.
           | 
           | Trying to get 'wisdom of crowds' through repeated querying of
           | the AI model is equivalent to sampling 10 new points at
           | random from your distribution. You'll get values that are
           | like your original distribution of true values (w/ some
           | outliers) but there's probably a better way to get at what
           | you're looking to extract from the model.
        
             | ori_b wrote:
             | It's worse than that. LLMs have been tuned carefully to
             | mostly produce output that will be inoffensive in a
             | corporate environment. This isn't an unbiased sampling.
        
               | yarp wrote:
               | Could be interesting if used with many different llms at
               | once
        
               | jfactorial wrote:
               | True for consumer products like ChatGPT but there are
               | plenty of models that are not censored. https://huggingfa
               | ce.co/models?sort=trending&search=uncensore...
        
               | isaacremuant wrote:
               | No. The censoring has already been done systematically by
               | tech corporations at the behest of political agents that
               | have power over them.
               | 
               | You only have to look at opinions about covid policies to
               | realize you won't get a good representation because
               | opinions will be deemed "misinformation" by the powers
               | that are vested in that being the case. Increasingly,
               | criticism of government policy can be conflated with some
               | sort of crime that is absolutely up for interpretation to
               | some government institution so people self censor,
               | companies censor just in casa and the Overton window gets
               | narrower.
               | 
               | LLMs are awesome but they will only represent what
               | they're trained on and what they're trained on only
               | represents what's allowed to be in the mainstream
               | discourse.
        
           | bpshaver wrote:
           | Section 6, "Controlled Evaluation," answers that question:
           | https://arxiv.org/pdf/2304.03442
        
         | michaelmior wrote:
         | I would tend to agree. Although for something like testing ads
         | it seems like it would be relatively straightforward to produce
         | an A/B test that compares the performance of two ads relative t
         | TinyTroupe's predictions.
        
         | A4ET8a8uTh0 wrote:
         | I did not test this library so I can't argue from that
         | perspective ( I think I will though ; it does seem interesting
         | ).
         | 
         | << "enlightening productivity and business scenarios" it will
         | be an engine for simply manufacturing support for a pre-
         | selected option.
         | 
         | In a sense, this is what training employees is all about. You
         | want to get them ready for various possible scenarios. For
         | recurring tasks that do require some human input, it does not
         | seem that far fetched.
         | 
         | << produce "convincing interactions"
         | 
         | This is the interesting part. Is convincing a bad thing if it
         | does what user would be expected to see?
        
           | mindcrime wrote:
           | > it will be an engine for simply manufacturing support for a
           | pre-selected option.
           | 
           | There's nothing unique about this tool in that regard though.
           | Pretty much anything can be mis-used in that way -
           | spreadsheets, graphics/visualizations, statistical models,
           | etc. etc. Whether tools are actually used to support better
           | decision making, or simply to support pre-selected decisions,
           | is more about the culture of the organization and the mind-
           | set of its leaders.
        
             | A4ET8a8uTh0 wrote:
             | Agreed. At the end of the day, it is just another tool.
             | 
             | I think the issue is the human tendency to just rubber
             | stamp whatever result is given. Not that long ago, few
             | questioned the result of a study and now there won't even
             | be underlying data to go back to see if someone made an
             | error. Naturally, this would suggest that we will start
             | seeing a lot of bad decisions, because human operators did
             | not stop and think whether the response made sense.
             | 
             | That said, I am not sure what can be done about it.
        
             | dragonwriter wrote:
             | > There's nothing unique about this tool in that regard
             | though.
             | 
             | Sure, it's just part of an arms race where having a new
             | thing with a hot selling pitch to cover that up and put a
             | layer of buzzwords on top of it helps sell the results to
             | audiences who have started to see through the existing ways
             | of doing that.
        
               | mindcrime wrote:
               | I agree in general. I'm just not sure how much the "new
               | thing with a hot selling pitch" part even matters. At
               | least IME, at companies where the culture is such that
               | management just look for ways to add a sheen of
               | scientific respectability to their ad-hoc decisions,
               | nobody really questions the details. Management just put
               | the "thing" out there, hand-wave some "blah, blah"
               | around, everybody nods their heads, and things proceed as
               | they were always going to.
        
         | bsenftner wrote:
         | My first thought while reading was this would be a great
         | academic framework in the hands of PhD students with extremely
         | high attention to all the details and those detail
         | interactions. But in the hands of any group or any individual
         | with a less scientifically rigorous mindset, it's a
         | construction set for justifications to do practically anything.
         | It's becomes in the hands of biased laypersons a toolset for
         | using statistics to lie exponentially changed into a nuclear
         | weapon.
        
         | keeda wrote:
         | The source does not mention the underlying motivation (and it
         | really should), but I think this is it:
         | 
         | https://www.linkedin.com/posts/emollick_kind-of-a-big-deal-a...
         | 
         | "... a new paper shows GPT-4 simulates people well enough to
         | replicate social science experiments with high accuracy.
         | 
         | Note this is done by having the AI prompted to respond to
         | survey questions as a person given random demographic
         | characteristics & surveying thousands of "AI people," and works
         | for studies published after the knowledge cut-off of the AI
         | models."
         | 
         | A couple other posts along similar lines:
         | 
         | https://www.linkedin.com/posts/emollick_this-paper-suggests-...
         | 
         | "... LLMs automatically generate scientific hypotheses, and
         | then test those hypotheses with simulated AI human agents.
         | 
         | https://www.linkedin.com/posts/emollick_formula-for-neat-ai-...
         | 
         | "Applying Asch's conformity experiment to LLMs: they tend to
         | conform with the majority opinion, especially when they are
         | "uncertain." Having a devil's advocate mitigates this effect,
         | just as it does with people."
        
       | oulipo wrote:
       | So fucking sad that the use of AI is for... manipulating more
       | humans into clicking on ads
       | 
       | Go get a fucking life, do something for the climate and repairing
       | our societies social fabric instead
        
       | thoreaux wrote:
       | How do I get the schizophrenic version of this
        
         | GlomarGadaffi wrote:
         | Following
        
       | itishappy wrote:
       | Here's the punchline from the Product Brainstorming example,
       | imagining new AI-driven features to add to Microsoft Word:
       | 
       | > AI-driven context-aware assistant. Suggests writing styles or
       | tones based on the document's purpose and user's past
       | preferences, adapting to industry-specific jargon.
       | 
       | > Smart template system. Learns from user's editing patterns to
       | offer real-time suggestions for document structure and content.
       | 
       | > Automatic formatting and structuring for documents. Learns from
       | previous documents to suggest efficient layouts and ensure
       | compliance with standards like architectural specifications.
       | 
       | > Medical checker AI. Ensures compliance with healthcare
       | regulations and checks for medical accuracy, such as verifying
       | drug dosages and interactions.
       | 
       | > AI for building codes and compliance checks. Flags potential
       | issues and ensures document accuracy and confidentiality,
       | particularly useful for architects.
       | 
       | > Design checker AI for sustainable architecture. Includes a
       | database of materials for sustainable and cost-effective
       | architecture choices.
       | 
       | Right, so what's missing in Word is an AI generated medical
       | compliance check that tracks drug interactions for you and an AI
       | architectural compliance and confidentiality... thing. Of course
       | these are all followed by a note that says "drawbacks: None."
       | Also, the penultimate line generated 7 examples but cut the
       | output off at 6.
       | 
       | The intermediate output isn't much better, generally restating
       | the same thing over and over and appending "in medicine" or "in
       | architecture." They quickly drop any context this discussion
       | relates to word processors in favor of discussing how a generic
       | industrial AI could help them. (Drug interactions in Word, my
       | word.)
       | 
       | Worth noting this is a Microsoft product generating ideas for a
       | different Microsoft product. I hope they vetted this within their
       | org.
       | 
       | As a proof of concept, this looks interesting! As a potentially
       | useful business insight tool this seems far out. I suppose this
       | might explain some of Microsoft's recent product decisions...
       | 
       | https://github.com/microsoft/TinyTroupe/blob/main/examples/p...
        
         | potatoman22 wrote:
         | That example is funny because 99% of doctors would not use Word
         | to write their notes (and not because it doesn't have this hot
         | new AI feature).
        
       | A4ET8a8uTh0 wrote:
       | Hmm, now lets see if there is an effort anywhere to link it to a
       | local llm.
        
         | ajcp wrote:
         | Just provide it the localhost:port for your instance of LM
         | Studio/text-generation-webui as the Azure OpenAI endpoint in
         | the config. Should work fine, but going to confirm now.
         | 
         | EDIT: Okay, this repo is a mess. They have "OpenAI" hardcoded
         | in so many places that it literally makes this useless for
         | working with Azure OpenAI Service OR any other openai style
         | API. That wouldn't be terrible once you fiddled with the config
         | IF they weren't importing the config BEFORE they set default
         | values...
        
       | thegabriele wrote:
       | I envision a future where ads are llms targetized. Is this worst
       | or better than what we have now?
        
       | libertine wrote:
       | Could this be applied to mass propaganda and disinformation
       | campaigns on social networks?
       | 
       | Like not only generating and testing narratives but then even use
       | it for agents to generate engagement.
       | 
       | We've seen massive bot networks unchecked on X to help tilt
       | election results, so probably this could be deployed there too.
        
         | Jimmc414 wrote:
         | > We've seen massive bot networks unchecked on X to help tilt
         | election results, so probably this could be deployed there too.
         | 
         | Do you have more details on this?
        
         | isaacremuant wrote:
         | Let me guess. These bot networks that influence elections are
         | from your political adversaries. Never from the party you
         | support or your government when they're in power.
         | 
         | The election results tilting talk is tired and hypocritical.
        
       | czbond wrote:
       | This is really cool. I can see quite a number of potential
       | applications.
        
       | GlomarGadaffi wrote:
       | Actually Sburb.
        
       ___________________________________________________________________
       (page generated 2024-11-11 23:01 UTC)