[HN Gopher] TinyTroupe, a new LLM-powered multiagent persona sim...
___________________________________________________________________
TinyTroupe, a new LLM-powered multiagent persona simulation Python
library
Author : paulosalem
Score : 115 points
Date : 2024-11-11 16:04 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| xrd wrote:
| I love jupyter notebooks. And, I'm amazed that a company like
| Microsoft would put a notebook front and center that starts off
| with a bunch of errors. Not a good look. I really think you can
| improve your AI marketing by a lot by creating compelling jupyter
| notebooks. Unsloth is a great example of the right way.
|
| https://github.com/microsoft/TinyTroupe/blob/main/examples/a...
| highcountess wrote:
| I am glad I was not the only one that was taken aback a bit by
| that. I am not one to be too critical about loose ends or
| roughness in things that are provided for free and my ability
| to contribute a change, but it is a bit surprising that
| Microsoft would now have QA on this considering it ties into
| the current image they are trying to build.
| turing_complete wrote:
| Written by George Hotz?
| minimaxir wrote:
| George Hotz does not have a monopoly on the word "Tiny" with
| respect to AI.
| ttul wrote:
| Something tells me he will never work for Microsoft. Even
| though they would probably love to employ him.
| uniqueuid wrote:
| Needs openai or azure APIs. I wonder if it's possible to just use
| any openapi-compatible local provider.
| uniqueuid wrote:
| Yup, looks like their azure api configuration is just a generic
| wrapper for openapi in which you can plug any endpoint url.
| Nice.
|
| https://github.com/microsoft/TinyTroupe/blob/7ae16568ad1c4de...
| simonw wrote:
| It looks like this defaults to GPT-4o:
| https://github.com/microsoft/TinyTroupe/blob/7ae16568ad1c4de...
|
| If you're going to try this out I would _strongly_ recommend
| running it against GPT-4o mini instead. Mini is 16x cheaper and I
| 'm confident the results you'll get out of it won't be 1/16th as
| good for this kind of experiment.
| ttul wrote:
| I suppose the Microsoft researchers default to 4o because the
| models are free in their environment...
| cen4 wrote:
| Might help GRRM finish his books.
| simonw wrote:
| Here's a quick way to start this running if you're using uv:
| cd /tmp git clone https://github.com/microsoft/tinytroupe
| cd tinytroupe OPENAI_API_KEY='your-key-here' uv run
| jupyter notebook
|
| I used this pattern because my OpenAI key is stashed in a LLM-
| managed JSON file: OPENAI_API_KEY="$(jq -r
| '.openai' "$(dirname "$(llm logs path)")/keys.json")" \
| uv run jupyter notebook
|
| (Which inspired me to add a new LLM feature: llm keys get openai
| - https://github.com/simonw/llm/issues/623)
| dragonwriter wrote:
| This seems fundamentally unsuitable for its stated purpose, which
| is "understanding human behavior".
|
| While it may, as it says, produce "convincing interactions",
| there is no basis at all peesented for believing it produces an
| _accurate_ model of human behavior, so using it to "understand
| human behavior" is at best willful self-deception, and probably,
| with a little effort at tweaking inputs to produce the desired
| results, most often when used by someone who presents it as
| "enlightening productivity and business scenarios" it will be an
| engine for simply manufacturing support for a pre-selected
| option.
|
| It is certainly easier and cheaper than exploring actual human
| interactions to understand human behavior, but then so is just
| using a magic 8-ball, which may be less _convincing_ , but for
| all the evidence supporting this is just as _accurate_.
| potatoman22 wrote:
| I wonder how one could measure the how human-like the agents'
| opinions and interactions are? There's a ton of value in
| simulating preferences, but you're right that it's hard to know
| if the simulation is accurate.
|
| I have a hunch that, through sampling many AI "opinions," you
| can arrive at something like the wisdom of the crowd, but
| again, it's hard to validate.
| kaibee wrote:
| cw: i don't actually work in ML, i just read a lot. if
| someone who is a real expert can tell me if my assessment
| here is correct, please let me know.
|
| > I have a hunch that, through sampling many AI "opinions,"
| you can arrive at something like the wisdom of the crowd, but
| again, it's hard to validate.
|
| That's what an AI model already is.
|
| Let's say you had 10 temperature sensors on a mountain and
| you logged their data at time T.
|
| If you take the average of those 10 readings, you get a
| 'wisdom of the crowds' from the temperature sensors, which
| you can model as an avg + std of your 10 real measurements.
|
| You can then sample 10 new points from the normal
| distribution defined by that avg + std. Cool for generating
| new similar data, but it doesn't really tell you anything you
| didn't already know.
|
| Trying to get 'wisdom of crowds' through repeated querying of
| the AI model is equivalent to sampling 10 new points at
| random from your distribution. You'll get values that are
| like your original distribution of true values (w/ some
| outliers) but there's probably a better way to get at what
| you're looking to extract from the model.
| ori_b wrote:
| It's worse than that. LLMs have been tuned carefully to
| mostly produce output that will be inoffensive in a
| corporate environment. This isn't an unbiased sampling.
| yarp wrote:
| Could be interesting if used with many different llms at
| once
| jfactorial wrote:
| True for consumer products like ChatGPT but there are
| plenty of models that are not censored. https://huggingfa
| ce.co/models?sort=trending&search=uncensore...
| isaacremuant wrote:
| No. The censoring has already been done systematically by
| tech corporations at the behest of political agents that
| have power over them.
|
| You only have to look at opinions about covid policies to
| realize you won't get a good representation because
| opinions will be deemed "misinformation" by the powers
| that are vested in that being the case. Increasingly,
| criticism of government policy can be conflated with some
| sort of crime that is absolutely up for interpretation to
| some government institution so people self censor,
| companies censor just in casa and the Overton window gets
| narrower.
|
| LLMs are awesome but they will only represent what
| they're trained on and what they're trained on only
| represents what's allowed to be in the mainstream
| discourse.
| bpshaver wrote:
| Section 6, "Controlled Evaluation," answers that question:
| https://arxiv.org/pdf/2304.03442
| michaelmior wrote:
| I would tend to agree. Although for something like testing ads
| it seems like it would be relatively straightforward to produce
| an A/B test that compares the performance of two ads relative t
| TinyTroupe's predictions.
| A4ET8a8uTh0 wrote:
| I did not test this library so I can't argue from that
| perspective ( I think I will though ; it does seem interesting
| ).
|
| << "enlightening productivity and business scenarios" it will
| be an engine for simply manufacturing support for a pre-
| selected option.
|
| In a sense, this is what training employees is all about. You
| want to get them ready for various possible scenarios. For
| recurring tasks that do require some human input, it does not
| seem that far fetched.
|
| << produce "convincing interactions"
|
| This is the interesting part. Is convincing a bad thing if it
| does what user would be expected to see?
| mindcrime wrote:
| > it will be an engine for simply manufacturing support for a
| pre-selected option.
|
| There's nothing unique about this tool in that regard though.
| Pretty much anything can be mis-used in that way -
| spreadsheets, graphics/visualizations, statistical models,
| etc. etc. Whether tools are actually used to support better
| decision making, or simply to support pre-selected decisions,
| is more about the culture of the organization and the mind-
| set of its leaders.
| A4ET8a8uTh0 wrote:
| Agreed. At the end of the day, it is just another tool.
|
| I think the issue is the human tendency to just rubber
| stamp whatever result is given. Not that long ago, few
| questioned the result of a study and now there won't even
| be underlying data to go back to see if someone made an
| error. Naturally, this would suggest that we will start
| seeing a lot of bad decisions, because human operators did
| not stop and think whether the response made sense.
|
| That said, I am not sure what can be done about it.
| dragonwriter wrote:
| > There's nothing unique about this tool in that regard
| though.
|
| Sure, it's just part of an arms race where having a new
| thing with a hot selling pitch to cover that up and put a
| layer of buzzwords on top of it helps sell the results to
| audiences who have started to see through the existing ways
| of doing that.
| mindcrime wrote:
| I agree in general. I'm just not sure how much the "new
| thing with a hot selling pitch" part even matters. At
| least IME, at companies where the culture is such that
| management just look for ways to add a sheen of
| scientific respectability to their ad-hoc decisions,
| nobody really questions the details. Management just put
| the "thing" out there, hand-wave some "blah, blah"
| around, everybody nods their heads, and things proceed as
| they were always going to.
| bsenftner wrote:
| My first thought while reading was this would be a great
| academic framework in the hands of PhD students with extremely
| high attention to all the details and those detail
| interactions. But in the hands of any group or any individual
| with a less scientifically rigorous mindset, it's a
| construction set for justifications to do practically anything.
| It's becomes in the hands of biased laypersons a toolset for
| using statistics to lie exponentially changed into a nuclear
| weapon.
| keeda wrote:
| The source does not mention the underlying motivation (and it
| really should), but I think this is it:
|
| https://www.linkedin.com/posts/emollick_kind-of-a-big-deal-a...
|
| "... a new paper shows GPT-4 simulates people well enough to
| replicate social science experiments with high accuracy.
|
| Note this is done by having the AI prompted to respond to
| survey questions as a person given random demographic
| characteristics & surveying thousands of "AI people," and works
| for studies published after the knowledge cut-off of the AI
| models."
|
| A couple other posts along similar lines:
|
| https://www.linkedin.com/posts/emollick_this-paper-suggests-...
|
| "... LLMs automatically generate scientific hypotheses, and
| then test those hypotheses with simulated AI human agents.
|
| https://www.linkedin.com/posts/emollick_formula-for-neat-ai-...
|
| "Applying Asch's conformity experiment to LLMs: they tend to
| conform with the majority opinion, especially when they are
| "uncertain." Having a devil's advocate mitigates this effect,
| just as it does with people."
| oulipo wrote:
| So fucking sad that the use of AI is for... manipulating more
| humans into clicking on ads
|
| Go get a fucking life, do something for the climate and repairing
| our societies social fabric instead
| thoreaux wrote:
| How do I get the schizophrenic version of this
| GlomarGadaffi wrote:
| Following
| itishappy wrote:
| Here's the punchline from the Product Brainstorming example,
| imagining new AI-driven features to add to Microsoft Word:
|
| > AI-driven context-aware assistant. Suggests writing styles or
| tones based on the document's purpose and user's past
| preferences, adapting to industry-specific jargon.
|
| > Smart template system. Learns from user's editing patterns to
| offer real-time suggestions for document structure and content.
|
| > Automatic formatting and structuring for documents. Learns from
| previous documents to suggest efficient layouts and ensure
| compliance with standards like architectural specifications.
|
| > Medical checker AI. Ensures compliance with healthcare
| regulations and checks for medical accuracy, such as verifying
| drug dosages and interactions.
|
| > AI for building codes and compliance checks. Flags potential
| issues and ensures document accuracy and confidentiality,
| particularly useful for architects.
|
| > Design checker AI for sustainable architecture. Includes a
| database of materials for sustainable and cost-effective
| architecture choices.
|
| Right, so what's missing in Word is an AI generated medical
| compliance check that tracks drug interactions for you and an AI
| architectural compliance and confidentiality... thing. Of course
| these are all followed by a note that says "drawbacks: None."
| Also, the penultimate line generated 7 examples but cut the
| output off at 6.
|
| The intermediate output isn't much better, generally restating
| the same thing over and over and appending "in medicine" or "in
| architecture." They quickly drop any context this discussion
| relates to word processors in favor of discussing how a generic
| industrial AI could help them. (Drug interactions in Word, my
| word.)
|
| Worth noting this is a Microsoft product generating ideas for a
| different Microsoft product. I hope they vetted this within their
| org.
|
| As a proof of concept, this looks interesting! As a potentially
| useful business insight tool this seems far out. I suppose this
| might explain some of Microsoft's recent product decisions...
|
| https://github.com/microsoft/TinyTroupe/blob/main/examples/p...
| potatoman22 wrote:
| That example is funny because 99% of doctors would not use Word
| to write their notes (and not because it doesn't have this hot
| new AI feature).
| A4ET8a8uTh0 wrote:
| Hmm, now lets see if there is an effort anywhere to link it to a
| local llm.
| ajcp wrote:
| Just provide it the localhost:port for your instance of LM
| Studio/text-generation-webui as the Azure OpenAI endpoint in
| the config. Should work fine, but going to confirm now.
|
| EDIT: Okay, this repo is a mess. They have "OpenAI" hardcoded
| in so many places that it literally makes this useless for
| working with Azure OpenAI Service OR any other openai style
| API. That wouldn't be terrible once you fiddled with the config
| IF they weren't importing the config BEFORE they set default
| values...
| thegabriele wrote:
| I envision a future where ads are llms targetized. Is this worst
| or better than what we have now?
| libertine wrote:
| Could this be applied to mass propaganda and disinformation
| campaigns on social networks?
|
| Like not only generating and testing narratives but then even use
| it for agents to generate engagement.
|
| We've seen massive bot networks unchecked on X to help tilt
| election results, so probably this could be deployed there too.
| Jimmc414 wrote:
| > We've seen massive bot networks unchecked on X to help tilt
| election results, so probably this could be deployed there too.
|
| Do you have more details on this?
| isaacremuant wrote:
| Let me guess. These bot networks that influence elections are
| from your political adversaries. Never from the party you
| support or your government when they're in power.
|
| The election results tilting talk is tired and hypocritical.
| czbond wrote:
| This is really cool. I can see quite a number of potential
| applications.
| GlomarGadaffi wrote:
| Actually Sburb.
___________________________________________________________________
(page generated 2024-11-11 23:01 UTC)