[HN Gopher] Yi: Open Foundation Models by 01.AI
___________________________________________________________________
Yi: Open Foundation Models by 01.AI
Author : pama
Score : 161 points
Date : 2024-03-10 15:12 UTC (7 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| helsinkiandrew wrote:
| The Github repository, gives a better introduction/howto:
|
| https://github.com/01-ai/yi
|
| > Yi-34B-Chat model landed in second place (following GPT-4
| Turbo), outperforming other LLMs (such as GPT-4, Mixtral, Claude)
| on the AlpacaEval Leaderboard (based on data available up to
| January 2024).
|
| > Yi-34B model ranked first among all existing open-source models
| (such as Falcon-180B, Llama-70B, Claude) in both English and
| Chinese on various benchmarks, including Hugging Face Open LLM
| Leaderboard (pre-trained) and C-Eval (based on data available up
| to November 2023).
| WhitneyLand wrote:
| It's been ~1 year since gpt-4 was released.
|
| It's hard to guess how long before any flavor of an "open"
| model will consensus match what was released in 2023 let alone
| potentially exceed it.
|
| A big part of the race seems like it will depend on how high
| gpt-5 can raise the bar. If it's only incrementally things may
| converge quickly.
| c0n5pir4cy wrote:
| The Yi models were actually released back in early November
| 2023. So there isn't as big a gap in time as it seems.
|
| I'm not sure why there is such a big gap between the release
| of the models and the publication of the paper.
|
| EDIT: Okay, this appears to be a new set of models with the
| same name, based on the same models from November but now
| with multimodal capabilities.
| oersted wrote:
| Looking at the leaderboard shows a clearer picture:
| https://tatsu-lab.github.io/alpaca_eval/
|
| - GPT-4-Turbo: 50.00%
|
| - Snorkel (current 2nd, Mistral 7B fine-tune): 34.86%
|
| - Yi 34B Chat (current 6th): 29.66%
|
| - GPT-4: 23.58%
|
| Thoughts:
|
| - Just saying that it came 2nd is quite misleading, the
| difference in score is significant.
|
| - Not sure what's up with this benchmark, I've never seen
| GPT-4-Turbo vs GPT-4 performing so differently.
|
| - The Snorkel model is impressive with just 7B parameters. The
| Yi authors claim that their success is based on good training
| data cleaning. This seems to be key at least for this
| benchmark. Snorkel has also always been all about that, using
| programmatic methods to generate lots of quality training data.
| doctorpangloss wrote:
| > Not sure what's up with this benchmark, I've never seen
| GPT-4-Turbo vs GPT-4 performing so differently.
|
| The benchmark is bad.
| lossolo wrote:
| The only reliable benchmark is found at
|
| https://huggingface.co/spaces/lmsys/chatbot-arena-
| leaderboar...
|
| Model creators train models including open-source benchmarks
| in the data, either intentionally to achieve better scores or
| inadvertently through leaks from various sources.
| nickpsecurity wrote:
| Anytime you see that, we should assume the newer models might
| have been trained on either the benchmarks themselves or
| something similar to them. If I was an evaluator, I'd keep a
| secret pile of tests that I know aren't in any LLM's, do the
| evaluations privately, and not publish scores either. Just rank
| plus how far apart they are.
|
| The best tests of these models are people who want to use AI to
| solve real problems attempting to do that with various models.
| If they work, report that they worked. Also, publish the work
| and result pairs permissively when possible to evaluate _that_
| and use it for fine-tuning, too.
| jes5199 wrote:
| 01, like from the Animatrix ?
| bearjaws wrote:
| Seeing models like this work so well gives me hope that mobile
| first LLMs for things like better voice to text and typing
| prediction will not just 'work' in 2-3 years but actually not
| kill your battery too.
| m3kw9 wrote:
| It it kills battery it won't be part of OS, and if it is on iOS
| it won't be allowed to be in the AppStore, or it will be gated
| by API/hardware gating.
|
| For Andoird, they'll just allow it and your batter will last
| 30min after a few questions
| coffeebeqn wrote:
| If it's fast enough to be useful it would also not physically
| be able to use that much power. Your phone CPU and GPU have a
| maximum W they can pull at any one time and if this runs for
| a few seconds then the maximum it can use is that.
|
| If it maxes out all cores and memory for 30 minutes then it
| won't really work for anything
| evilduck wrote:
| MLC Chat is already on the App Store and allowed to be used.
| I haven't used Yi with it, but a quantized Mistral or Llama
| runs quite well on an iPhone 15. See https://llm.mlc.ai.
| "Apple GPT" is also rumored to be coming too.
|
| It is processor and therefore battery intensive but it
| already won't kill your battery inside of 30 minutes.
| Obviously it will be worse for resource usage than an app if
| it's always kept running by some OS level process and set as
| the processing layer for every trivial thing but it seems
| like cheaper input handling could decide to promote some
| input up to being evaluated by an LLM or not.
| barronli wrote:
| iOS already has app with LLM running locally on iPhone:
| https://apps.apple.com/gb/app/mlc-chat/id6448482937
|
| I've also tried a few Android LLM apps, all running more than
| 30min.
|
| Current LLM models are not running constantly on the phones
| to drain your battery. They just run when responding to a
| prompt. It by no means consumes more battery than a heavy
| game.
| orra wrote:
| The repo source code is Apache 2.0 licensed, but the weights are
| not.
|
| Just in case anybody else is excited then misled by their tagline
| "Building the Next Generation of Open-Source and Bilingual LLMs".
| est31 wrote:
| More reading on the weight license:
| https://news.ycombinator.com/item?id=38159862
| mmastrac wrote:
| The model license, excerpts:
|
| https://github.com/01-ai/Yi/blob/main/MODEL_LICENSE_AGREEMEN...
|
| 1) Your use of the Yi Series Models must comply with the Laws
| and Regulations as well as applicable legal requirements of
| other countries/regions, and respect social ethics and moral
| standards, including but not limited to, not using the Yi
| Series Models for purposes prohibited by Laws and Regulations
| as well as applicable legal requirements of other
| countries/regions, such as harming national security, promoting
| terrorism, extremism, inciting ethnic or racial hatred,
| discrimination, violence, or pornography, and spreading false
| harmful information.
|
| 2) You shall not, for military or unlawful purposes or in ways
| not allowed by Laws and Regulations as well as applicable legal
| requirements of other countries/regions, a) use, copy or
| Distribute the Yi Series Models, or b) create complete or
| partial Derivatives of the Yi Series Models.
|
| "Laws and Regulations" refers to the laws and administrative
| regulations of the mainland of the People's Republic of China
| (for the purposes of this Agreement only, excluding Hong Kong,
| Macau, and Taiwan).
| echelon wrote:
| Weights are trained on copyrighted data. I think that
| ethically, weights should be public domain unless all of the
| data [1] is owned or licensed by the training entity.
|
| I'm hopeful that this is where copyright law lands. It seems
| like this might be the disposition of the regulators, but we'll
| have to wait and see.
|
| In the meantime, maybe you should build your product in this
| way anyway and fight for the law when you succeed. I don't
| think a Chinese tech company is going to find success in
| battling a US startup in court. (I would also treat domestic
| companies with model licenses the same way, though the outcome
| could be more of a toss up.)
|
| "Break the rules."
|
| "Fake it until you make it."
|
| Both idioms seem highly applicable here.
|
| [1] I think this should be a viral condition. Finetuning on a
| foundational model that incorporates vast copyrighted data
| should mean downstream training also becomes public domain.
| jacobn wrote:
| "we attribute the performance of Yi models primarily to its data
| quality resulting from our data-engineering efforts"
|
| Data work is rarely sexy, but (almost) always useful.
|
| Did they release the corpus?
| gwern wrote:
| They did not, in part because it would reveal the data-
| filtering routines (particularly the political censorship -
| Chinese LLM papers sometimes mention the ban list but never
| reveal it), and also in part because it might reveal things
| they'd rather keep secret.
|
| For example, Bytedance has already been caught using the OA API
| to generate data for their models because they are having such
| a hard time catching up to OA - and evading bans for doing
| that, and also instructing employees on how to lie & cover it
| up: https://www.theverge.com/2023/12/15/24003151/bytedance-
| china...
|
| Do you think that a small Chinese startup like 01.AI, which by
| their own admission had to "bet the farm" to buy enough GPUs to
| train the Yi models at all
| https://www.bloomberg.com/news/articles/2023-11-05/kai-fu-le...
| , and which were completely silent about cloning the American
| LLaMA architecture until people analyzed the released
| checkpoints and noticed it looked awfully familiar, is going to
| be above such tactics...? In this economic/geopolitical
| context? Especially when everyone seems to be doing it, not
| just Bytedance?* (01.AI claims that, the architecture aside,
| they didn't simply further train LLaMA models but trained from
| scratch. You can decide for yourself how much you are willing
| to believe this.) I wouldn't bet a lot of money on it, and
| that's why I don't expect to see any large comprehensive data
| releases from 01.AI for the Yi models.
|
| * This is one of my theories for why so many disparate models
| by so many different groups all seem to weirdly converge on the
| same failure modes like 'write a non-rhyming poem', and why
| GPT-3.5, and then GPT-4, seemed to be oddly difficult to
| surpass, as if there were some magnetic force which made
| reaching _near_ 3.5 /4 quality easy for 'independent' models,
| but then _surpassing_ somehow difficult. Everyone is lying or
| mistaken about 3.5 /4 data getting into their corpus, and the
| sugar-rush of imitation learning fools you into thinking you're
| making a lot of progress, even when your overall approach
| sucks. (As Andrej Karpathy notes, neural nets _want_ to work,
| and so even if you have serious bugs in your code, they will
| still work pretty well - and simply permanently fall short of
| their true potential. Cautionary recent example:
| https://twitter.com/karpathy/status/1765473722985771335 )
| visarga wrote:
| > 01.AI claims that, the architecture aside, they didn't
| simply further train LLaMA models but trained from scratch.
| You can decide for yourself how much you are willing to
| believe this.
|
| You can't hide this. The latent space remains mostly fixed
| after pre-training. It all depends on the seed for the
| initial random init. Further pre-training won't move it
| enough. Because of this property, you can even average two
| fine-tunings from the same parent model, but never on models
| trained from different seeds.
| sroussey wrote:
| the averaging seems like good test for who the parent is.
| gwern wrote:
| I don't know anyone has properly analyzed this, nor how
| robust such methods are if one is trying to cover it up.
| Also, I doubt anyone has analyzed the scenario where the
| warm-started model is then extensively trained for
| trillions of token (possibly with a cyclical LR)
| particularly in Chinese - the latent spaces are _not_
| perfectly aligned Chinese /English and I'd expect that to
| change it a lot. (The point of this would be that
| 'cheating' by warm-starting it should let you avoid a lot
| of training instabilities and issues early in training, and
| may get you better quality at the end.)
| dannyw wrote:
| Can play around with the model here:
| https://replicate.com/01-ai/yi-34b-chat
|
| Very slow, but this is unquantized and there's probably a lot of
| demand.
| mg wrote:
| Hmm.. it fails for my favorite test prompt:
|
| https://www.gnod.com/search/ai#q=Two%20cars%20have%20a%20100...
|
| I gave it 3 tries and each time, Yi picked one of the cars as the
| winner.
|
| I've been watching for many months now, how LLMs got better and
| better at solving it. Many still struggle with it, but the top
| ones nowadays mostly get it right.
| appplication wrote:
| On one hand, I don't really understand why anyone would expect
| an LLM to solve logic puzzles. The only way it can do so is not
| through reasoning, but by having been trained on a structurally
| similar puzzle.
|
| On the other hand, it does feel fun that the top ones appear to
| solve it, and I understand why it feels cool to have a computer
| that appears to be capable of solving these puzzles. But
| really, I think this is just specificity in training. There is
| no theoretical or empirical basis for LLMs having any reasoning
| capability. The only reason it can solve it is because side the
| creators of these top models specifically trained the models on
| problems like this to give the appearance of intelligence.
| mg wrote:
| There might be no reasoning in a single pass which outputs a
| single token. But in the loop where the output of the LLM
| repeatedly gets fed back into its input, reasoning is clearly
| happening:
|
| The LLMs lay out how to go about figuring out the answer, do
| a series of calculation steps and then come up with an
| answer.
|
| If you add "Please answer in just one short sentence." to the
| prompt, even the top ones get it wrong.
| visarga wrote:
| Reasoning is also an iterative process. Besides scaling in
| response length, the model can also get multiple feedbacks
| from outside to correct itself.
| spyder wrote:
| Yep, humans too have to think before answering most non-
| trivial questions, and especially the ones that include
| calculations. So it seems "obvious" that we should try to
| to give LLMs too some time to think before answering, for
| example with the popular methods of asking for step-by-step
| thinking, thinking out loud, and only giving the final
| answer at the end, and also asking it to proofread and
| correct it's answers at the end all can help with that.
|
| Pause tokens (thinking tokens) are also an interesting
| method to achieve that and seems to have a positive effect
| on performance:
|
| https://arxiv.org/abs/2310.02226
| visarga wrote:
| > There is no theoretical or empirical basis for LLMs having
| any reasoning capability.
|
| Yes there is. Learning to predict the next token implies a
| lot of things, among which is also logical reasoning. The
| chain-of-thought approach shows that when you stimulate this
| behavior, you get higher accuracies.
| xcv123 wrote:
| > There is no theoretical or empirical basis for LLMs having
| any reasoning capability.
|
| Deep learning models are specifically designed for automatic
| pattern recognition. That includes patterns of reasoning and
| problem solving.
|
| > The only reason it can solve it is because side the
| creators of these top models specifically trained the models
| on problems like this to give the appearance of intelligence.
|
| That's not how deep learning works, and not how machine
| learning works in general. The models can automatically
| recognize patterns of reasoning then apply those methods to
| problems it has never seen before.
|
| > The only way it can do so is not through reasoning, but by
| having been trained on a structurally similar puzzle.
|
| This is a fundamental misunderstanding of how it works. The
| large deep learning models have 100+ layers, modelling
| extremely abstract features of the data, which include
| abstract patterns of problem solving and reasoning. They are
| not simply regurgitating training examples.
| xcv123 wrote:
| > There is no theoretical or empirical basis for LLMs having
| any reasoning capability.
|
| Geoffrey Hinton - Mapping Part-Whole Hierarchies into
| Connectionist Networks (1990)
|
| https://www.cs.toronto.edu/~hinton/absps/AIJmapping.pdf
|
| "The paper, titled "Mapping Part-Whole Hierarchies into
| Connectionist Networks" (1990), demonstrated how neural
| networks can learn to represent conceptual hierarchies and
| reason about relations like family trees.
|
| Specifically, Hinton showed that by training a neural network
| on examples of family relationships (parent-child,
| grandparent-grandchild, etc.), the network was able to
| accurately model the inherent logical patterns and reason
| about new family tree instances it had not encountered during
| training.
|
| This pioneering work highlighted that instead of just
| memorizing specific training examples, neural networks can
| extract the underlying logical rules and reasoning patterns
| governing the data. The learned representations captured
| abstract concepts like "parent" that enabled generalizing to
| reason about entirely new family tree configurations."
| stevenhuang wrote:
| Your assertion that LLMs cannot reason is some exquisite
| irony considering the extensive theoretical foundation in
| support of the idea.
| AndrewKemendo wrote:
| I asked my 12 year old son to solve this prompt.
|
| His answer was "Neither win" and it took him 1 minute and 24
| sec using no pre-defined algorithm or heuristic.
|
| He said his process of thoughts was:
|
| "I figured it would take 10 hours for car A to finish 100 miles
| and it would take twice that long for car B. Since Car B is
| already halfway there when car A starts, then they would arrive
| together"
|
| I as 40 year old man, approached it intentionally naively (eg.
| I did not go looking for an optimal solver first) by making a
| drawing and attempting to derive the algorithm. It took me ~3
| minutes to come to the same conclusion but at the end I had a
| series of equations, but no algebraic proofs.[1]
|
| So now you have a human child reference metric if you want it.
|
| [1]https://twitter.com/AndrewKemendo/status/1766872572300235022
| mattstir wrote:
| Interestingly, GPT-4 also fails to correctly solve this prompt,
| choosing car A each time after multiple tries for me. I tend to
| find that models struggle with such logic puzzles when using
| less common phrasing (e.g., two cars "having" a race instead of
| participating in one, "headstart" instead of "head-start",
| etc).
|
| GPT-4 correctly solved the problem when it was reworded to:
| "There is a 100 mile race with two participants: car A and car
| B. Car A travels at 10 miles per hour but does not begin
| driving immediately. Car B travels at 5 miles per hour and is
| given a 10 hour head-start. After 10 hours, car A begins to
| move as well. Who wins the race?"
| theptip wrote:
| "01.ai" is not a very auspicious name; 01 was the first AI
| nation, that eventually waged war with humanity and then enslaved
| them in The Matrix.
| riku_iki wrote:
| > that eventually waged war with humanity
|
| I think humanity waged war on 01
| ben_w wrote:
| One thing I assumed when watching the Animatrix, but never
| had confirmed, was that the name "01" was chosen because it
| sounds a bit like "Zion".
| acjohnson55 wrote:
| I, for one, welcome our digital overlords.
| d-z-m wrote:
| there's also a new Yi model, Yi-9B[0].
|
| [0]: https://huggingface.co/01-ai/Yi-9B
| gyre wrote:
| Potentially interesting on the alignment front: In my experience
| the yi-6b model running on ollama is more likely to refuse
| politically sensitive queries (relating to Tiananmen Square, Peng
| Shuai's disappearance, etc) when asked in Chinese, and more
| likely to provide information when asked in English. I wonder if
| this difference falls out naturally from available training data,
| is a deliberate internationalization choice, or is just noise
| from the queries I happened to run.
| Havoc wrote:
| Could also be both. Training data organically creating the
| difference but with an additional layer of specific alignment
| on top too
| mattstir wrote:
| I noticed similar behaviour in an older model (Skywork 13B) a
| few months back. When asked in Chinese, it would politely say
| that nothing of note occurred when responding to queries about
| Tiananmen Square, etc. In English, it would usually respond
| truthfully. It was deliberate in the case of Skywork, based on
| their model card
| (https://huggingface.co/Skywork/Skywork-13B-base):
|
| > We have developed a data cleaning pipeline with great care to
| effectively clean and filter low-quality data and eliminate
| harmful information from text data.
|
| I'd imagine it's likely similar for Yi.
| BoorishBears wrote:
| Huge jump to go from that line in the model card to it being
| intentional from the model's creators.
|
| China censors those events. They pre-trained with a specific
| focus on Chinese text, and integrated more native Chinese
| text than most models do.
|
| Doesn't require any additional filtering on their behalf to
| have the model reflect that, and if anything the fact that
| they're mentioned in english implies the opposite of your
| hypothesis.
|
| If they were going to filter Tiananmen Square, the lift to
| filter it in English would not be any higher.
| arijun wrote:
| I wonder if you could use the multilingual capabilities to
| workaround it's own censorship? I.e. what would happen if you
| asked it to translate the query to English, asked it in
| English, and then asked it to translate back to Chinese.
| advael wrote:
| This may be a useful workaround, but it also forms the
| strongset argument I've yet seen so far against claims that
| LLMs do something like "understanding" or "an underlying world
| model". Maybe models knowing the same facts in different
| languages, especially across political controversy, might form
| a good benchmark to evaluate
| GaggiX wrote:
| Yi-34B is the LLM used by LLaVA-1.6 (also known as LLaVA-NeXT)
| and it's by far the best open source large multimodal models,
| demo: https://llava.hliu.cc/
| zone411 wrote:
| Yi 34B Chat has not done well on my new NYT Connections benchmark
| and it's only in the 22nd place on the LMSYS Elo-based
| leaderboard (151 Elo below GPT 4 Turbo). It's doing better in
| Chinese. When it comes to models with open-sourced weights, Qwen
| 72B is clearly stronger.
| Yenrabbit wrote:
| Ooh I also use connections as a benchmark! It tends to favour
| things with 'chain of thought' style reasoning in the training
| mix somewhere since directly producing the answer is hard. Do
| you have public code you could share?
| gpjanik wrote:
| I understand that all these new models are an attempt to catch up
| with GPT-4, but frankly speaking, in the current shape and form,
| they're almost entirely useless.
|
| I frantically tried anything available on Groq to improve
| performance of my GPT-4 based chatbot - it's incomparably bad -
| and the more of them I see, the more I believe OpenAI has
| fundamentally no competition at all at the moment.
|
| No exception with the above, also pretty bad (IMHO worse than
| GPT-3.5).
| yumraj wrote:
| Given that this is a Chinese model, I'm genuinely curious if
| researchers have been evaluating risk that these models could be
| used for soft propaganda or similar purpose?
|
| As others have reported, English and Chinese queries return
| different replies on topics that are not kosher in China.
|
| What's the risk that such models could be used for nefarious
| purposes by providing propaganda/biased/incorrect/... responses
| that on a cursory glance seem factual.
| ithkuil wrote:
| At the very least models will exhibit the bias present in the
| underlying training text and on top of that there will be a
| bias imposed by those wanting to correct the unwanted bias
| present in the underlying training text, possibly swinging the
| pendulum too far in the other side.
|
| I have the feeling you're asking something more specific,
| something more of a direct interference coming from politics
| and not just the natural "point of view" about various topics
| that are present in the chinese training corpora that is
| understandably different from western corpora.
|
| Do you have anything specific in mind about something that you
| expect the Chinese government to feed as propaganda that is not
| already widely being sculpted into the chinese text corpora
| available on the internet?
| yumraj wrote:
| > I have the feeling you're asking something more specific...
| > Do you have anything specific in mind about something that
| you expect the Chinese government to feed as propaganda that
| is not already widely being sculpted into the chinese text
| corpora available on the internet?
|
| I don't have anything specific, and it doesn't have to be
| different from _" chinese text corpora available on the
| internet"_, it's just that these models can become yet
| another channel of distribution for the _" chinese text
| corpora available on the internet"_ especially if they are
| unknowingly/naively picked up and used as the foundation by
| others to build their offerings.
| maxglute wrote:
| PRC will definitely weaponize this for mass foreign
| propaganda, which up until now PRC has been thoroughly
| deficient in, despite all the reees of 50cents on western
| net. The reality pre-LLM is PRC propaganda on western
| social media platforms has been very limited in scale for
| the simple reason that they are not wasting valuable
| English fluency to shit post on western platforms enmass.
| Low 100s-1000s of accounts, most of which target diasphora
| in spambot efforts, frequently in Chinese. Now that LLM has
| made it cheap to spam passable English/foreign languages,
| I'd expect increased volumes of PRC propaganda on western
| social media where anonymous posting is asymmetrically
| easier. But then again, they don't need a PRC LLM for that,
| plenthy of US bad posting on western platforms from
| international audiences and US herself.
| ithkuil wrote:
| > they are not wasting valuable English fluency to shit
| post on western platforms enmass
|
| How's the economy of that different for the Russian
| campaigns? Do they have a larger pool of English fluency
| to draw from or is the urgency of the operation higher in
| their case?
| maxglute wrote:
| A PRC person with English fluency good enough to blend in
| with native English on western platform has much better
| job opporunities. Even in PRC, 50c posts are largely
| civil servants told to write a few perfunctory
| platitutdes on domestic platforms. The MO is to overwhelm
| with spam not engage where effort:return is low. Like
| even Ministry of Foreign Affairs and most of thinktanks
| that also publish in English can rarely find people to
| write "casual" English. You'd have to write 1000s of
| "50c" comments for 1 hour of English tutoring gig. The
| economics of it doesn't make sense pre LLM.
| ithkuil wrote:
| Does it mean that in Russia of 10 years ago a person with
| the same english language skills would not be able to
| find a better job or does it mean that the troll farms
| pay more? Or is it just patriotism?
|
| (I genuinely would like to learn more about this topic)
| TaylorAlexander wrote:
| It's a fair question, but one we should be asking about all
| models, perhaps especially our own. It's of course easier to
| see the propaganda of foreign cultures, and this should be
| investigated, but let's not let ourselves believe that a model
| is more likely to contain propaganda because it is Chinese. It
| will just contain propaganda that is easier for us to see as
| propaganda.
|
| Noam Chomsky and Edward Herman wrote extensively about
| propaganda in democratic societies in their 1988 book
| Manufacturing Consent. A nice introductory excerpt is here, and
| the first two or three paragraphs are enough to begin to see
| the argument:
|
| https://chomsky.info/consent01/
|
| Put as briefly as possible: propaganda in totalitarian
| societies is simpler. They just use force to remove people who
| say the wrong things, and state media to broadcast the "right
| things". In democratic societies, institutional power still
| wants to protect itself, and this is achieved through more
| complex means, but it is nonetheless still rather effective.
| yumraj wrote:
| > It's a fair question, but one we should be asking about all
| models, perhaps especially our own. It's of course easier to
| see the propaganda of foreign cultures, and this should be
| investigated, but let's not let ourselves believe that a
| model is more likely to contain propaganda because it is
| Chinese. It will just contain propaganda that is easier for
| us to see as propaganda.
|
| Yes, but in this particular case I'm coming from a viewpoint
| where I view China as a hostile power. So, at the moment, my
| worry is about that.
|
| In future, if US slips into authoritarianism, which TBH it
| might depending on the outcome of the next election, what you
| note would become a very real problem.
|
| So, putting it differently and more neutrally, is there any
| research being done on evaluating a political, and other,
| bias in a model or is it just being all put in the bucket of
| _hallucination_?
| TaylorAlexander wrote:
| > if US slips into authoritarianism
|
| The point of Chomsky's work in this case is to show that
| authoritarianism does not make propaganda more or less
| likely, it just changes the means by which propaganda is
| created and reinforced. Chinese propaganda is easier to
| identify as a foreigner, but the propaganda of your home
| country has a much more significant effect on your life.
| The nature of living with pervasive propaganda is that it
| is hard to see or consider how your life would be different
| without the propaganda, and that's what makes it so
| dangerous.
|
| > is there any research being done on evaluating a
| political, and other, bias in a model or is it just being
| all put in the bucket of hallucination?
|
| It's a better question, and again one that we should ask
| regardless of the model's origins.
| seanmcdirmid wrote:
| Wouldn't have more to do with propaganda that reinforces
| and caters to your cognitive biases being less detectable
| than propaganda that doesn't? Even inside America, I'm
| pretty resistant to FoxNews propaganda but if CNN has
| any, it isn't registering much on my propaganda
| detectors.
___________________________________________________________________
(page generated 2024-03-10 23:00 UTC)