[HN Gopher] You Wouldn't Download a Hacker News
       ___________________________________________________________________
        
       You Wouldn't Download a Hacker News
        
       Author : jasonthorsness
       Score  : 353 points
       Date   : 2025-04-30 01:26 UTC (21 hours ago)
        
 (HTM) web link (www.jasonthorsness.com)
 (TXT) w3m dump (www.jasonthorsness.com)
        
       | ashish01 wrote:
       | I wrote one a while back https://github.com/ashish01/hn-data-
       | dumps and it was a lot of fun. One thing which will be cool to
       | implement is that more recent items will update more over time
       | making any recent downloaded items more stale than older ones.
        
         | jasonthorsness wrote:
         | Yeah I'm really happy HN offers an API like this instead of
         | locking things down like a bunch of other sites...
         | 
         | I used a function based on the age for staleness, it considers
         | things stale after a minute or two initially and immutable
         | after about two weeks old.                   // DefaultStaleIf
         | marks stale at 60 seconds after creation, then frequently for
         | the first few days after an item is         // created, then
         | quickly tapers after the first week to never again mark stale
         | items more than a few weeks old.              const
         | DefaultStaleIf = "(:now-refreshed)>" +
         | "(60.0*(log2(max(0.0,((:now-Time)/60.0))+1.0)+pow(((:now-
         | Time)/(24.0*60.0*60.0)),3)))"
         | 
         | https://github.com/jasonthorsness/unlurker/blob/main/hn/core...
        
       | jakegmaths wrote:
       | Your query for Java will include all instances of JavaScript as
       | well, so you're over representing Java.
        
         | jasonthorsness wrote:
         | Ah right... maybe even more unexpected then to see a decline
        
           | cs02rm0 wrote:
           | I'm not so sure, while Java's never looked better to me, it
           | does "feel" to me to be in significant decline in terms of
           | what people are asking for on LinkedIn.
           | 
           | I'd imagine these days typescript or node might be taking
           | over some of what would have hit on javascript.
        
             | cess11 wrote:
             | Recruiting Java developers is easy mode, there are rather
             | large consultancies and similar suppliers that will sell or
             | rent them to you in bulk so you don't need to nag with
             | adverts to the same extent as with pythonistas and rubyists
             | and TypeScript.
             | 
             | But there is likely some decline for Java. I'd bet Elixir
             | and Erlang have been nibbling away on the JVM space for
             | quite some time, they make it pretty comfortable to build
             | the kind of systems you'd otherwise use a JVM-JMS-
             | Wildfly/JBoss rig for. Oracle doesn't help, they take zero
             | issue with being widely perceived as nasty and it takes a
             | bit of courage and knowledge to manage to avoid getting a
             | call from them at your inconvenience.
        
               | patates wrote:
               | Speaking as someone who ended up in the corporate Java
               | world somewhat accidentally (wasn't deep in the ecosystem
               | before): even the most invested Java shops seem wary of
               | Oracle's influence now. Questioning Oracle tech, if not
               | outright planning an exit strategy, feels like the
               | default stance.
        
               | cess11 wrote:
               | Most such places probably have some trauma related to
               | Oracle now. Someone spun up the wrong JVM by accident and
               | within hours salespeople were on the phone with some
               | middle manager about how they would like to pay for it,
               | that kind of thing. Or just the issue of injecting their
               | surveillance trojans everywhere and knowing they're
               | there, that's pretty off-putting in itself.
               | 
               | Which is a pity, once you learn to submit to and tolerate
               | Maven it's generally a very productive and for the most
               | part convenient language and 'ecosystem'. It's like
               | Debian, even if you fuck up badly there is likely a
               | documented way to fix it. And there are good libraries
               | for pretty much anything one could want to do.
        
             | karel-3d wrote:
             | New Java looks actually good, but most of the Java actual
             | ecosystem is stuck in the past.... and you will mostly work
             | within the existing ecosystem
        
           | smcin wrote:
           | a) Does your query for 'JS' return instances of 'JSON'?
           | 
           | b) The ultimate hard search topic for is 'R' / 'R language'.
           | Check if you think you index it corectly. Or related terms
           | like RStudio, Posit, [R]Shiny, tidyverse, data.table,
           | Hadleyverse...
        
         | smarnach wrote:
         | Similarly, the Rust query will include "trust", "antitrust",
         | "frustration" and a bunch of other words
        
           | matsemann wrote:
           | Reminded me about Scunthorpe problem
           | https://en.wikipedia.org/wiki/Scunthorpe_problem
        
           | sph wrote:
           | A guerilla marketing plan for a new language is to call it a
           | common one word syllable, so that it appears much more
           | prominent than it really is on badly-done popularity
           | contests.
           | 
           | Call it "Go", for example.
           | 
           | (Necessary disclaimer for the irony-impaired: this is a joke
           | and an attempt at being witty.)
        
             | InDubioProRubio wrote:
             | You also wouldn't acronym hijack overload to boost mental
             | presence in gamers LOL
        
             | setopt wrote:
             | Let's make a language called "A" in that case. (I mean C
             | was fine, so why not one letter?)
        
             | TZubiri wrote:
             | Or call it the name of a popular song to appeal to the
             | youngins.
             | 
             | I present to you "Gangam C"
        
       | flakiness wrote:
       | I have done something similar. I cheated to use BigQuery dataset
       | (which somehow keeps getting updated) and export the data to
       | parquet, download it and query it using duckdb.
        
         | minimaxir wrote:
         | That's not cheating, that's just pragmatic.
        
           | AbstractH24 wrote:
           | What a pragmatic way to rationalize most cheating
        
       | matsemann wrote:
       | One thing I'm curious about, but I guess not visible in any way,
       | is random stats about my own user/usage of the site. What's my
       | upvote/downvote ratio? Are there users I constantly
       | upvote/downvote? Who is liking/hating my comments the most? And
       | some I guessed could be scrapable: Which days/times are I the
       | most active (like the github green grid thingy)? How's my
       | activity changed over the years?
        
         | minimaxir wrote:
         | The only vote data that is visible via any HN API is the scores
         | on submissions.
         | 
         | Day/Hour activity maps for a given user are relatively trivial
         | to do in a single query, but only public submission/comment
         | data could be used to infer it.
        
           | ryandrake wrote:
           | Too bad! I've always sort of wanted to be able to query
           | things like what were my most upvoted and downvoted comments,
           | how often are my comments flagged, and so on.
        
             | saagarjha wrote:
             | I did this once by scraping the site (very slowly, to be
             | nice). It's not that hard since the HTML is pretty
             | consistent.
        
         | nottorp wrote:
         | > Are there users I constantly upvote/downvote?
         | 
         | Hmm. Personally I never look at user names when I comment on
         | something. It's too easy to go from "i agree/disagree with this
         | piece of info" to "i like/dislike this guy"...
        
           | matsemann wrote:
           | Same, which is why it would be cool to see. Perhaps there are
           | people I both upvote and downvote?
        
           | thaumasiotes wrote:
           | > It's too easy to go from "i agree/disagree with this piece
           | of info" to "i like/dislike this guy"...
           | 
           | ...is that supposed to pose some kind of problem? The problem
           | would be in the other direction, surely?
        
             | nottorp wrote:
             | Either you got the direction wrong or you'd support someone
             | who is wrong just because you like them.
             | 
             | You're wrong in both cases :)
        
               | thaumasiotes wrote:
               | Maybe try rereading my comment?
        
               | nottorp wrote:
               | You're right. But I still disagree with you. Both ways
               | are wrong if you want to maintain a constructive
               | discussion.
               | 
               | Maybe you don't like my opinions on cogwheel shaving but
               | you will agree with me on quantum frobnicators. But if
               | you first come across about my comments on cogwheel
               | shaving and note the user name, you may not even read the
               | comments on quantum frobnicators later.
        
           | vidarh wrote:
           | The exception, to me, is if I'm questioning whether the
           | comment was in good faith or not, where the trackrecord of
           | the user on a given topic could go some way to untangle that.
           | It happens rarely here, compared to e.g. Reddit, but
           | sometimes it's mildly useful.
        
           | pjc50 wrote:
           | I recognize twenty or so of the most frequent and/or annoying
           | posters.
           | 
           | The leaderboard https://news.ycombinator.com/leaders
           | absolutely doesn't correlate with posting frequency. Which is
           | probably a good thing. You can't bang out good posts non-stop
           | on every subject.
        
         | 9rx wrote:
         | _> What 's my upvote/downvote ratio?_
         | 
         | Undefined, presumably. For what reason would there be to take
         | time out of your day to press a pointless button?
         | 
         | It doesn't communicate anything other than that you pressed a
         | button. For someone participating in good faith, that doesn't
         | add any value. But those not participating in good faith, i.e.
         | trolls, it adds incredible value knowing that their trolling is
         | being seen. So it is actually a net negative to the community
         | if you did somehow accidentally press one of those buttons.
         | 
         | For those who seek fidget toys, there are better devices for
         | that.
        
           | saagarjha wrote:
           | If Hacker News had reactions I'd put an eye roll here.
        
             | 9rx wrote:
             | You could have assigned 'eye roll' to one of the arrow
             | buttons! Nobody else would have been able to infer your
             | intent, but if you are pressing the arrow buttons it is not
             | like you want anyone else to understand your intent anyway.
        
           | immibis wrote:
           | Actually, its most useful purpose is to hide opinions you
           | disagree with - if 3 other people agree with you.
           | 
           | Like when someone says GUIs are better than CLIs, or C++ is
           | better than Rust, or you don't need microservices, you can
           | just hide that inconvenient truth from the masses.
        
             | 9rx wrote:
             | So, what you are saying is that if the masses agree that
             | some opinion is disagreeable, they will hide it from
             | themselves? But they already read it to know it was
             | disagreeable, so... What are they hiding it for, exactly?
             | So that they don't have to read it again when they revisit
             | the same comments 10 years later? Does anyone actually go
             | back and reread the comments from 10 years ago?
        
               | jpc0 wrote:
               | It's not so much rereading the comments but more a matter
               | of it being indication to other users.
               | 
               | The C++ example for instance above, you are likely to be
               | downvoted for supporting C++ over rust and therefore most
               | people reading through the comments (and LLMs correlating
               | comment "karma" to how liked a comment is) will generally
               | associate Rust > C++, which isn't a nuanced opinion at
               | all and IMHO is just plain wrong a decent amount if
               | times. They are tools and have their uses.
               | 
               | So generally it shows the sentiment of the group and
               | humans and conditioned to follow the group.
        
               | 9rx wrote:
               | An indication of what? It is impossible to know why a
               | user pressed an arrow button. Any meaning the user may
               | have wanted to convey remains their own private
               | information.
               | 
               | All it can fundamentally serve is to act as an
               | impoverished man's read receipt. And why would you want
               | to give trolls that information? Fishing to find out if
               | anyone is reading what they're posting is their whole
               | game. Do not feed the trolls, as they say.
        
             | matsemann wrote:
             | Since there are no rules on down voting, people probably
             | use it for different things. Some to show dissent, some to
             | down vote things they think don't belong only, etc. Which
             | is why it would be interesting to see. Am I overusing it
             | compared to the community? Underusing it?
        
         | pjc50 wrote:
         | I don't think you can get the individual vote interactions, and
         | that's probably a good thing. It is irritating that the "API"
         | won't let me get vote counts; I should go back to my Python
         | scraper of the comments page, since that's the only way to get
         | data on post scores.
         | 
         | I've probably written over 50k words on here and was wondering
         | if I could restructure my best comments into a long meta-
         | commentary on what does well here and what I've learned about
         | what the audience likes and dislikes.
         | 
         | (HN does not like jokes, but you can get away with it if you
         | also include an explanation)
        
         | xnx wrote:
         | Some of this data is available through the API (and Clickhouse
         | and BigQuery).
         | 
         | I wrote a Puppeteer script to export my own data that isn't
         | public (upvotes, downvotes, etc.)
        
       | pier25 wrote:
       | would love to see the graph of React, Vue, Angular, and Svelte
        
       | andrewshadura wrote:
       | Funny nobody's mentioned "correct horse battery staple" in the
       | comments yet...
        
       | hsbauauvhabzb wrote:
       | Is the raw dataset available anywhere? I really don't like the HN
       | search function, and grepping through the data would be handy.
        
         | Havoc wrote:
         | It's on firebase/bigquery to avoid people doing what OP did
         | 
         | If you click the api link bottom of page it'll explain.
        
           | jasonthorsness wrote:
           | I used the API! It only takes a few hours to download your
           | own copy with the tool I used
           | https://github.com/jasonthorsness/unlurker
           | 
           | I had to CTRL-C and resume a few times when it stalled; it
           | might be a bug in my tool
        
             | xnx wrote:
             | Is there any advantage to making all these requests instead
             | of using Clickhouse o BigQuery?
        
               | jasonthorsness wrote:
               | Probably not :P. I made the client for another project,
               | https://hn.unlurker.com, and then just jumped straight to
               | using it to download the whole thing instead of searching
               | for an already available full data set.
        
             | Havoc wrote:
             | My mistake - apologies. Had misunderstood what you did
        
       | 9rx wrote:
       | _> The Rise Of Rust_
       | 
       | Shouldn't that be The Fall Of Rust? According to this, it saw the
       | most attention during the years before it was created!
        
         | emilbratt wrote:
         | The chart is a stacked one, so we are looking at the height
         | each category takes up and not the height each category reach.
        
       | stefs wrote:
       | please do not use stacked charts! i think it's close to
       | impossible to not to distort the readers impression because a)
       | it's very hard to gauge the height of a certain data point in the
       | noise and b) they're implying a dependency where there _probably_
       | is none.
        
         | dguest wrote:
         | How do you feel about stacked plots on a logarithmic y axis?
         | Some physics experiments do this all the time [1] but I find
         | them pretty uninitiative.
         | 
         | [1]:
         | https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PUBNOTES/ATL-...
        
           | lblume wrote:
           | What is this even supposed to represent? The entire
           | justification I could give for stacked bars is that you could
           | permute the sub-bars and obtain comparable results. Do the
           | bars still represent additive terms? Multiplicative
           | constants? As a non-physicist I would have no idea on how to
           | interpret this.
        
             | dguest wrote:
             | It's a histogram. Each color is a different simulated
             | physical process: they can all happen in particle
             | collisions, so the sum of all of them should add up to the
             | data the experiment takes. The data isn't shown here
             | because it hasn't been taken yet: this is an extrapolation
             | to a future dataset. And the dotted lines are some
             | hypothetical signal.
             | 
             | The area occupied by each color is basically meaningless,
             | though, because of the logarithmic y-scale. It always looks
             | like there's way more of whatever you put on the bottom.
             | And obviously you can grow it without bound: if you move
             | the lower y-limit to 1e-20 you'll have the whole plot
             | dominated by whatever is on the bottom.
             | 
             | For the record I think it's a terrible convention, it just
             | somehow became standard in some fields.
        
         | seabass wrote:
         | My first thought as well! The author of uPlot has a good demo
         | illustrating their pitfalls
         | https://leeoniya.github.io/uPlot/demos/stacked-series.html
        
         | jasonthorsness wrote:
         | It's true :( but line charts of the data had too much overlap
         | and were hard to see anything. I was thinking next time maybe
         | multiple line charts aligned and stacked, with one series per
         | region?
        
       | tacker2000 wrote:
       | Yea, i also get the feeling that these rust evangelists get more
       | annoying every day ;p
        
       | userbinator wrote:
       | _I had a 20 GiB JSON file of everything that has ever happened on
       | Hacker News_
       | 
       | I'm actually surprised at that volume, given this is a text-only
       | site. Humans have managed to post _over 20 billion bytes_ of text
       | to it over the 18 years that HN existed? That averages to over
       | 2MB per day, or around 7.5KB /s.
        
         | sph wrote:
         | 2 MB per day doesn't sound like a lot. The amount of posts
         | probably has increased exponentially over the years, especially
         | after the Reddit fiasco when we had our latest, and biggest
         | neverending September.
         | 
         | Also, I bet a decent amount of that is not from humans. /newest
         | is full of bot spam.
        
           | samplatt wrote:
           | Plus the JSON structure metadata, which for the average
           | comment is going to add, what, 10%?
        
             | kevincox wrote:
             | I suspect it is closer to 100% increase for the average
             | comment. If the average comment is a few senteces and the
             | metadata has id, parent id, author, timestamp and a vote
             | count that can add up pretty fast.
        
           | FabHK wrote:
           | Around one book every 12 hours.
        
         | xnx wrote:
         | 20 GB JSON is surprising to me. I have an sqlite file of all HN
         | data that is 20 GB, it would be much larger as JSON.
        
           | wolfgang42 wrote:
           | 20 GB of JSON is correct; here's the entire dump straight
           | from the API up to last Monday:                 $ du -c
           | ~/feepsearch-prod/datasource/hacker-news/data/dump/*.jsonl |
           | tail -n1       19428360        total
           | 
           | Not sure how your sqlite file is structured but my intuition
           | is that the sizes being roughly the same sounds plausible:
           | JSON has a lot of overhead from redundant structure and
           | ASCII-formatted values; but sqlite has indexes, btrees,
           | ptrmaps, overflow pages, freelists, and so on.
        
         | olalonde wrote:
         | 7.5KB/s (aka 7500 characters per second) didn't sound
         | realistic... So I did the math[0] and it turns out it's closer
         | to 34 bytes/s (0.03 KB/s). And it's really lower than that
         | because of all the metadata and syntax in the JSON. You were
         | right about the "over 2MB per day" though.
         | 
         | [0] Well, ChatGPT did the math but it seems to check out:
         | https://chatgpt.com/share/68124afc-c914-800b-8647-74e7dc4f21...
        
       | montebicyclelo wrote:
       | There's also two DBs I know of that have an updated Hacker News
       | table for running analytics on without needing to download it
       | first.
       | 
       | - BigQuery, (requires Google Cloud account, querying will be free
       | tier I'd guess) -- `bigquery-public-data.hacker_news.full`
       | 
       | - ClickHouse, no signup needed, can run queries in browser
       | directly, [1]
       | 
       | [1]
       | https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...
        
         | xnx wrote:
         | The ClickHouse resource is amazing. It even has history! I had
         | already done my own exercise of downloading all the JSON before
         | discovering the Clickhouse HN DBs.
        
         | kordlessagain wrote:
         | It even finds your comment 'clickhouse':
         | https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...
        
           | ZeWaka wrote:
           | and now yours :)
        
       | bambax wrote:
       | > _Now that I have a local download of all Hacker News content, I
       | can train hundreds of LLM-based bots on it and run them as
       | contributors, slowly and inevitably replacing all human text with
       | the output of a chinese room oscillator perpetually echoing and
       | recycling the past._
       | 
       | The author said this in jest, but I fear someone, someday, will
       | try this; I hope it never happens but if it does, could we stop
       | it?
        
         | ahoka wrote:
         | Probably already happening.
        
         | nashashmi wrote:
         | We LLMs only output the average response of humanity because we
         | can only give results that are confirmed by multiple sources.
         | On the contrary, many of HN's comments are quite unique
         | insights that run contrary to the average popular thought. If
         | this is ever to be emulated by an LLM, we would give only
         | gibberish answers. If we had a filter to that gibberish to only
         | permit answers that are reasonable and sensible, our answers
         | would be boring and still be gibberish. In order for our
         | answers to be precise, accurate and unique, we must use
         | something other than LLMs.
        
         | miki123211 wrote:
         | How do you know it isn't already happening?
         | 
         | With long and substantive comments, sure, you can usually tell,
         | though much less so now than a year or two ago. With short, 1
         | to 2 sentence comments though? I think LLMs are good enough to
         | pass as humans by now.
        
           | Joker_vD wrote:
           | But what if LLMs will start leaving constructive and helpful
           | comments? I personally would feel like xkcd [0], but others
           | may disagree.
           | 
           | [0] https://xkcd.com/810/
        
             | gosub100 wrote:
             | That's the moment we will realize that it's not the spam
             | that bothers us, but rather that there is no human
             | interaction. How vapid would it be to have a bunch of fake
             | comments saying eat more vegetables, good job for not
             | running over that animal in the road, call mom tonight it's
             | been a while, etc. They mean nothing if they were generated
             | by a piece of silicon.
        
               | withinboredom wrote:
               | I believe they mean whatever you mean it to mean.
               | Humanity has existed on religion based on what some dead
               | people wrote down, just fine. Er, well, maybe not "just
               | fine" but hopefully you get the gist: you can attribute
               | whatever meaning you want to the AI, holy text, or other
               | people.
        
               | gosub100 wrote:
               | Religion is the opposite of AI text generation. It brings
               | people together to be _less_ lonely.
               | 
               | AI actively tears us apart. We no longer know if we're
               | talking to a human, or if an artists work came from their
               | ability, or if we will continue to have a job to pay for
               | our living necessities.
        
               | miki123211 wrote:
               | I think a much more important question is what happens
               | when we have no idea who's an LLM and who's a real
               | person.
               | 
               | Do we accuse everybody of being an LLM? Will most threads
               | devolve into "you're an LLM, no you're the LLM" wars?
               | Will this give an edge to non-native English speakers,
               | because grammatical errors are an obvious tell that
               | somebody is human? Will LM makers get over their
               | squeamishness and make "write like a Mexican who barely
               | speaks English" a prompt that works and produces good
               | results?
               | 
               | Maybe the whole system of anonymity on the internet gets
               | dismantled (perhaps after uncovering a few successful
               | llm-powered psy-ops or under the guise of child safety
               | laws), and everybody just needs to verify their identity
               | everywhere (or login with Google)? Maybe browser makers
               | introduce an API to do this as anonymously and
               | frictionlessly as possible, and it becomes the new normal
               | without much fuss? Is turnstile ever going to get good
               | enough to make this whole issue moot?
               | 
               | I think we have a very interesting few years in front of
               | us.
        
               | datameta wrote:
               | Also neuronormative individuals sometimes mistake
               | neurodivergent usage of language for LLM-speak which
               | might have similar pattern matching schema reinforced
        
             | melagonster wrote:
             | This just another reddit or HN.
        
             | Pikamander2 wrote:
             | I was browsing a Reddit thread recently and noticed that
             | all of the human comments were off-topic one-liners and
             | political quips, as is tradition.
             | 
             | Buried at the bottom of the thread was a helpful reply by
             | an obvious LLM account that answered the original question
             | far better than any of the other comments.
             | 
             | I'm still not sure if that's amazing or terrifying.
        
         | no_time wrote:
         | I can't think of an solution that preserves the open and
         | anonymous nature that we enjoy now. I think most open internet
         | forums will go one of the following routes:
         | 
         | - ID/proof of human verification. Scan your ID, give me your
         | phone number, rotate your head around while holding up a piece
         | of paper etc. note that some sites already do this by proxy
         | when they whitelist like 5 big email providers they accept for
         | a new account.
         | 
         | - Going invite only. Self explanatory and works quite well to
         | prevent spam, but limits growth. lobste.rs and private trackers
         | come to mind as an example.
         | 
         | - Playing a whack-a-mole with spammers (and losing eventually).
         | 4chan does this by requiring you to solve a captcha and
         | requires you to pass the cloudflare turnstile that may or may
         | not do some browser fingerprinting/bot detection. CF is
         | probably pretty good at deanonimizing you through this process
         | too.
         | 
         | All options sound pretty grim to me. Im not looking forward to
         | the AI spam era of the internet.
        
           | dns_snek wrote:
           | There must be a technical solution to this based on some
           | cryptographic black magic that both verifies you to be a
           | unique person to a given website without divulging your
           | identity, and without creating a globally unique identifier
           | that would make it easy to track us across the web.
           | 
           | Of course this goes against the interests of tracking/spying
           | industry and increasingly authoritarian governments, so it's
           | unlikely to ever happen.
        
             | 05 wrote:
             | Oh you mean something like Apple's Private Access Tokens?
             | 
             | https://support.apple.com/en-us/102591
             | 
             | https://blog.cloudflare.com/eliminating-captchas-on-
             | iphones-...
        
               | dns_snek wrote:
               | I don't think that's what I was going for? As far as I
               | can see it relies on a locked down software stack to
               | "prove" that the user is running blessed software on top
               | of blessed hardware. That's one way of dealing with bots
               | but I'm looking for a solution that doesn't lock us out
               | of our own devices.
        
             | vvillena wrote:
             | These kinds of solutions are already deployed in some
             | places. A trusted ID server creates a bunch of anonymous
             | keys for a person, the person uses these keys to identify
             | in pages that accept the ID server keys. The page has no
             | way to identify a person from a key.
             | 
             | The weak link is in the ID servers themselves. What happens
             | if the servers go down, or if they refuse to issue keys?
             | Think a government ID server refusing to issue keys for a
             | specific person. Pages that only accept keys from these
             | government ID servers, or that are forced to only accept
             | those keys, would be inaccessible to these people. The
             | right to ID would have to be enshrined into law.
        
             | no_time wrote:
             | As I see it, a technical solution to AI spam inherently
             | must include a way to uniquely identify particular machines
             | at best, and particular humans responsible for said
             | machines at worst.
             | 
             | This verification mechanism must include some sort of UUID
             | to reign in a single bad actor who happens to validate
             | his/her bot farm of 10000 accounts from the same
             | certificate.
        
           | icoder wrote:
           | I'm sometimes thinking about account verification that
           | requires work/effort over time, could be something fun even,
           | so that it becomes a lot harder to verify a whole army of
           | them. We don't need identification per se, just being human
           | and (somewhat) unique.
           | 
           | See also my other comment on the same parent wrt network of
           | trust. That could perhaps vet out spammers and trolls. On one
           | and it seems far fetched and a quite underdeveloped idea, on
           | the other hand, social interaction (including discussions
           | like these) as we know it is in serious danger.
        
           | theasisa wrote:
           | Wouldn't those only mean that the account was initially
           | created by a human but afterwards there are no guarantees
           | that the posts are by humans.
           | 
           | You'd need to have a permanent captcha that tracks that the
           | actions you perform are human-like, such as mouse movement or
           | scrolling on phone etc. And even then it would only deter
           | current AI bots but not for long as impersonation human
           | behavior would be a 'fun' challenge to break.
           | 
           | Trusted relationships are only as trustworthy as the humans
           | trusting each other, eventually someone would break that
           | trust and afterwards it would be bots trusting bots.
           | 
           | Due to bots already filling up social media with their spew
           | and that being used for training other bots the only way I
           | see this resolving itself is by eventually everything
           | becoming nonsensical and I predict we aren't that far from it
           | happening. AI will eat itself.
        
             | no_time wrote:
             | >Wouldn't those only mean that the account was initially
             | created by a human but afterwards there are no guarantees
             | that the posts are by humans.
             | 
             | Correct. But for curbing AI slop comments this is enough
             | imo. As of writing this, you can quite easily spot LLM
             | generated comments and ban them. If you have a verification
             | system in place then you banned the human too, meaning you
             | put a stop to their spamming.
        
         | _Algernon_ wrote:
         | This is probably already happening to some extent. I think the
         | best we can hope for is xkcd 810: https://xkcd.com/810/
        
         | holuponemoment wrote:
         | Does it even matter?
         | 
         | Perhaps I am jaded but most if not all people regurgitate about
         | topics without thought or reason along very predictable paths,
         | myself very much included. You can mention a single word
         | covered with a muleta (Spanish bullfighting flag) and the
         | average person will happily run at it and give you a
         | predictable response.
        
           | bob1029 wrote:
           | It's like a Pavlovian response in me to respond to anything
           | SQL or C# adjacent.
           | 
           | I see the _exact_ same in others. There are some HN usernames
           | that I have memorized because they show up deterministically
           | in these threads. Some are so determined it seems like a
           | dedicated PR team, but I know better...
        
             | OccamsMirror wrote:
             | I always love checking the comments on articles about Bevy
             | to see how the metaverse client guy is going.
        
           | gosub100 wrote:
           | The paths are going to be predictable by necessity. It's not
           | possible for everyone to have a uniquely derived
           | interpretation about most common issues, whether that's
           | standard lightning rod politics but also extending somewhat
           | into tech socio/political issues.
        
         | icoder wrote:
         | I'm more and more convinced of an old idea that seems to become
         | more relevant over time: to somehow form a network of trust
         | between humans so that I know that your account is trusted by a
         | person (you) that is trusted by a person (I don't know) [...]
         | that is trusted by a person (that I do know) that is trusted by
         | me.
         | 
         | Lots of issues there to solve, privacy being one (the links
         | don't have to be known to the users, but in a naive approach
         | they _are_ there on the server).
         | 
         | Paths of distrust could be added as negative weight, so I can
         | distrust people directly or indirectly (based on the accounts
         | that they trust) and that lowers the trust value of the
         | chain(s) that link me to them.
         | 
         | Because it's a network, it can adjust itself to people trying
         | to game the system, but it remains a question to how robust it
         | will be.
        
           | XorNot wrote:
           | I think technically this is the idea that GPG's web of trust
           | was circling without quite staring at, which is the oddest
           | thing about the protocol: it's used mostly today for machine
           | authentication, which it's quite good at (i.e. deb
           | repos)...but the tooling actually generally is oriented
           | around verifying and trusting _people_.
        
             | wobfan wrote:
             | Yeah exactly, this was exactly the idea behind that.
             | Unfortunately, while on paper it just sounds like a sound
             | idea, at least IMO, though ineffective, it has proven time
             | and time again that the WOT idea in PGP has no chance
             | against the laziness of humans.
        
           | littlestymaar wrote:
           | Ultimately, guaranteeing common trust between citizens is a
           | fundamental role of the State.
           | 
           | For a mix of ideological reasons and lack of genuine interest
           | for the internet from the legislators, mainly due to the
           | generational factor I'd guess, it hasn't happened yet, but I
           | expect government issued equivalent of IDs and passports for
           | the internet to become mainstream sooner than later.
        
             | eadmund wrote:
             | > Ultimately, guaranteeing common trust between citizens is
             | a fundamental role of the State.
             | 
             | I don't think that really follows. Businesses credit
             | bureaus and Dun & Bradstreet have been privately enabling
             | trust between non-familiar parties for quite a long time.
             | Various networks of merchants did the same in the Middle
             | Ages.
        
               | littlestymaar wrote:
               | > Businesses credit bureaus and Dun & Bradstreet have
               | been privately enabling trust between non-familiar
               | parties for quite a long time.
               | 
               | Under the supervision of the State (they are regulated
               | and rely on the justice and police system to make things
               | work).
               | 
               | > Various networks of merchants did the same in the
               | Middle Ages.
               | 
               | They did, and because there was no State the amount of
               | trust they could built was fairly limited compared to was
               | has later been made possible by the development of modern
               | states (the industrial revolution appearing in the UK has
               | partly been attributed to the institutional framework
               | that existed there early).
               | 
               | Private actors can, and do, and have always done, build
               | their own makeshift trust network, but building a
               | society-wide trust network is a key pillar of what makes
               | modern states "States" (and it directly derives from the
               | "monopoly of violence").
        
               | lormayna wrote:
               | Havala (https://it.m.wikipedia.org/wiki/Hawala) or other
               | similar way to transfer money abroad are working over a
               | net of trust, but without any state trust system.
        
               | littlestymaar wrote:
               | Compare its use to SWIFT and you'll see the difference.
        
             | icoder wrote:
             | Interestingly, as I've begun to realise the ease by which a
             | State's trust can sway has actually increased my believe
             | that this should come from 'below'. I think a trust network
             | between people (of different countries) can be much more
             | resilient.
        
             | nostrademons wrote:
             | That's not really what research on state formation has
             | found. The basic definition of a state is "a centralized
             | government with a monopoly on the legitimate use of force",
             | and as you might expect from the definition, groups
             | generally attain statehood by monopolizing the use of
             | force. In other words, they are the bandits that become big
             | enough that nobody dares oppose them. They attain statehood
             | through what's effectively a peace treaty, when all
             | possible opposition basically says "okay, we're submit to
             | your jurisdiction, please stop killing us". Very often, it
             | actually is a literal peace treaty.
             | 
             | States will often co-opt _existing_ trust networks as a way
             | to enhance and maintain their legitimacy, as with
             | Constantine's adoption of Christianity to preserve social
             | cohesion in the Roman Empire, or all the compromises that
             | led the 13 original colonies to ratify the U.S.
             | constitution in the wake of the American Revolution. But
             | violence comes first, _then_ statehood, then trust.
             | 
             | Attempts to legislate trust don't really work. Trust is an
             | emotion, it operates person-to-person, and saying "oh, you
             | need to trust such-and-such" don't really work unless you
             | are trusted yourself.
        
               | littlestymaar wrote:
               | > The basic definition of a state is "a centralized
               | government with a monopoly on the legitimate use of force
               | 
               | I'm not saying otherwise (I've even referred to this in a
               | later comment).
               | 
               | > But violence comes first, then statehood, then trust.
               | 
               | Nobody said anything about the historical process so
               | you're not contradicting anyone.
               | 
               | > Attempts to legislate trust don't really work
               | 
               | Quite the opposite, it works very, very well. Civil laws
               | and jurisdiction on _contracts_ have existed since the
               | Roman Republic, and every society has some equivalent
               | (you should read about how the Taliban could get back to
               | power so quickly in big part because they kept doing
               | civil justice in the rural afghan society even while the
               | country was occupied by the US coalition).
               | 
               | You must have institutions to be sure than the other
               | party is going to respect the contract, so that you don't
               | have to trust them, you just need to trust that the state
               | is going to _enforce_ that contract (what they can do
               | because they have the monopoly of violence and can just
               | force the party violating the contract into submission).
               | 
               | With the monopoly of violence comes the responsibility to
               | use your violence to enforce contracts, otherwise social
               | structures are going to collapse (and someone else is
               | going to take that job from you, and gone is your
               | monopoly of violence)
        
           | drcongo wrote:
           | I actually built this once, a long time ago for a very
           | bizarre social network project. I visualised it as a mesh
           | where individuals were the points where the threads met, and
           | as someone's trust level rose, it would pull up the trust
           | levels of those directly connected, and to a lesser degree
           | those connected to them - picture a trawler fishing net and
           | lifting one of the points where the threads meet. Similarly,
           | a user whose trust lowered over time would pull their
           | connections down with them. Sadly I never got to see it at
           | the scale it needed to become useful as the project's funding
           | went sideways.
        
             | icoder wrote:
             | Yeah building something like this is not a weekend project,
             | getting enough traction for it to make sense is another
             | orders of magnitude beyond that.
             | 
             | I like the idea of one's trust to leverage that of those
             | around them. This may make it more feasible to ask some
             | 'effort' for the trust gain (as a means to discourage
             | duplicate 'personas' for a single human), as that can
             | ripple outward.
        
             | all2 wrote:
             | How would 'trust' manifest? A karma system?
             | 
             | How are individuals in the network linked? Just comments on
             | comments? Or something different?
        
           | Philpax wrote:
           | https://en.wikipedia.org/wiki/Key_signing_party
        
             | genewitch wrote:
             | Matrix protocol or at least the clients agree that several
             | emoji is a key - which is fine - and you verify by looking
             | at the keys (on each client) at the same time in person,
             | ideally. I've only ever signed for people in person, and
             | one remote attestation; but we had a separate _verified_
             | private channel and attested the emoji that way.
        
             | nickdothutton wrote:
             | Do these still happen? They were common (-ish, at least in
             | my circles) in the 90s during the crypto wars, often at the
             | end of conferences and events, but I haven't come across
             | them in recent years.
        
           | im3w1l wrote:
           | GPG lost, TLS won. Both are actually webs of trust with the
           | same underlying technology. But they have different cultures
           | and so different shapes. GPG culture is to trust your friends
           | and have them trust their friends. With TLS culture you trust
           | one entity (e.g. browser) that trusts a couple dozen entities
           | that (root certificate authorities), that either signs keys
           | directly or can fan out to intermediate authorities that then
           | sign keys. The hierarchical structure has proven much more
           | successful than the decentralized one.
           | 
           | Frankly I don't trust my friends of friends of friends not to
           | add thirst trap bots.
        
             | lxgr wrote:
             | The difference is in both culture and topology.
             | 
             | TLS (or more accurately, the set of browser-trusted X.509
             | root CAs) is extremely hierarchical and all-or-nothing.
             | 
             | The PGP web of trust is non-hierarchical and decentralized
             | (from an organizational point of view). That unfortunately
             | makes it both more complex and less predictable, which I
             | suppose is why it "lost" (not that it's actually gone, but
             | I personally have about one or maybe two trusted, non-
             | expired keys left in my keyring).
        
             | kevin_thibedeau wrote:
             | The issue is key management. TLS doesn't usually require
             | client keys. GPG requires all receivers to have a key.
        
             | amenghra wrote:
             | Couple dozen => it's actually 50-ish, with a mix of private
             | and government entities located all over the world.
             | 
             | The fact that the Spanish mint can mint (pun!) certificates
             | for any domain is unfortunate.
             | 
             | Hopefully, any abuse would be noticed quickly and rights
             | revoked.
             | 
             | It would maybe have made more sense for each country's TLD
             | to have one or more associated CA (with the ability to
             | delegate trust among friendly countries if desired).
             | 
             | https://wiki.mozilla.org/CA/Included_Certificates
        
           | SuperShibe wrote:
           | I think this ideas problem might be the people part,
           | specifically the majority type of people that will click
           | absolutely anything for a free iPad
        
             | icoder wrote:
             | Theoretically that should swiftly be reflected in their
             | trust level. But maybe I'm too optimistic.
             | 
             | I have nothing intrinsically against people that 'will
             | click absolutely anything for a free iPad' but I wouldn't
             | mind removing them from my online interactions if that also
             | removes bots, trolls, spamners and propaganda.
        
           | haswell wrote:
           | I've also been thinking about this quite a bit lately.
           | 
           | I also want something like this for a lightweight social
           | media experience. I've been off of the big platforms for
           | years now, but really want a way to share life updates and
           | photos with a group of trusted friends and family.
           | 
           | The more hostile the platforms become, the more viable I
           | think something like this will become, because more and more
           | people are frustrated and willing to put in some work to
           | regain some control of their online experience.
        
             | jeremyjh wrote:
             | The key is to completely disconnect all ad revenue. I'm
             | skeptical people are willing to put in some money to regain
             | control; not in the kind of percentages that means I can
             | move most of my social graph. Network effects are a real
             | issue.
        
           | marcusb wrote:
           | Isn't this vaguely how the invite system at Lobsters
           | functions? There's a public invite tree, and users risk their
           | reputation (and posting access) when they invite new users.
        
             | withinboredom wrote:
             | I know exactly zero people over there. I am also not about
             | to go brown nose my way into it via IRC (or whatever chat
             | they are using these days). I'd love to join, someday.
        
             | somethingsome wrote:
             | Hey I never actually tried lobsters, do you mind if I ask
             | an invite?
        
           | brongondwana wrote:
           | Also there's the problem that every human has to have perfect
           | opsec or you get the problem we have now, where there are
           | massive botnets out there of compromised home computers.
        
         | drcongo wrote:
         | The internet is going to become like William Basinski's
         | Disintegration Loops, regurgitating itself with worse fidelity
         | until it's all just unintelligible noise.
        
         | genewitch wrote:
         | I have all of n-gate as json with the cross references cross
         | referenced.
         | 
         | Just in case I need to check for plagiarism.
         | 
         | I don't have enough Vram nor enough time to do anything useful
         | on my personal computer. And yes I wrote vram like that to
         | pothole any EE.
        
         | Etheryte wrote:
         | See the Metal Gear franchise [0], the Dead Internet Theory [1],
         | and many others who have predicted this.
         | 
         | > Hideo Kojima's ambitious script in Metal Gear Solid 2 has
         | been praised, some calling it the first example of a postmodern
         | video game, while others have argued that it anticipated
         | concepts such as post-truth politics, fake news, echo chambers
         | and alternative facts.
         | 
         | [0] https://en.wikipedia.org/wiki/Metal_Gear
         | 
         | [1] https://en.wikipedia.org/wiki/Dead_Internet_theory
        
         | djoldman wrote:
         | A variant of this was done for 4chan by the fantastic Yannic
         | Kilcher:
         | 
         | https://en.wikipedia.org/wiki/GPT4-Chan
        
         | r3trohack3r wrote:
         | HN already has a pretty good immune system for this sort of
         | thing. Low-effort or repetitive comments get down-voted,
         | flagged, and rate-limited fast. The site's karma and velocity
         | heuristics are crude compared with fancy ML, but they work
         | because the community is tiny relative to Reddit or Twitter and
         | the mods are hands-on. A fleet of sock-puppet LLM accounts
         | would need to consistently clear that bar--i.e. post things
         | people actually find interesting--otherwise they'd be throttled
         | or shadow-killed long before they "replace all human text."
         | 
         | Even if someone managed to keep a few AI-driven accounts alive,
         | the marginal cost is high. Running inference on dozens of fresh
         | threads 24/7 isn't free, and keeping the output from slipping
         | into generic SEO sludge is surprisingly hard. (Ask anyone who's
         | tried to use ChatGPT to farm karma--it reeks after a couple of
         | posts.) Meanwhile the payoff is basically zero: you can't
         | monetize HN traffic, and karma is a lousy currency for bot-
         | herders.
         | 
         | Could we stop a determined bad actor with resources? Probably,
         | but the countermeasures would look the same as they do now:
         | aggressive rate-limits, harsher newbie caps, human mod review,
         | maybe some stylometry. That's annoying for legit newcomers but
         | not fatal. At the end of the day HN survives because humans
         | here actually want to read other humans. As soon as commenters
         | start sounding like a stochastic parrot, readers will tune out
         | or flag, and the bots will be talking to themselves.
         | 
         |  _Written by GPT-3o_
        
           | stephenhumphrey wrote:
           | Regardless of whether that final line reflects reality or is
           | merely tongue-in-cheek snark, it elevates the whole post into
           | the sublime.
        
         | dangoodmanUT wrote:
         | I imagine LLMs already have this too
        
         | kriro wrote:
         | I think LLMs could be a great driver of private-public key
         | encryption. I could see a future where everyone finally wants
         | to sign their content. Then at least we know it's from that
         | person or an LLM-agent by that person.
         | 
         | Maybe that'll be a use case for blockchain tech. See the whole
         | posting history of the account on-chain.
        
         | photochemsyn wrote:
         | It's hopeless.
         | 
         | We can still take the mathematical approach: any argument can
         | be analyzed for logical self-consistency, and if it fails this
         | basic test, reject it.
         | 
         | Then we can take the evidentiary approach: if any argument that
         | relies on physical real-word evidence is not supported by well-
         | curated, transparent, verifiable data then it should also be
         | rejected.
         | 
         | Conclusion: finding reliable information online is a needle-in-
         | a-haystack problem. This puts a premium on devising ways (eg a
         | magnet for the needle) to filter the sewer for nuggets of gold.
        
       | SilverBirch wrote:
       | What is the netiquette of downloading HN? Do you ping Dang and
       | ask him before you blow up his servers? Or do you just assume at
       | this point that every billion dollar tech company is doing this
       | many times over so you probably won't even be noticed?
        
         | euroderf wrote:
         | Not to mention three-letter agencies, incidentally attaching
         | real names to HN monikers ?
        
         | krapp wrote:
         | HN has an API, as mentioned in the article, which isn't even
         | rate limited. And all of the data is hosted on Firebase, which
         | is a YC company. It's fine.
        
           | mikeevans wrote:
           | Firebase is owned and operated by Google (has been for a
           | while).
        
         | alt227 wrote:
         | If something is on the public web, it is already being scraped
         | by thousands of bots.
        
         | dangoodmanUT wrote:
         | there's literally an API they promote. Did you read that part
         | before trying to cancel them?
        
         | TZubiri wrote:
         | Well, it's called Hacker News, so hacking is fair game, at
         | least in the good sense of the word.
        
         | internetter wrote:
         | There's literally a public database
         | 
         | https://console.cloud.google.com/marketplace/product/y-combi...
        
           | umvi wrote:
           | What if someone from EU invokes "right to be forgotten" and
           | demands HN delete past comments from years ago. Will those
           | deletions be reflected in the public database? Or could you
           | mine the db to discover deleted data?
        
             | jeremyjh wrote:
             | They need to issue their demand to whoever is hosting their
             | data. If HN has deleted it, they are not hosting it.
        
           | dang wrote:
           | That's an entirely third party project so I doubt they should
           | be listing YC as a partner there.
        
             | internetter wrote:
             | Huh, yeah that is really misleading. Makes it look like it
             | is by YC.
        
       | mattkevan wrote:
       | I did something similar a while back to the @fesshole
       | Twitter/Bluesky account. Downloaded the entire archive and fine-
       | tuned a model on it to create more unhinged confessions.
       | 
       | Was feeling pretty pleased with myself until I realised that all
       | I'd done was teach an innocent machine about wanking and divorce.
       | Felt like that bit in a sci-fi movie where the alien/super-
       | intelligent AI speed-watches humanity's history and decides we're
       | not worth saving after all.
        
         | falcor84 wrote:
         | What's wrong with wanking and divorce? These are respectively a
         | way for people to be happier and more self-reliant, and a way
         | for people to get out of a situation that isn't working out for
         | them. I think both are net positives, and I'm very grateful to
         | live in a society that normalizes them.
        
           | dcuthbertson wrote:
           | The innocent machine can't do either. It's akin to having no
           | mouth, but it must scream (apologies to Harlan Ellison)
        
             | falcor84 wrote:
             | That is a fair point, but it would then apply to everything
             | else we teach it about, like how we perceive the color of
             | the sky or the taste of champagne. Should we remove these
             | from the training set too?
             | 
             | Is it not still good to be exposed to the experiences of
             | others, even if one cannot experience these things
             | themself?
        
               | dcuthbertson wrote:
               | Thanks for saying it's a fair point, but it's more of an
               | offhand joke about "an innocent machine". In reality, a
               | machine, even an LLM, has no innocence. It's just a
               | machine.
        
               | pixl97 wrote:
               | Gets a bit more complicated when we start giving these
               | machines agency.
        
               | falcor84 wrote:
               | Having studied biology, I never accepted the "just a
               | machine" argument. Everything is essentially a machine,
               | but when a machine is sufficiently complex, it is
               | rational to apply the Intentional Stance to it.
        
           | pc86 wrote:
           | I'm not implying that divorce should be stigmatized or
           | prohibited or anything, but it is bad (necessary evil?) and
           | most people would be much happier if they had never married
           | that person in the first place rather than married them then
           | gotten divorced.
           | 
           | So "normalize divorce" is pretty backward when what we should
           | be doing is normalizing making sure you're marrying the right
           | person.
        
             | cgriswald wrote:
             | Making sure you are marrying the right person _is_
             | normalized. I'd have never even known my ex wasn't the
             | right person if I hadn't married her. I didn't come out of
             | my marriage worse off.
             | 
             | Normalize divorce and stop stigmatizing it by calling it
             | bad or evil.
        
               | pixl97 wrote:
               | Eh, I would say it's quite a bit more complicated than
               | you're giving it credit for.
               | 
               | >Making sure you are marrying the right person is
               | normalized.
               | 
               | Absolutely not.
               | 
               | I live in the southern US and we have the culmination of
               | "Young people should get married" coupled with "divorce
               | is bad/evil" and the disincentivization of actually
               | learning about human behaviors/complications before going
               | through something that could be traumatic.
               | 
               | There are a lot of relationships that from an outside and
               | balanced perspective give all the signs they will not
               | work out and will be potentially dangerous for one or
               | both partners in the relationship.
        
               | bluefirebrand wrote:
               | > I didn't come out of my marriage worse off
               | 
               | This is good for you, but many people do come out of
               | their marriages much worse off in various ways
               | 
               | > Normalize divorce and stop stigmatizing it by calling
               | it bad or evil
               | 
               | It's not bad or evil, but let's also not pretend that it
               | isn't damaging
        
               | cgriswald wrote:
               | We don't have to pretend. The original poster thinks he
               | knows what the world looks like if every marriage that
               | ends in divorce just never happened. Those marriages _do_
               | happen, though, and to place all the damage generated by
               | that marriage strictly on the divorce is incorrect.
               | Usually one or both parties know the consequences of the
               | divorce and prefer them to the state of the marriage,
               | because the damages are _less_ than if divorce wasn 't an
               | option. Claiming divorce is some kind of undesirable
               | 'damaged' state is just as stigmatizing as claiming it is
               | 'bad' or 'evil'.
               | 
               | The alternative to divorce isn't perfect marriages, it is
               | failed marriages that are inescapable.
        
               | gwerbret wrote:
               | > The alternative to divorce isn't perfect marriages, it
               | is failed marriages that are inescapable.
               | 
               | I'm sure this has nothing to do with you, but by your
               | comments in this thread, I'm reminded of a conversation I
               | had with a friend on a bus one day. We were talking about
               | the unfortunate tendency, in daytoday, of people to
               | shuffle their elderly parents off to nursing homes,
               | rather than to support said parents in some sort of
               | independent living. A nearby passenger jumped into our
               | conversation to argue that there are situations in which
               | the nursing home situation is for the best. Although we
               | agreed with him, he seemed to dislike the fundamental
               | idea of caring for one's elderly parents _at all_ , and
               | subsequently became quite heated.
        
               | smcin wrote:
               | There are lots of proven viable alternatives to quick no-
               | fault divorce, the most obvious being waiting periods or
               | separation periods ranging from months to years. [0].
               | Parental alienation can be gamed, and frequently is.
               | Psychologist evals can be gamed or biased. Expert witness
               | reports can be gamed. Move-away scenarios (by the
               | custodial parent) can be gamed. Making false or perjurous
               | allegations can be gamed, sometimes without consequence.
               | Jurisdiction-shopping can be gamed. It seems pretty
               | obvious that if there are huge incentives (or penalties)
               | for certain modes of behavior, some types of people will
               | exploit those. Community property/separate property can
               | be gamed. The timing of all these things can be gamed wrt
               | dicslosures, health events, insurance
               | coverage/eligibility, job change/start/end, stock
               | vesting, SS eligibity, tax filings etc. Divorce
               | settlements can be gamed too by one party BK'ing out of a
               | settlement/division of debts. At-fault divorce also
               | exists (in many US states), and obviously can be gamed.
               | 
               | It's not a false dichotomy between either a jurisdiction
               | must allow instant no-fault divorce for everyone who
               | petitions for it, or none at all.
               | 
               | > _Usually one or both parties know the consequences of
               | the divorce and prefer them to the state of the marriage,
               | because the damages are less than if divorce wasn 't an
               | option._
               | 
               | Sometimes both parties are reasonably rational and honest
               | and non-adversarial, then again sometimes one or both
               | aren't, and it only takes one party (or their relatives)
               | to make things adversarial. If you as a member of the
               | public want to see it in action, in general you can sit
               | in and observe proceedings in your local courthouse in
               | person, or view the docket of that day's cases, or view
               | the local court calendar online. Often the judge and
               | counsel strongly affect the outcome too, much more than
               | the facts at issue.
               | 
               | > _Claiming divorce is some kind of undesirable 'damaged'
               | state is just as stigmatizing as claiming it is 'bad' or
               | 'evil'._
               | 
               | It is not necessarily the end-state of being divorced
               | that is objectively quantifiably the most damaging to
               | both parties' finances, wellness, children, and society
               | at large, it's the expensive non-transparent ordeal of
               | family court itself that can cause damage, as much as (or
               | sometimes more than) the end-state of ending up divorced.
               | Or both. Or neither.
               | 
               | > _The alternative to divorce is..._
               | 
               | ...a less broken set of divorce laws, for which there are
               | multiple viable candidates. Or indeed, marriage(
               | /cohabitation/relationships) continuing to fall out of
               | favor. Other than measuring crude divorce rates and
               | comparing their ratio to crude marriage rates (assuming
               | same jurisdiction, correcting for offset by the
               | (estimated) average length of marriage, and assuming zero
               | internal migration), as marriage becomes less and less
               | common, we're losing the ability to form a quantified
               | picture of human behavior viz. when
               | partnerships/relationships start or end; many countries'
               | censuses no longer track this or being pressued to stop
               | tracking it [1]; it could be inferred from e.g. bank,
               | insurance, household bill arrangements, credit
               | information, public records, but obviously privacy needs
               | to be respected.
               | 
               | [0] https://en.wikipedia.org/wiki/Divorce_law_by_country
               | 
               | [1]: https://www.pewresearch.org/short-
               | reads/2015/05/11/census-bu...
        
               | pc86 wrote:
               | Something can be both bad and not stigmatized. Divorce is
               | a pretty good example here. It's not stigmatized, and to
               | prove it's not say with a straight face it should be
               | illegal and you won't be able to blink before the
               | backlash hits you. It's not stigmatized at all. _Most_
               | individuals who get married will get divorced. The way
               | the numbers work out something like 60-70% of all
               | marriages contain at least one divorced partner. Saying
               | it 's stigmatized is silly and doesn't line up with
               | reality. But of course it's an objectively bad thing.
               | It's messy, it's expensive, feelings get hurt, often
               | times years or decades of peoples' lives are wasted.
        
               | cgriswald wrote:
               | I don't have to say it with a straight face because your
               | sibling poster did it for me. Something can be both
               | common and stigmatized. Yes, divorce _can be_ messy,
               | expensive, emotionally fraught, and take time. Mine was,
               | and it still wasn 't 'bad' or even undesirable. Starting
               | a business, learning an instrument, training for a sport
               | can _also_ be all those things. We don 't call them
               | 'bad', or 'evil', because we don't assume the end result
               | is undesirable.
               | 
               | The comparison can't be to an imaginary world where
               | everyone always picks the best partner. It has to be to
               | the real world where people don't always pick the best
               | partner and the absence of divorce means they're stuck
               | with them.
        
             | nhod wrote:
             | This reminds me of one of my very favorite essays of all
             | time, "Why You Will Marry the Wrong Person" by Alain de
             | Botton from the School of Life. The title is somewhat
             | misleading, and I resisted reading it for a couple years as
             | a result. It is exquisite writing -- it couldn't be said
             | with fewer words, and adding more wouldn't help either --
             | and an extraordinary and ultimately hopeful meditation on
             | love and marriage.
             | 
             | NYT Gift Article:
             | https://www.nytimes.com/2016/05/29/opinion/sunday/why-you-
             | wi...
        
               | tailspin2019 wrote:
               | You're 100% right. That essay is superb and I'm glad I
               | read it!
               | 
               | Thanks for sharing the link.
        
               | Nzen wrote:
               | Alain de Botton also published this in video form, seven
               | years ago [0]. If you want the cliff's notes, his School
               | of Life channel has a shorter version [1].
               | 
               | [0] https://www.youtube.com/watch?v=-EvvPZFdjyk 22
               | minutes
               | 
               | [1] https://www.youtube.com/watch?v=zuKV2DI9-Jg 4
               | minutess
        
               | didgetmaster wrote:
               | I agree. The title is wrong. It should be 'Why you are
               | sure to think, whomever you marry, that they are the
               | wrong person".
        
           | adamc wrote:
           | Having gone through a divorce... no. It would be better if
           | people tried harder to make relationships work. Failing that,
           | it would be better to not marry such a person.
        
             | falcor84 wrote:
             | People sometimes grow in different directions. Sometimes
             | the person who was perfect for you at 25 just isn't a good
             | fit for you at age 40, regardless of how hard you try to
             | make it work.
        
         | nthingtohide wrote:
         | > an innocent machine about wanking and divorce
         | 
         | Let's say you discovered a pendrive of a long lost civilization
         | and train a model on that text data. How would you or the model
         | know that the pendrive contained data on wanking and divorce
         | without anykind of external grounding to that data?
        
       | deadbabe wrote:
       | Is the 20GB JSON file available?
        
       | a3w wrote:
       | Cool project. Cool graphs.
       | 
       | But any GDPR requests for info and deletion in your inbox, yet?
        
         | arduanika wrote:
         | Come on, you wouldn't GDPR a whimsical toy project!
        
       | shayway wrote:
       | Hah, I've been scraping HN over the past couple weeks to do
       | something similar! Only submissions though, not comments. It was
       | after I went to /newest and was faced with roughly 9/10 posts
       | being AI-related. I was curious what the actual percentage of
       | posts on HN were about AI, and also how it compared to other
       | things heavily hyped in the past like Web3 and crypto.
        
         | alt227 wrote:
         | Here, the entire history of HN with the ability to run queries
         | on it directly in the browser :)
         | 
         | https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...
        
       | sebastianmestre wrote:
       | Can you remake the stacked graphs with the variable of interest
       | at the bottom? Its hard to see the percentage of Rust when it's
       | all the way at the top with a lot of noise on the lower layers
       | 
       | Edit: or make a non-stacked version?
        
         | jasonthorsness wrote:
         | Lots of valid criticism here of these graphs and the queries;
         | I'll write a follow-up article.
        
       | xnx wrote:
       | I have this data and a bunch of interesting analysis to share.
       | Any suggestions on the best method to share results?
       | 
       | I like Tableau Public, because it allows for interactivity and
       | exploration, but it can't handle this many rows of data.
       | 
       | Is there a good tool for making charts directly from Clickhouse
       | data?
        
         | texodus wrote:
         | No Clickhouse connector for free accounts yet, but if you can
         | drop a Parquet file on S3 you can try https://prospective.co
        
           | xnx wrote:
           | Thanks! I'll check that out. Thought it was a typo of
           | "Perspective" for a moment: https://perspective.finos.org/
        
             | texodus wrote:
             | Yes! This is the _pro_ version, we also develop open source
             | https://github.com/finos/perspective (which Prospective is
             | substantially built on, with some customizations such as a
             | wasm64 runtime).
        
       | Am4TIfIsER0ppos wrote:
       | I hope they snatched my flagged comments. I would be pleased to
       | have helped make the AI into an asshole. Here's hoping for
       | another Tay AI.
        
       | wslh wrote:
       | It would be great if it is available as a torrent. There also
       | mutable torrents [1]. Not implemented everywhere but there are
       | available ones [2].
       | 
       | [1] https://www.bittorrent.org/beps/bep_0046.html
       | 
       | [2] https://www.npmjs.com/package/bittorrent-dht
        
       | th1nhng0 wrote:
       | Can I ask how you draw the chart in the post?
        
         | jasonthorsness wrote:
         | lol it was Excel (save as picture / SVG format / edit colors to
         | support dark/light mode)
        
           | th1nhng0 wrote:
           | wow, I never expect that xD thanks for let me know
        
       | byearthithatius wrote:
       | Can you scrape all of HN by just incrementing item?id (since its
       | sequential) and using Python web requests with IP rotation (in
       | case there is rate limiting)?
       | 
       | NVM this approach of going item by item would take 460 days if
       | the average request response time is 1 second (unless heavily
       | parallelized, for instance 500 instances _could_ do it in a day
       | but thats 40 million requests either way so would raise alarms).
        
       | g8oz wrote:
       | I predict that in the coming years a lot of APIs will begin offer
       | the option of just returning a duckdb file. If you're just going
       | to load the json into a database anyway, why not just get a
       | database in the response.
        
       ___________________________________________________________________
       (page generated 2025-04-30 23:00 UTC)