[HN Gopher] Track HN: Survival Rate of Show HN Stories
       ___________________________________________________________________
        
       Track HN: Survival Rate of Show HN Stories
        
       Author : namiwang
       Score  : 126 points
       Date   : 2023-06-13 17:25 UTC (5 hours ago)
        
 (HTM) web link (nami.land)
 (TXT) w3m dump (nami.land)
        
       | gkoberger wrote:
       | I'm happy to say that the reports of my death here are greatly
       | exaggerated :)
       | 
       | I'm the owner of both #4 and #140 on the Top-scoring Show HN
       | Stories that Didn't Survive... but both are very much alive!
       | 
       | #4 StackSort was a Github.com page, but on 2021 they made it so
       | only Github.io wroks. If dang sees this, I'd really appreciate if
       | you could change the URL for
       | https://news.ycombinator.com/item?id=5395463 to use github.io!
       | 
       | #140 ReadMe has the same io/com issue, in the opposite direction!
       | we redirect readme.io to readme.com now, which seems to be why
       | it's flagged.
        
         | codetrotter wrote:
         | The main page https://gkoberger.github.io/ that the
         | https://gkoberger.github.com/ link suggests going to gives a
         | 404 as well. Could be a good idea to add a main page for
         | https://gkoberger.github.io/ that links the StackSort page and
         | anything else
        
         | 12907835202 wrote:
         | How on earth did you get readme.com?
         | 
         | I'm assuming someone else owned it, whenever I see that and all
         | the "make an offer" links I move on and ignore it. Was the
         | process easy?
        
       | AndrewKemendo wrote:
       | I am also, along with gkoberger happy to say that we didn't die
       | after our Show HN (Show HN: A Covid-19 testing location site that
       | a group of us are building)
       | 
       | https://news.ycombinator.com/item?id=22650725
       | 
       | In fact we were so successful that we were able to shut it down
       | less than a year after we started (It's on the list as a very
       | reasonable Type II error ;))
       | 
       | Thanks to the HN community for helping us get an amazing
       | Temporary product out and shut down successfully
        
       | karaterobot wrote:
       | I know you mention there are lots of reasons for false positives
       | and negatives, but does your methodology account for length of
       | time at all? Meaning, if a project was posted to HN in 2009, it
       | could have been successful for 14 years and then closed down, or
       | just changed URLs somewhere along the way, and in that case it
       | would be counted as a failure even though it wasn't. Likewise, if
       | it was posted in May, 2023 and is still around, that doesn't mean
       | much because it's still flying the Grand Opening banner,
       | practically.
        
         | h0l0cube wrote:
         | Exactly. Some of these graphs are really flawed. Like the
         | heatmap for the top 1% which pretty much mirrors the submission
         | heatmap. I want to see what _portion_ of submissions _for that
         | time slot_ reached 1%, not of all submissions. There could be
         | time slots that perform exceedingly well outside of popular
         | times.
        
       | billllll wrote:
       | I'd love to get some correlation with rank, or even filtering of
       | lower scoring posts.
       | 
       | From what I know, HN posts are often used as a signal for
       | viability of a project. In that case, you can't make a conclusion
       | on the effectiveness of Show HN posts, because some of them will
       | die off by design.
        
       | reaperman wrote:
       | > Extra: ChatGPT Gave a Wrong RegexPermalink I consulted ChatGPT
       | for a regex to extract domains from urls, and it gave a flawed
       | one:
       | 
       | ^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\\.)?([^:\/\n?]+).
       | 
       | It even gave reasonable detailed explanations which convinced me.
       | Later tests revealed that this regex doesn't work for url with @
       | in path, such as https://foo.com/@./bar. The correct one should
       | be
       | 
       | ^(?:https?:\/\/)?(?:[^@\/\n]+@)?(?:www\\.)?([^:\/?\n]+).
       | 
       | ---------------------
       | 
       | The trick is to ask ChatGPT what the right tool for the job is in
       | your language of choice. For python, ChatGPT will happily give
       | you:                 from urllib.parse import urlparse
       | extract_domain = lambda url: urlparse(url).netloc.replace('www.',
       | '', 1)       # Example usage       url = 'https://foo.com/@./bar'
       | domain = extract_domain(url)       print(domain)  # Output:
       | foo.com
       | 
       | -------------
       | 
       | I don't think RegEx is typically the "most" correct tool for the
       | job for things which likely have built-in parser libraries (XML,
       | HTML, URLs, JSON, etc)
        
       | [deleted]
        
       | tagawa wrote:
       | What timezone is used for the submission heatmap?
        
       | littlestymaar wrote:
       | Oh, Airmash is dead. I remember seeing it on HN then spending
       | half of my workday this day playing it.
        
         | gadgetoid wrote:
         | The community revived it to https://airmash.online/ pretty
         | sharpish, does this count as dead?
        
       | coding123 wrote:
       | Is this why HN was so slow yesterday?
        
       | Semaphor wrote:
       | The top 250 has 8 dead projects from 2023. Of those 8, 5 are not
       | dead at all, 1 is alive but has an expired certificate and only 2
       | (the lowest ranked) are dead. This does not seem like useful
       | data.
        
       | elaus wrote:
       | Recently I was browsing through old threads where users showed
       | off their personal websites and blogs. I wanted to find some
       | inspiration for my own website.
       | 
       | What I found instead were about 3/4 dead links - even though the
       | threads were all from the last 4-5 years. I found that quite sad,
       | because people often talked with great passion about their
       | websites and they sounded really cool. Also i LOVE those small,
       | personal islands in the big, commercialized and in many ways
       | centralized web.
        
         | manuelmoreale wrote:
         | Sadly that is nothing new. I used to run a website gallery and
         | link rotting is incredibly high.
         | 
         | Same is true for another couple of projects I'm running now.
         | I'm collecting personal websites and quirky small web
         | experiments and the same is happening there.
         | 
         | Somewhat related is the phenomenon of dead blogs. Plenty of
         | those with a couple of interesting posts and then abandoned.
        
       | nvy wrote:
       | Neat idea, thanks for sharing.
       | 
       | Curious choice to highlight Show HNs that didn't survive, but not
       | the ones that did.
       | 
       | Is there a reason for this?
        
         | malfist wrote:
         | Same, I read the article twice in case I missed it, but no,
         | nothing about the ones that did survive, even on the "more
         | data" section.
        
       | zX41ZdbW wrote:
       | > Looking for a Sponsor to Host the Database PubliclyPermalink >
       | In the meantime, it'd be great if anyone can query the database.
       | I tried to host a public database and real-time query interface
       | online, but couldn't afford the bill for a smooth Postgres
       | instance to hold around 20G (40M rows plus indices) data. While a
       | $20 instance could suffice, it's pretty slow from usable,
       | comparing to the local one on my M2 MacBook Air.
       | 
       | Here is the database with publicly available SQL endpoint:
       | https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...
        
         | SushiHippie wrote:
         | Nice, but seems to be last updated 2022-12-12 and funnily the
         | IDs that don't exist have a time of 1970-01-01 00:00:00
        
       | smallerfish wrote:
       | Phind (#2 on your list) is still up and running also (https://www
       | .phind.com/search?q=false%20negative&source=searc...).
        
       | CryptoBanker wrote:
       | How do you have 40mm rows of data on Show HN for only ~126,000
       | stories?
        
         | SushiHippie wrote:
         | Comments and the stories that are not "SHOW HN".
         | 
         | From TFA:
         | 
         | > For this analyze, I considered submissions made before May
         | 31, 2023, 23:59 UTC. The dataset consists of 4,714,023 stories
         | and 30,363,533 comments from 867,097 users.
        
       | david_shaw wrote:
       | Thanks for making and sharing this - although I'm surprised it's
       | not a "Show HN" itself!
       | 
       | I was curious about the top post that didn't survive - an HTML5
       | game called "airma.sh" - and I wanted to check it out. I _think_
       | I found a working mirror: https://www.crazygames.com/game/airmash
       | 
       | It's possible that this is a different game, but it seems to fit
       | the description.
       | 
       | Interestingly, the person who submitted that post stopped being
       | active on HN after that discussion.
        
         | flyinglizard wrote:
         | Airmash lives very well on this community hosted site:
         | https://airmash.online/
         | 
         | The original author was never to be heard from again.
        
       | gadgetoid wrote:
       | Airmash still lives at https://airmash.online/ and there's also a
       | space mod - Starmash - at https://airmash.cc/
       | 
       | I apologise in advance for the hours you'll lose to these
       | (again?)
        
       | ravenstine wrote:
       | You're telling me substack.com doesn't even make the top 100?
        
       | oliverobscure wrote:
       | Great visualisation. I was quite surprised that the submission
       | dates and times appeared unimodal around an American morning
       | peak.
        
         | _dain_ wrote:
         | n=1 but I know at least one non-american who has stayed up late
         | so that the submission coincides with this peak time
        
         | oezi wrote:
         | Using a stacked barchart for dead vs alive isn't a great choice
         | in my mind. Normalize to 100% please.
        
       | TomNomNom wrote:
       | Just a silly aside with regards to the regex to extract domains
       | from URLs, my little tool called unfurl [0] exists to solve that
       | exact sort of problem :)
       | 
       | [0]https://github.com/tomnomnom/unfurl
        
         | opello wrote:
         | bagder (of curl) also made trurl to address URL manipulation:
         | 
         | https://github.com/curl/trurl
        
       | folli wrote:
       | Nice work!
       | 
       | I'd actually be interested in factors that make a Show HN a
       | success vs failure.
       | 
       | Objectively, there's an obvious one your dataset: time of
       | submission. Tuesday afternoon (which timezone? I assume US west
       | coast?) seems to be key. No way this correlates with the quality
       | of submissions.
       | 
       | Subjectively: it seems to become much harder recently. I managed
       | once a couple of years ago for a short time to reach the front
       | page with an Android app, now I'm barely able to get above 20
       | points, even though the product is (again, subjectively) cooler
       | and has a possibly wider audience
       | (https://news.ycombinator.com/item?id=35671245).
       | 
       | Not complaining, but perhaps nowadays Show HN is not an easy way
       | anymore to "get the word out" and get some early user feedback
       | for and from indie hackers? Any other sites that might be of
       | interest?
        
         | OJFord wrote:
         | Its badge on a product's home page is to me a negative signal,
         | but partly since it does still happen (quite a lot) - people do
         | seem to use ProductHunt.
         | 
         | (I suppose I'd use it - and pretty much anything - but just not
         | put 'omg #1' badge on my site, if I had something to launch
         | myself.)
         | 
         | Completely tangential now, but I think its problem is right in
         | the title - who is hunting a product? It's a complete echo
         | chamber, surely nobody who doesn't have something to launch is
         | actively using it - 'it's Wednesday so I need a new Gmail-
         | integrating Jira spline reticulator'.
        
       | trewqasdf wrote:
       | The pandemic really got the activity going during 2020 (first bar
       | chart), but maybe not so surprising with everyone pivoting to
       | remote work. And obviously all discssusions about vaccines and
       | how different government were handling things.
        
       | hawski wrote:
       | Regarding database hosting, if you would consider giving the data
       | away, I would suggest converting it to an SQLite database and
       | sharing it over Torrent.
        
         | xnx wrote:
         | I second this. You've done a great service to collect this
         | data. I'm guessing the file must be much smaller than 20GB when
         | compressed.
        
           | zX41ZdbW wrote:
           | It is only around 5 GB in ClickHouse. Details:
           | https://github.com/ClickHouse/ClickHouse/issues/29693
        
           | zX41ZdbW wrote:
           | I've also did an experiment by generating and searching
           | embeddings for all the comments on HN. Here is the
           | walkthrough: https://www.youtube.com/watch?v=hGRNcftpqAk
        
       | _andrei_ wrote:
       | Phind, the 2nd entry, is live and well.
        
       | jumploops wrote:
       | No affiliation, but the second to top deceased site is still
       | alive and kicking [0]
       | 
       | Spot checking the top results might give a better estimate for
       | how many are actually alive vs. just using bot protection.
       | 
       | [0]https://news.ycombinator.com/item?id=35543668
        
         | sentrysapper wrote:
         | https://harvestsignal.com/ is also still alive, but the site
         | certificate expired.
        
         | bagels wrote:
         | It just errors out right now. How can we differentiate: always
         | errors out vs dead?
        
           | gkoberger wrote:
           | Vercel (and AWS) are down right now, hence the error.
        
       ___________________________________________________________________
       (page generated 2023-06-13 23:01 UTC)