[HN Gopher] Network visualization of 50k blogs and links
       ___________________________________________________________________
        
       Network visualization of 50k blogs and links
        
       Author : ng-henry
       Score  : 99 points
       Date   : 2024-04-23 19:33 UTC (3 hours ago)
        
 (HTM) web link (graph.henryn.ca)
 (TXT) w3m dump (graph.henryn.ca)
        
       | ng-henry wrote:
       | I scraped my favorite blogs and made a graph from the domains
       | that each blog links to.
       | 
       | You can see clusters forming of websites that talk about similar
       | topics, like crypto, rationality, Canada, India, and even
       | postgres!
       | 
       | The visualization was made entirely in webgl with some neat
       | optimizations to render that many lines and circles.
        
         | dameyawn wrote:
         | Very neat! So you wrote the graph visualization UI? I see in
         | prior project you used cytoscape - any motivation for doing it
         | yourself this time (vs one of the available libraries)?
        
           | ng-henry wrote:
           | Yeah I used cytoscape before but it didn't have the full
           | customization that I wanted. Besides the performance issues,
           | there were some problems I couldn't have solved without a
           | custom renderer - if many lines overlap, how should their
           | colors blend? - how to render circles so that they look nice
           | both zoomed in / out - how to avoid it looking like a
           | hairball graph [1]
           | 
           | The nice thing about a personal project is that I can do
           | whatever I like with no constraints, so I built one that's
           | suited for this project and fits my tastes.
           | 
           | [1] https://cambridge-intelligence.com/how-to-fix-hairballs/
        
         | varenc wrote:
         | This is a really cool project! I'd love to hear more about how
         | your built the front end.
        
         | gala8y wrote:
         | serendipity heaven, but... how is this map of _your favorite_
         | blogs?
        
           | ng-henry wrote:
           | I started off with my favorite blogs and recursively explored
           | from there based on what they linked to.
        
             | gala8y wrote:
             | ok, got it.
        
         | jseliger wrote:
         | This is very cool but also not accurate, at least for
         | jakeseliger.com. Henryn.ca lists 0 links from jakeseliger.com
         | to nytimes.com, reason.com, and numerous others that simple
         | search demonstrates are linked to, for example:
         | https://jakeseliger.com/?s=nytimes.com&submit=Search
         | 
         | I put up many links posts, so I probably link to an abnormally
         | large number of sites.
        
           | ng-henry wrote:
           | Yep this is only for stuff that we've crawled, so we can't
           | detect all of your links. Because we have limited crawling
           | resources, we rate-limit the crawling by domain so we don't
           | get stuck in spider traps.
           | 
           | The current visualization only shows the current state of the
           | crawl, so it won't know about all of the posts.
        
         | nickjj wrote:
         | Thanks a lot for including my site in your list. It was fun to
         | see where it appeared on the map. It was pretty close to
         | RealPython and GitHub.
        
         | TuringNYC wrote:
         | > I scraped my favorite blogs and made a graph from the domains
         | that each blog links to.
         | 
         | Nice analysis! However, I'm guessing these arent your fav blogs
         | as there are tens of thousands of entries! How did you decide
         | which blogs to index, did you use some central registry of
         | blogs?
        
       | Avamander wrote:
       | Awesome, I've always wanted to build something like that on top
       | of YaCy just so that I could properly select new potentially
       | interesting sites to index. (I can't rely on the auto-index
       | unfortunately because it has no option to pre-confirm before
       | indexing.)
        
       | abalaji wrote:
       | This is neat, found my blog in there. Don't think I linked to
       | NatGeo at any point, though.
       | 
       | adithyabalaji.com
        
         | ng-henry wrote:
         | Nice! I looked through the logs and saw that you linked to it
         | in this article:
         | https://www.adithyabalaji.com/datascience/2021/05/17/Analyzi...
        
       | nexuist wrote:
       | This is really awesome work! How did you classify so many links?
        
         | ng-henry wrote:
         | To get their topics? I used a basic louvain community detection
         | algorithm, then put all the URLs into GPT with some few-shot
         | prompting tricks to get it to output a particular topic.
         | There's some heuristics to break up giant communities / combine
         | small communities in there too.
        
           | internetter wrote:
           | Interesting, I was curious what I would be categorized as and
           | it's "Whistleblowing and Leaks", which I do suppose is what
           | my content has lately been to some extent but it was funny to
           | see that written out.
           | 
           | My question for you is how can I see what sites link to me,
           | as opposed to what sites I link to?
        
       | ibaikov wrote:
       | Scraped 10k blogs some time ago. Only like 20 of them had /ideas
       | page, sad :(
        
         | mixedmath wrote:
         | What does an /ideas page mean to you?
        
       | amadeuspagel wrote:
       | I think it would be cool if the search results were also
       | visualized as a network.
        
       | PaulHoule wrote:
       | People still upvote hairball graphs every time. Fortunately there
       | is a cure:
       | 
       | https://cambridge-intelligence.com/how-to-fix-hairballs/
        
         | ng-henry wrote:
         | The hairball was much worse before. I used a lot of techniques
         | from this paper [1] to make it look decent and a bunch of other
         | heuristics based on other papers to make it look informative.
         | 
         | [1]
         | https://jgaa.info/accepted/2015/NocajOrtmannBrandes2015.19.2...
        
         | tauchunfall wrote:
         | Also edge bundling can help. See the papers by Benjamin Bach et
         | al.
         | 
         | https://aviz.fr/~bbach/confluentgraphs/
        
         | 3abiton wrote:
         | Is graph data processing considered visualization style? It is
         | changing the data, how can this be considered "visualization"?
        
       | system2 wrote:
       | Awesome. I will spend many hours looking at it today. Thanks a
       | lot.
        
       | bhartzer wrote:
       | This is very similar to Majestic's Link Graph where you can put
       | in any domain name and see all the links, up to tier 5, that link
       | to that domain name.
        
       | jszymborski wrote:
       | My blog is on here, but as a lonely, lonely node. I link to stuff
       | I promise!
        
         | ng-henry wrote:
         | We just haven't crawled your site yet! There's a lot of links
         | so we can't crawl them all :(
        
       | erikig wrote:
       | Reminds me a little of my fav sub-reddit browsing tool -
       | https://anvaka.github.io/map-of-reddit/
       | 
       | One nice feature that would be helpful is the ability to preview
       | the blog.
        
       | anfractuosity wrote:
       | Cool, I'm just wondering how come some nodes don't have any lines
       | to/from them, does that mean they came from an initial seed list?
        
       | JohnKemeny wrote:
       | Any chance of getting a copy of the underlying dataset?
        
       | nerdl0ve_kr wrote:
       | what's the point of this?
        
       | montyanderson wrote:
       | Reminds me of my friend's visualisation of tracks on the popular
       | London station NTS https://www.barneyhill.com/pages/nts-
       | tracklists/. Turns out a lot of cool artists like the same
       | tracks... ;)
        
       | rcarmo wrote:
       | Hmmm. My site is listed, but I have _way_ more inbound and
       | outbound links than shown.
       | 
       | And I have my own internal links visualization, which might be a
       | bit over the top (GPU recommended):
       | https://taoofmac.com/static/graph
        
         | ng-henry wrote:
         | See this comment I posted in another thread:
         | 
         | Yep this is only for stuff that we've crawled, so we can't
         | detect all of your links. Because we have limited crawling
         | resources, we rate-limit the crawling by domain so we don't get
         | stuck in spider traps. The current visualization only shows the
         | current state of the crawl, so it won't know about all of the
         | posts.
        
       | hanniabu wrote:
       | Surprised to see litprotocol at the same level as etherscan
        
       ___________________________________________________________________
       (page generated 2024-04-23 23:00 UTC)