[HN Gopher] Network visualization of 50k blogs and links
___________________________________________________________________
Network visualization of 50k blogs and links
Author : ng-henry
Score : 99 points
Date : 2024-04-23 19:33 UTC (3 hours ago)
(HTM) web link (graph.henryn.ca)
(TXT) w3m dump (graph.henryn.ca)
| ng-henry wrote:
| I scraped my favorite blogs and made a graph from the domains
| that each blog links to.
|
| You can see clusters forming of websites that talk about similar
| topics, like crypto, rationality, Canada, India, and even
| postgres!
|
| The visualization was made entirely in webgl with some neat
| optimizations to render that many lines and circles.
| dameyawn wrote:
| Very neat! So you wrote the graph visualization UI? I see in
| prior project you used cytoscape - any motivation for doing it
| yourself this time (vs one of the available libraries)?
| ng-henry wrote:
| Yeah I used cytoscape before but it didn't have the full
| customization that I wanted. Besides the performance issues,
| there were some problems I couldn't have solved without a
| custom renderer - if many lines overlap, how should their
| colors blend? - how to render circles so that they look nice
| both zoomed in / out - how to avoid it looking like a
| hairball graph [1]
|
| The nice thing about a personal project is that I can do
| whatever I like with no constraints, so I built one that's
| suited for this project and fits my tastes.
|
| [1] https://cambridge-intelligence.com/how-to-fix-hairballs/
| varenc wrote:
| This is a really cool project! I'd love to hear more about how
| your built the front end.
| gala8y wrote:
| serendipity heaven, but... how is this map of _your favorite_
| blogs?
| ng-henry wrote:
| I started off with my favorite blogs and recursively explored
| from there based on what they linked to.
| gala8y wrote:
| ok, got it.
| jseliger wrote:
| This is very cool but also not accurate, at least for
| jakeseliger.com. Henryn.ca lists 0 links from jakeseliger.com
| to nytimes.com, reason.com, and numerous others that simple
| search demonstrates are linked to, for example:
| https://jakeseliger.com/?s=nytimes.com&submit=Search
|
| I put up many links posts, so I probably link to an abnormally
| large number of sites.
| ng-henry wrote:
| Yep this is only for stuff that we've crawled, so we can't
| detect all of your links. Because we have limited crawling
| resources, we rate-limit the crawling by domain so we don't
| get stuck in spider traps.
|
| The current visualization only shows the current state of the
| crawl, so it won't know about all of the posts.
| nickjj wrote:
| Thanks a lot for including my site in your list. It was fun to
| see where it appeared on the map. It was pretty close to
| RealPython and GitHub.
| TuringNYC wrote:
| > I scraped my favorite blogs and made a graph from the domains
| that each blog links to.
|
| Nice analysis! However, I'm guessing these arent your fav blogs
| as there are tens of thousands of entries! How did you decide
| which blogs to index, did you use some central registry of
| blogs?
| Avamander wrote:
| Awesome, I've always wanted to build something like that on top
| of YaCy just so that I could properly select new potentially
| interesting sites to index. (I can't rely on the auto-index
| unfortunately because it has no option to pre-confirm before
| indexing.)
| abalaji wrote:
| This is neat, found my blog in there. Don't think I linked to
| NatGeo at any point, though.
|
| adithyabalaji.com
| ng-henry wrote:
| Nice! I looked through the logs and saw that you linked to it
| in this article:
| https://www.adithyabalaji.com/datascience/2021/05/17/Analyzi...
| nexuist wrote:
| This is really awesome work! How did you classify so many links?
| ng-henry wrote:
| To get their topics? I used a basic louvain community detection
| algorithm, then put all the URLs into GPT with some few-shot
| prompting tricks to get it to output a particular topic.
| There's some heuristics to break up giant communities / combine
| small communities in there too.
| internetter wrote:
| Interesting, I was curious what I would be categorized as and
| it's "Whistleblowing and Leaks", which I do suppose is what
| my content has lately been to some extent but it was funny to
| see that written out.
|
| My question for you is how can I see what sites link to me,
| as opposed to what sites I link to?
| ibaikov wrote:
| Scraped 10k blogs some time ago. Only like 20 of them had /ideas
| page, sad :(
| mixedmath wrote:
| What does an /ideas page mean to you?
| amadeuspagel wrote:
| I think it would be cool if the search results were also
| visualized as a network.
| PaulHoule wrote:
| People still upvote hairball graphs every time. Fortunately there
| is a cure:
|
| https://cambridge-intelligence.com/how-to-fix-hairballs/
| ng-henry wrote:
| The hairball was much worse before. I used a lot of techniques
| from this paper [1] to make it look decent and a bunch of other
| heuristics based on other papers to make it look informative.
|
| [1]
| https://jgaa.info/accepted/2015/NocajOrtmannBrandes2015.19.2...
| tauchunfall wrote:
| Also edge bundling can help. See the papers by Benjamin Bach et
| al.
|
| https://aviz.fr/~bbach/confluentgraphs/
| 3abiton wrote:
| Is graph data processing considered visualization style? It is
| changing the data, how can this be considered "visualization"?
| system2 wrote:
| Awesome. I will spend many hours looking at it today. Thanks a
| lot.
| bhartzer wrote:
| This is very similar to Majestic's Link Graph where you can put
| in any domain name and see all the links, up to tier 5, that link
| to that domain name.
| jszymborski wrote:
| My blog is on here, but as a lonely, lonely node. I link to stuff
| I promise!
| ng-henry wrote:
| We just haven't crawled your site yet! There's a lot of links
| so we can't crawl them all :(
| erikig wrote:
| Reminds me a little of my fav sub-reddit browsing tool -
| https://anvaka.github.io/map-of-reddit/
|
| One nice feature that would be helpful is the ability to preview
| the blog.
| anfractuosity wrote:
| Cool, I'm just wondering how come some nodes don't have any lines
| to/from them, does that mean they came from an initial seed list?
| JohnKemeny wrote:
| Any chance of getting a copy of the underlying dataset?
| nerdl0ve_kr wrote:
| what's the point of this?
| montyanderson wrote:
| Reminds me of my friend's visualisation of tracks on the popular
| London station NTS https://www.barneyhill.com/pages/nts-
| tracklists/. Turns out a lot of cool artists like the same
| tracks... ;)
| rcarmo wrote:
| Hmmm. My site is listed, but I have _way_ more inbound and
| outbound links than shown.
|
| And I have my own internal links visualization, which might be a
| bit over the top (GPU recommended):
| https://taoofmac.com/static/graph
| ng-henry wrote:
| See this comment I posted in another thread:
|
| Yep this is only for stuff that we've crawled, so we can't
| detect all of your links. Because we have limited crawling
| resources, we rate-limit the crawling by domain so we don't get
| stuck in spider traps. The current visualization only shows the
| current state of the crawl, so it won't know about all of the
| posts.
| hanniabu wrote:
| Surprised to see litprotocol at the same level as etherscan
___________________________________________________________________
(page generated 2024-04-23 23:00 UTC)