[HN Gopher] League of Legends data scraping the hard and tedious...
       ___________________________________________________________________
        
       League of Legends data scraping the hard and tedious way for fun
        
       Author : maknee
       Score  : 115 points
       Date   : 2025-02-12 11:11 UTC (11 hours ago)
        
 (HTM) web link (maknee.github.io)
 (TXT) w3m dump (maknee.github.io)
        
       | Kuinox wrote:
       | The diagrams are not visible in dark mode.
        
         | bilekas wrote:
         | I see comments like this a lot actually and I'm curious, if the
         | client is manipulating the intended style and layout of the
         | site, do you really think it's the responsibility of the
         | website owner ?
         | 
         | Otherwise I'm confused why you mention it.
        
           | bool3max wrote:
           | In this case yes because the website itself has a dark-mode
           | toggle in the top right corner, and in its dark mode, the
           | images are not visible.
        
             | bilekas wrote:
             | Ahhh I missed that! That's completely fair then
        
           | akerl_ wrote:
           | The site automatically displays in dark mode if the browser
           | says it's using dark mode.
           | 
           | So this isn't something the user is doing to manipulate the
           | style and layout: their browser is saying "hey, fyi, this
           | user's local system biases to dark mode" and the site is
           | choosing to respond by styling in a way that breaks diagram
           | visibility.
        
           | stordoff wrote:
           | This site has a theme picker to toggle between light and dark
           | modes.
        
           | jwagenet wrote:
           | The blog has a toggle for darkmode and some of their images
           | are black text with a transparent background. When darkmode
           | is toggled, the text is effectively invisible, so in this
           | case it seems to be an oversight of the blog.
        
           | doix wrote:
           | This isn't the case of a browser plugin modifying the styles.
           | The blog framework or whatever detects what your
           | browser/system preference is and respects it. So if you've
           | got your browser/os set to "dark mode" the page renders in
           | "dark mode". Except the author used transparent images with
           | dark lines, so they are invisible.
           | 
           | I think it's fair enough to complain about.
        
         | doix wrote:
         | document.querySelectorAll('img').forEach(img =>
         | img.style.background = 'white');
         | 
         | As a quick hack for anyone else that has the problem (paste
         | into your browser console).
        
         | maknee wrote:
         | Oops, I didn't realize that the images are not visible in dark
         | mode. I'll fix it. Thanks for pointing that out!
        
       | armanckeser wrote:
       | Really cool project! I am not sure if this is only me, but your
       | dark theme is hiding the illustrations fyi.
        
       | moonshadow565 wrote:
       | > League of Legends runs on a custom game engine developed in
       | 2009.
       | 
       | Developed by Sergey Titov (same engine that powers Big Rigs).
        
         | killerteddybear wrote:
         | Big Rigs: Over the Road Racing?
        
           | moonshadow565 wrote:
           | Yes, angry video game nerd made a very funny video about it.
           | Other game that i know that runs on same engine is WarZ.
        
       | picafrost wrote:
       | A tip:                 @media (prefers-color-scheme: dark) {
       | img[src*="svg"], img[src*="png"] {           filter: invert(1)
       | hue-rotate(180deg);         }       }
        
       | finalfire wrote:
       | This is really something cool, and it is exactly what I was
       | looking for. To give a context, I worked on some data science-
       | inspired studies [1] about LoL, and the future research direction
       | is to provide a formal modeling for the games and analyze them
       | through it. While I had a little success by getting aggregated
       | data from websites such as uol.gg, the granularity is not fine
       | enough to do very interesting analysis.
       | 
       | [1] https://doi.org/10.1016/j.ipm.2023.103516
        
       | SpaceManNabs wrote:
       | One of the cool things about dota is that opendota and stratz
       | provide a lot of data because steam is relatively open.
       | 
       | it is how i wrote a blog post on generating builds for heroes
       | before dota plus even had the feature!
        
       | m0w0kuma wrote:
       | I've been working on something similar [1], but I took a
       | different approach: I statically extract all decryption stubs
       | using a IDA script I wrote, then emulate them using Unicorn. I'm
       | also interested in your implementation details--do you have your
       | code on GitHub or somewhere else?
       | 
       | [1] https://github.com/m0w0kuma/ROFL
        
         | maknee wrote:
         | That's pretty cool! It's quite similar to my tool in many ways.
         | Parsing the file, setting up the packet context and using
         | unicorn :)
         | 
         | The repo isn't on github. I might release it later, but I would
         | want it to be in a better shape if I were to.
        
       | infogulch wrote:
       | Getting data by directly processing the packets instead of using
       | the (buggy, slow) replay system is a great idea. There's a lot of
       | interesting data in the middle of LoL gamestate that is missing
       | in summary overviews that only consider the final state of the
       | game.
        
       | jeremiahar wrote:
       | I worked on something like this back in 2016, I'm not sure how
       | much things have changed since then. I used dynamic binary
       | instrumentation to deal with the field encryption. Basically,
       | manually map the executable into executable memory on Linux (as
       | if it were a shared library). Begin execution at the packet
       | switch, but before executing a block of code, disassemble it
       | until a conditional branch, and modify it according to some
       | heuristics to remove the at rest encryption. The original block
       | of code wasn't executed since it might not fit into the original
       | block size, so new blocks were mmap'd for this. Malloc/Free were
       | hooked and replaced with wrappers over glibc's free/malloc, but
       | with bookkeeping so that the memory can be freed after execution
       | of the packet switch. atexit was just replaced with a noop. That
       | all just dealt with the encryption, but there were also
       | randomized packet id's and field orders. Those problems were
       | dealt with by using manually written heuristics based on the
       | packet id's which were actually interesting. Packet handlers with
       | references to text strings (even hashed ones), etc were a gold
       | mine here because they made static detection of packet id's
       | simple. If there was no text string, many of the offsets could be
       | auto detected just by parsing a replay and running small snippets
       | to determine which offsets actually "made sense" for the field
       | that was being searched for. For example, if there was a gold
       | gain packet, the amount of gold gained shouldn't be out of an
       | expected range, or else the offset is likely not corresponding to
       | that field. Once all of the high volume code blocks had been
       | instrumented, replays were able to be parsed in 2-3 seconds
       | (along with generating the desired data aggregations). This is
       | all from memory so it's possible there could be a minor mistake
       | or two.
        
       | exar0815 wrote:
       | I did something similar with a friend for some time for another
       | game.
       | 
       | As it went, our data was used to prove things to the developer
       | they would have loved to hush-hush, which led to a cat and mouse
       | game with the data and their open and... not so open apis. In the
       | End, we stopped playing the game and stopped our efforts at it.
       | Fun times.
        
       | landr0id wrote:
       | The World of Warships community has gone through similar steps,
       | but the encryption is much more straightforward. Some of the
       | packets are pickled Python, some are just binary blobs, so there
       | are some undocumented packets but for the most part people have
       | done a decent job of figuring it out and building tooling around
       | it such as the minimap renderer: https://github.com/WoWs-Builder-
       | Team/minimap_renderer
       | 
       | There's an odd unspoken and somewhat understood agreement between
       | the developer (Wargaming) and community though: the community
       | actively reverse engineers the game to document the packets and
       | WG kind of looks the other way (except when they recently
       | threatened me with a perma ban :) -- they even use the tooling
       | the community creates for official tournaments.
       | 
       | In this article the author mentions Riot partnering with external
       | companies to provide more rich data set and analytics. Do they
       | use these tools/data sets for tournaments as well? Is it known at
       | all how these partnerships are structured?
        
         | maknee wrote:
         | Glad to see another community working on similar things!
         | 
         | I do not know how RIOT partners with external companies, so I
         | do not know any analysis tools or datasets besides what is
         | publicly available :(
         | 
         | At least, RIOT offers special endpoints/overlays for companies
         | ~ [1][2].
         | 
         | [1] https://blitz.gg/overlays/lol [2]
         | https://www.overwolf.com/browse-by-game/league-of-legends
        
       | pton_xd wrote:
       | I've always heard that "security through obscurity" is
       | discouraged because, well, there's no stopping someone from
       | digging in and figuring it out. However in this case it seems
       | somewhat successful in that the author was not able to decrypt
       | the packets directly.
       | 
       | The article says that "while it might seem feasible to
       | reimplement these functions in Python without running the client,
       | several factors make this approach impractical" and then lists
       | some reasons like the lookup tables changing, chunk layouts
       | getting shuffled, etc.
       | 
       | Is that all it takes to thwart decrypting the packets? Even
       | though, presumably, you have access to all those lookup tables
       | and chunk layouts somewhere in the client? Is it just too much
       | effort to piece together how it works? I'd be curious to hear
       | more specifics on how exactly Riot was able to make reverse
       | engineering this so impractical.
       | 
       | Great article!
        
       | babuloseo wrote:
       | GTFO hackernews, we only play Dota2 here.
        
       ___________________________________________________________________
       (page generated 2025-02-12 23:00 UTC)