[HN Gopher] League of Legends data scraping the hard and tedious...
___________________________________________________________________
League of Legends data scraping the hard and tedious way for fun
Author : maknee
Score : 115 points
Date : 2025-02-12 11:11 UTC (11 hours ago)
(HTM) web link (maknee.github.io)
(TXT) w3m dump (maknee.github.io)
| Kuinox wrote:
| The diagrams are not visible in dark mode.
| bilekas wrote:
| I see comments like this a lot actually and I'm curious, if the
| client is manipulating the intended style and layout of the
| site, do you really think it's the responsibility of the
| website owner ?
|
| Otherwise I'm confused why you mention it.
| bool3max wrote:
| In this case yes because the website itself has a dark-mode
| toggle in the top right corner, and in its dark mode, the
| images are not visible.
| bilekas wrote:
| Ahhh I missed that! That's completely fair then
| akerl_ wrote:
| The site automatically displays in dark mode if the browser
| says it's using dark mode.
|
| So this isn't something the user is doing to manipulate the
| style and layout: their browser is saying "hey, fyi, this
| user's local system biases to dark mode" and the site is
| choosing to respond by styling in a way that breaks diagram
| visibility.
| stordoff wrote:
| This site has a theme picker to toggle between light and dark
| modes.
| jwagenet wrote:
| The blog has a toggle for darkmode and some of their images
| are black text with a transparent background. When darkmode
| is toggled, the text is effectively invisible, so in this
| case it seems to be an oversight of the blog.
| doix wrote:
| This isn't the case of a browser plugin modifying the styles.
| The blog framework or whatever detects what your
| browser/system preference is and respects it. So if you've
| got your browser/os set to "dark mode" the page renders in
| "dark mode". Except the author used transparent images with
| dark lines, so they are invisible.
|
| I think it's fair enough to complain about.
| doix wrote:
| document.querySelectorAll('img').forEach(img =>
| img.style.background = 'white');
|
| As a quick hack for anyone else that has the problem (paste
| into your browser console).
| maknee wrote:
| Oops, I didn't realize that the images are not visible in dark
| mode. I'll fix it. Thanks for pointing that out!
| armanckeser wrote:
| Really cool project! I am not sure if this is only me, but your
| dark theme is hiding the illustrations fyi.
| moonshadow565 wrote:
| > League of Legends runs on a custom game engine developed in
| 2009.
|
| Developed by Sergey Titov (same engine that powers Big Rigs).
| killerteddybear wrote:
| Big Rigs: Over the Road Racing?
| moonshadow565 wrote:
| Yes, angry video game nerd made a very funny video about it.
| Other game that i know that runs on same engine is WarZ.
| picafrost wrote:
| A tip: @media (prefers-color-scheme: dark) {
| img[src*="svg"], img[src*="png"] { filter: invert(1)
| hue-rotate(180deg); } }
| finalfire wrote:
| This is really something cool, and it is exactly what I was
| looking for. To give a context, I worked on some data science-
| inspired studies [1] about LoL, and the future research direction
| is to provide a formal modeling for the games and analyze them
| through it. While I had a little success by getting aggregated
| data from websites such as uol.gg, the granularity is not fine
| enough to do very interesting analysis.
|
| [1] https://doi.org/10.1016/j.ipm.2023.103516
| SpaceManNabs wrote:
| One of the cool things about dota is that opendota and stratz
| provide a lot of data because steam is relatively open.
|
| it is how i wrote a blog post on generating builds for heroes
| before dota plus even had the feature!
| m0w0kuma wrote:
| I've been working on something similar [1], but I took a
| different approach: I statically extract all decryption stubs
| using a IDA script I wrote, then emulate them using Unicorn. I'm
| also interested in your implementation details--do you have your
| code on GitHub or somewhere else?
|
| [1] https://github.com/m0w0kuma/ROFL
| maknee wrote:
| That's pretty cool! It's quite similar to my tool in many ways.
| Parsing the file, setting up the packet context and using
| unicorn :)
|
| The repo isn't on github. I might release it later, but I would
| want it to be in a better shape if I were to.
| infogulch wrote:
| Getting data by directly processing the packets instead of using
| the (buggy, slow) replay system is a great idea. There's a lot of
| interesting data in the middle of LoL gamestate that is missing
| in summary overviews that only consider the final state of the
| game.
| jeremiahar wrote:
| I worked on something like this back in 2016, I'm not sure how
| much things have changed since then. I used dynamic binary
| instrumentation to deal with the field encryption. Basically,
| manually map the executable into executable memory on Linux (as
| if it were a shared library). Begin execution at the packet
| switch, but before executing a block of code, disassemble it
| until a conditional branch, and modify it according to some
| heuristics to remove the at rest encryption. The original block
| of code wasn't executed since it might not fit into the original
| block size, so new blocks were mmap'd for this. Malloc/Free were
| hooked and replaced with wrappers over glibc's free/malloc, but
| with bookkeeping so that the memory can be freed after execution
| of the packet switch. atexit was just replaced with a noop. That
| all just dealt with the encryption, but there were also
| randomized packet id's and field orders. Those problems were
| dealt with by using manually written heuristics based on the
| packet id's which were actually interesting. Packet handlers with
| references to text strings (even hashed ones), etc were a gold
| mine here because they made static detection of packet id's
| simple. If there was no text string, many of the offsets could be
| auto detected just by parsing a replay and running small snippets
| to determine which offsets actually "made sense" for the field
| that was being searched for. For example, if there was a gold
| gain packet, the amount of gold gained shouldn't be out of an
| expected range, or else the offset is likely not corresponding to
| that field. Once all of the high volume code blocks had been
| instrumented, replays were able to be parsed in 2-3 seconds
| (along with generating the desired data aggregations). This is
| all from memory so it's possible there could be a minor mistake
| or two.
| exar0815 wrote:
| I did something similar with a friend for some time for another
| game.
|
| As it went, our data was used to prove things to the developer
| they would have loved to hush-hush, which led to a cat and mouse
| game with the data and their open and... not so open apis. In the
| End, we stopped playing the game and stopped our efforts at it.
| Fun times.
| landr0id wrote:
| The World of Warships community has gone through similar steps,
| but the encryption is much more straightforward. Some of the
| packets are pickled Python, some are just binary blobs, so there
| are some undocumented packets but for the most part people have
| done a decent job of figuring it out and building tooling around
| it such as the minimap renderer: https://github.com/WoWs-Builder-
| Team/minimap_renderer
|
| There's an odd unspoken and somewhat understood agreement between
| the developer (Wargaming) and community though: the community
| actively reverse engineers the game to document the packets and
| WG kind of looks the other way (except when they recently
| threatened me with a perma ban :) -- they even use the tooling
| the community creates for official tournaments.
|
| In this article the author mentions Riot partnering with external
| companies to provide more rich data set and analytics. Do they
| use these tools/data sets for tournaments as well? Is it known at
| all how these partnerships are structured?
| maknee wrote:
| Glad to see another community working on similar things!
|
| I do not know how RIOT partners with external companies, so I
| do not know any analysis tools or datasets besides what is
| publicly available :(
|
| At least, RIOT offers special endpoints/overlays for companies
| ~ [1][2].
|
| [1] https://blitz.gg/overlays/lol [2]
| https://www.overwolf.com/browse-by-game/league-of-legends
| pton_xd wrote:
| I've always heard that "security through obscurity" is
| discouraged because, well, there's no stopping someone from
| digging in and figuring it out. However in this case it seems
| somewhat successful in that the author was not able to decrypt
| the packets directly.
|
| The article says that "while it might seem feasible to
| reimplement these functions in Python without running the client,
| several factors make this approach impractical" and then lists
| some reasons like the lookup tables changing, chunk layouts
| getting shuffled, etc.
|
| Is that all it takes to thwart decrypting the packets? Even
| though, presumably, you have access to all those lookup tables
| and chunk layouts somewhere in the client? Is it just too much
| effort to piece together how it works? I'd be curious to hear
| more specifics on how exactly Riot was able to make reverse
| engineering this so impractical.
|
| Great article!
| babuloseo wrote:
| GTFO hackernews, we only play Dota2 here.
___________________________________________________________________
(page generated 2025-02-12 23:00 UTC)