[HN Gopher] Show HN: I scrape Steam data every month and it's yo...
___________________________________________________________________
Show HN: I scrape Steam data every month and it's yours to download
for free
Yeah, there's AI, but I added it because I found it easier to find
answers I'm looking for. For the data scientists, you can download
the CSV and go crazy. Would love to know what discoveries or
learnings can be found from it. To download the raw scraped data
you need to become a paid member but you don't really need it
unless you're wanting to finesse a table of data for a particular
need. The cost is mostly just an incentive to help me pay the bills
for running the website. The bunch of available CSV files contain
large amounts of data which has everything from tags, genres,
pricing, wishlists, estimated revenue, etc. It's what the AI is
reading from. Hope you find it useful :-)
Author : csmets
Score : 145 points
Date : 2025-02-24 11:43 UTC (11 hours ago)
(HTM) web link (www.gginsights.io)
(TXT) w3m dump (www.gginsights.io)
| endre wrote:
| nice
| ddxv wrote:
| Hi, I'm interested in scraping steam too. Do you have the scraper
| code available open source or one you recommend?
| lolinder wrote:
| Have you looked over the data that OP is providing here and
| determined that it doesn't meet your needs?
|
| Generally it's polite to avoid scraping if you can help it, so
| I'd start by considering whether OP is already providing what
| you are looking for.
| DrammBA wrote:
| On the other hand you need to be a paid member to download
| the raw scraped data, so it isn't unreasonable to want to
| learn how to scrape it instead.
| schnebbau wrote:
| Good idea, let's save $5 and sink dozens of hours into
| building our own instead
| DrammBA wrote:
| Isn't that the hacker spirit, wanting to put things
| together yourself? Let's revisit the comment that started
| this thread:
|
| > Hi, I'm interested in scraping steam too.
| schnebbau wrote:
| I was responding directly to your objection with having
| to pay for it as the reason to do this, not hacking for
| the sake of hacking.
| diggan wrote:
| Hacking for the sake of hacking VS hacking for the sake
| of saving money, why does it matter?
| schnebbau wrote:
| Jesus Christ, it's 5 bucks. 5 bucks is not a reason to
| roll your own version.
|
| Roll your own version because you want to roll your own
| version, not to save 5 bucks.
| DrammBA wrote:
| you seem to be really hung up on the 5 bucks while at the
| same time being angry that people are hung up on the 5
| bucks, it's just 5 bucks man, if people want to pay it or
| not doesn't matter, let people hack it away if they want
| nickthegreek wrote:
| Its not $5. It is free or $5/month. The reason OP wants
| to scrape doesnt matter. His question was reasonable to
| ask here and could lead to finding out about some open
| source projects. You have not added anything to the
| conversation besides being wrong about the price.
| matly wrote:
| Let's expand our own skillset by investing time instead
| of money (paying someone else). Sounds like a reasonable
| proposition to me.
| netruk44 wrote:
| I wrote a simple scraper for a 'steam game semantic search' app
| I built a while ago.
|
| It definitely won't fetch all the data that this person does
| though. It only fetches the current list of games on Steam,
| their store page information and some reviews for the game.
|
| The code quality probably isn't amazing, but it might give you
| an idea of how to get started with your own scraper.
|
| https://github.com/Netruk44/steam-embedding-search/blob/main...
| ddxv wrote:
| Thanks! That's perfect, just want somewhere to get started.
| DrammBA wrote:
| https://steamdb.info/faq/#how-are-we-getting-this-informatio...
|
| I found this explanation from steamdb that points to the
| various projects and libraries they use to gather all the data
| they have. It's not a how-to, but it has very useful info.
| aranw wrote:
| Nice! It would be nice however to see more detail about the data
| you collect and what exactly you provide on top of it using AI or
| through aggregation etc
| kmfrk wrote:
| I got some answers that weren't specifically about my questions
| in some instances. As someone who's just trying out the free
| demo, it's not a big deal, but maybe you can provide a way to
| flag answers for to redeem their credits? It would probably
| increase retention and help people chase down bugs.
| Apreche wrote:
| Do you have data that https://steamdb.info/ doesn't have?
| noirscape wrote:
| Steamdb lacks an API for one, and the devs officially have a
| policy that they'll never make one, saying you should just
| scrape Steam directly instead of bugging them about it[0].
|
| It means that steamdb, while extraordinarily useful for casual
| prodding at what's stored on Valve's servers, isn't very good
| if you want to run data analysis or something like that on the
| metadata of Steam games at scale.
|
| Not sure if it's legal to charge for the raw scrape when OP
| doesn't seem to be affiliated with Valve, but that's not up to
| me to figure out.
|
| [0]: https://steamdb.info/faq/
| joseda-hg wrote:
| That seems pretty reasonable, it's their data, they just make
| useful visualizations
| seanw444 wrote:
| This whole time I was under the impression that SteamDB was
| owned by Valve. Huh.
| ghfhghg wrote:
| I guess the main differentiator over steamdb is getting the data
| in CSV?
|
| Might be good to clarify in the FAQ because the people I know who
| would pay for this are not the most techy types.
| bdd8f1df777b wrote:
| It seems to be missing reviews? I have always thought about
| building my own recommendation engine from steam data, given how
| steam's own recommendation never works for me.
| somenameforme wrote:
| Out of curiosity, what formula did you end up using for
| reviews:sales? I've looked into this a bunch and it's a very
| tough problem!
| giancarlostoro wrote:
| > Yeah, there's AI, but I added it because I found it easier to
| find answers I'm looking for. For the data scientists, you can
| download the CSV and go crazy.
|
| This is kind of the only way I use AI really, to summarize
| things, and extract details, then review from the raw sources to
| make sure the LLM isn't misleading me. I find myself using this
| approach instead of Googling for things since Google crippled
| their search the last few years, it feels like every year its
| harder to find things with Google. I miss 2007 Google...
| dewey wrote:
| Give Kagi a try, it's basically Google before it went to shit.
| z3c0 wrote:
| I had to refresh before posting, because I wanted to see if
| someone else beat me to being _that_ HN commenter but...
|
| From the Terms of Service (emphasis mine):
|
| 6. Restrictions on Use
|
| You agree not to: Use the Service for any
| unlawful purpose. Attempt to reverse-engineer,
| modify, or *create derivative works of the Service.*
| Share, resell, or distribute downloadable data provided by the
| Service without explicit written permission.
|
| Do you intend to delineate the data provided by the service from
| "the Service" itself? It seems most fair that data received via
| Fair Use remains in that arena, pun fully intended.
|
| That aside, it's an intriguing dataset nonetheless, but I'd
| prefer to see a sample of the data before signing up.
| JadoJodo wrote:
| At a glance, it appears the product is the "chat with the data"
| feature; The CSV is free.
| DrammBA wrote:
| What I don't understand is the difference between 'Download
| all CSV data' in the free tier and 'Download CSV data' /
| 'Download raw data' in the paid member tier. It seems that
| the free CSV data is likely an extract or digest of the raw
| data offered as a sample.
| z3c0 wrote:
| I might be inclined to seek the raw data, should it be more
| cost effective than scraping Steam myself.
|
| Being a user, free, paid, or anonymous, can still be under
| the thumb of their ToS, especially so if they force a dialog
| in front of you to agree to the ToS while signing up. I'm
| merely pointing out hurdles to the OP that may obstruct some
| of the people they are trying to reach.
| akudha wrote:
| Steamdb.info displays graphs etc. Is that considered a
| "derivative work"?
|
| I am not sure what is considered derivative work and what isn't
| z3c0 wrote:
| IANAL but I am someone who deals heavily in 1) scraping and
| 2) data and the analysis, enrichment & brokerage thereof. As
| such, I like to consult this for anything regarding US
| Copyright law: https://www.copyright.gov/circs
|
| Circular 14 addresses derivative works, including those based
| on data: https://www.copyright.gov/circs/circ14.pdf
|
| Steamdb.info is a derivative work, yes. And scraping is
| usually accepted as Fair Use, so both services are presumably
| within their rights, but they have no claim to the underlying
| data, only their process of enrichment. If someone were to
| build a new service based on the data presented on either
| site, there's not much they could do to stop them... short of
| getting them to agree not to do so via their ToS.
|
| OpenAI is a great example of a company who built a derivative
| work on scraped data available under Fair Use, and then
| subsequently gated their data via their ToS. With such a
| popular precedent at play, I'd rather not use any services
| doing anything similar, especially when steamdb.info doesn't
| even have a ToS.
| akudha wrote:
| Thank you. Does this still hold good if steamdb was making
| money (ads, for example)?
|
| Also, I am wary of using big companies like OpenAI as
| precedent. Big companies can do whatever they want and get
| away with a lot of stuff that individuals and smaller
| companies can only dream of
| z3c0 wrote:
| Yes, within some limits, but if one were to set up a
| business like that, it's a very good idea to seek out a
| consultation from a local copyright lawyer to know
| exactly what one can and can't get away with. Datasets
| are addressed as a "collective work", which lumps them in
| with everything ranging from art books, to hackernews, to
| scientific journals.
|
| Personally, I wouldn't sell anything I gathered from a
| publicly available source anyways, mostly out of
| principle, but doubly so if that source is as well-paid
| as Valve.
| gopher_space wrote:
| > Personally, I wouldn't sell anything I gathered from a
| publicly available source anyways, mostly out of
| principle, but doubly so if that source is as well-paid
| as Valve.
|
| Market reports are an entire industry, and people pay for
| them solely to avoid ingesting a tangential domain. It's
| ok to sell your transformations.
|
| My advice is free, my custom tooling is dirt cheap with
| public examples, and my finished product costs money
| every month. It's basically price tiers based on your
| interest level.
| csmets wrote:
| Thank you for highlighting this. I've updated the terms to
| align with the values of this service.
| bloomingkales wrote:
| Question for OP, or anyone that considered it:
|
| Do you think Steam reviews are coordinated?
| bluefirebrand wrote:
| I think for basically any possible online discussion, from
| Facebook to Hacker News to Steam Reviews, you should always
| keep in mind that some portion of it is _probably_ astroturfed,
| to some scale
|
| Anything from a small indie game to a huge AAA title, you can
| bet that the creators got their friends and family to post some
| nice reviews early, just to give it that positive bump
| bloomingkales wrote:
| I was specifically alarmed by what looked like review bombing
| of a indie game. I just can't imagine it. I need to write a
| small llm plugin that collapses coordinated/astroturfed
| reviews.
| bluefirebrand wrote:
| The smaller the scale the easier to astroturf, honestly
|
| If there are only 20 reviews it's pretty easy for one
| person to review bomb on their own if they want to
|
| It gets much harder when there are 2 million reviews
| shagie wrote:
| > Do you think Steam reviews are coordinated?
|
| Yes. It's not even a question. Steam flags outliers too.
|
| https://store.steampowered.com/app/281990/Stellaris/
|
| It got review bombed starting on Feb 14th because a _different_
| game that the company makes (HOI4) released DLC that upset the
| sensibilities of part of _that_ player base. (
| https://old.reddit.com/r/Stellaris/comments/1iqzih8/why_is_s...
| )
|
| ---
|
| There are Steam review bots for discord (
| https://www.codecks.io/steam-bot/ ) and that also encourages
| people who are members of a game's discord to leave a
| (positive) review.
|
| ---
|
| It's a certainty that reviews are coordinated through a number
| of different means.
| m00dy wrote:
| If you need to be a paid member to download csv file, then it is
| not free :) lol
| xerox13ster wrote:
| If you need to make an account and give this guy personal
| information (a digital commodity like oil) to see the data it's
| not free lmao
| stronglikedan wrote:
| > If you need to make an account and give this guy personal
| information
|
| In this case, you don't. That's just to weed out people who
| can't figure out temporary emails. I just used one to create
| an account without turning over any PI.
| xerox13ster wrote:
| > If you need to make an account
|
| >In this case, you don't.
|
| >I just used one to create an account
|
| Strange. Tell me, do you often struggle with such basic
| logic and reading comprehension?
| nickthegreek wrote:
| free tier allows you to download the csv.
| bitbasher wrote:
| You use the chat but the credit used isn't updated immediately in
| the lower left.
| stared wrote:
| Regarding Steam data, I am curious about how games are being
| played (hours spent) and, even more, about their co-occurrence
| (i.e., player X spent both time on game A and game B). I would
| love to make a visualization like
| https://p.migdal.pl/tagoverflow/?site=gaming&size=32, but for
| Steam data.
|
| Also, for deeper insight than sales volumes (e.g., game design,
| general trends, demographics, types of players), such things
| would be crucial.
|
| and
| ryanisnan wrote:
| I thought this would a stavros post. Thanks for your efforts!
| eamsen wrote:
| Can you please provide examples for the raw data? As a user, I
| would like to know what I'm buying before paying.
| babuloseo wrote:
| looking into this thank you.
| thot_experiment wrote:
| > To download the raw scraped data you need to become a paid
| member
|
| If I have to pay to download the data how is it mine to download
| for free?
| rapfaria wrote:
| Do the HN crowd read "for free" on the title, click it, scan the
| page in a milisecond, see "Pricing" on the top, and come back to
| complain in the comments? Geez
| satiric wrote:
| Yes, it's disingenuous. Free means $0.00.
| happyopossum wrote:
| > I scrape Steam data every month and it's yours to download for
| free
|
| Does not line up with
|
| > To download the raw scraped data you need to become a paid
| member
|
| Sooo, clickbait or just plain dishonest?
| voodooEntity wrote:
| ye clickbait first i saw was pricing. this should be deleted
| (thread)
___________________________________________________________________
(page generated 2025-02-24 23:01 UTC)