[HN Gopher] Warning: $14k BigQuery charge in 2 hours
       ___________________________________________________________________
        
       Warning: $14k BigQuery charge in 2 hours
        
       Author : httparchive
       Score  : 117 points
       Date   : 2024-02-20 20:56 UTC (2 hours ago)
        
 (HTM) web link (discuss.httparchive.org)
 (TXT) w3m dump (discuss.httparchive.org)
        
       | httparchive wrote:
       | This website makes it seem like this "public" dataset is for the
       | community to use, but it is instead a for-profit money maker for
       | Google Cloud and you can lose tens of thousands of dollars.
       | 
       | Last week I ran a script on BigQuery for historical HTTP Archive
       | data and was billed $14,000 by Google Cloud with zero warning
       | whatsoever, and they won't remove the fee.
       | 
       | This official website should be updated to warn people Google is
       | apparently now hosting this dataset to make money. I don't think
       | that was the original mission, but that's what it is today,
       | there's basically zero customer support, and you can lose $14k in
       | the blink of an eye.
       | 
       | Academics, especially grad students, need to be aware of this
       | before they give a credit card number to Google. In fact, I'd
       | caution against using this dataset whatsoever with this new
       | business model attached.
        
         | MrDarcy wrote:
         | Did the cost estimate calculator provide an inaccurate
         | estimate?
         | 
         | https://cloud.google.com/bigquery/docs/best-practices-costs
         | 
         | Estimate query costs
         | 
         | BigQuery provides various methods to estimate cost:
         | 
         | Use the query dry run option to estimate costs before running a
         | query using the on-demand pricing model. Calculate the number
         | of bytes processed by various types of query. Get the monthly
         | cost based on projected usage by using the Google Cloud Pricing
         | Calculator.
        
           | cornel_io wrote:
           | When I use the BQ interface, it estimates the bytes for each
           | query in real time before I run it, does that turn off if the
           | query is too big? I guess that isn't directly a cost
           | estimate, but if I saw hundreds of TB I'd think twice before
           | hitting "Run"...
        
         | judge2020 wrote:
         | Do you run httparchive, or did you make your username
         | "httparchive" just because it's the subject of your post?
        
         | mike_d wrote:
         | > Google is apparently now hosting this dataset to make money
         | 
         | Public datasets are hosted for free by Google (Amazon has a
         | similar program) to take the burden off public projects.
         | 
         | You didn't pay for the data, you paid for the query you ran
         | against it.
        
           | Jgrubb wrote:
           | Well sure, but how do you query the data they're hosting for
           | free without using google services?
        
             | anon84873628 wrote:
             | Well, sure. But it is convenient to have lots of sample
             | data. Also you get the first TiB per month free in BQ.
             | 
             | Also note that anyone can make a dataset available for
             | public use, where they pay the storage and the consumer
             | pays the compute. The official Google datasets are just
             | curated and maintained by Google itself.
        
         | gnfargbl wrote:
         | The real issue here is that you didn't quite understand what
         | BigQuery was when you pressed the button.
         | 
         | What it is, roughly, is a publicly-accessible data
         | supercomputer. If you lost $14k in a blink of the eye, then I
         | would think you consumed at least $4k of Google's actual
         | resources -- maybe $7k. Maybe more. That thing can move some
         | _serious_ data, and you apparently moved around over 2PB.
         | 
         | Google bears some significant responsibility for not making the
         | cost transparent to you, it's true. But on the the other hand,
         | don't they bear some significant credit for making such an
         | awesome power available to a lowly peon with a credit card?
        
       | mikeortman wrote:
       | The dataset _IS_ free to download, but running a query against it
       | on Google Cloudis what costs $$$. BigQuery is basically renting
       | servers to scan through the data, which is the fee
        
         | threeseed wrote:
         | Given how small the dataset is there is _no_ query that
         | justifies a $14k charge.
         | 
         | AWS charges $27/hour for a server with 3TB of memory. Enough to
         | run the queries in memory.
        
           | Symbiote wrote:
           | It's easy to make an enormous query by joining to other data
           | (or to the same data), or reading a lot of data.
           | 
           | A regex query on response_bodies would churn through 2.5TB of
           | data every time it's run.
        
           | darth_avocado wrote:
           | BQ charges you based on the volume of data being scanned. I
           | think this is a situation which involves scanning the whole
           | dataset again and again without fully understanding how it
           | works. I've worked with much larger datasets on BQ (petabyte
           | scale) and managed to not spend more than $1000 in an hour.
           | Also, BQ tells you how much data will be processed BEFORE you
           | run the query, which makes it easier to understand the cost
           | implications.
           | 
           | Again, you could fit the whole dataset in memory in an EC2
           | instance and do your thing.
        
         | treffer wrote:
         | The complaint says there should be a warning that processing
         | fees can be high. Go to the front page and check out the links.
         | Nothing really about cost. Someone follows that path and 14k
         | gone without a word about it. That's the path that people are
         | sent down from the website. It explicitly talks about using BQ
         | for analysis.
         | 
         | A simple "running queries over the whole dataset can cause
         | significant costs due to the size of the dataset" should be
         | enough. And I think that's a valid and fair point.
         | 
         | The whole part of accusing Google should just be ignored.
        
           | darth_avocado wrote:
           | The setup instructions mention what you're asking.
           | 
           | https://github.com/HTTPArchive/httparchive.org/blob/main/doc.
           | ..
        
             | treffer wrote:
             | I can't even find "cost" on that page. Only one rather tiny
             | side note that you could get past the free tier quota.
             | 
             | I don't think that's a proper warning on costs.
        
           | IshKebab wrote:
           | > The whole part of accusing Google should just be ignored.
           | 
           | I don't know. Google could trivially solve this problem by
           | imposing an opt-out warning on potentially expensive queries.
           | 
           | "It looks like your query might cost $14k. Are you sure?"
           | 
           | But money.
        
             | anon84873628 wrote:
             | It probably wasn't a single query costing $14k, but more
             | like 1k costing $14.
        
       | epanchin wrote:
       | To be brutally honest, it's badly considered queries like yours
       | that mean these services cannot be free.
        
         | threeseed wrote:
         | This comment is hilariously insane.
         | 
         | The idea that Google would give away BigQuery for free if only
         | people would write better SQL queries.
        
           | charcircuit wrote:
           | There are plenty of APIs on the internet that are free that
           | query a database for information. If queries are too
           | expensive it's not viable to run for free.
        
         | yjftsjthsd-h wrote:
         | Or, GCP could implement cost/resource/use limits, which would
         | allow them to give away whatever they wanted for free without
         | any concern about people over using it, while also allowing
         | people to avoid shooting their own feet off.
        
           | mulmen wrote:
           | I don't disagree but how does that work exactly? When you hit
           | the quota the query gets cancelled? That's definitely already
           | a feature of Redshift Spectrum with WLM. Does BigQuery offer
           | something similar?
        
             | yjftsjthsd-h wrote:
             | My first choice would be something like "this query will
             | cost $13953, which exceeds your default cap of $100; please
             | click the confirm button if you really want to run it".
             | (The dollars could be CPU-minutes or whatever if you want
             | to use resource based limits, which might play nicer with a
             | free tier)
             | 
             | Edit: rereading, I think this is actually for non-
             | interactive scripts, in which case yes it should just
             | cancel the query
             | 
             | Edit 2: https://news.ycombinator.com/item?id=39447499 was
             | kind enough to point out that the resource-based version of
             | this might actually exist, which is nice
        
         | httparchive wrote:
         | I've forgotten more Sql than most people ever learn. Time is
         | also valuable and I make trade-offs. Should I spend hours (eg.
         | $$$) to optimize or run a non-optimized query in the background
         | for a different cost? Well, I didn't think the
         | time/benefit/cost equation favored tuning, if I had known that
         | I'd have spent time on tuning. If you offer something for
         | "free" and then change the cost, and don't have any alerting
         | mechanisms to inefficient queries, it's impossible to evaluate
         | trade offs.
        
           | johnnyo wrote:
           | Can you post what a $14,000 SQL query looks like?
           | 
           | If nothing else, it can be an example in my SQL 101 course.
        
             | dabernathy89 wrote:
             | This would make a great educational blog post
        
           | kstrauser wrote:
           | > Time is also valuable and I make trade-offs.
           | 
           | I'd say!
        
       | MattGaiser wrote:
       | > Last week I ran a script on BigQuery for historical HTTP
       | Archive data and was billed $14,000 by Google Cloud with zero
       | warning whatsoever,
       | 
       | This comment kind of suggests that you do not understand how
       | BigQuery bills. The archive pays for the storage, but you have to
       | pay for the queries. You would also have had to attach a billing
       | account to run those queries. Running BigQuery searches is not
       | free.
       | 
       | Expensive lesson, but on the surface this one appears to be your
       | error.
        
         | madsbuch wrote:
         | It seems excessive to allow USD 14k spend on a newly created
         | account, or and account with no prior big spend. If I was
         | Google, I would not allow it without explicitly raising limits
         | or increasing quotas. Otherwise there is a big chance there
         | customer can not pay and they just lost that resource - unless
         | you don't really have an expense for that resource and you use
         | predatory pricing.
        
           | rvnx wrote:
           | It's like predatory telcos who charge you "roaming data fees:
           | $4,500, but took bad you didn't check your online bill
           | before"
           | 
           | https://arstechnica.com/gadgets/2009/04/users-62000-data-
           | bil...
        
         | httparchive wrote:
         | Yes and no, I ran the script before and the fee wasn't that
         | high (they jacked it up last summer). Usually I have to jump
         | through a ton of hoops just to add more CPU cores to my VMs so
         | I "trusted" that GCP would warn me if I ever made an error.
         | 
         | One of the bigger issues is they charged my card before I
         | literally had any notice what the bill was - it wasn't even in
         | the dashboard yet. I would have terminated the script ASAP had
         | I gotten *any* warning.
        
       | mulmen wrote:
       | What was the query you ran?
        
         | httparchive wrote:
         | I was doing historical evaluation for a few sites, so I was
         | running a query for each month going back to 2016 for each
         | site. I've done this before with no real issues, and if I knew
         | the charges were rapidly exploding I'd have halted the script
         | immediately - but instead it ran for 2 hours and the first
         | notice I got was the CC charge.
        
           | Symbiote wrote:
           | My guess is you were querying all the data each time.
           | 
           | If you instead filter out the rows you are interested in
           | (e.g. the particular "few sites" by their URL) and put that
           | in a new table, querying the resulting, tiny table will be
           | very cheap.
        
           | eklitzke wrote:
           | I haven't looked at the exact schema for this dataset but for
           | this type of query pattern to be efficient the data would
           | need to be partitioned by date.^[1] I'm guessing that it's
           | not partitioned this way and therefore each of these queries
           | that was looking at "one month" of data was doing a full
           | table scan, so if you queried N months you did N table scans
           | even though the exact same query results could have been
           | achieved even without partitioning by doing one table scan
           | with some kind of aggregation (e.g. GROUP BY) clause.
           | 
           | [1]: https://cloud.google.com/bigquery/docs/partitioned-
           | tables
        
           | mulmen wrote:
           | Can you be more specific? What filtering did you apply? How
           | many columns did you select?
        
       | summerlight wrote:
       | I frequently see this kind of surprising billing anecdotes across
       | many cloud providers. Why don't they provide a way to set a hard
       | budget limit applied for the entire account. I tried to see what
       | can be done for GCP and this seems pretty daunting.
       | 
       | https://medium.com/@steffenjanbrouwer/how-to-set-a-hard-paym...
        
         | Bjartr wrote:
         | Because they aren't sufficiently incentivized to make giving
         | them your money harder.
        
         | klysm wrote:
         | What incentive to cloud providers have to give you that
         | ability? I think they greatly appreciate the ability to
         | accidentally spend a lot of money
        
           | summerlight wrote:
           | To prevent a bad PR like this? When it goes viral, most
           | customer supports escalate it to a higher level then they
           | just eventually cancel the bill.
        
             | IshKebab wrote:
             | Google does not care about bad PR like this. It doesn't
             | affect their biggest customers.
        
           | braza wrote:
           | tbh, I have worked with AWS for at least 10 years, and
           | recently their field support are quite prone to help avoid
           | those scenarios (e.g. helped to save hundreds of thousands in
           | a single-digit million account).
           | 
           | This was one of the main selling points for all portfolio
           | companies of the group to adopt AWS in their digital
           | transformation projects.
        
             | 0cf8612b2e1e wrote:
             | Limited use for a nobody who wants to run <$100 / year
             | cloud spend and does not have account managers.
             | 
             | I would love to kick the tires on some AWS stuff, but the
             | threat of unlimited ruin is not worth it. Sure, maybe the
             | gods would take pity on me and wipe the debt, but far
             | easier to just run with someone who caps costs. My toy
             | project can gladly go down if the alternative is a huge
             | unexpected bill.
        
           | rvnx wrote:
           | An unhappy customer won't come back.
           | 
           | The OP is probably a good person with strong interest in data
           | science and building projects.
           | 
           | If it'd be "oh here's your $500 charge, upgrade your quota
           | for more, 'ok fair enough, I did a mistake'", but $14k is not
           | ok without explicit quota upgrade.
        
         | thatoneguy wrote:
         | Google AppEngine used to have that but -- presumably in the
         | interest of additional profit -- they removed it. Now I have to
         | make do with an alert that warns me long after I could be
         | hypothetically bankrupted, i.e. in seconds.
        
         | deelowe wrote:
         | Because then we'd see articles about how the next start up
         | missed their opportunity whenever their site unexpectedly got
         | discussed on the latest Rogan episode and subsequently was
         | taken offline by the limits being tripped.
        
           | onion2k wrote:
           | Companies could make the limit optional and pass 100% of that
           | downside to the customer. 99.9% of customers would opt in.
        
           | sfn42 wrote:
           | I don't see the problem. Don't set a budget limit if you
           | don't want your app to go offline. Lots of people wouldn't
           | mind if their app went offline for a bit. They'd prefer to
           | not suddenly get a $10,000 bill
        
           | ghaff wrote:
           | There's no "right" answer. In one case, it's checked the
           | wrong box and got a $14K bill. In the other case, it's I
           | checked the wrong box and my startup missed its one window.
           | There are in-between levels of alerting etc. for both
           | populations but they're probably unsatisfactory for the
           | extreme conditions.
           | 
           | To be clear: I'd be very in favor of the major cloud
           | providers having a "DO NOT! DO NOT! use this for production
           | mode and your content could be deleted at any time if you
           | screw up. But I suspect most people wouldn't use that."
        
         | ToucanLoucan wrote:
         | My cynical self sees it as how cloud providers aim to make the
         | most money: by making billing oblique and waiting for buzzword-
         | happy project leads to mandate stuff be put on their service
         | without understanding what the end billing will be.
         | 
         | I can't say that's for certain what it is. I just know a
         | hallmark of any business with recurring charges that are
         | otherwise incomprehensible is so they can hit you with the
         | charge after the fact, and you have little recourse to avoid
         | paying it without a ton of work for yourself or your team.
        
         | MrDarcy wrote:
         | Google does provide a way, project owners can set a custom
         | quota to limit costs.
        
           | yjftsjthsd-h wrote:
           | So your comment made me go look it up, and if you squint hard
           | that's kind of true...
           | 
           | https://cloud.google.com/billing/docs/how-
           | to/notify#cap_disa...
           | 
           | Notice that their "solution" is to tell you how if you want
           | you can spin up effectively your own custom service to watch
           | spend and if it goes over some threshold delete the entire
           | project[0] after some delay. This is the malicious compliance
           | version of letting you add a limit.
           | 
           | [0] At least, that's how I interpret "This example removes
           | Cloud Billing from your project, shutting down all resources.
           | Resources might not shut down gracefully, and might be
           | irretrievably deleted. There is no graceful recovery if you
           | disable Cloud Billing. You can re-enable Cloud Billing, but
           | there is no guarantee of service recovery and manual
           | configuration is required."
        
             | MrDarcy wrote:
             | I meant this, linked directly off the big query cost
             | estimation docs:
             | 
             | https://cloud.google.com/bigquery/docs/custom-quotas
             | 
             | > Custom quota is proactive, so you can't run an 11 TB
             | query if you have a 10 TB quota. Creating a custom quota on
             | query data lets you control costs at the project level or
             | at the user level.
        
               | yjftsjthsd-h wrote:
               | Oh, good catch! Yes, that does look like something that
               | can be coerced into limiting it. Having actually tried to
               | click through, it is very much not as simple as "don't
               | spend more than $X"; the doc points to
               | https://console.cloud.google.com/iam-admin/quotas and you
               | have to find and set the right quota, but yes that can
               | probably help.
        
             | beejiu wrote:
             | Isn't it the quota limit that you need to set?
             | https://cloud.google.com/bigquery/quotas#query_jobs
        
         | onion2k wrote:
         | The reasons are probably quite complicated, because some of
         | them are bound by hard technical limits to how quickly a system
         | can react and thus make a hard limit actually a _hard_ limit,
         | but realistically that 's largely solvable just by making it a
         | softer hard limit (eg you set a limit of $1000 and the terms
         | say you pay that plus whatever is used before the limit kicks
         | in. More that $1000 but way less than $14000).
         | 
         | All of those technical reasons aside though, the commercial
         | reason is obvious - people's mistakes and overages are a great
         | source of revenue and profit. Companies refund the times where
         | it'd be enough to lose the customer, or when it hits HN, but
         | they make more money every time someone pays up. They have no
         | incentive to fix it. It's part of the business model.
        
           | beejiu wrote:
           | There are no conceivable "hard technical limits" that make
           | such a system difficult. It's 100% commercial.
        
             | londons_explore wrote:
             | oh there are - billing systems at scale almost exclusively
             | work on logs. Logs can take minutes or hours to aggregate
             | and transmit to a central place.
             | 
             | Ever notice how your "1GB" data plan sometimes lets you use
             | 5GB if you happen to be roaming in another country and
             | downloading something fast over 5G...? Same reason.
        
               | beejiu wrote:
               | Then engineer it differently. If we can put man on the
               | moon, put 5 TB of data in 1 square-inch of plastic, build
               | self driving cars... then this is trivial.
        
           | londons_explore wrote:
           | There is also the fact that if a company has critical systems
           | go down because GCP hit some hard budget limit, it will be
           | reported in the press as "Netflix down globally due to issue
           | with Google Cloud".
           | 
           | Google doesn't want the bad press. Most real companies would
           | prefer to have a big bill when their product surges in
           | popularity than have unexpected downtime at the worst time.
        
         | httparchive wrote:
         | I learned of that billing limiting mechanism _after_ the $14k
         | was charged to my account. As designed.
        
         | twism wrote:
         | the setup of the budget limit isn't complcated. the linked
         | article goes thru putting the monitors/alerts on pubsub and
         | etc. which isn't mandatory.
        
       | fabian2k wrote:
       | The cloud isn't something I'd ever use my private credit card on,
       | there are just too many ways to screw it up if you're not very
       | careful and know what you're doing. I don't think I would have
       | hit this particular issue, but that is mainly because I've read a
       | bunch of stories of this kind and BigQuery is one of the things I
       | associate with "can get very expensive very quickly" based on
       | those.
       | 
       | I know the explanations and justifications for it, but for
       | personal use a service where I can't put a hard limit on usage is
       | simply not acceptable for me. It's just not worth the risk.
        
         | bugbuddy wrote:
         | There is a really easy fix to this problem: setting billing
         | limits. This can be done with almost all cloud providers and it
         | takes almost no time. These incidents just show a lack of
         | professionalism on the part of the person incurring the costs.
         | I personally did on the first day I setup a cloud computing
         | account when I was still doing my BS in college. It is not that
         | hard folks. Set the billing limits.
        
           | blibble wrote:
           | > This can be done with almost all cloud providers and it
           | takes almost no time.
           | 
           | well, other than the three market leaders (GCP, AWS and
           | Azure)
        
             | crazygringo wrote:
             | I guess it depends what you mean?
             | 
             | They all support forms of limits, quotas and budget
             | notifications you can set, no?
             | 
             | Part of me understands that maybe you can't set a hard
             | billing limit because that means they'd need to immediately
             | delete all your instances, storage, and backups to prevent
             | any further costs for the month -- which is probably not
             | what people actually want.
             | 
             | But are the tools they provide insufficient?
        
             | bugbuddy wrote:
             | True these giants make their own lives easier and don't
             | implement much billing controls into the infrastructure. It
             | is your money so it is your responsibility to protect it.
             | Use billing alerts, hacks, and research things carefully
             | before jumping with both feet.
        
               | joepie91_ wrote:
               | > It is your money so it is your responsibility to
               | protect it.
               | 
               | This is victim blaming.
        
           | fabian2k wrote:
           | The main reason I'd use a personal account for one of the big
           | cloud providers would be to learn stuff. At that point a lack
           | of professionalism is kinda expected, because learning stuff
           | is the whole point.
           | 
           | And my understanding is that almost none of the way of
           | setting limits are actual hard limits, but only alerts and
           | some hacked-together emergency abort scripts. Correct me if
           | I'm wrong, but can you actually limit the cost robustly for
           | services that spend that much money in an hour or so? Doesn't
           | help much if I get an email about it and read it two hours
           | afterwards.
        
             | bugbuddy wrote:
             | I understand the down votes but I would still say that
             | being aware of the rough estimate costs of each service you
             | are using is an integral part of an engineer's job. After
             | all, we care a lot about CPU cycles and those are measured
             | in femto dollars.
        
               | joepie91_ wrote:
               | You can call someone an 'engineer' with the associated
               | responsibilities when they are getting paid for what they
               | are doing like an engineer, in a setting that provides
               | them with the protections of an engineer.
               | 
               | Until that point, they are just an individual who got
               | screwed by disguised billing practices.
        
               | fabian2k wrote:
               | That's not sufficient, you also must not make mistakes.
               | 
               | I have very limited cloud experience, but I did make a
               | mistake that lead to a rather slow but constant cost. The
               | amount was small enough to not be relevant in a
               | professional context, but the memorable part was that I
               | could not pinpoint the source easily with the AWS tools
               | and my limited understanding of them. The categories and
               | labels were too broad, and it took a bit until I figured
               | out what went wrong. There are certainly better tools to
               | investigate this, but I didn't know them. In the end it
               | was simply luck that the mistake still fell into an area
               | of insignificant amounts of money, but it could have
               | easily been significantly more if a few parameters had
               | been different for the same mistake
        
           | yjftsjthsd-h wrote:
           | > It is not that hard folks. Set the billing limits.
           | 
           | Excellent idea. Please describe how to create an account on
           | AWS or GCP that is not allowed to spend more than $100/mo.
           | Since it is "a really easy fix" and "takes almost no time" it
           | should be easy to explain, right?
        
             | kstrauser wrote:
             | https://aws.amazon.com/getting-started/hands-on/control-
             | your... tells you how to set up a budget, with
             | notifications when you're getting close.
             | 
             | That's probably enough for 99% of people, and if you're
             | highly motivated, you could make that trigger an SNS
             | notification that trips a circuit breaker.
        
             | nonfamous wrote:
             | You can in Azure, easily. New Azure free accounts (which
             | most learners start with) have spending limits enabled by
             | default. https://learn.microsoft.com/azure/cost-management-
             | billing/ma...
        
         | blibble wrote:
         | if your country lets you set up and maintain an LLC easily this
         | is a reasonable way to manage the risk
         | 
         | a catastrophic mistake might result in the company going bust
         | and all the pain associated with that
         | 
         | but shouldn't lose you your home (assuming you acted properly)
        
           | questionacount wrote:
           | Is there a guide or someone I should talk to about how to do
           | this?
           | 
           | I've long wondered what I can do with an LLC to protect me
           | from debts like this but I don't know how to get more
           | information about it. Particularly as I'd be the sole owner I
           | don't really understand what the llc does/doesn't do.
           | 
           | If you had just 1000$ (and made a few hundred a year) is it
           | worth doing?
        
           | crysin wrote:
           | IANAL but this can be risky in the US still because if you're
           | not careful and demonstrate a clear separation of your
           | business funds and your personal funds it can let those
           | pursuing you for money owed to pierce the veil, thus losing a
           | huge benefit of the LLC.
        
         | httparchive wrote:
         | It wasn't personal use, for business - but I'm bootstrapping a
         | startup, so it's a very tough lesson to learn.
        
       | braza wrote:
       | To be honest, even the official guide [1] for BG does not have
       | any information about how to make some info about query cost,
       | budget, and service limits mechanisms [2].
       | 
       | I think the HTTP Archive team could set something in that regard.
       | 
       | PS: When I was an instructor for some cloud training in AWS, the
       | first 2 hours were only to set up billing and budgets to avoid
       | any kind of situation like this. No one would start training
       | without all those locks in place in the first place.
       | 
       | [1] -
       | https://github.com/HTTPArchive/httparchive.org/blob/main/doc...
       | [2] - https://cloud.google.com/bigquery/docs/best-practices-costs
        
         | httparchive wrote:
         | Yeah, I'm basically just having to write this off so it sucks
         | for me (a lot - I'm bootstrapping a start up), but I'm more
         | worried about other people (especially students) getting caught
         | up in what feels like a scam given the language on the website
         | not, ya know, mentioning the risk of being charged $14k.
        
           | ghaff wrote:
           | I understand the argument against hard circuit-breakers
           | (yeah, seems like a good idea, but had a good traffic spike
           | and I'm down). But it makes even me cautious with respect to
           | scenarios where I could just fat finger something. There are
           | some controls but there are no guarantees in most cases.
        
           | hobofan wrote:
           | The getting started guide linked by the website states:
           | 
           | > Note: The size of the tables you query are important
           | because BigQuery is billed based on the number of processed
           | data. There is 1TB of processed data included in the free
           | tier, so running a full scan query on one of the larger
           | tables can easily eat up your quota. This is where it becomes
           | important to design queries that process only the data you
           | wish to explore
           | 
           | Could this be a bigger warning? Sure.
           | 
           | Is something a scam just because they don't explain the
           | general implications of entering your payment information to
           | a usage-billed product? Not really.
        
       | darth_avocado wrote:
       | I am sorry but this seems to be more of a "TLDR; didn't read;"
       | situation. The http archive clearly mentions that the data is
       | available for offline processing or for querying online on BQ.
       | And in the "Getting started" section of the instructions, it is
       | mentioned multiple times on how BQ will charge you. And even if
       | it wasn't mentioned anywhere, it's a little presumptuous to
       | assume a tool for processing data will not charge you money for
       | literally processing TBs of data again and again.
       | 
       | > Note: BigQuery has a free tier that you can use to get started
       | without enabling billing. At the time of this writing, the free
       | tier allows 10GB of storage and 1TB of data processing per month.
       | Google also provides a $300 credit for new accounts.
       | 
       | > Note: The size of the tables you query are important because
       | BigQuery is billed based on the number of processed data. There
       | is 1TB of processed data included in the free tier, so running a
       | full scan query on one of the larger tables can easily eat up
       | your quota. This is where it becomes important to design queries
       | that process only the data you wish to explore
       | 
       | > When we look at the results of this, you can see how much data
       | was processed during this query. Writing efficient queries limits
       | the number of bytes processed - which is helpful since that's how
       | BigQuery is billed. Note: There is 1TB free per month
       | 
       | https://github.com/HTTPArchive/httparchive.org/blob/main/doc...
        
         | httparchive wrote:
         | Yes, sure there's stuff I could have done better, and stayed up
         | all night looking at the fine print. But that's not the point -
         | this is *warning* to other people who see the Internet Archive
         | logo, the words "public", and for some dumb reason also trust
         | Google. I'm hoping this doesn't happen to others, I learned a
         | costly lesson.
        
         | dabernathy89 wrote:
         | I'm on OP's side - even if I knew I'd be paying to run some
         | queries against this dataset, I never would have thought it
         | could reach 5 figures in such a short time. And you can't argue
         | that the billing is straightforward. The "Getting Started"
         | guide for the HTTP Archive doesn't even describe what indexes
         | are available/commonly used for limiting the scanned rows.
        
       | jeffparsons wrote:
       | Warning: most cloud providers (Google, Amazon, Microsoft) require
       | you to accept unlimited liability to use their services.
       | 
       | If you're running a business and you have lawyers, then fair
       | enough -- just play the game. But for individuals, it seems crazy
       | that so many of us accept this sort of thing. Good luck
       | contesting the charge with your credit card company when you
       | already agreed to a contract that said Google could bill you
       | thousands of dollars and then you used thousands of dollars worth
       | of their service.
       | 
       | Big cloud providers are not your friend. They do not care if they
       | destroy the lives of you and your family, unless it's happening
       | so often that it's making mainstream news.
       | 
       | My advice is to go and delete your cloud accounts, and only use
       | services that offer hard spending caps, and ideally prepaid
       | accounts.
       | 
       | Maybe this doesn't leave many options. Oh well. Maybe if you
       | can't afford big lawyers then you also can't afford the risks of
       | using big cloud.
        
         | httparchive wrote:
         | Yup, I'm already having to pay legal fees - which is why you
         | have a biz lawyer on retainer to start with - but I'm not sure
         | I have any standing.
        
           | jeffparsons wrote:
           | IANAL, but if this happened to me I would be gathering as
           | many examples as I could of this having happened to other
           | people. The angle being: Google knows this is a huge issue.
           | Effectively, they know that they have (presumably
           | accidentally) created a really dangerous trap for small
           | players, and have chosen to do nothing about it.
           | 
           | In some jurisdictions I think that reduces the legitimacy of
           | their claim that you actually owe them money.
           | 
           | EDIT: Even better, focus on the examples where Google
           | "forgave" the debt; you could argue that those examples prove
           | that Google knows it's at least partly their fault.
        
         | AdamJacobMuller wrote:
         | All of the cloud services I have are setup only with
         | privacy.com cards. I have each individual cards limited to just
         | above what the monthly expected spend is. Even if there's a
         | (reasonable) spike I can see it and I have to take manual
         | action before the charge will go through.
         | 
         | Can not recommend privacy.com enough.
        
           | ihattendorf wrote:
           | Doesn't stop them from trying to collect after the
           | transaction is declined. It's not a prepaid service, you're
           | agreeing to pay the charges _after_ you've used the service.
           | 
           | Will they pursue? Do they have enough info to purse? Who
           | knows, but they can if they want to.
        
           | myself248 wrote:
           | That's not what privacy.com does or is for. They advertise
           | it, but I've had transactions blow right through the facade.
           | Specifically, the New York Times, after my trial subscription
           | ended and I watched the stupendously-expensive charges
           | bounce, they kept trying and eventually tried a different way
           | and it went through.
           | 
           | I emailed support, and here's what I got back:
           | 
           | > Hi, $firstname. I've been reviewing your dispute and wanted
           | to touch base with you to explain what happened.
           | 
           | > It appears that the disputed charge is a "force post" by
           | the merchant. This happens when a merchant cannot collect
           | funds for a transaction after repeated attempts and completes
           | the transaction without an authorization -- it's literally an
           | unauthorized transaction that's against payment card network
           | rules. It's a pretty sneaky move used by some merchants, and
           | unfortunately, it's not something Privacy can block.
        
         | romeros wrote:
         | This is just a single data point but I had a surprise bill with
         | Google. I talked to the support and got it waived off.
         | 
         | I used Amazon EC2 instances for years and I always felt in
         | control. There were never any surprises. I knew even in the
         | worst case situation I would be okay because I had faith in the
         | Amazon support. With Google I felt insecure. I never played
         | with any of Google cloud services since then.
         | 
         | Amazon's customer first policy is really true. They try their
         | absolute best to make sure there are no surprises to a great
         | extent. Even the UI is very intuitive.
        
           | theolivenbaum wrote:
           | Same here - incidentally was also one of the weirdest
           | interactions with customer support I've ever had. I suspect
           | the first point of contact was some sort of LLM/chatbot that
           | desperately wanted to make sure I was feeling fine and that
           | there was nothing to worry about. When I was forwarded to the
           | billing support team the interaction went back to normal -
           | couple of messages back and forth and some homework to set
           | the real budget limit (the quota is just for alarms) and they
           | waved the charge.
        
           | tw04 wrote:
           | >Amazon's customer first policy is really true.
           | 
           | Which part of customer first drove their egress fee policies?
        
             | zer00eyz wrote:
             | The part that was ALWAYS there.
             | 
             | Egress is basically all outbound traffic. The fee was
             | always this. Dont act shocked when it doesn't go down when
             | you have buyers remorse.
        
       | neilv wrote:
       | Not the same thing, but: some pre-Web Usenet programs would have
       | warnings before "expensive" operations:
       | 
       | > _Version 4.3 patch 30 of rn's Pnews.SH (September 5, 1986,
       | published to support the new top-level groups) introduced the
       | "thousands of machines" message:_
       | 
       | > > _This program posts news to thousands of machines throughout
       | the entire civilized world. You message will cost the net
       | hundreds if not thousands of dollars to send everywhere. Please
       | be sure you know what you are doing._
       | 
       | --
       | https://retrocomputing.stackexchange.com/questions/14763/wha...
        
       | bearjaws wrote:
       | Had this happen (but more like $100) analyzing the GitHub dataset
       | on GCP.
       | 
       | Honestly I was already concerned when it was taking more than 5
       | minutes to return a result.
       | 
       | Once I saw how slow it was I found my error, not querying the
       | sample dataset that was a fraction of the size, to make sure my
       | filtering worked.
        
       | KomoD wrote:
       | How big of a query do you have to make to get charged $14k...?
       | Isn't it billed by data transfer
        
         | Jgrubb wrote:
         | No, by amount of data scanned to answer the query. This would
         | be about 2.2PB worth of data.
         | 
         | Partitioning matters y'all!
        
           | KomoD wrote:
           | Ah, ok.
        
         | anon84873628 wrote:
         | On-demand pricing in the US charges $6.25 per TiB (and the
         | first TiB per month is free).
         | 
         | So $14k would be about 2,240 TiB.
         | 
         | I wonder what sort of partitioning and clustering is used for
         | the tables.
        
       | gelatocar wrote:
       | Have you tried getting in touch with GCP to see if they would
       | refund the charge? I've heard plenty of stories of cloud services
       | refunding large one-off accidental spends like this one.
        
         | zmarty wrote:
         | "Last week I ran a script on BigQuery for historical HTTP
         | Archive data and was billed $14,000 by Google Cloud with zero
         | warning whatsoever, *and they won't remove the fee.*"
        
       | mekoka wrote:
       | This is a matter of a user not having read some fine prints,
       | which doesn't mean that they're necessarily at fault. The only
       | way to know which of the user, httparchive.org, or Google BQ is
       | most responsible is to know how often similar situations arise in
       | this specific context (i.e. using BQ by way of httparchive.org).
        
       | londons_explore wrote:
       | Just ask support. In almost all cases, they'll cancel the
       | charges.
        
       | Jgrubb wrote:
       | BQ lesson #1 - don't select * on a gigundous dataset.
       | 
       | Lesson #2 - if you select * on a gigundous dataset, make sure
       | it's on your employers bill.
        
       | kshmir wrote:
       | Just set quotas by default...
        
       | tills13 wrote:
       | Sorry but I can't get over the hyperbole in
       | 
       | > This website makes it seem like this "public" dataset is for
       | the community to use, but it is instead a for-profit money maker
       | for Google Cloud and you can lose tens of thousands of dollars.
       | 
       |  _you_ didn 't understand what you were doing. HA's datasets
       | _are_ public and free. It is _not_ a  "for-profit money maker for
       | Google Cloud". Sorry, sucks for you but blaming the restaurant
       | when you bit off more of the steak than you can chew is not how
       | this works.
        
       | francoismassot wrote:
       | BigQuery is just too costly...
       | 
       | Do you know if the dataset is public? We should just offer a
       | cheap alternative and ditch BigQuery.
        
       | nomilk wrote:
       | Could you explain the steps you went through that led to you
       | using BigQuery? The reason I ask is most of us probably use GCP
       | and only ever interact with BigQuery via GCP. But it seems your
       | entry point was a bit different to most (e.g. seems you might
       | have clicked on a link to GCP from HTTP Archive, or perhaps
       | something else?).
       | 
       | FWIW I use BigQuery a lot and as a rough guide I assume about 1c
       | per GB scanned. So if I query a dataset that's 1TB, that's about
       | $10. If the same data were stored on a relational db, the same
       | query would take about a day (or at least a good part of a day).
       | Because BigQuery returns a result so quickly (e.g. <1 minute) it
       | can be easy to miss the insane amount of work it did to get
       | there. So I could see someone accidentally putting that ~1m (but
       | 1TB!) query into a loop or something, and boom, there's your $15k
       | bill. Accidents happen.
       | 
       | Also FWIW, I've found although the big 3 cloud's pricing is
       | tricky (since there are so many services), I find them much
       | better than the PaaS built on top of the big 3 clouds. My
       | suspicion is that the PaaS's have a strong incentive to obscure
       | their pricing because customers can typically see what their
       | costs are (e.g. if they buy some compute from AWS at $0.16/hr and
       | sell it for $1.40/hr, that can be seen as a bit of a rip, hence
       | they try to obscure it). But I think the big 3 are not too bad at
       | this practice. It really bugs me when anyone deliberately
       | obscures their prices, and it's often an indicator of more shady
       | practices to come.
        
       ___________________________________________________________________
       (page generated 2024-02-20 23:01 UTC)