[HN Gopher] Warning: $14k BigQuery charge in 2 hours
___________________________________________________________________
Warning: $14k BigQuery charge in 2 hours
Author : httparchive
Score : 117 points
Date : 2024-02-20 20:56 UTC (2 hours ago)
(HTM) web link (discuss.httparchive.org)
(TXT) w3m dump (discuss.httparchive.org)
| httparchive wrote:
| This website makes it seem like this "public" dataset is for the
| community to use, but it is instead a for-profit money maker for
| Google Cloud and you can lose tens of thousands of dollars.
|
| Last week I ran a script on BigQuery for historical HTTP Archive
| data and was billed $14,000 by Google Cloud with zero warning
| whatsoever, and they won't remove the fee.
|
| This official website should be updated to warn people Google is
| apparently now hosting this dataset to make money. I don't think
| that was the original mission, but that's what it is today,
| there's basically zero customer support, and you can lose $14k in
| the blink of an eye.
|
| Academics, especially grad students, need to be aware of this
| before they give a credit card number to Google. In fact, I'd
| caution against using this dataset whatsoever with this new
| business model attached.
| MrDarcy wrote:
| Did the cost estimate calculator provide an inaccurate
| estimate?
|
| https://cloud.google.com/bigquery/docs/best-practices-costs
|
| Estimate query costs
|
| BigQuery provides various methods to estimate cost:
|
| Use the query dry run option to estimate costs before running a
| query using the on-demand pricing model. Calculate the number
| of bytes processed by various types of query. Get the monthly
| cost based on projected usage by using the Google Cloud Pricing
| Calculator.
| cornel_io wrote:
| When I use the BQ interface, it estimates the bytes for each
| query in real time before I run it, does that turn off if the
| query is too big? I guess that isn't directly a cost
| estimate, but if I saw hundreds of TB I'd think twice before
| hitting "Run"...
| judge2020 wrote:
| Do you run httparchive, or did you make your username
| "httparchive" just because it's the subject of your post?
| mike_d wrote:
| > Google is apparently now hosting this dataset to make money
|
| Public datasets are hosted for free by Google (Amazon has a
| similar program) to take the burden off public projects.
|
| You didn't pay for the data, you paid for the query you ran
| against it.
| Jgrubb wrote:
| Well sure, but how do you query the data they're hosting for
| free without using google services?
| anon84873628 wrote:
| Well, sure. But it is convenient to have lots of sample
| data. Also you get the first TiB per month free in BQ.
|
| Also note that anyone can make a dataset available for
| public use, where they pay the storage and the consumer
| pays the compute. The official Google datasets are just
| curated and maintained by Google itself.
| gnfargbl wrote:
| The real issue here is that you didn't quite understand what
| BigQuery was when you pressed the button.
|
| What it is, roughly, is a publicly-accessible data
| supercomputer. If you lost $14k in a blink of the eye, then I
| would think you consumed at least $4k of Google's actual
| resources -- maybe $7k. Maybe more. That thing can move some
| _serious_ data, and you apparently moved around over 2PB.
|
| Google bears some significant responsibility for not making the
| cost transparent to you, it's true. But on the the other hand,
| don't they bear some significant credit for making such an
| awesome power available to a lowly peon with a credit card?
| mikeortman wrote:
| The dataset _IS_ free to download, but running a query against it
| on Google Cloudis what costs $$$. BigQuery is basically renting
| servers to scan through the data, which is the fee
| threeseed wrote:
| Given how small the dataset is there is _no_ query that
| justifies a $14k charge.
|
| AWS charges $27/hour for a server with 3TB of memory. Enough to
| run the queries in memory.
| Symbiote wrote:
| It's easy to make an enormous query by joining to other data
| (or to the same data), or reading a lot of data.
|
| A regex query on response_bodies would churn through 2.5TB of
| data every time it's run.
| darth_avocado wrote:
| BQ charges you based on the volume of data being scanned. I
| think this is a situation which involves scanning the whole
| dataset again and again without fully understanding how it
| works. I've worked with much larger datasets on BQ (petabyte
| scale) and managed to not spend more than $1000 in an hour.
| Also, BQ tells you how much data will be processed BEFORE you
| run the query, which makes it easier to understand the cost
| implications.
|
| Again, you could fit the whole dataset in memory in an EC2
| instance and do your thing.
| treffer wrote:
| The complaint says there should be a warning that processing
| fees can be high. Go to the front page and check out the links.
| Nothing really about cost. Someone follows that path and 14k
| gone without a word about it. That's the path that people are
| sent down from the website. It explicitly talks about using BQ
| for analysis.
|
| A simple "running queries over the whole dataset can cause
| significant costs due to the size of the dataset" should be
| enough. And I think that's a valid and fair point.
|
| The whole part of accusing Google should just be ignored.
| darth_avocado wrote:
| The setup instructions mention what you're asking.
|
| https://github.com/HTTPArchive/httparchive.org/blob/main/doc.
| ..
| treffer wrote:
| I can't even find "cost" on that page. Only one rather tiny
| side note that you could get past the free tier quota.
|
| I don't think that's a proper warning on costs.
| IshKebab wrote:
| > The whole part of accusing Google should just be ignored.
|
| I don't know. Google could trivially solve this problem by
| imposing an opt-out warning on potentially expensive queries.
|
| "It looks like your query might cost $14k. Are you sure?"
|
| But money.
| anon84873628 wrote:
| It probably wasn't a single query costing $14k, but more
| like 1k costing $14.
| epanchin wrote:
| To be brutally honest, it's badly considered queries like yours
| that mean these services cannot be free.
| threeseed wrote:
| This comment is hilariously insane.
|
| The idea that Google would give away BigQuery for free if only
| people would write better SQL queries.
| charcircuit wrote:
| There are plenty of APIs on the internet that are free that
| query a database for information. If queries are too
| expensive it's not viable to run for free.
| yjftsjthsd-h wrote:
| Or, GCP could implement cost/resource/use limits, which would
| allow them to give away whatever they wanted for free without
| any concern about people over using it, while also allowing
| people to avoid shooting their own feet off.
| mulmen wrote:
| I don't disagree but how does that work exactly? When you hit
| the quota the query gets cancelled? That's definitely already
| a feature of Redshift Spectrum with WLM. Does BigQuery offer
| something similar?
| yjftsjthsd-h wrote:
| My first choice would be something like "this query will
| cost $13953, which exceeds your default cap of $100; please
| click the confirm button if you really want to run it".
| (The dollars could be CPU-minutes or whatever if you want
| to use resource based limits, which might play nicer with a
| free tier)
|
| Edit: rereading, I think this is actually for non-
| interactive scripts, in which case yes it should just
| cancel the query
|
| Edit 2: https://news.ycombinator.com/item?id=39447499 was
| kind enough to point out that the resource-based version of
| this might actually exist, which is nice
| httparchive wrote:
| I've forgotten more Sql than most people ever learn. Time is
| also valuable and I make trade-offs. Should I spend hours (eg.
| $$$) to optimize or run a non-optimized query in the background
| for a different cost? Well, I didn't think the
| time/benefit/cost equation favored tuning, if I had known that
| I'd have spent time on tuning. If you offer something for
| "free" and then change the cost, and don't have any alerting
| mechanisms to inefficient queries, it's impossible to evaluate
| trade offs.
| johnnyo wrote:
| Can you post what a $14,000 SQL query looks like?
|
| If nothing else, it can be an example in my SQL 101 course.
| dabernathy89 wrote:
| This would make a great educational blog post
| kstrauser wrote:
| > Time is also valuable and I make trade-offs.
|
| I'd say!
| MattGaiser wrote:
| > Last week I ran a script on BigQuery for historical HTTP
| Archive data and was billed $14,000 by Google Cloud with zero
| warning whatsoever,
|
| This comment kind of suggests that you do not understand how
| BigQuery bills. The archive pays for the storage, but you have to
| pay for the queries. You would also have had to attach a billing
| account to run those queries. Running BigQuery searches is not
| free.
|
| Expensive lesson, but on the surface this one appears to be your
| error.
| madsbuch wrote:
| It seems excessive to allow USD 14k spend on a newly created
| account, or and account with no prior big spend. If I was
| Google, I would not allow it without explicitly raising limits
| or increasing quotas. Otherwise there is a big chance there
| customer can not pay and they just lost that resource - unless
| you don't really have an expense for that resource and you use
| predatory pricing.
| rvnx wrote:
| It's like predatory telcos who charge you "roaming data fees:
| $4,500, but took bad you didn't check your online bill
| before"
|
| https://arstechnica.com/gadgets/2009/04/users-62000-data-
| bil...
| httparchive wrote:
| Yes and no, I ran the script before and the fee wasn't that
| high (they jacked it up last summer). Usually I have to jump
| through a ton of hoops just to add more CPU cores to my VMs so
| I "trusted" that GCP would warn me if I ever made an error.
|
| One of the bigger issues is they charged my card before I
| literally had any notice what the bill was - it wasn't even in
| the dashboard yet. I would have terminated the script ASAP had
| I gotten *any* warning.
| mulmen wrote:
| What was the query you ran?
| httparchive wrote:
| I was doing historical evaluation for a few sites, so I was
| running a query for each month going back to 2016 for each
| site. I've done this before with no real issues, and if I knew
| the charges were rapidly exploding I'd have halted the script
| immediately - but instead it ran for 2 hours and the first
| notice I got was the CC charge.
| Symbiote wrote:
| My guess is you were querying all the data each time.
|
| If you instead filter out the rows you are interested in
| (e.g. the particular "few sites" by their URL) and put that
| in a new table, querying the resulting, tiny table will be
| very cheap.
| eklitzke wrote:
| I haven't looked at the exact schema for this dataset but for
| this type of query pattern to be efficient the data would
| need to be partitioned by date.^[1] I'm guessing that it's
| not partitioned this way and therefore each of these queries
| that was looking at "one month" of data was doing a full
| table scan, so if you queried N months you did N table scans
| even though the exact same query results could have been
| achieved even without partitioning by doing one table scan
| with some kind of aggregation (e.g. GROUP BY) clause.
|
| [1]: https://cloud.google.com/bigquery/docs/partitioned-
| tables
| mulmen wrote:
| Can you be more specific? What filtering did you apply? How
| many columns did you select?
| summerlight wrote:
| I frequently see this kind of surprising billing anecdotes across
| many cloud providers. Why don't they provide a way to set a hard
| budget limit applied for the entire account. I tried to see what
| can be done for GCP and this seems pretty daunting.
|
| https://medium.com/@steffenjanbrouwer/how-to-set-a-hard-paym...
| Bjartr wrote:
| Because they aren't sufficiently incentivized to make giving
| them your money harder.
| klysm wrote:
| What incentive to cloud providers have to give you that
| ability? I think they greatly appreciate the ability to
| accidentally spend a lot of money
| summerlight wrote:
| To prevent a bad PR like this? When it goes viral, most
| customer supports escalate it to a higher level then they
| just eventually cancel the bill.
| IshKebab wrote:
| Google does not care about bad PR like this. It doesn't
| affect their biggest customers.
| braza wrote:
| tbh, I have worked with AWS for at least 10 years, and
| recently their field support are quite prone to help avoid
| those scenarios (e.g. helped to save hundreds of thousands in
| a single-digit million account).
|
| This was one of the main selling points for all portfolio
| companies of the group to adopt AWS in their digital
| transformation projects.
| 0cf8612b2e1e wrote:
| Limited use for a nobody who wants to run <$100 / year
| cloud spend and does not have account managers.
|
| I would love to kick the tires on some AWS stuff, but the
| threat of unlimited ruin is not worth it. Sure, maybe the
| gods would take pity on me and wipe the debt, but far
| easier to just run with someone who caps costs. My toy
| project can gladly go down if the alternative is a huge
| unexpected bill.
| rvnx wrote:
| An unhappy customer won't come back.
|
| The OP is probably a good person with strong interest in data
| science and building projects.
|
| If it'd be "oh here's your $500 charge, upgrade your quota
| for more, 'ok fair enough, I did a mistake'", but $14k is not
| ok without explicit quota upgrade.
| thatoneguy wrote:
| Google AppEngine used to have that but -- presumably in the
| interest of additional profit -- they removed it. Now I have to
| make do with an alert that warns me long after I could be
| hypothetically bankrupted, i.e. in seconds.
| deelowe wrote:
| Because then we'd see articles about how the next start up
| missed their opportunity whenever their site unexpectedly got
| discussed on the latest Rogan episode and subsequently was
| taken offline by the limits being tripped.
| onion2k wrote:
| Companies could make the limit optional and pass 100% of that
| downside to the customer. 99.9% of customers would opt in.
| sfn42 wrote:
| I don't see the problem. Don't set a budget limit if you
| don't want your app to go offline. Lots of people wouldn't
| mind if their app went offline for a bit. They'd prefer to
| not suddenly get a $10,000 bill
| ghaff wrote:
| There's no "right" answer. In one case, it's checked the
| wrong box and got a $14K bill. In the other case, it's I
| checked the wrong box and my startup missed its one window.
| There are in-between levels of alerting etc. for both
| populations but they're probably unsatisfactory for the
| extreme conditions.
|
| To be clear: I'd be very in favor of the major cloud
| providers having a "DO NOT! DO NOT! use this for production
| mode and your content could be deleted at any time if you
| screw up. But I suspect most people wouldn't use that."
| ToucanLoucan wrote:
| My cynical self sees it as how cloud providers aim to make the
| most money: by making billing oblique and waiting for buzzword-
| happy project leads to mandate stuff be put on their service
| without understanding what the end billing will be.
|
| I can't say that's for certain what it is. I just know a
| hallmark of any business with recurring charges that are
| otherwise incomprehensible is so they can hit you with the
| charge after the fact, and you have little recourse to avoid
| paying it without a ton of work for yourself or your team.
| MrDarcy wrote:
| Google does provide a way, project owners can set a custom
| quota to limit costs.
| yjftsjthsd-h wrote:
| So your comment made me go look it up, and if you squint hard
| that's kind of true...
|
| https://cloud.google.com/billing/docs/how-
| to/notify#cap_disa...
|
| Notice that their "solution" is to tell you how if you want
| you can spin up effectively your own custom service to watch
| spend and if it goes over some threshold delete the entire
| project[0] after some delay. This is the malicious compliance
| version of letting you add a limit.
|
| [0] At least, that's how I interpret "This example removes
| Cloud Billing from your project, shutting down all resources.
| Resources might not shut down gracefully, and might be
| irretrievably deleted. There is no graceful recovery if you
| disable Cloud Billing. You can re-enable Cloud Billing, but
| there is no guarantee of service recovery and manual
| configuration is required."
| MrDarcy wrote:
| I meant this, linked directly off the big query cost
| estimation docs:
|
| https://cloud.google.com/bigquery/docs/custom-quotas
|
| > Custom quota is proactive, so you can't run an 11 TB
| query if you have a 10 TB quota. Creating a custom quota on
| query data lets you control costs at the project level or
| at the user level.
| yjftsjthsd-h wrote:
| Oh, good catch! Yes, that does look like something that
| can be coerced into limiting it. Having actually tried to
| click through, it is very much not as simple as "don't
| spend more than $X"; the doc points to
| https://console.cloud.google.com/iam-admin/quotas and you
| have to find and set the right quota, but yes that can
| probably help.
| beejiu wrote:
| Isn't it the quota limit that you need to set?
| https://cloud.google.com/bigquery/quotas#query_jobs
| onion2k wrote:
| The reasons are probably quite complicated, because some of
| them are bound by hard technical limits to how quickly a system
| can react and thus make a hard limit actually a _hard_ limit,
| but realistically that 's largely solvable just by making it a
| softer hard limit (eg you set a limit of $1000 and the terms
| say you pay that plus whatever is used before the limit kicks
| in. More that $1000 but way less than $14000).
|
| All of those technical reasons aside though, the commercial
| reason is obvious - people's mistakes and overages are a great
| source of revenue and profit. Companies refund the times where
| it'd be enough to lose the customer, or when it hits HN, but
| they make more money every time someone pays up. They have no
| incentive to fix it. It's part of the business model.
| beejiu wrote:
| There are no conceivable "hard technical limits" that make
| such a system difficult. It's 100% commercial.
| londons_explore wrote:
| oh there are - billing systems at scale almost exclusively
| work on logs. Logs can take minutes or hours to aggregate
| and transmit to a central place.
|
| Ever notice how your "1GB" data plan sometimes lets you use
| 5GB if you happen to be roaming in another country and
| downloading something fast over 5G...? Same reason.
| beejiu wrote:
| Then engineer it differently. If we can put man on the
| moon, put 5 TB of data in 1 square-inch of plastic, build
| self driving cars... then this is trivial.
| londons_explore wrote:
| There is also the fact that if a company has critical systems
| go down because GCP hit some hard budget limit, it will be
| reported in the press as "Netflix down globally due to issue
| with Google Cloud".
|
| Google doesn't want the bad press. Most real companies would
| prefer to have a big bill when their product surges in
| popularity than have unexpected downtime at the worst time.
| httparchive wrote:
| I learned of that billing limiting mechanism _after_ the $14k
| was charged to my account. As designed.
| twism wrote:
| the setup of the budget limit isn't complcated. the linked
| article goes thru putting the monitors/alerts on pubsub and
| etc. which isn't mandatory.
| fabian2k wrote:
| The cloud isn't something I'd ever use my private credit card on,
| there are just too many ways to screw it up if you're not very
| careful and know what you're doing. I don't think I would have
| hit this particular issue, but that is mainly because I've read a
| bunch of stories of this kind and BigQuery is one of the things I
| associate with "can get very expensive very quickly" based on
| those.
|
| I know the explanations and justifications for it, but for
| personal use a service where I can't put a hard limit on usage is
| simply not acceptable for me. It's just not worth the risk.
| bugbuddy wrote:
| There is a really easy fix to this problem: setting billing
| limits. This can be done with almost all cloud providers and it
| takes almost no time. These incidents just show a lack of
| professionalism on the part of the person incurring the costs.
| I personally did on the first day I setup a cloud computing
| account when I was still doing my BS in college. It is not that
| hard folks. Set the billing limits.
| blibble wrote:
| > This can be done with almost all cloud providers and it
| takes almost no time.
|
| well, other than the three market leaders (GCP, AWS and
| Azure)
| crazygringo wrote:
| I guess it depends what you mean?
|
| They all support forms of limits, quotas and budget
| notifications you can set, no?
|
| Part of me understands that maybe you can't set a hard
| billing limit because that means they'd need to immediately
| delete all your instances, storage, and backups to prevent
| any further costs for the month -- which is probably not
| what people actually want.
|
| But are the tools they provide insufficient?
| bugbuddy wrote:
| True these giants make their own lives easier and don't
| implement much billing controls into the infrastructure. It
| is your money so it is your responsibility to protect it.
| Use billing alerts, hacks, and research things carefully
| before jumping with both feet.
| joepie91_ wrote:
| > It is your money so it is your responsibility to
| protect it.
|
| This is victim blaming.
| fabian2k wrote:
| The main reason I'd use a personal account for one of the big
| cloud providers would be to learn stuff. At that point a lack
| of professionalism is kinda expected, because learning stuff
| is the whole point.
|
| And my understanding is that almost none of the way of
| setting limits are actual hard limits, but only alerts and
| some hacked-together emergency abort scripts. Correct me if
| I'm wrong, but can you actually limit the cost robustly for
| services that spend that much money in an hour or so? Doesn't
| help much if I get an email about it and read it two hours
| afterwards.
| bugbuddy wrote:
| I understand the down votes but I would still say that
| being aware of the rough estimate costs of each service you
| are using is an integral part of an engineer's job. After
| all, we care a lot about CPU cycles and those are measured
| in femto dollars.
| joepie91_ wrote:
| You can call someone an 'engineer' with the associated
| responsibilities when they are getting paid for what they
| are doing like an engineer, in a setting that provides
| them with the protections of an engineer.
|
| Until that point, they are just an individual who got
| screwed by disguised billing practices.
| fabian2k wrote:
| That's not sufficient, you also must not make mistakes.
|
| I have very limited cloud experience, but I did make a
| mistake that lead to a rather slow but constant cost. The
| amount was small enough to not be relevant in a
| professional context, but the memorable part was that I
| could not pinpoint the source easily with the AWS tools
| and my limited understanding of them. The categories and
| labels were too broad, and it took a bit until I figured
| out what went wrong. There are certainly better tools to
| investigate this, but I didn't know them. In the end it
| was simply luck that the mistake still fell into an area
| of insignificant amounts of money, but it could have
| easily been significantly more if a few parameters had
| been different for the same mistake
| yjftsjthsd-h wrote:
| > It is not that hard folks. Set the billing limits.
|
| Excellent idea. Please describe how to create an account on
| AWS or GCP that is not allowed to spend more than $100/mo.
| Since it is "a really easy fix" and "takes almost no time" it
| should be easy to explain, right?
| kstrauser wrote:
| https://aws.amazon.com/getting-started/hands-on/control-
| your... tells you how to set up a budget, with
| notifications when you're getting close.
|
| That's probably enough for 99% of people, and if you're
| highly motivated, you could make that trigger an SNS
| notification that trips a circuit breaker.
| nonfamous wrote:
| You can in Azure, easily. New Azure free accounts (which
| most learners start with) have spending limits enabled by
| default. https://learn.microsoft.com/azure/cost-management-
| billing/ma...
| blibble wrote:
| if your country lets you set up and maintain an LLC easily this
| is a reasonable way to manage the risk
|
| a catastrophic mistake might result in the company going bust
| and all the pain associated with that
|
| but shouldn't lose you your home (assuming you acted properly)
| questionacount wrote:
| Is there a guide or someone I should talk to about how to do
| this?
|
| I've long wondered what I can do with an LLC to protect me
| from debts like this but I don't know how to get more
| information about it. Particularly as I'd be the sole owner I
| don't really understand what the llc does/doesn't do.
|
| If you had just 1000$ (and made a few hundred a year) is it
| worth doing?
| crysin wrote:
| IANAL but this can be risky in the US still because if you're
| not careful and demonstrate a clear separation of your
| business funds and your personal funds it can let those
| pursuing you for money owed to pierce the veil, thus losing a
| huge benefit of the LLC.
| httparchive wrote:
| It wasn't personal use, for business - but I'm bootstrapping a
| startup, so it's a very tough lesson to learn.
| braza wrote:
| To be honest, even the official guide [1] for BG does not have
| any information about how to make some info about query cost,
| budget, and service limits mechanisms [2].
|
| I think the HTTP Archive team could set something in that regard.
|
| PS: When I was an instructor for some cloud training in AWS, the
| first 2 hours were only to set up billing and budgets to avoid
| any kind of situation like this. No one would start training
| without all those locks in place in the first place.
|
| [1] -
| https://github.com/HTTPArchive/httparchive.org/blob/main/doc...
| [2] - https://cloud.google.com/bigquery/docs/best-practices-costs
| httparchive wrote:
| Yeah, I'm basically just having to write this off so it sucks
| for me (a lot - I'm bootstrapping a start up), but I'm more
| worried about other people (especially students) getting caught
| up in what feels like a scam given the language on the website
| not, ya know, mentioning the risk of being charged $14k.
| ghaff wrote:
| I understand the argument against hard circuit-breakers
| (yeah, seems like a good idea, but had a good traffic spike
| and I'm down). But it makes even me cautious with respect to
| scenarios where I could just fat finger something. There are
| some controls but there are no guarantees in most cases.
| hobofan wrote:
| The getting started guide linked by the website states:
|
| > Note: The size of the tables you query are important
| because BigQuery is billed based on the number of processed
| data. There is 1TB of processed data included in the free
| tier, so running a full scan query on one of the larger
| tables can easily eat up your quota. This is where it becomes
| important to design queries that process only the data you
| wish to explore
|
| Could this be a bigger warning? Sure.
|
| Is something a scam just because they don't explain the
| general implications of entering your payment information to
| a usage-billed product? Not really.
| darth_avocado wrote:
| I am sorry but this seems to be more of a "TLDR; didn't read;"
| situation. The http archive clearly mentions that the data is
| available for offline processing or for querying online on BQ.
| And in the "Getting started" section of the instructions, it is
| mentioned multiple times on how BQ will charge you. And even if
| it wasn't mentioned anywhere, it's a little presumptuous to
| assume a tool for processing data will not charge you money for
| literally processing TBs of data again and again.
|
| > Note: BigQuery has a free tier that you can use to get started
| without enabling billing. At the time of this writing, the free
| tier allows 10GB of storage and 1TB of data processing per month.
| Google also provides a $300 credit for new accounts.
|
| > Note: The size of the tables you query are important because
| BigQuery is billed based on the number of processed data. There
| is 1TB of processed data included in the free tier, so running a
| full scan query on one of the larger tables can easily eat up
| your quota. This is where it becomes important to design queries
| that process only the data you wish to explore
|
| > When we look at the results of this, you can see how much data
| was processed during this query. Writing efficient queries limits
| the number of bytes processed - which is helpful since that's how
| BigQuery is billed. Note: There is 1TB free per month
|
| https://github.com/HTTPArchive/httparchive.org/blob/main/doc...
| httparchive wrote:
| Yes, sure there's stuff I could have done better, and stayed up
| all night looking at the fine print. But that's not the point -
| this is *warning* to other people who see the Internet Archive
| logo, the words "public", and for some dumb reason also trust
| Google. I'm hoping this doesn't happen to others, I learned a
| costly lesson.
| dabernathy89 wrote:
| I'm on OP's side - even if I knew I'd be paying to run some
| queries against this dataset, I never would have thought it
| could reach 5 figures in such a short time. And you can't argue
| that the billing is straightforward. The "Getting Started"
| guide for the HTTP Archive doesn't even describe what indexes
| are available/commonly used for limiting the scanned rows.
| jeffparsons wrote:
| Warning: most cloud providers (Google, Amazon, Microsoft) require
| you to accept unlimited liability to use their services.
|
| If you're running a business and you have lawyers, then fair
| enough -- just play the game. But for individuals, it seems crazy
| that so many of us accept this sort of thing. Good luck
| contesting the charge with your credit card company when you
| already agreed to a contract that said Google could bill you
| thousands of dollars and then you used thousands of dollars worth
| of their service.
|
| Big cloud providers are not your friend. They do not care if they
| destroy the lives of you and your family, unless it's happening
| so often that it's making mainstream news.
|
| My advice is to go and delete your cloud accounts, and only use
| services that offer hard spending caps, and ideally prepaid
| accounts.
|
| Maybe this doesn't leave many options. Oh well. Maybe if you
| can't afford big lawyers then you also can't afford the risks of
| using big cloud.
| httparchive wrote:
| Yup, I'm already having to pay legal fees - which is why you
| have a biz lawyer on retainer to start with - but I'm not sure
| I have any standing.
| jeffparsons wrote:
| IANAL, but if this happened to me I would be gathering as
| many examples as I could of this having happened to other
| people. The angle being: Google knows this is a huge issue.
| Effectively, they know that they have (presumably
| accidentally) created a really dangerous trap for small
| players, and have chosen to do nothing about it.
|
| In some jurisdictions I think that reduces the legitimacy of
| their claim that you actually owe them money.
|
| EDIT: Even better, focus on the examples where Google
| "forgave" the debt; you could argue that those examples prove
| that Google knows it's at least partly their fault.
| AdamJacobMuller wrote:
| All of the cloud services I have are setup only with
| privacy.com cards. I have each individual cards limited to just
| above what the monthly expected spend is. Even if there's a
| (reasonable) spike I can see it and I have to take manual
| action before the charge will go through.
|
| Can not recommend privacy.com enough.
| ihattendorf wrote:
| Doesn't stop them from trying to collect after the
| transaction is declined. It's not a prepaid service, you're
| agreeing to pay the charges _after_ you've used the service.
|
| Will they pursue? Do they have enough info to purse? Who
| knows, but they can if they want to.
| myself248 wrote:
| That's not what privacy.com does or is for. They advertise
| it, but I've had transactions blow right through the facade.
| Specifically, the New York Times, after my trial subscription
| ended and I watched the stupendously-expensive charges
| bounce, they kept trying and eventually tried a different way
| and it went through.
|
| I emailed support, and here's what I got back:
|
| > Hi, $firstname. I've been reviewing your dispute and wanted
| to touch base with you to explain what happened.
|
| > It appears that the disputed charge is a "force post" by
| the merchant. This happens when a merchant cannot collect
| funds for a transaction after repeated attempts and completes
| the transaction without an authorization -- it's literally an
| unauthorized transaction that's against payment card network
| rules. It's a pretty sneaky move used by some merchants, and
| unfortunately, it's not something Privacy can block.
| romeros wrote:
| This is just a single data point but I had a surprise bill with
| Google. I talked to the support and got it waived off.
|
| I used Amazon EC2 instances for years and I always felt in
| control. There were never any surprises. I knew even in the
| worst case situation I would be okay because I had faith in the
| Amazon support. With Google I felt insecure. I never played
| with any of Google cloud services since then.
|
| Amazon's customer first policy is really true. They try their
| absolute best to make sure there are no surprises to a great
| extent. Even the UI is very intuitive.
| theolivenbaum wrote:
| Same here - incidentally was also one of the weirdest
| interactions with customer support I've ever had. I suspect
| the first point of contact was some sort of LLM/chatbot that
| desperately wanted to make sure I was feeling fine and that
| there was nothing to worry about. When I was forwarded to the
| billing support team the interaction went back to normal -
| couple of messages back and forth and some homework to set
| the real budget limit (the quota is just for alarms) and they
| waved the charge.
| tw04 wrote:
| >Amazon's customer first policy is really true.
|
| Which part of customer first drove their egress fee policies?
| zer00eyz wrote:
| The part that was ALWAYS there.
|
| Egress is basically all outbound traffic. The fee was
| always this. Dont act shocked when it doesn't go down when
| you have buyers remorse.
| neilv wrote:
| Not the same thing, but: some pre-Web Usenet programs would have
| warnings before "expensive" operations:
|
| > _Version 4.3 patch 30 of rn's Pnews.SH (September 5, 1986,
| published to support the new top-level groups) introduced the
| "thousands of machines" message:_
|
| > > _This program posts news to thousands of machines throughout
| the entire civilized world. You message will cost the net
| hundreds if not thousands of dollars to send everywhere. Please
| be sure you know what you are doing._
|
| --
| https://retrocomputing.stackexchange.com/questions/14763/wha...
| bearjaws wrote:
| Had this happen (but more like $100) analyzing the GitHub dataset
| on GCP.
|
| Honestly I was already concerned when it was taking more than 5
| minutes to return a result.
|
| Once I saw how slow it was I found my error, not querying the
| sample dataset that was a fraction of the size, to make sure my
| filtering worked.
| KomoD wrote:
| How big of a query do you have to make to get charged $14k...?
| Isn't it billed by data transfer
| Jgrubb wrote:
| No, by amount of data scanned to answer the query. This would
| be about 2.2PB worth of data.
|
| Partitioning matters y'all!
| KomoD wrote:
| Ah, ok.
| anon84873628 wrote:
| On-demand pricing in the US charges $6.25 per TiB (and the
| first TiB per month is free).
|
| So $14k would be about 2,240 TiB.
|
| I wonder what sort of partitioning and clustering is used for
| the tables.
| gelatocar wrote:
| Have you tried getting in touch with GCP to see if they would
| refund the charge? I've heard plenty of stories of cloud services
| refunding large one-off accidental spends like this one.
| zmarty wrote:
| "Last week I ran a script on BigQuery for historical HTTP
| Archive data and was billed $14,000 by Google Cloud with zero
| warning whatsoever, *and they won't remove the fee.*"
| mekoka wrote:
| This is a matter of a user not having read some fine prints,
| which doesn't mean that they're necessarily at fault. The only
| way to know which of the user, httparchive.org, or Google BQ is
| most responsible is to know how often similar situations arise in
| this specific context (i.e. using BQ by way of httparchive.org).
| londons_explore wrote:
| Just ask support. In almost all cases, they'll cancel the
| charges.
| Jgrubb wrote:
| BQ lesson #1 - don't select * on a gigundous dataset.
|
| Lesson #2 - if you select * on a gigundous dataset, make sure
| it's on your employers bill.
| kshmir wrote:
| Just set quotas by default...
| tills13 wrote:
| Sorry but I can't get over the hyperbole in
|
| > This website makes it seem like this "public" dataset is for
| the community to use, but it is instead a for-profit money maker
| for Google Cloud and you can lose tens of thousands of dollars.
|
| _you_ didn 't understand what you were doing. HA's datasets
| _are_ public and free. It is _not_ a "for-profit money maker for
| Google Cloud". Sorry, sucks for you but blaming the restaurant
| when you bit off more of the steak than you can chew is not how
| this works.
| francoismassot wrote:
| BigQuery is just too costly...
|
| Do you know if the dataset is public? We should just offer a
| cheap alternative and ditch BigQuery.
| nomilk wrote:
| Could you explain the steps you went through that led to you
| using BigQuery? The reason I ask is most of us probably use GCP
| and only ever interact with BigQuery via GCP. But it seems your
| entry point was a bit different to most (e.g. seems you might
| have clicked on a link to GCP from HTTP Archive, or perhaps
| something else?).
|
| FWIW I use BigQuery a lot and as a rough guide I assume about 1c
| per GB scanned. So if I query a dataset that's 1TB, that's about
| $10. If the same data were stored on a relational db, the same
| query would take about a day (or at least a good part of a day).
| Because BigQuery returns a result so quickly (e.g. <1 minute) it
| can be easy to miss the insane amount of work it did to get
| there. So I could see someone accidentally putting that ~1m (but
| 1TB!) query into a loop or something, and boom, there's your $15k
| bill. Accidents happen.
|
| Also FWIW, I've found although the big 3 cloud's pricing is
| tricky (since there are so many services), I find them much
| better than the PaaS built on top of the big 3 clouds. My
| suspicion is that the PaaS's have a strong incentive to obscure
| their pricing because customers can typically see what their
| costs are (e.g. if they buy some compute from AWS at $0.16/hr and
| sell it for $1.40/hr, that can be seen as a bit of a rip, hence
| they try to obscure it). But I think the big 3 are not too bad at
| this practice. It really bugs me when anyone deliberately
| obscures their prices, and it's often an indicator of more shady
| practices to come.
___________________________________________________________________
(page generated 2024-02-20 23:01 UTC)