[HN Gopher] Show HN: Cloud GPUs for Deep Learning - At 1/3 the C...
___________________________________________________________________
Show HN: Cloud GPUs for Deep Learning - At 1/3 the Cost of AWS/GCP
Author : ilmoi
Score : 102 points
Date : 2021-03-17 15:56 UTC (7 hours ago)
(HTM) web link (gpu.land)
(TXT) w3m dump (gpu.land)
| wrongdonf wrote:
| I love a dutch programmer
| ilmoi wrote:
| hah I am European indeed but not Dutch:) But I have spent a
| year living in the Netherlands.
| ilmoi wrote:
| I'm a self taught ML engineer, and when I was starting on my ML
| journey I was incredibly frustrated by cloud services like
| AWS/GCP. a)they were super expensive, b)it took me longer to
| setup a working GPU instance than to learn to build my first
| model!
|
| So I built https://gpu.land/.
|
| It's a simple service that only does one thing: rents out Tesla
| V100s in the cloud.
|
| Why is it awesome?
|
| - It's dirt-cheap. You get a Tesla V100 for $0.99/hr, which is
| 1/3 the cost of AWS/GCP/Azure/[insert big cloud name].
|
| - It's dead simple. It takes 2mins from registration to a
| launched instance. Instances come pre-installed with everything
| you need for Deep Learning, including a 1-click Jupyter server.
|
| - It sports a retro, MS-DOS-like look. Because why not:)
|
| The most common question I get is - how is this so cheap? The
| answer is because AWS/GCP are charging you a huge markup and I'm
| not. In fact I'm charging just enough to break even, and built
| this project really to give back to community (and to learn some
| of the tech in the process).
|
| HN special: email me a few lines about yourself and what you're
| working on and get $10 in free credit. I'm at hi@gpu.land.
|
| Otherwise I'm around for any questions!
| ogiberstein wrote:
| Wow, sounds really useful! Will check it out
| typon wrote:
| Hey ilmoi,
|
| Do you support multi-node training? Particularly, can I reserve
| for example a 64-GPU instance via 8 nodes and perform
| distributed training?
| ilmoi wrote:
| You can! All machines within a single account are inside a
| private VLAN, which means they all talk to each other (out of
| the box, no setup required), but nobody else on gpu.land sees
| them.
|
| See the "Are instances within my account connected?" item in
| the FAQ - https://gpu.land/faq
|
| If you're going to need 64 GPUs I'll have to increase your
| account limit (currently 16 GPUs per account). Email me at
| hi@gpu.land
| etaioinshrdlu wrote:
| Can you use GeForce cards and then make us pinky swear to only
| use it for blockchain processing applications? This is what
| LeaderGPU does (https://www.leadergpu.com/#chose-best), and
| they had even lower pricing.
|
| Although lately they have had demand outstrip supply.
|
| Hetzner used to have GTX 1080 instances for about $100 a month,
| no longer though, and I'm lucky to be grandfathered in to about
| 8 of them.
|
| I told myself a couple years ago that compute will only get
| cheaper over time. In reality, compute has gotten MORE
| expensive over time! I cannot match or scale the compute price
| I locked in a couple years ago with Hetzner. Some of that is
| the crypto market raising GPU prices, but it is also NVIDIA's
| licensing making their cheaper cards unavailable in cloud
| servers...
| liuliu wrote:
| Thank you! This looks indeed cheap.
|
| Can you share more stories on data persistence / checkpointing?
| If I have a job that requires 8 V100 with 3 days, what types of
| reliability am I looking at?
| ilmoi wrote:
| Exactly the same as you would be had you been running an EC2
| instance at AWS. The machines are hosted, maintained and
| managed in exactly the same way in a whitelabel DC.
| sacheendra wrote:
| This I highly unlikely. Hyperscalers like AWS have custom
| power delivery, rack organization, redundant networking,
| etc. which make their instances reliable.
|
| Long running jobs often use checkpoints which require high
| speed networking and storage, which I don't see an option
| for. Eg, I cam get EC2 instances with 100gbps networking.
|
| Great job starting the service! But, I think you have ways
| to go before reaching Hyperscalers level reliability.
| joecot wrote:
| I'm not very familiar with machine learning setups, but in
| general with hosts you trade SLA for pricing. You trade the
| 99.999% uptime and data consistency of hyperscaled hosts for
| cheaper pricing at smaller hosts. Assuming you can backup
| your data at checkpoints, consider running it on a setup like
| this for very cheap, and then back it up every hour/day to a
| consistently reliable data storage. AWS EC2 is very expensive
| for computing, but S3 is relatively cheap for storage.
| Backblaze B2 storage is also even cheaper, but with less
| guaranteed reliability than S3. But the odds of both this
| going down on you and Backblaze failing at the same time are
| pretty low.
|
| I run a credit card processor on AWS, my important personal
| websites on Linode, and my fun time websites and video
| conferencing on a small Chicago host called Genesis Hosting
| with a website out of the 90s and dirt cheap pricing (but
| excellent support). Match your price and SLA with how much
| pain it's going to cause you if it goes down, and don't put
| pay extra to put all your stuff on the same 5 9s host if you
| don't really have to.
| ghoomketu wrote:
| For somebody who was 0 knowledge of ML but has always been wary
| of the huge costs involved this sure looks like a great
| offering!
|
| But I still don't know what it would cost to get something
| useful out of it. Can you (or anybody who knows about ML) tell
| me a very very ballpark amount of what costs it incurs to train
| a model like www.remove.bg that automatically removes
| background from photos? I'm not trying to build a clone but
| curios what sort ot financial investment it takes to make such
| things.
| perennate wrote:
| Unless you're using unsupervised learning or can find a good
| dataset, most of the cost will be in labeling the data. I'm
| not too familiar with background removal, there may be some
| self-supervised/contrastive learning approaches, but
| generally they don't work as well as supervised learning.
| Even for a week of training, the compute cost is only $200.
|
| Edit: maybe you can get some decent results just with COCO
| segmentation labels:
| https://towardsdatascience.com/background-removal-with-
| deep-...
| ashish01 wrote:
| Nice! I really like the top up feature. Really gives me peace
| of mind that I will not blow up my budget accidentally. I have
| been using https://datacrunch.io/ which has almost same feature
| set.
| mraza007 wrote:
| Hey I loved the product you made. As a student who is always
| concerned about the spending on cloud services as they can be
| pricey but I'm just curious I have used google Colab in the
| past and it charges $10/Month so I wanted to know how is it
| different when compared to google colab
|
| I would love to hear your thoughts on this
| chovybizzass wrote:
| was going to mine but you say not to.
| ilmoi wrote:
| thanks for reading the FAQ! Most people don't seem to
| bother:)
| Sanzig wrote:
| Is general CUDA development permitted, or can I only use
| this for deep learning?
|
| There's a simulation problem that I encounter often at work
| which should parallelize very well, and I was considering
| getting my feet wet with CUDA by writing a solver for it.
| ilmoi wrote:
| CUDA develpment is totally fine. The service was designed
| with DL in mind, but that shouldn't stop CUDA dev in any
| way.
| teruakohatu wrote:
| Is there any persistent storage options? I looked through the
| faq and comparisons and couldn't find any mention of it.
| iujjkfjdkkdkf wrote:
| Sorry if I missed it in the FAQ, do you have finite capacity or
| do you have an agreement with the data center where you can bring
| on more gpus with demand. My worry would be to wake up one day
| and find that all GPUs are in use.
|
| And (this may sound naive) I work with other companies' data, I
| have a responsibility to them to keep it safe, do you have any
| concerns with this being used for "professional" applications or
| are you targeting research / hobby?
|
| To be clear, I am dealing with things that my clients are
| comfortable with me working with on mainstream cloud providers,
| not state secrets or data with legislative requirements.
| artem_mazur wrote:
| Looks interesting! Cool!!
| dindresto wrote:
| How does the Tesla V100 compare to the Tesla P100, which Scaleway
| offers at a similar price of EUR1/hour?
|
| https://www.scaleway.com/en/gpu-instances/
| ilmoi wrote:
| Tesla P100 is the previous generation. P100 > V100 > A100. You
| can see detailed benchmarking here - http://ai-
| benchmark.com/ranking_deeplearning.html
| knrz wrote:
| For others who may have been confused, like me, in order of
| GPU power:
|
| (Least) P100 < V100 < A100 (Most)
| ilmoi wrote:
| Sorry I should have been clearer in my reply. Thanks for
| clarifying!
| usmannk wrote:
| This is seriously cool. I want to mention that on AWS you can
| reliably get V100's for $0.918/hr (I do this all the time) by
| using spot instances. The spot price hasn't changed by even 0.01
| for at least a year, so you're extremely unlikely to get an
| unexpected shutdown. Also, how did you manage to get V100s? What
| was that like?
| ilmoi wrote:
| Yep you're correct that V100s are available as spot instances.
| If you're willing to write some extra code for saving down
| weights you could easily go with them.
|
| Actually, I'm renting V100s. Got lucky to know the right person
| at the right time:)
| usmannk wrote:
| Ah gotcha, did you have to move them to your DC and have them
| racked?
| mastermojo wrote:
| Very cool. Is there a way to persist disks at a cost and
| attach/unattach Voltas for training?
|
| EDIT: found in the FAQ:
|
| Compute: $0.99/hr / 1x Tesla V100 (running instance only)
| Storage: $0.02/GB/month (running and stopped instances)
| ilmoi wrote:
| Yep - most people stop instances and come back whenever they're
| ready to train again. No charges for GPUs while the instance is
| stopped.
|
| If you have a say 200GB harddrive you're only paying $4/month
| for storage.
| ffpip wrote:
| FYI, the site shows nothing until the scripts from stripe.com
| load. I know that that these breakages might not happen to all
| users, but it shouldn't be completely broken until the browser
| connects to Stripe.
|
| https://images2.imgbox.com/b2/42/n87NuBRT_o.png
| ilmoi wrote:
| Do you think it's the scripts from stripe that are blocking the
| view? I would guess it's auth0.
|
| This was my first front-end project so any feedback 100%
| welcome.
| jandrese wrote:
| FWIW when I went to the site I had to whitelist scripts from
| js.stripe.com to get anything more than a black screen.
| ilmoi wrote:
| Ok noted. That's for me to work on. Thanks for pointing
| out.
| neil1 wrote:
| This is a pretty cool project
| maremmano wrote:
| I came for the GPU, I stayed for the vintage CSS
| iujjkfjdkkdkf wrote:
| One more comment / suggestion: I watched the video and saw there
| are lots of pre-configured environments. I see this a lot, but to
| be honest this is not to helpful for me because I'm not doing
| development in your environment, I'm writing and checking the
| code somewhere else and then training on the gpus.
|
| For the same reason, I'm only getting value out of the gpus when
| I am right ready to train. So I would be much happier if I could
| push a docker, or maybe even a conda environment spec, along with
| my code, attach data storage, and run e.g. train.py (or more
| likely a shell script that calls it) to completion and then
| immediately release the GPUs. Everything else is just the
| overhead of as quickly as possible trying to get the environment
| right, run my script, and then shut down the instance as soon as
| it's done. It would be awesome to have this kind of
| functionality, or is there a way to do that I missed?
| paul_milovanov wrote:
| this is great, except apparently there's no support for
| containers? is that on the roadmap?
|
| thanks!
| 37ef_ced3 wrote:
| AVX-512 neural net inference on inexpensive, CPU-only cloud
| compute instances: https://NN-512.com
|
| An AVX-512 Skylake-X cloud compute instance costs $10 per CPU-
| core per month at Vultr (https://www.vultr.com/products/cloud-
| compute/), and you can do about 18 DenseNet121 inferences per
| CPU-core per second (in series, not batched) using tools like
| NN-512
|
| GPU cloud compute is almost unbelievably expensive. Even Linode
| charges $1000 per month, or $1.50 per hour (look at the GPU
| plans: https://www.linode.com/pricing/#row--compute). It's really
| hard to keep that GPU saturated, which is what you need to do to
| get your money's worth
|
| As AVX-512 becomes better supported by Intel and AMD chips, it
| becomes more attractive as an alternative to expensive GPU
| instances for workloads with small amounts of inference mixed
| with other computation
| rckoepke wrote:
| What does this have to do with https://gpu.land/ ? I've been
| very impressed with NN-512 in your past postings but I'm
| failing to see the direct salience w.r.t the top level post.
|
| Perhaps readers would benefit from an apples-to-apples
| comparison of V100 to some CPU in a training per dollar metric?
| Preferably using something like MLPerf. You do mention
| inference but I think most people looking at https://gpu.land/
| will be far more interested in training rather than inference.
|
| I think the most direct competitor to this ShowHN would be
| https://vast.ai/
| ilmoi wrote:
| You are correct that vast.ai is pretty close.
|
| The biggest difference is probably security / guaranteed
| uptime. With vast you're getting what it says on the tin - a
| machine from a marketplace. Could come from anyone /
| anywhere. No idea what else is running on it. Ours are hosted
| in a professional DC, managed and secured as they should be.
|
| If anyone's curious, there's a detailed comparison page with
| other platforms here - https://gpu.land/versus
| perennate wrote:
| I think Linode/Paperspace/AWS/GCP/etc. are closer to "direct
| competitors". I would be much more comfortable trusting a
| single entity that owns its servers than a GPU rental
| marketplace like vast.ai.
| 37ef_ced3 wrote:
| The suggestion is that you may not want to use a GPU if you
| can use AVX-512 CPUs for your machine learning workload.
| Reducing the cost of the GPU helps, but it's still relatively
| expensive
| akrymski wrote:
| Congrats on the launch! Cheaper GPUs are always welcome ;-)
| stuartbman wrote:
| This is a very cool project, but can I ask if there's a
| particular CSS package you used to get this site appearance?
| ilmoi wrote:
| Thanks! I used Tailwind CSS and built the theme from scratch,
| but I drew inspiration from:
|
| - https://nostalgic-css.github.io/NES.css/
|
| - https://jdan.github.io/98.css/
| DIVx0 wrote:
| Could this be used for bespoke cloud gaming? I was considering
| setting up a parsec host with paperspace GPU. I want more control
| over the game client than geforce now or stadia provides and a
| better GPU and availability than Shadow.tech currently offers.
|
| *edit: I see that only Linux hosts are available at the moment so
| that kills my use case but I'll keep the question up for grins
| and giggles
| ilmoi wrote:
| Not really. Firstly it will be way too expensive for you (cloud
| gamins is like $10/mo, this is $10/10h). Secondly the GPUs
| installed (Tesla V100s) are not designed for gaming - rather
| for ML & scientific computing.
|
| You should check out some of the services people mention in
| https://www.reddit.com/r/cloudygamer/
| tombh wrote:
| These seems very similarly priced to AWS Spot Instances. But
| you're guaranteeing uptime?
| perennate wrote:
| In the FAQ it says:
|
| > Is my instance guaranteed?
|
| > Yes, unless you run out of credit. To prevent that from
| happening be sure to setup automatic top ups.
| samsammurphy wrote:
| Cooooool!
| DougN7 wrote:
| I know nothing about crypto mining, but why wouldn't this be
| allowed? How would you stop it?
| ilmoi wrote:
| you'll be making around $6/d, while losing $24h paying for the
| service.
| ska wrote:
| see jsnell's point which is worth considering and mitigating
| if your (CC) processor isn't able to catch.
| imhoguy wrote:
| Woudn't pay off really, check the FAQ.
| jsnell wrote:
| Mining cryptocurrencies is a good way to launder stolen credit
| cards numbers to cash. It is scalable (unlike most digital
| goods) and automatable (unlike most physical goods). It being
| unprofitable will not dissuade anyone from this use case: they
| are not paying with their own money to start with.
|
| So it is best to forbid it in the TOS and just autokill the
| miners, rather than wait for the chargebacks. Especially if the
| compute is being sold at cost, since there is no buffer of
| profits to balance out the abuse.
| ilmoi wrote:
| Wow thanks for posting this! I defo haven't thought about
| this use case. So interesting.
|
| So we actually explicitly prohibit mining in the T&Cs, it's
| just that in the FAQ I wanted to be a bit more human and
| dissuade people.
|
| Also, there's quite a few protections built-in at the network
| layer to make sure mining isn't possible. Ports, ips, dns,
| even DPI. I learnt a lot about the early days of bitcoin and
| mining protocols when I was building gpu.land:)
|
| But again, thanks for sharing this. Really good to know.
| perennate wrote:
| The website design put me off but the FAQ makes it clear that it
| is quite professional. They have information about physical
| security, GDPR compliance, and VLAN. Wish there was some
| information about storage, e.g. if it's local disk or distributed
| block storage, and how many replicas. Very nice that unused
| credit is refundable.
| ilmoi wrote:
| Great point re storage, I'll be sure to add to the FAQ. To
| answer your question it's SSD block storage.
___________________________________________________________________
(page generated 2021-03-17 23:02 UTC)