[HN Gopher] Reverse Engineering Google Colab
       ___________________________________________________________________
        
       Reverse Engineering Google Colab
        
       Author : arjvik
       Score  : 98 points
       Date   : 2022-06-23 16:03 UTC (6 hours ago)
        
 (HTM) web link (dagshub.com)
 (TXT) w3m dump (dagshub.com)
        
       | a-dub wrote:
       | https://github.com/singhsidhukuldeep/Google-Colab-Shell
       | 
       | pops a terminal inline in the colab notbook on the backing vm.
       | super useful if you get tired of having to shell execute all the
       | time via the cell interface.
        
         | datageddon wrote:
         | There's also colab-ssh [1] that sets up an SSH tunnel (through
         | cloudflared) and allows you to connect from your ssh client in
         | your own terminal.
         | 
         | [1] https://github.com/WassimBenzarti/colab-ssh
        
       | BrianHenryIE wrote:
       | There's an active effort to (again) implement Swift on Colab:
       | 
       | https://github.com/philipturner/swift-colab/
        
       | [deleted]
        
       | cperry wrote:
       | Impressive work.
       | 
       | Just came here to note that we read all of our in-product
       | feedback submissions as well as GitHub issues:
       | https://github.com/googlecolab/colabtools/issues
       | 
       | If you've got feature requests or encounter bugs we appreciate
       | you filing!
        
         | DiogenesKynikos wrote:
         | Question: Why does Google not allow children to use Colab?
         | 
         | I can imagine plenty of teenagers interested in programming
         | would like to tinker on Colab. However, Google restricts the
         | service to people 18 and above.
        
           | cperry wrote:
           | Where are you seeing 18+ restrictions? I went through a lot
           | last year to get us approved for 13+ so we'd be good at least
           | down to middle school ish.
        
         | nalzok wrote:
         | Do you have a plan to expose some high-level API endpoints? I
         | have been dreaming about something like
         | `run.colab.research.google.com/<notebook_url>?runtime=gpu`
         | which executes a Colab notebook without human interference.
         | This can be extremely helpful in CI/CD environments when you
         | have a lot of notebooks to test, e.g. for
         | https://github.com/probml/pyprobml.
        
           | elashri wrote:
           | Colab by design is made to be interactive. They even
           | introduced CAPTCHA to make sure you don't train long models
           | and go do something else.
        
           | cperry wrote:
           | No plans at this time; we try to prioritize interactive
           | compute features. But this would be really cool to do! Maybe
           | in the future.
        
       | sillysaurusx wrote:
       | It's about a bazillion times easier to reverse engineer colab if
       | you just SSH into it. You can set up a reverse proxy. I used
       | ngrok back in the day, but maybe they blocked it.
       | 
       | The most interesting thing was a custom binary that mounts your
       | Google drive as a folder. I was able to copy it off colab and use
       | it on my own Linux boxes, which was handy in a "oh neat, lookie
       | there" kind of way. I assume it'll break whenever they update
       | their api, but you'd still be able to just grab the new binary
       | from a random colab instance.
       | 
       | There's also a custom script they run to set up everything, using
       | Node. It spawns a bunch of stuff that I've forgotten. (It was
       | 2019 when I was poking around, and a pandemic has a nice way of
       | wiping one's memory of ye olden hacking days. Still a bit sad I
       | never got to go to the tensorflow conference.)
       | 
       | Anyway just ssh in and ls -la / and you'll see one or two
       | interesting folders. You can rsync them down to your box and
       | examine at your leisure.
        
         | sva_ wrote:
         | It should be noted that it is against their rules and you might
         | get worse instances if they somehow detect it
         | 
         | https://research.google.com/colaboratory/faq.html#limitation...
        
           | teruakohatu wrote:
           | Doesn't Pro allow SSH?
        
             | sva_ wrote:
             | I looked at the license agreement, and it says under "5.
             | Restrictions"
             | 
             |  _> circumvent, reverse-engineer, modify, disable, or
             | otherwise tamper with any security technology that Google
             | uses to protect the Paid Service or encourage or help
             | anyone else to do so;
             | 
             | > access the Paid Service other than by means authorized by
             | Google; or _
             | 
             | I'm not sure what exactly they mean by "means authorized by
             | Google".
             | 
             | https://colab.research.google.com/pro/terms/v1
        
           | sillysaurusx wrote:
           | I think they're just trying to fight abuse. You can do
           | everything from colab that you can do from ssh anyway. It's
           | just faster to enter commands.
           | 
           | Good catch though. I didn't know that.
           | 
           | When I originally figured out how to ssh in, I kept it a
           | secret figuring that it'd be a matter of time till they
           | clamped down. Guess it took a few years, or I just missed it.
           | Bunch of us in the ML scene used to do it regularly, since
           | it's way easier to monitor a training run via tmux.
        
             | RyEgswuCsn wrote:
             | I think they do shut you out if you try to spin any process
             | through "unauthorised" means. There have many projects that
             | offer automated setup of SSH/VNC/VSCode on a colab
             | instance, and my experience has been that colab somehow is
             | able to manage to shut off the connections soon after I
             | start them.
        
           | dinvlad wrote:
           | I would imagine their threat model pretty much assumes anyone
           | can do anything on that host :-)
        
       | minimaxir wrote:
       | > However, it's incredibly difficult to harness the compute power
       | of Colab for anything beyond Jupyter notebooks. For Machine
       | Learning engineers that want to productionize their models and
       | bring them out of the notebook stage, this is a particularly
       | relevant issue; notebooks, while perfect for exploration, don't
       | play well with more advanced MLOps tools that codify the training
       | process into a formal pipeline.
       | 
       | That isn't what Colab is intended for. Google has better and more
       | productive tools for companies who can fit the bill, which is
       | getting cheaper over time.
       | 
       | AI Notebooks behave the same in practice as Google Colab with
       | one-click one/off for model testing + JupyterLab. If you want to
       | minimize costs via spot instances, you can deploy a Compute
       | Engine with the Deep Learning VM image, which also includes a
       | running JupyterLab on launch if need to use that workflow, and
       | also saves time by including your framework of choice. A spot VM
       | with a T4 GPU is about $0.18/hour.
        
       ___________________________________________________________________
       (page generated 2022-06-23 23:01 UTC)