https://github.com/kevmo314/scuda

Skip to content

Navigation Menu

Toggle navigation
 
Sign in

  * Product
      +  
        GitHub Copilot
        Write better code with AI
      +  
        Security
        Find and fix vulnerabilities
      +  
        Actions
        Automate any workflow
      +  
        Codespaces
        Instant dev environments
      +  
        Issues
        Plan and track work
      +  
        Code Review
        Manage code changes
      +  
        Discussions
        Collaborate outside of code
      +  
        Code Search
        Find more, search less
    Explore
      + All features
      + Documentation
      + GitHub Skills
      + Blog
  * Solutions
    By size
      + Enterprise
      + Teams
      + Startups
    By industry
      + Healthcare
      + Financial services
      + Manufacturing
    By use case
      + CI/CD & Automation
      + DevOps
      + DevSecOps
  * Resources
    Topics
      + AI
      + DevOps
      + Security
      + Software Development
      + View all
    Explore
      + Learning Pathways
      + White papers, Ebooks, Webinars
      + Customer Stories
      + Partners
  * Open Source
      +  
        GitHub Sponsors
        Fund open source developers
      +  
        The ReadME Project
        GitHub community articles
    Repositories
      + Topics
      + Trending
      + Collections
  * Enterprise
      +  
        Enterprise platform
        AI-powered developer platform
    Available add-ons
      +  
        Advanced Security
        Enterprise-grade security features
      +  
        GitHub Copilot
        Enterprise-grade AI features
      +  
        Premium Support
        Enterprise-grade 24/7 support
  * Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search
[                    ]
Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

[                    ] [ ] Include my email address so I can be
contacted
Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name [                    ] 
Query [                    ]

To see all available qualifiers, see our documentation.

Cancel Create saved search
Sign in
Sign up Reseting focus
You signed in with another tab or window. Reload to refresh your
session. You signed out in another tab or window. Reload to refresh
your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
{{ message }}
kevmo314 / scuda Public

  * Notifications You must be signed in to change notification
    settings
  * Fork 5
  * Star 370

370 stars 5 forks Branches Tags Activity
Star
Notifications You must be signed in to change notification settings

  * Code
  * Pull requests 1
  * Actions
  * Security
  * Insights

Additional navigation options

  * Code
  * Pull requests
  * Actions
  * Security
  * Insights

kevmo314/scuda

This commit does not belong to any branch on this repository, and may
belong to a fork outside of the repository.
 main
BranchesTags
  
Go to file
Code

Folders and files

     Name            Name       Last commit message Last commit date
Latest commit

 

History

79 Commits
 
.vscode         .vscode                              
codegen         codegen                              
.env.example    .env.example                         
.gitignore      .gitignore                           
Dockerfile.test Dockerfile.test                      
README.md       README.md                            
TODO.md         TODO.md                              
client.cpp      client.cpp                           
local.sh        local.sh                             
server.cu       server.cu                            
start.sh        start.sh                             
View all files

Repository files navigation

  * README

SCUDA: GPU-over-IP

 

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be
attached to CPU-only machines.

Demo

 

The below demo displays a NVIDIA GeForce RTX 4090 running on a remote
machine (right pane). Left pane is a Mac running a docker container
with nvidia utils installed.

The docker container runs python3 -c "import torch; print
(torch.cuda.is_available())" to check if cuda is available.

You can view the docker image used here.

Screen.Recording.2024-10-08.at.8.27.07.PM.mp4

Local development

 

Make the local dev script executable

chmod +x local.sh

Also helpful to alias this local script in your bash profile.

alias s='/home/brodey/scuda-latest/local.sh'

It's required to run scuda server before initiating client commands.

s server

Running the client

 

If the server above is running:

s run

The above will rebuild the client and run nvidia-smi for you.

Installation

 

To install SCUDA, run the server binary on the GPU host:

scuda -l 0.0.0.0:0

Then, on the client, run:

scuda <ip>:<port>

Building from source

 

nvcc -shared -o libscuda.so client.c

This library can then be preloaded

LD_PRELOAD=libscuda.so nvidia-smi

By default, the client library passes calls through to the client. In
other words, it does not connect to a server. To connect to a server,
create a file with the host you wish to connect to

~/.config/scuda/host

Motivations

 

The goal of SCUDA is to enable developers to easily interact with
GPUs over a network in order to take advantage of various pools of
distributed GPUs. Obviously TCP is slower than traditional methods,
but we have plans to minimize performance impact through various
methods.

Some use cases / motivations:

 

 1. Local testing - For testing purposes, the latency added by TCP is
    acceptable, as the goal is to verify compatibility and
    performance rather than achieving the lowest latency. The remote
    GPU can still fully accelerate the application, allowing a
    developer to run tests they otherwise couldn't on their local
    setup.

 2. Aggregated GPU pools - The goal is to centralize GPU management
    and resource allocation, making it easier to deploy and scale
    containerized applications that need GPU support without worrying
    about GPU availability. SCUDA will eventually handle capacity
    management and pooling.

 3. Remote model training - Developers can train models from their
    laptops or low-power devices, using GPUs optimized for training
    without needing to deploy a full VM or move the entire
    development environment to the remote location.

 4. Remote inferencing - For remote inferencing, devs can set up
    their application locally but direct all CUDA calls for model
    inference to a remote GPU server. The application can thus
    process large batches of images or video frames using the remote
    GPU's acceleration capabilities.

 5. Remote data processing - Developers can run operations like
    filtering, joining, and aggregating data directly on the remote
    GPU, while the results are transferred back over the network.
    Technically, developers can accelerate matrix multiplication or
    linear algebra computations on large datasets by offloading these
    computations to a remote GPU; they can run their scripts locally
    while utilizing the power of a remote machine.

 6. Remote fine-tuning - Developers can download a pre-trained model
    (ex: resnet) and fine-tune it. With SCUDA, training is done
    remotely using the library to route PyTorch CUDA calls over TCP
    to a remote GPU, allowing the developer to run the fine-tuning
    process from their local machine or Jupyter Notebook environment.

Future goals:

 

See our TODO.

Prior Art

 

This project is inspired by some existing proprietary solutions:

  * https://www.thundercompute.com/
  * https://www.juicelabs.co/
  * https://en.wikipedia.org/wiki/RCUDA (That's where SCUDA's name
    comes from, S is the next letter after R!)

Benchmarks

 

todo

About

No description, website, or topics provided.

Resources

Readme
Activity

Stars

370 stars

Watchers

7 watching

Forks

5 forks
Report repository

Releases

No releases published

Packages 0

No packages published

Contributors 2

  * @kevmo314 kevmo314 Kevin Wang
  * @brodeynewman brodeynewman Brodey Newman

Languages

  * C++ 79.7%
  * C 16.9%
  * Python 2.6%
  * Other 0.8%

Footer

 (c) 2024 GitHub, Inc.

Footer navigation

  * Terms
  * Privacy
  * Security
  * Status
  * Docs
  * Contact
  * Manage cookies
  * Do not share my personal information

You can't perform that action at this time.