https://github.com/kevmo314/scuda Skip to content Navigation Menu Toggle navigation Sign in * Product + GitHub Copilot Write better code with AI + Security Find and fix vulnerabilities + Actions Automate any workflow + Codespaces Instant dev environments + Issues Plan and track work + Code Review Manage code changes + Discussions Collaborate outside of code + Code Search Find more, search less Explore + All features + Documentation + GitHub Skills + Blog * Solutions By size + Enterprise + Teams + Startups By industry + Healthcare + Financial services + Manufacturing By use case + CI/CD & Automation + DevOps + DevSecOps * Resources Topics + AI + DevOps + Security + Software Development + View all Explore + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up Reseting focus You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} kevmo314 / scuda Public * Notifications You must be signed in to change notification settings * Fork 5 * Star 370 370 stars 5 forks Branches Tags Activity Star Notifications You must be signed in to change notification settings * Code * Pull requests 1 * Actions * Security * Insights Additional navigation options * Code * Pull requests * Actions * Security * Insights kevmo314/scuda This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main BranchesTags Go to file Code Folders and files Name Name Last commit message Last commit date Latest commit History 79 Commits .vscode .vscode codegen codegen .env.example .env.example .gitignore .gitignore Dockerfile.test Dockerfile.test README.md README.md TODO.md TODO.md client.cpp client.cpp local.sh local.sh server.cu server.cu start.sh start.sh View all files Repository files navigation * README SCUDA: GPU-over-IP SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines. Demo The below demo displays a NVIDIA GeForce RTX 4090 running on a remote machine (right pane). Left pane is a Mac running a docker container with nvidia utils installed. The docker container runs python3 -c "import torch; print (torch.cuda.is_available())" to check if cuda is available. You can view the docker image used here. Screen.Recording.2024-10-08.at.8.27.07.PM.mp4 Local development Make the local dev script executable chmod +x local.sh Also helpful to alias this local script in your bash profile. alias s='/home/brodey/scuda-latest/local.sh' It's required to run scuda server before initiating client commands. s server Running the client If the server above is running: s run The above will rebuild the client and run nvidia-smi for you. Installation To install SCUDA, run the server binary on the GPU host: scuda -l 0.0.0.0:0 Then, on the client, run: scuda : Building from source nvcc -shared -o libscuda.so client.c This library can then be preloaded LD_PRELOAD=libscuda.so nvidia-smi By default, the client library passes calls through to the client. In other words, it does not connect to a server. To connect to a server, create a file with the host you wish to connect to ~/.config/scuda/host Motivations The goal of SCUDA is to enable developers to easily interact with GPUs over a network in order to take advantage of various pools of distributed GPUs. Obviously TCP is slower than traditional methods, but we have plans to minimize performance impact through various methods. Some use cases / motivations: 1. Local testing - For testing purposes, the latency added by TCP is acceptable, as the goal is to verify compatibility and performance rather than achieving the lowest latency. The remote GPU can still fully accelerate the application, allowing a developer to run tests they otherwise couldn't on their local setup. 2. Aggregated GPU pools - The goal is to centralize GPU management and resource allocation, making it easier to deploy and scale containerized applications that need GPU support without worrying about GPU availability. SCUDA will eventually handle capacity management and pooling. 3. Remote model training - Developers can train models from their laptops or low-power devices, using GPUs optimized for training without needing to deploy a full VM or move the entire development environment to the remote location. 4. Remote inferencing - For remote inferencing, devs can set up their application locally but direct all CUDA calls for model inference to a remote GPU server. The application can thus process large batches of images or video frames using the remote GPU's acceleration capabilities. 5. Remote data processing - Developers can run operations like filtering, joining, and aggregating data directly on the remote GPU, while the results are transferred back over the network. Technically, developers can accelerate matrix multiplication or linear algebra computations on large datasets by offloading these computations to a remote GPU; they can run their scripts locally while utilizing the power of a remote machine. 6. Remote fine-tuning - Developers can download a pre-trained model (ex: resnet) and fine-tune it. With SCUDA, training is done remotely using the library to route PyTorch CUDA calls over TCP to a remote GPU, allowing the developer to run the fine-tuning process from their local machine or Jupyter Notebook environment. Future goals: See our TODO. Prior Art This project is inspired by some existing proprietary solutions: * https://www.thundercompute.com/ * https://www.juicelabs.co/ * https://en.wikipedia.org/wiki/RCUDA (That's where SCUDA's name comes from, S is the next letter after R!) Benchmarks todo About No description, website, or topics provided. Resources Readme Activity Stars 370 stars Watchers 7 watching Forks 5 forks Report repository Releases No releases published Packages 0 No packages published Contributors 2 * @kevmo314 kevmo314 Kevin Wang * @brodeynewman brodeynewman Brodey Newman Languages * C++ 79.7% * C 16.9% * Python 2.6% * Other 0.8% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.