https://github.com/motiwari/BanditPAM Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code + Explore + All features + Documentation + GitHub Skills + Blog * Solutions + For + Enterprise + Teams + Startups + Education + By Solution + CI/CD & Automation + DevOps + DevSecOps + Case Studies + Customer Stories + Resources * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles + Repositories + Topics + Trending + Collections * Pricing [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this user All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} motiwari / BanditPAM Public * Notifications * Fork 25 * Star 383 BanditPAM C++ implementation and Python package License MIT license 383 stars 25 forks Star Notifications * Code * Issues 69 * Pull requests 2 * Actions * Projects 0 * Wiki * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Wiki * Security * Insights motiwari/BanditPAM This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 9 branches 11 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/m] Use Git or checkout with SVN using the web URL. [gh repo clone motiwa] Work fast with our official CLI. Learn more. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @motiwari motiwari Update README.md ... 3a8eb53 Apr 5, 2023 Update README.md 3a8eb53 Git stats * 967 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .github/workflows Version bump to 4.0.1, testing GHA March 26, 2023 20:38 README_files Updating readme September 30, 2020 01:31 R_package/banditpam Sync docs to 1.0-1 version March 14, 2023 11:36 data Fixing some filenames December 21, 2021 14:50 docs Fixing some more docs links March 13, 2023 17:06 headers Unbreaking segfault March 13, 2023 19:10 scripts Fixing some more docs links March 13, 2023 17:06 src Unbreaking segfault March 13, 2023 19:10 tests Fixing python lint errors February 17, 2023 15:16 .gitignore Modifying gitignore December 22, 2022 20:31 .gitmodules Moving carma to submodule. Fixes #40 July 7, 2021 10:00 CMakeLists.txt Updating cmake config and default swapConfidence March 13, 2023 17:50 LICENSE pypi stuff October 15, 2020 15:17 MANIFEST.in Also including fix for google colab by including MANIFEST.in and addi... January 7, 2022 21:45 README.md Update README.md April 5, 2023 12:39 pyproject.toml Fixing delocate-wheel command March 12, 2023 19:15 requirements.txt Adding myst-parser to requirements.txt December 27, 2021 17:56 setup.py Version bump to 4.0.1, testing GHA March 26, 2023 20:38 View code [ ] BanditPAM: Almost Linear-Time $k$-Medoids Clustering Requirements TL; DR run pip install banditpam or install.packages(banditpam) and jump to the examples. Further Reading Python Quickstart Install the repo and its dependencies: Example 1: Synthetic data from a Gaussian Mixture Model Example 2: MNIST and its medoids visualized via t-SNE R Examples Documentation Building the C++ executable from source Option 1: Building with Docker Option 2: Installing requirements and building directly Platform-specific installation guides C++ Usage Implementing a custom distance metric Testing Credits README.md BanditPAM: Almost Linear-Time $k$-Medoids Clustering Linux - build package and run tests Linux - build source distribution and wheels MacOS - build package and run tests MacOS - build wheels Run CMake build on MacOS Run style checks This repo contains a high-performance implementation of BanditPAM from BanditPAM: Almost Linear-Time k-Medoids Clustering. The code can be called directly from Python, R, or C++. If you use this software, please cite: Mo Tiwari, Martin Jinye Zhang, James Mayclin, Sebastian Thrun, Chris Piech, Ilan Shomorony. "BanditPAM: Almost Linear Time k-medoids Clustering via Multi-Armed Bandits" Advances in Neural Information Processing Systems (NeurIPS) 2020. @inproceedings{BanditPAM, title={BanditPAM: Almost Linear Time k-medoids Clustering via Multi-Armed Bandits}, author={Tiwari, Mo and Zhang, Martin J and Mayclin, James and Thrun, Sebastian and Piech, Chris and Shomorony, Ilan}, booktitle={Advances in Neural Information Processing Systems}, pages={368--374}, year={2020} } Requirements TL;DR run pip install banditpam or install.packages(banditpam) and jump to the examples. If you have any difficulties, please see the platform-specific guides and file a Github issue if you have additional trouble. Further Reading * Full paper * 3-minute summary video * Blog post * Code * PyPI * Documentation Python Quickstart Install the repo and its dependencies: This can be done either through PyPI (recommended) /BanditPAM/: pip install -r requirements.txt /BanditPAM/: pip install banditpam OR through the source code via /BanditPAM/: git submodule update --init --recursive /BanditPAM/: cd headers/carma /BanditPAM/: mkdir build && cd build && cmake -DCARMA_INSTALL_LIB=ON .. && sudo cmake --build . --config Release --target install /BanditPAM/: cd ../../.. /BanditPAM/: pip install -r requirements.txt /BanditPAM/: sudo pip install . Example 1: Synthetic data from a Gaussian Mixture Model from banditpam import KMedoids import numpy as np import matplotlib.pyplot as plt # Generate data from a Gaussian Mixture Model with the given means: np.random.seed(0) n_per_cluster = 40 means = np.array([[0,0], [-5,5], [5,5]]) X = np.vstack([np.random.randn(n_per_cluster, 2) + mu for mu in means]) # Fit the data with BanditPAM: kmed = KMedoids(n_medoids=3, algorithm="BanditPAM") kmed.fit(X, 'L2') print(kmed.average_loss) # prints 1.2482391595840454 print(kmed.labels) # prints cluster assignments [0] * 40 + [1] * 40 + [2] * 40 # Visualize the data and the medoids: for p_idx, point in enumerate(X): if p_idx in map(int, kmed.medoids): plt.scatter(X[p_idx, 0], X[p_idx, 1], color='red', s = 40) else: plt.scatter(X[p_idx, 0], X[p_idx, 1], color='blue', s = 10) plt.show() png Example 2: MNIST and its medoids visualized via t-SNE # Start in the repository root directory, i.e. '/BanditPAM/'. from banditpam import KMedoids import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.manifold import TSNE # Load the 1000-point subset of MNIST and calculate its t-SNE embeddings for visualization: X = pd.read_csv('data/MNIST_1k.csv', sep=' ', header=None).to_numpy() X_tsne = TSNE(n_components=2).fit_transform(X) # Fit the data with BanditPAM: kmed = KMedoids(n_medoids=10, algorithm="BanditPAM") kmed.fit(X, 'L2') # Visualize the data and the medoids via t-SNE: for p_idx, point in enumerate(X): if p_idx in map(int, kmed.medoids): plt.scatter(X_tsne[p_idx, 0], X_tsne[p_idx, 1], color='red', s = 40) else: plt.scatter(X_tsne[p_idx, 0], X_tsne[p_idx, 1], color='blue', s = 5) plt.show() R Examples Please see here. Documentation Documentation for BanditPAM can be found on read the docs. Building the C++ executable from source Please note that it is NOT necessary to build the C++ executable from source to use the Python code above. However, if you would like to use the C++ executable directly, follow the instructions below. Option 1: Building with Docker We highly recommend building using Docker. One can download and install Docker by following instructions at the Docker install page. Once you have Docker installed and the Docker Daemon is running, run the following commands: /BanditPAM/scripts/docker$ chmod +x env_setup.sh /BanditPAM/scripts/docker$ ./env_setup.sh /BanditPAM/scripts/docker$ ./run_docker.sh which will start a Docker instance with the necessary dependencies. Then: /BanditPAM$ mkdir build && cd build /BanditPAM/build$ cmake .. && make This will create an executable named BanditPAM in BanditPAM/build/ src. Option 2: Installing requirements and building directly Building this repository requires four external requirements: * CMake >= 3.17 * Armadillo >= 10.5.3 * OpenMP >= 2.5 (OpenMP is supported by default on most Linux platforms, and can be downloaded through homebrew on MacOS) * CARMA >= 0.6.2 If installing these requirements from source, one can generally use the following procedure to install each requirement from the library's root folder (with armadillo used as an example here): /armadillo$ mkdir build && cd build /armadillo/build$ cmake .. && make && sudo make install Note that CARMA has different installation instructions; see its instructions. Platform-specific installation guides Further installation information for MacOS, Linux, and Windows is available in the docs folder. Ensure all the requirements above are installed and then run: /BanditPAM$ mkdir build && cd build /BanditPAM/build$ cmake .. && make This will create an executable named BanditPAM in BanditPAM/build/ src. C++ Usage Once the executable has been built, it can be invoked with: /BanditPAM/build/src/BanditPAM -f [path/to/input.csv] -k [number of clusters] * -f is mandatory and specifies the path to the dataset * -k is mandatory and specifies the number of clusters with which to fit the data For example, if you ran ./env_setup.sh and downloaded the MNIST dataset, you could run: /BanditPAM/build/src/BanditPAM -f ../data/MNIST_1k.csv -k 10 The expected output in the command line will be: Medoids: 694,168,306,714,324,959,527,251,800,737 Implementing a custom distance metric One of the advantages of $k$-medoids is that it works with arbitrary distance metrics; in fact, your "metric" need not even be a real metric -- it can be negative, asymmetric, and/or not satisfy the triangle inequality or homogeneity. Any pairwise dissimilarity function works with $k$-medoids. This also allows for clustering of "exotic" objects like trees, graphs, natural language, and more -- settings where running $k$-means wouldn't even make sense. We talk about one such setting in the full paper. The package currently supports a number of distance metrics, including all $L_p$ losses and cosine distance. If you're willing to write a little C++, you only need to add a few lines to kmedoids_algorithm.cpp and kmedoids_algorithm.hpp to implement your distance metric / pairwise dissimilarity! Then, be sure to re-install the repository with a pip install . (note the trailing .). The maintainers of this repository are working on permitting arbitrary dissimilarity metrics that users write in Python, as well; see #4. Testing To run the full suite of tests, run in the root directory: /BanditPAM$ python -m unittest discover -s tests Alternatively, to run a "smaller" set of tests, from the main repo folder run python tests/test_smaller.py or python tests/ test_larger.py to run a set of longer, more intensive tests. Credits Mo Tiwari wrote the original Python implementation of BanditPAM and many features of the C++ implementation. Mo now maintains the C++ implementation. James Mayclin developed the initial C++ implementation of BanditPAM. The original BanditPAM paper was published by Mo Tiwari, Martin Jinye Zhang, James Mayclin, Sebastian Thrun, Chris Piech, and Ilan Shomorony. We would like to thank Jerry Quinn, David Durst, Geet Sethi, and Max Horton for helpful guidance regarding the C++ implementation. About BanditPAM C++ implementation and Python package Topics python machine-learning clustering Resources Readme License MIT license Stars 383 stars Watchers 7 watching Forks 25 forks Report repository Releases 10 BanditPAM v4.0.1 Latest Mar 27, 2023 + 9 releases Packages 0 No packages published Used by 4 * @alexismenanieves * @cwerner * @tanweer-mahdi * @motiwari Contributors 8 * * * * * * * * Languages * C++ 72.9% * Python 17.3% * R 4.3% * JavaScript 1.5% * Shell 1.5% * M4 0.8% * Other 1.7% Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.