[HN Gopher] Show HN: Want something better than k-means? Try Ban...
       ___________________________________________________________________
        
       Show HN: Want something better than k-means? Try BanditPAM
        
       Want something better than k-means? I'm happy to announce our SOTA
       k-medoids algorithm from NeurIPS 2020, BanditPAM, is now publicly
       available! `pip install banditpam` or
       `install.packages("banditpam")` and you're good to go!  k-means is
       one of the most widely-used algorithms to cluster data. However, it
       has several limitations: a) it requires the use of L2 distance for
       efficient clustering, which also b) restricts the data you're
       clustering to be vectors, and c) doesn't require the means to be
       datapoints in the dataset.  Unlike in k-means, the k-medoids
       problem requires cluster centers to be actual datapoints, which
       permits greater interpretability of your cluster centers. k-medoids
       also works better with arbitrary distance metrics, so your
       clustering can be more robust to outliers if you're using metrics
       like L1. Despite these advantages, most people don't use k-medoids
       because prior algorithms were too slow.  In our NeurIPS 2020 paper,
       BanditPAM, we sped up the best known algorithm from O(n^2) to
       O(nlogn) by using techniques from multi-armed bandits. We were
       inspired by prior research that demonstrated many algorithms can be
       sped up by sampling the data intelligently, instead of performing
       exhaustive computations.  We've released our implementation, which
       is pip- and CRAN-installable. It's written in C++ for speed, but
       callable from Python and R. It also supports parallelization and
       intelligent caching at no extra complexity to end users. Its
       interface also matches the sklearn.cluster.KMeans interface, so
       minimal changes are necessary to existing code.  PyPI:
       https://pypi.org/project/banditpam  CRAN:
       https://cran.r-project.org/web/packages/banditpam/index.html  Repo:
       https://github.com/motiwari/BanditPAM  Paper:
       https://arxiv.org/abs/2006.06856  If you find our work valuable,
       please consider starring the repo or citing our work. These help us
       continue development on this project.  I'm Mo Tiwari
       (motiwari.com), a PhD student in Computer Science at Stanford
       University. A special thanks to my collaborators on this project,
       Martin Jinye Zhang, James Mayclin, Sebastian Thrun, Chris Piech,
       and Ilan Shomorony, as well as the author of the R package,
       Balasubramanian Narasimhan.  (This is my first time posting on HN;
       I've read the FAQ before posting, but please let me know if I broke
       any rules)
        
       Author : motiwari
       Score  : 121 points
       Date   : 2023-04-04 20:16 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | alsodumb wrote:
       | Awesome work and it's a great HN contribution! I loved your blog
       | post, it gave me a gist of everything I wanted to know without
       | reading the paper (I have back ground in bandits).
       | 
       | I have a couple of non-technical questions:
       | 
       | 1. Can you share some background on how this work developed? I am
       | guessing there were many attempts to improve PAM over the last
       | three decades, right? And in hindsight, bandit-based approach
       | seems like a natural approach to try, right? Did you start with
       | trying to improve PAM and realize no one else thought of a
       | probabilistic/random approach?
       | 
       | 2. Once you realized multi-armed bandit approach is the way to
       | go, did implementation of the idea and empirical evaluation take
       | a lot of time? I am guessing most of the effort went to providing
       | complexity guarantees, right?
       | 
       | 3. The paper has an interesting set of authors from diverse areas
       | - areas in which the k-medoid problems seems highly relevant.
       | This was partly the reason why I asked question 1. - was the
       | project motivated by the need of such an algorithm in application
       | areas or what is by looking for an area to apply the insight that
       | bandit based approaches can actually perform better.
       | 
       | Overall, I really like the life-cycle of the entire paper. It
       | started with a highly relevant and practical problem, gave an
       | intuitive algorithm that comes with complexity bounds, has an
       | accessible blog post to support the paper, and has what seems to
       | be a very efficient implementation that can directly be used in
       | production at scale. A lot of researchers miss the last part and
       | move on to the next project (I am guilty of that) - kudos to you
       | for spending time on the implementation! If you ever end up at
       | UIUC, I'd love to buy you a coffee (:
       | 
       | PS: I am a grad student at UIUC and was scrolling by and stopped
       | as I saw two familiar names: Ilan (took Random processes with him
       | and loved it) and of course who in robotics wouldn't know Prof.
       | Thrun (for those who don't, his Probabilistic Robotics is a
       | mandatory reference in every robotics class).
        
         | motiwari wrote:
         | Thank you for the positive feedback! Will definitely take you
         | up on that coffee next time I visit Ilan :)
         | 
         | To answer your questions:
         | 
         | 1. When looking at k-medoids algorithms, we realized that PAM
         | hadn't been improved since the 80s! Other, faster algorithms
         | had been developed but sacrificed solution quality (clustering
         | loss) for speed. Simultaneously, prior work from my coauthors
         | had recognized that the 1-medoid problem could be solved via
         | randomized algorithms/multi-armed bandit -- much faster but
         | returning the same solution. Our key insight was that _every_
         | stage of PAM could be recast as a multi-armed bandit problem,
         | and that reusing information across different stages would
         | result in further speedups.
         | 
         | 2. Actually, the complexity guarantees/theory were pretty easy
         | because we were able to use proof techniques that are common in
         | the multi-armed bandit literature. The hardest part was
         | implementing it in C++ and making it available to both Python
         | and R via bindings. For the original paper we did everything in
         | Python and measured sample complexity, but to make the
         | algorithm valuable for users we had to implement it in a more
         | performant language.
         | 
         | 3. To be honest, this project came about from a chance meeting
         | at a conference (ICML 2019). I was randomly introduced to
         | Martin Zhang (ironically I met him first at the conference even
         | though we were both at Stanford). Martin and Ilan had deep
         | expertise in multi-armed bandits/randomized algorithms (and
         | solved the 1-medoid problem), and I got really interested in
         | their work when talking with them, primarily because it seemed
         | like a straightforward win and useful for a lot of people.
        
       | yeldarb wrote:
       | Is there a good heuristic for picking a reasonable number of
       | clusters automatically for an arbitrary set of vectors?
        
         | syntaxfree wrote:
         | You could always try to use a density based clusterer like
         | DBSCAN, HDBSCAN or OPTICS to determine a likely number of
         | clusters.
        
         | motiwari wrote:
         | The elbow method is pretty common!
         | https://en.wikipedia.org/wiki/Elbow_method_(clustering)
         | 
         | You can also use some regularization criterion (AIC, BIC, or
         | other)
        
         | esalman wrote:
         | I usually go with Davies-Bouldin index but there are a few
         | methods:
         | 
         | Python/Sklearn: https://scikit-
         | learn.org/stable/modules/clustering.html#clus...
         | 
         | R:
         | https://cran.r-hub.io/web/packages/clusterCrit/clusterCrit.p...
        
       | binarymax wrote:
       | Really interesting and have not heard of k-medoids before!
       | 
       | Have you tried BanditPAM as an index creation technique for
       | approximate-nearest-neighbor search?
        
         | motiwari wrote:
         | Really funny that you mention that! Some of our more recent
         | work focuses on using adaptive sampling techniques in
         | approximate-nearest-neighbor search (actually, the related
         | problem of maximum inner product search:
         | https://ar5iv.org/abs/2212.07551).
         | 
         | We definitely think that our approach could be used to make an
         | index structure for ANN search directly, for example in
         | conjunction with Hierarchical Navigable Small World approaches.
        
           | eternalban wrote:
           | Thanks for sharing your great work!
           | 
           | [Try this tool from arxiv labs (just replace the x in
           | original url with a 5)]
           | 
           | https://ar5iv.labs.arxiv.org/html/2006.06856
           | 
           | https://ar5iv.labs.arxiv.org/html/2212.07551
        
             | motiwari wrote:
             | Oh cool, neat trick, thanks!
        
       | cicdw wrote:
       | Your algorithm looks surprisingly similar to "Affinity
       | Propagation" which uses message passing techniques to
       | (approximately) optimize the binary integer programming problem.
       | Message passing algorithms have always fascinated me as they seem
       | to be related to deep structure in the original linear
       | programming problem.
       | 
       | For example, there are results for other binary problems that
       | show a relationship between fixed points of message passing and
       | optimal dual points to the relaxed linear programming problem
       | (see below for an example with maximum weighted independent
       | sets).
       | 
       | Back in the day I spent a long time trying to directly relate the
       | affinity propagation messages to a coordinate-descent type of
       | algorithm on the dual for k-medoids but despite the similarity in
       | structure I could never make it work.
       | 
       | I'm curious if you're familiar with this class of algorithms and
       | how they compare (both practically and theoretically) to the work
       | you've presented here? Thanks for sharing!
       | 
       | References: - https://www.science.org/doi/10.1126/science.1136800
       | - https://arxiv.org/abs/0807.5091
        
         | motiwari wrote:
         | Interesting, thanks for the references! I'm not too familiar
         | with this line of work; let me read up on it and get back to
         | you
        
       | arc4n0n wrote:
       | Cool. Could this/similar sampling be used to initialize a
       | Gaussian Mixture Model and speed that up too?
        
       | vitorsr wrote:
       | Thanks for doing this, I will be sure to try it in ML
       | competitions. I really like that you used Armadillo, which is
       | something I personally want to do for my own projects.
       | 
       | Just out of technical curiosity, have you come across any
       | particular developments or empirical evidence on the use of
       | (invertible) data transformations to enhance clustering results?
       | I am currently researching a particular problem within signal
       | processing related to signal distribution transforms and I am
       | particularly interested in reading about potential applications.
       | As an example, and since you mention JPM partial funding, how
       | would copula transformation affect the results of clustering
       | (assuming an inverse exists etc. and we apply the inverse
       | transformation afterwards)?
        
         | motiwari wrote:
         | One thing that's important to note is that k-medoids supports
         | arbitrary distance metrics -- in fact, your dissimilarity
         | measure need not even be a metric (it can be negative,
         | asymmetric, not satisfy the triangle inequality, etc.)
         | 
         | An implication of this is that if you were to do some
         | invertible data transformation and then perform clustering,
         | that's equivalent to doing clustering with a different
         | dissimilarity measure (without the data transformation in the
         | first place). It should be possible to avoid doing the
         | invertible data transformation in the first place if you're
         | willing to engineer your dissimilarity measure.
         | 
         | Without more details, it's hard to say exactly what would
         | happen to the clustering results under custom dissimilarity
         | measures or data transformations -- but our package supports
         | both use cases!
        
       ___________________________________________________________________
       (page generated 2023-04-05 23:00 UTC)