https://github.com/rentruewang/bocoel Skip to content Toggle navigation Sign in * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} rentruewang / bocoel Public * Notifications * Fork 0 * Star 125 * Bayesian Optimization as a Coverage Tool for Evaluating LLMs. 10 times faster and accurate evaluation (benchmarking) with just a few lines of modular code. rentruewang.github.io/bocoel/ License Apache-2.0 license 125 stars 0 forks Branches Tags Activity Star Notifications * Code * Issues 0 * Pull requests 0 * Actions * Projects 0 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Projects * Security * Insights rentruewang/bocoel This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main BranchesTags Go to file Code Folders and files Name Name Last commit Last commit message date Latest commit History 189 Commits .github/workflows .github/workflows assets assets bocoel bocoel docs docs examples examples tests tests .gitignore .gitignore CHANGELOG.md CHANGELOG.md CODE_OF_CONDUCT.md CODE_OF_CONDUCT.md CONTRIBUTING.md CONTRIBUTING.md LICENSE.md LICENSE.md README.md README.md mkdocs.yml mkdocs.yml pyproject.toml pyproject.toml View all files Repository files navigation * README * Code of conduct * Apache-2.0 license [?] BoCoEL Bayesian Optimization as a Coverage Tool for Evaluating Large Language Models Logo Publish Build Pages Formatting Type Checking Unit Testing GitHub License PyPI - Python Version Built with Material for MkDocs Why BoCoEL? Large language models are expensive and slow behemoths, and evaluating them on gigantic modern datasets only makes it worse. If only there is a way to just select a meaningful (and small) subset of the corpus and obtain a highly accurate evaluation..... Wait, sounds like Bayesian Optmization! Bocoel works in the following steps: 1. Encode individual entry into embeddings (way cheaper / faster than LLM and reusable). 2. Use Bayesian optimization to select queries to evaluate. 3. Use the queries to retrieve from our corpus (with the encoded embeddings). 4. Profit. The evaluations generated are easily managed by the provided manager utility. Features * Accurately evaluate large language models with just tens of samples from your selected corpus. * [?] Uses the power of Bayesian optimization to select an optimal set of samples for language model to evaluate. * Evalutes the corpus on the model in addition to evaluating the model on corpus. * Support for GPT2, Pythia, LLAMA and more through integration with huggingface transformers and datasets * Modular design. * Efficient representation of the corpus / dataset such as N-sphere representation or whitening of the latent space to agument evaluation quality. Give us a star! Like what you see? Please consider giving this a star ()! [?][?] Bayesian Optimization [68747470733a2f2f757] Simply put, Bayesian optimization aims to optimize either the exploration objective (the purple area in the image) or the exploitation object (the height of the black dots). It uses Gaussian processes as a backbone for inference, and uses an acquisition function to decide where to sample next. See here for an a more in-depth introduction. Since Bayesian optimization works well with expensive-to-evaluate black-box model (paraphrase: LLM), it is perfect for this particular use case. Bocoel uses Bayesian optimization as a backbone for exploring the embedding space given by our corpus, which allows it to select a good subset acting as a mini snapshot of the corpus. [?] Performance Implications LLMs are painfully slow, especially generative ones (which is what is usually referred to as LLM), since sequence generation is sequential by nature. Despite bocoel's requirement to use an embedder to encode the entire corpus, embedders are faster than LLMs by orders of magnitude and the time is gained back by practically any savings in evaluating LLMs. [?] Installation I don't want optional dependencies: pip install bocoel Give me the full experience (all optional dependencies): pip install "bocoel[all]" Usage See the folder examples/getting_started for a simplistic usage of the library to get started with just a few lines of code. [?] Develop with BoCoEL Usage examples are under the folder examples. API reference can be found here. Contributing Contributors wanted! Don't be shy. Feel free to file issues and PRs. For PRs, please follow the guide on contributing and the code of conduct. Openness and inclusiveness are taken very seriously. [?] Roadmap: work in progress * Simpler usage. I should provide a high level wrapper for the entire library s.t. evaluations can be run in one line. * Visualization module of the evaluation. * Integration of alternative methods (random, kmedoids...) with Gaussian process. * Integration with more backends such as VLLM and OpenAI's API. * Support for Python 3.11+ [?] License and Citation The code is available under Apache License. If you find this project helpful in your research, please cite this work at @misc{bocoel2024, title = {BoCoEL: Bayesian Optimization as a Coverage Tool for Evaluating Large Language Models}, url = {https://rentruewang.github.io/bocoel/research/}, author = {Wang, RenChu and Chuang, Yung-Sung}, month = {January}, year = {2024} } About Bayesian Optimization as a Coverage Tool for Evaluating LLMs. 10 times faster and accurate evaluation (benchmarking) with just a few lines of modular code. rentruewang.github.io/bocoel/ Resources Readme License Apache-2.0 license Code of conduct Code of conduct Activity Stars 125 stars Watchers 2 watching Forks 0 forks Report repository Releases 6 tags Packages 0 No packages published Contributors 6 * @rentruewang * @github-actions[bot] * @voidism * @doctryucsd * @PsyDak-Meng * @gauthameyunni Languages * Python 100.0% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.