https://github.com/apache/datafusion-comet Skip to content Navigation Menu Toggle navigation Sign in * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code Explore + All features + Documentation + GitHub Skills + Blog * Solutions For + Enterprise + Teams + Startups + Education By Solution + CI/CD & Automation + DevOps + DevSecOps Resources + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + Copilot Enterprise Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} apache / datafusion-comet Public * Notifications You must be signed in to change notification settings * Fork 105 * Star 490 * Apache DataFusion Comet Spark Accelerator datafusion.apache.org/comet License Apache-2.0 license 490 stars 105 forks Branches Tags Activity Star Notifications You must be signed in to change notification settings * Code * Issues 90 * Pull requests 23 * Actions * Security * Insights Additional navigation options * Code * Issues * Pull requests * Actions * Security * Insights apache/datafusion-comet This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main BranchesTags Go to file Code Folders and files Name Name Last commit Last commit message date Latest commit History 239 Commits .github .github .mvn/wrapper .mvn/wrapper bin bin common common conf conf core core dev dev docs docs spark-integration spark-integration spark spark .asf.yaml .asf.yaml .gitignore .gitignore .scalafix.conf .scalafix.conf LICENSE.txt LICENSE.txt Makefile Makefile README.md README.md mvnw mvnw mvnw.cmd mvnw.cmd pom.xml pom.xml rust-toolchain rust-toolchain scalafmt.conf scalafmt.conf View all files Repository files navigation * README * Code of conduct * Apache-2.0 license * Security Apache DataFusion Comet Apache DataFusion Comet is a high-performance accelerator for Apache Spark, built on top of the powerful Apache DataFusion query engine. Comet is designed to significantly enhance the performance of Apache Spark workloads while leveraging commodity hardware and seamlessly integrating with the Spark ecosystem without requiring any code changes. Benefits of Using Comet Run Spark Queries at DataFusion Speeds Comet delivers a performance speedup for many queries, enabling faster data processing and shorter time-to-insights. The following chart shows the time it takes to run the 22 TPC-H queries against 100 GB of data in Parquet format using a single executor with 8 cores. See the Comet Benchmarking Guide for details of the environment used for these benchmarks. When using Comet, the overall run time is reduced from 649 seconds to 440 seconds, a 1.5x speedup. Running the same queries with DataFusion standalone (without Spark) using the same number of cores results in a 3.9x speedup compared to Spark. Comet is not yet achieving full DataFusion speeds in all cases, but with future work we aim to provide a 2x-4x speedup for many use cases. [tpch_allqu] Here is a breakdown showing relative performance of Spark, Comet, and DataFusion for each TPC-H query. [tpch_queri] The following chart shows how much Comet currently accelerates each query from the benchmark. Performance optimization is an ongoing task, and we welcome contributions from the community to help achieve even greater speedups in the future. [tpch_queri] These benchmarks can be reproduced in any environment using the documentation in the Comet Benchmarking Guide. We encourage you to run your own benchmarks. Use Commodity Hardware Comet leverages commodity hardware, eliminating the need for costly hardware upgrades or specialized hardware accelerators, such as GPUs or FGPA. By maximizing the utilization of commodity hardware, Comet ensures cost-effectiveness and scalability for your Spark deployments. Spark Compatibility Comet aims for 100% compatibility with all supported versions of Apache Spark, allowing you to integrate Comet into your existing Spark deployments and workflows seamlessly. With no code changes required, you can immediately harness the benefits of Comet's acceleration capabilities without disrupting your Spark applications. Tight Integration with Apache DataFusion Comet tightly integrates with the core Apache DataFusion project, leveraging its powerful execution engine. With seamless interoperability between Comet and DataFusion, you can achieve optimal performance and efficiency in your Spark workloads. Active Community Comet boasts a vibrant and active community of developers, contributors, and users dedicated to advancing the capabilities of Apache DataFusion and accelerating the performance of Apache Spark. Getting Started To get started with Apache DataFusion Comet, follow the installation instructions. Join the DataFusion Slack and Discord channels to connect with other users, ask questions, and share your experiences with Comet. Contributing We welcome contributions from the community to help improve and enhance Apache DataFusion Comet. Whether it's fixing bugs, adding new features, writing documentation, or optimizing performance, your contributions are invaluable in shaping the future of Comet. Check out our contributor guide to get started. License Apache DataFusion Comet is licensed under the Apache License 2.0. See the LICENSE.txt file for details. Acknowledgments We would like to express our gratitude to the Apache DataFusion community for their support and contributions to Comet. Together, we're building a faster, more efficient future for big data processing with Apache Spark. About Apache DataFusion Comet Spark Accelerator datafusion.apache.org/comet Topics rust spark arrow datafusion Resources Readme License Apache-2.0 license Code of conduct Code of conduct Security policy Security policy Activity Custom properties Stars 490 stars Watchers 53 watching Forks 105 forks Report repository Releases No releases published Packages 0 No packages published Contributors 38 * @viirya * @andygrove * @sunchao * @advancedxy * @huaxingao * @comphead * @kazuyukitanimura * @snmvaughan * @parthchandra * @tshauck * @vaibhawvipul * @planga82 * @edmondop * @leoluan2009 + 24 contributors Languages * Rust 46.0% * Scala 39.6% * Java 13.9% * Other 0.5% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.