https://github.com/deepset-ai/haystack Skip to content Sign up * Why GitHub? + Features + Mobile + Actions + Codespaces + Packages + Security + Code review + Issues + Integrations + GitHub Sponsors + Customer stories * Team * Enterprise * Explore + Explore GitHub + Learn and contribute + Topics + Collections + Trending + Learning Lab + Open source guides + Connect with others + The ReadME Project + Events + Community forum + GitHub Education + GitHub Stars program * Marketplace * Pricing + Plans + Compare plans + Contact Sales + Education [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} deepset-ai / haystack Public * Notifications * Star 3k * Fork 568 * Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications. deepset.ai/haystack Apache-2.0 License 3k stars 568 forks Star Notifications * Code * Issues 141 * Pull requests 9 * Discussions * Actions * Projects 0 * Wiki * Security * Insights More * Code * Issues * Pull requests * Discussions * Actions * Projects * Wiki * Security * Insights master Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show Loading {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default Loading View all tags 37 branches 13 tags Code Loading Latest commit @brandenchan @github-actions @bogdankostic 3 authors Update pydoc-markdown-file-classifier.yml (#1856) ... ea5aab2 Dec 8, 2021 Update pydoc-markdown-file-classifier.yml (#1856) * Update pydoc-markdown-file-classifier.yml * Add latest docstring and tutorial changes * Prevent wrapping DataParallel in second DataParallel (#1855) * Prevent wrapping DataParallel in second DataParallel * Add latest docstring and tutorial changes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Create v1.0 docs (#1862) * Update pydoc-markdown-file-classifier.yml * Add latest docstring and tutorial changes * Rebase and apply change to v1.0 Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bogdankostic ea5aab2 Git stats * 950 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .github Deploy demo (#1837) Dec 3, 2021 annotation_tool Add faq annotation (#1333) Aug 10, 2021 docs Update pydoc-markdown-file-classifier.yml (#1856) Dec 8, 2021 haystack Prevent wrapping DataParallel in second DataParallel (#1855) Dec 8, 2021 rest_api Demo UI fix debug info (#1846) Dec 6, 2021 test Fix bug ranker: wrong lambda function (#1824) Dec 6, 2021 tutorials Update evaluation tutorial to cover the new pipeline.eval() (#1765) Dec 3, 2021 ui Demo UI fix debug info (#1846) Dec 6, 2021 .gitignore Add /documents/get_by_filters endpoint (#1580) Oct 12, 2021 CONTRIBUTING.md Make weaviate more compliant to other doc stores (UUIDs and dummy emb... Nov 4, 2021 Dockerfile Add execute permissions (#1666) Oct 27, 2021 Dockerfile-GPU Add execute permissions (#1666) Oct 27, 2021 LICENSE Fix name Oct 12, 2021 MANIFEST.in Add MANIFEST Nov 27, 2019 README.md Add live demo link to readme (#1839) Dec 3, 2021 code_of_conduct.txt Add code of conduct Mar 18, 2021 docker-compose-gpu.yml Deploy demo (#1837) Dec 3, 2021 docker-compose.yml Deploy demo (#1837) Dec 3, 2021 mypy.ini Switch from dataclass to pydantic dataclass & Fix Swagger API Docs (# ... Oct 18, 2021 requirements-dev.txt Add sentence-transformers as mandatory dependency and remove from dev... ( Sep 2, 2021 requirements.txt Add AzureConverter to support table parsing from documents (#1813) Nov 29, 2021 run_docker_gpu.sh Update tutorials (torch versions, ES version, replace Finder with Pip... Feb 9, 2021 setup.py Refactoring of the haystack package (#1624) Oct 25, 2021 tox.ini Add coverage reports and more tests (#78) Apr 28, 2020 View code What to build with Haystack Core Features Installation Tutorials Quick Demo Community [?] Contributing README.md Haystack Build Documentation Release Last commit Downloads Jobs Twitter Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want to perform Question Answering or semantic document search, you can use the State-of-the-Art NLP models in Haystack to provide unique search experiences and allow your users to query in natural language. Haystack is built in a modular fashion so that you can combine the best technology from other open-source projects like Huggingface's Transformers, Elasticsearch, or Milvus. [main_examp] What to build with Haystack * Ask questions in natural language and find granular answers in your documents. * Perform semantic search and retrieve documents according to meaning, not keywords * Use off-the-shelf models or fine-tune them to your domain. * Use user feedback to evaluate, benchmark, and continuously improve your live models. * Leverage existing knowledge bases and better handle the long tail of queries that chatbots receive. * Automate processes by automatically applying a list of questions to new documents and using the extracted answers. Core Features * Latest models: Utilize all latest transformer-based models (e.g., BERT, RoBERTa, MiniLM) for extractive QA, generative QA, and document retrieval. * Modular: Multiple choices to fit your tech stack and use case. Pick your favorite database, file converter, or modeling framework. * Pipelines: The Node and Pipeline design of Haystack allows for custom routing of queries to only the relevant components. * Open: 100% compatible with HuggingFace's model hub. Tight interfaces to other frameworks (e.g., Transformers, FARM, sentence-transformers) * Scalable: Scale to millions of docs via retrievers, production-ready backends like Elasticsearch / FAISS, and a fastAPI REST API * End-to-End: All tooling in one place: file conversion, cleaning, splitting, training, eval, inference, labeling, etc. * Developer friendly: Easy to debug, extend and modify. * Customizable: Fine-tune models to your domain or implement your custom DocumentStore. * Continuous Learning: Collect new training data via user feedback in production & improve your models continuously Docs Overview, Components, Guides, API documentation How to install Haystack Installation Tutorials See what Haystack can do with our Notebooks & Scripts Quick Demo Deploy a Haystack application with Docker Compose and a REST API Community Slack, Twitter, Stack Overflow, GitHub Discussions [?] We welcome all contributions! Contributing Benchmarks Speed & Accuracy of Retriever, Readers and DocumentStores Roadmap Public roadmap of Haystack Blog Read our articles on Medium [?] Jobs We're hiring! Have a look at our open positions Installation If you're interested in learning more about Haystack and using it as part of your application, we offer several options. 1. Installing from a package You can install Haystack by using pip. pip3 install farm-haystack Please check our page on PyPi for more information. 2. Installing from GitHub You can also clone it from GitHub -- in case you'd like to work with the master branch and check the latest features: git clone https://github.com/deepset-ai/haystack.git cd haystack pip install --editable . To update your installation, do a git pull. The --editable flag will update changes immediately. 3. Installing on Windows On Windows, you might need: pip install farm-haystack -f https://download.pytorch.org/whl/torch_stable.html Tutorials image Follow our introductory tutorial to setup a question answering system using Python and start performing queries! Explore the rest of our tutorials to learn how to tweak pipelines, train models and perform evaluation. * Tutorial 1 - Basic QA Pipeline: Jupyter notebook | Colab | Python * Tutorial 2 - Fine-tuning a model on own data: Jupyter notebook | Colab | Python * Tutorial 3 - Basic QA Pipeline without Elasticsearch: Jupyter notebook | Colab | Python * Tutorial 4 - FAQ-style QA: Jupyter notebook | Colab | Python * Tutorial 5 - Evaluation of the whole QA-Pipeline: Jupyter noteboook | Colab | Python * Tutorial 6 - Better Retrievers via "Dense Passage Retrieval": Jupyter noteboook | Colab | Python * Tutorial 7 - Generative QA via "Retrieval-Augmented Generation": Jupyter noteboook | Colab | Python * Tutorial 8 - Preprocessing: Jupyter noteboook | Colab | Python * Tutorial 9 - DPR Training: Jupyter noteboook | Colab | Python * Tutorial 10 - Knowledge Graph: Jupyter noteboook | Colab | Python * Tutorial 11 - Pipelines: Jupyter noteboook | Colab | Python * Tutorial 12 - Long-Form Question Answering: Jupyter noteboook | Colab | Python * Tutorial 13 - Question Generation: Jupyter noteboook | Colab | Python * Tutorial 14 - Query Classifier: Jupyter noteboook | Colab | Python * Tutorial 15 - TableQA: Jupyter noteboook | Colab | Python Quick Demo Hosted Try out our hosted Explore The World live demo here! Ask any question on countries or capital cities and let Haystack return the answers to you. Local Start up a Haystack service via Docker Compose. With this you can begin calling it directly via the REST API or even interact with it using the included Streamlit UI. Click here for a step-by-step guide 1. Update/install Docker and Docker Compose, then launch Docker apt-get update && apt-get install docker && apt-get install docker-compose service docker start 2. Clone Haystack repository git clone https://github.com/deepset-ai/haystack.git 3. Pull images & launch demo app cd haystack docker-compose pull docker-compose up # Or on a GPU machine: docker-compose -f docker-compose-gpu.yml up You should be able to see the following in your terminal window as part of the log output: .. ui_1 | You can now view your Streamlit app in your browser. .. ui_1 | External URL: http://192.168.108.218:8501 .. haystack-api_1 | [2021-01-01 10:21:58 +0000] [17] [INFO] Application startup complete. 4. Open the Streamlit UI for Haystack by pointing your browser to the "External URL" from above. You should see the following: image You can then try different queries against a pre-defined set of indexed articles related to Game of Thrones. Note: The following containers are started as a part of this demo: * Haystack API: listens on port 8000 * DocumentStore (Elasticsearch): listens on port 9200 * Streamlit UI: listens on port 8501 Please note that the demo will publish the container ports to the outside world. We suggest that you review the firewall settings depending on your system setup and the security guidelines. Community There is a very vibrant and active community around Haystack which we are regularly interacting with! If you have a feature request or a bug report, feel free to open an issue in Github. We regularly check these and you can expect a quick response. If you'd like to discuss a topic, or get more general advice on how to make Haystack work for your project, you can start a thread in Github Discussions or our Slack channel. We also check Twitter and Stack Overflow. [?] Contributing We are very open to the community's contributions - be it a quick fix of a typo, or a completely new feature! You don't need to be a Haystack expert to provide meaningful improvements. To learn how to get started, check out our Contributor Guidelines first. You can also find instructions to run the tests locally there. Thanks so much to all those who have contributed to our project! [6874747073] About Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications. deepset.ai/haystack Topics python nlp search-engine elasticsearch machine-learning natural-language-processing information-retrieval ai transformers pytorch question-answering summarization transfer-learning language-model semantic-search squad bert dpr neural-search Resources Readme License Apache-2.0 License Code of conduct Code of conduct Releases 11 1.0.0 Latest Dec 8, 2021 + 10 releases Used by 94 * @JayThibs * @SGrannemann * @ysfali * @afry-south * @dhruv2600 * @byukan * @MedBot-team * @nlstrait + 86 Contributors 89 * * * * * * * * * * * + 78 contributors Languages * Python 62.5% * Jupyter Notebook 37.3% * Other 0.2% * (c) 2021 GitHub, Inc. * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.