deepset

:mag: Haystack is an open source NLP framework to interact with your data using Transformer models and LLMs (GPT-4, ChatGPT and alike). Haystack offers production-ready tools to quickly build complex decision making, question answering, semantic search, text generation applications, and more.

Stars
11.8K
Forks
1.49K
Open issues
401
Closed issues
2.36K
Last release
6 months ago
Last commit
5 months ago
Watchers
11.8K
Total releases
71
Total commits
2.91K
Open PRs
18
Closed PRs
2.7K
Repo URL
Platform
License
apache-2.0
Category
Offers premium version?
NO
Proprietary?
NO
About

CI/CD

Docs

Package

Meta

⚠️ You are currently looking at the readme of Haystack 2.0-Beta, an unstable version of what will eventually become Haystack 2.0. We are still maintaining Haystack 1.x which is the version of Haystack you should use in production. Switch to Haystack 1.x, currently on 1.22.1 here.

Haystack is an end-to-end LLM framework that enables you to build applications powered by LLMs, Transformer models, vector search and more. Whether you want to perform retrieval-augmented generation (RAG), documentation search, question answering or answer generation, you can use state-of-the-art embedding models and LLMs with Haystack to build end-to-end NLP applications to solve your use case.

Quickstart

Haystack is built around the concept of pipelines. A pipeline is a powerful structure that performs an NLP task. It's made up of components connected together. For example, you can connect a retriever and a generator to build a Generative Question Answering pipeline that uses your own data.

First, run the minimal Haystack installation:

pip install haystack-ai

👉 To build a minimal RAG pipeline that uses GPT-4 on your own data, use the RAG Pipeline Recipe

Core Concepts

⚛️ Components: Each Component achieves one thing. Such as preprocessing documents, retrieving documents, using specific language models to answer questions, and so on. Components can .connect() to each other to form a complete pipeline.

🏃‍♀️ Pipelines: This is the standard Haystack structure that builds on top of your data to perform various NLP tasks such as retrieval augmented generation, question answering and more. Pipelines in Haystack are Directed Multigraphs composed of components. Components can receive inputs from other components and produce outputs that can be forwarded to other components.

🗂️ DocumentStores: A DocumentStore is database where you store your text data for Haystack to access. Haystack DocumentStores are available with ElasticSearch, Opensearch, Weaviate, Pinecone, FAISS and more. For a full list of available DocumentStores, check out our documentation.

What to Build with Haystack

  • Build retrieval augmented generation (RAG) by making use of one of the available vector databases and customizing your LLM interaction, the sky is the limit 🚀
  • Perform Question Answering in natural language to find granular answers in your documents.
  • Perform semantic search and retrieve documents according to meaning.
  • Build applications that can make complex decisions making to answer complex queries: such as systems that can resolve complex customer queries, do knowledge search on many disconnected resources and so on.
  • Use off-the-shelf models or fine-tune them to your data.
  • Use user feedback to evaluate, benchmark, and continuously improve your models.

Features

  • Latest models: Haystack allows you to use and compare models available from OpenAI, Cohere and Hugging Face, as well as your own local models or models hosted on SageMaker. Use the latest LLMs or Transformer-based models (for example: BERT, RoBERTa, MiniLM).
  • Modular: Multiple choices to fit your tech stack and use case. A wide choice of DocumentStores to store your data, file conversion tools and more
  • Open: Integrated with Hugging Face's model hub, OpenAI, Cohere and various Azure services.
  • Scalable: Scale to millions of docs using retrievers and production-scale components like Elasticsearch and a fastAPI REST API.
  • End-to-End: All tooling in one place: file conversion, cleaning, splitting, training, eval, inference, labeling, and more.
  • Customizable: Fine-tune models to your domain or implement your custom Nodes.
  • Continuous Learning: Collect new training data from user feedback in production & improve your models continuously.

Resources

📒 Docs

Components, Pipeline Nodes, Guides, API Reference

🎓 Tutorials

See what Haystack can do with our Notebooks & Scripts

🎉 Integrations

The index of additional Haystack packages and components that can be installed separately.

🔰 Demos

A repository containing Haystack demo applications with Docker Compose and a REST API

🖖 Community

Discord, 𝕏 (Twitter), Stack Overflow, GitHub Discussions

💙 Contributing

We welcome all contributions!

🔭 Roadmap

Public roadmap of Haystack

📰 Blog

Learn about the latest with Haystack and NLP

☎️ Jobs

We're hiring! Have a look at our open positions

💾 Installation

For a detailed installation guide see the official documentation. There you’ll find instructions for custom installations handling Windows and Apple Silicon.

Basic Installation

Use pip to install a basic version of Haystack's latest release:

pip install haystack-ai

This command installs everything needed for basic Pipelines that use an in-memory DocumentStore and external LLM provider (e.g. OpenAI).

If you want to try out the newest features that are not in an official release yet, you can install the unstable version from the main branch with the following command:

pip install git+https://github.com/deepset-ai/haystack.git@main#egg=haystack-ai

To be able to make changes to Haystack code, first of all clone this repo:

git clone https://github.com/deepset-ai/haystack.git

Then move into the cloned folder and install the project with pip, including the development dependencies:

cd haystack && pip install -e '.[dev]'

If you want to contribute to the Haystack repo, check our Contributor Guidelines first.

🔰Demos

You can find some of our hosted demos with instructions to run them locally too on our haystack-demos repository

💫 Reduce Hallucinations with Retrieval Augmentation - Generative QA with LLMs

🐥 Should I follow? - Summarizing tweets with LLMs

🌎 Explore The World - Extractive Question Answering

🖖 Community

If you have a feature request or a bug report, feel free to open an issue in Github. We regularly check these and you can expect a quick response. If you'd like to discuss a topic, or get more general advice on how to make Haystack work for your project, you can start a thread in Github Discussions or our Discord channel. We also check 𝕏 (Twitter) and Stack Overflow.

💙 Contributing

We are very open to the community's contributions - be it a quick fix of a typo, or a completely new feature! You don't need to be a Haystack expert to provide meaningful improvements. To learn how to get started, check out our Contributor Guidelines first.

Who Uses Haystack

Here's a list of projects and companies using Haystack. Want to add yours? Open a PR, add it to the list and let the world know that you use Haystack!

Alternative Projects

Subscribe to Open Source Businees Newsletter

Twice a month we will interview people behind open source businesses. We will talk about how they are building a business on top of open source projects.

We'll never share your email with anyone else.