Skip to content

Commit b6a19eb

Browse files
committed
update tutorials to use transformers integration
1 parent bfcbabc commit b6a19eb

8 files changed

Lines changed: 1124 additions & 2146 deletions

index.toml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ notebook = "27_First_RAG_Pipeline.ipynb"
1212
aliases = []
1313
completion_time = "10 min"
1414
created_at = 2023-12-05
15-
dependencies = ["datasets>=2.6.1", "sentence-transformers>=4.1.0", "mistral-haystack"]
15+
dependencies = ["datasets>=2.6.1", "sentence-transformers>=4.1.0", "mistral-haystack", "transformers-haystack"]
1616
featured = true
1717

1818
[[tutorial]]
@@ -35,7 +35,7 @@ notebook = "29_Serializing_Pipelines.ipynb"
3535
aliases = []
3636
completion_time = "10 min"
3737
created_at = 2024-01-29
38-
dependencies = ["transformers[torch]"]
38+
dependencies = ["transformers-haystack"]
3939

4040
[[tutorial]]
4141
title = "Preprocessing Different File Types"
@@ -98,7 +98,7 @@ notebook = "34_Extractive_QA_Pipeline.ipynb"
9898
aliases = []
9999
completion_time = "10 min"
100100
created_at = 2024-02-09
101-
dependencies = ["accelerate", "sentence-transformers", "datasets", "transformers<5"]
101+
dependencies = ["accelerate", "sentence-transformers", "datasets", "transformers<5", "transformers-haystack"]
102102

103103
[[tutorial]]
104104
title = "Evaluating RAG Pipelines"
@@ -154,7 +154,7 @@ notebook = "41_Query_Classification_with_TransformersTextRouter_and_Transformers
154154
aliases = []
155155
completion_time = "25 min"
156156
created_at = 2024-10-15
157-
dependencies = ["sentence-transformers>=4.1.0", "gradio", "torch", "sentencepiece", "datasets", "accelerate", "transformers<5"]
157+
dependencies = ["sentence-transformers>=4.1.0", "gradio", "torch", "sentencepiece", "datasets", "accelerate", "transformers<5", "transformers-haystack"]
158158

159159
[[tutorial]]
160160
title = "Retrieving a Context Window Around a Sentence"
@@ -258,6 +258,6 @@ notebook = "49_TurboQuant_Quantization_with_HuggingFace.ipynb"
258258
aliases = []
259259
completion_time = "20 min"
260260
created_at = 2026-03-30
261-
dependencies = ["haystack-ai", "turboquant-vllm", "transformers"]
261+
dependencies = ["haystack-ai", "turboquant-vllm", "transformers-haystack"]
262262
featured = false
263263
python_version = "3.12"

tutorials/27_First_RAG_Pipeline.ipynb

Lines changed: 1068 additions & 1105 deletions
Large diffs are not rendered by default.

tutorials/29_Serializing_Pipelines.ipynb

Lines changed: 10 additions & 149 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,7 @@
55
"metadata": {
66
"id": "cFFW8D-weE2S"
77
},
8-
"source": [
9-
"# Tutorial: Serializing LLM Pipelines\n",
10-
"\n",
11-
"- **Level**: Beginner\n",
12-
"- **Time to complete**: 10 minutes\n",
13-
"- **Components Used**: [`HuggingFaceLocalChatGenerator`](https://docs.haystack.deepset.ai/docs/huggingfacelocalchatgenerator), [`ChatPromptBuilder`](https://docs.haystack.deepset.ai/docs/chatpromptbuilder)\n",
14-
"- **Prerequisites**: None\n",
15-
"- **Goal**: After completing this tutorial, you'll understand how to serialize and deserialize between YAML and Python code."
16-
]
8+
"source": "# Tutorial: Serializing LLM Pipelines\n\n- **Level**: Beginner\n- **Time to complete**: 10 minutes\n- **Components Used**: [`TransformersChatGenerator`](https://docs.haystack.deepset.ai/docs/transformerschatgenerator), [`ChatPromptBuilder`](https://docs.haystack.deepset.ai/docs/chatpromptbuilder)\n- **Prerequisites**: None\n- **Goal**: After completing this tutorial, you'll understand how to serialize and deserialize between YAML and Python code."
179
},
1810
{
1911
"cell_type": "markdown",
@@ -35,11 +27,7 @@
3527
"metadata": {
3628
"id": "TLaHxdJcfWtI"
3729
},
38-
"source": [
39-
"## Installing Haystack\n",
40-
"\n",
41-
"Install Haystack with `pip`:"
42-
]
30+
"source": "## Installing Haystack\n\nInstall Haystack and the [`transformers-haystack`](https://haystack.deepset.ai/integrations/huggingface) integration (which provides `TransformersChatGenerator`) with `pip`:"
4331
},
4432
{
4533
"cell_type": "code",
@@ -52,11 +40,7 @@
5240
"outputId": "e304450a-24e3-4ef8-e642-1fbb573e7d55"
5341
},
5442
"outputs": [],
55-
"source": [
56-
"%%bash\n",
57-
"\n",
58-
"pip install haystack-ai"
59-
]
43+
"source": "%%bash\n\npip install haystack-ai transformers-haystack"
6044
},
6145
{
6246
"cell_type": "markdown",
@@ -71,51 +55,12 @@
7155
},
7256
{
7357
"cell_type": "code",
74-
"execution_count": 3,
58+
"execution_count": null,
7559
"metadata": {
7660
"id": "odZJjD7KgO1g"
7761
},
78-
"outputs": [
79-
{
80-
"data": {
81-
"text/plain": [
82-
"<haystack.core.pipeline.pipeline.Pipeline object at 0x13cc77370>\n",
83-
"🚅 Components\n",
84-
" - builder: ChatPromptBuilder\n",
85-
" - llm: HuggingFaceLocalChatGenerator\n",
86-
"🛤️ Connections\n",
87-
" - builder.prompt -> llm.messages (List[ChatMessage])"
88-
]
89-
},
90-
"execution_count": 3,
91-
"metadata": {},
92-
"output_type": "execute_result"
93-
}
94-
],
95-
"source": [
96-
"from haystack import Pipeline\n",
97-
"from haystack.components.builders import ChatPromptBuilder\n",
98-
"from haystack.dataclasses import ChatMessage\n",
99-
"from haystack.components.generators.chat import HuggingFaceLocalChatGenerator\n",
100-
"\n",
101-
"template = [\n",
102-
" ChatMessage.from_user(\n",
103-
" \"\"\"\n",
104-
"Please create a summary about the following topic:\n",
105-
"{{ topic }}\n",
106-
"\"\"\"\n",
107-
" )\n",
108-
"]\n",
109-
"\n",
110-
"builder = ChatPromptBuilder(template=template)\n",
111-
"llm = HuggingFaceLocalChatGenerator(model=\"Qwen/Qwen2.5-1.5B-Instruct\", generation_kwargs={\"max_new_tokens\": 150})\n",
112-
"\n",
113-
"pipeline = Pipeline()\n",
114-
"pipeline.add_component(name=\"builder\", instance=builder)\n",
115-
"pipeline.add_component(name=\"llm\", instance=llm)\n",
116-
"\n",
117-
"pipeline.connect(\"builder.prompt\", \"llm.messages\")"
118-
]
62+
"outputs": [],
63+
"source": "from haystack import Pipeline\nfrom haystack.components.builders import ChatPromptBuilder\nfrom haystack.dataclasses import ChatMessage\nfrom haystack_integrations.components.generators.transformers import TransformersChatGenerator\n\ntemplate = [\n ChatMessage.from_user(\n \"\"\"\nPlease create a summary about the following topic:\n{{ topic }}\n\"\"\"\n )\n]\n\nbuilder = ChatPromptBuilder(template=template)\nllm = TransformersChatGenerator(model=\"Qwen/Qwen2.5-1.5B-Instruct\", generation_kwargs={\"max_new_tokens\": 150})\n\npipeline = Pipeline()\npipeline.add_component(name=\"builder\", instance=builder)\npipeline.add_component(name=\"llm\", instance=llm)\n\npipeline.connect(\"builder.prompt\", \"llm.messages\")"
11964
},
12065
{
12166
"cell_type": "code",
@@ -225,54 +170,7 @@
225170
"metadata": {
226171
"id": "0C7zGsUCGszq"
227172
},
228-
"source": [
229-
"You should get a pipeline YAML that looks like the following:\n",
230-
"\n",
231-
"```yaml\n",
232-
"components:\n",
233-
" builder:\n",
234-
" init_parameters:\n",
235-
" required_variables: null\n",
236-
" template:\n",
237-
" - _content:\n",
238-
" - text: '\n",
239-
"\n",
240-
" Please create a summary about the following topic:\n",
241-
"\n",
242-
" {{ topic }}\n",
243-
"\n",
244-
" '\n",
245-
" _meta: {}\n",
246-
" _name: null\n",
247-
" _role: user\n",
248-
" variables: null\n",
249-
" type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder\n",
250-
" llm:\n",
251-
" init_parameters:\n",
252-
" init_parameters:\n",
253-
" generation_kwargs:\n",
254-
" max_new_tokens: 150\n",
255-
" stop_sequences: []\n",
256-
" huggingface_pipeline_kwargs:\n",
257-
" device: cpu\n",
258-
" model: Qwen/Qwen2.5-1.5B-Instruct\n",
259-
" task: text-generation\n",
260-
" streaming_callback: null\n",
261-
" token:\n",
262-
" env_vars:\n",
263-
" - HF_API_TOKEN\n",
264-
" - HF_TOKEN\n",
265-
" strict: false\n",
266-
" type: env_var\n",
267-
" type: haystack.components.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator\n",
268-
"connections:\n",
269-
"- receiver: llm.messages\n",
270-
" sender: builder.prompt\n",
271-
"max_runs_per_component: 100\n",
272-
"metadata: {}\n",
273-
"\n",
274-
"```"
275-
]
173+
"source": "You should get a pipeline YAML that looks like the following:\n\n```yaml\ncomponents:\n builder:\n init_parameters:\n required_variables: null\n template:\n - _content:\n - text: '\n\n Please create a summary about the following topic:\n\n {{ topic }}\n\n '\n _meta: {}\n _name: null\n _role: user\n variables: null\n type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder\n llm:\n init_parameters:\n chat_template: null\n enable_thinking: false\n generation_kwargs:\n max_new_tokens: 150\n huggingface_pipeline_kwargs:\n device: cpu\n model: Qwen/Qwen2.5-1.5B-Instruct\n task: text-generation\n streaming_callback: null\n token:\n env_vars:\n - HF_API_TOKEN\n - HF_TOKEN\n strict: false\n type: env_var\n tool_parsing_function: haystack_integrations.components.generators.transformers.chat.chat_generator.default_tool_parser\n tools: null\n type: haystack_integrations.components.generators.transformers.chat.chat_generator.TransformersChatGenerator\nconnections:\n- receiver: llm.messages\n sender: builder.prompt\nmax_runs_per_component: 100\nmetadata: {}\n\n```"
276174
},
277175
{
278176
"cell_type": "markdown",
@@ -287,49 +185,12 @@
287185
},
288186
{
289187
"cell_type": "code",
290-
"execution_count": 5,
188+
"execution_count": null,
291189
"metadata": {
292190
"id": "U332-VjovFfn"
293191
},
294192
"outputs": [],
295-
"source": [
296-
"yaml_pipeline = \"\"\"\n",
297-
"components:\n",
298-
" builder:\n",
299-
" init_parameters:\n",
300-
" template:\n",
301-
" - _content:\n",
302-
" - text: 'Please translate the following to French: \\n{{ sentence }}\\n'\n",
303-
" _meta: {}\n",
304-
" _name: null\n",
305-
" _role: user\n",
306-
" variables: null\n",
307-
" type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder\n",
308-
" llm:\n",
309-
" init_parameters:\n",
310-
" generation_kwargs:\n",
311-
" max_new_tokens: 150\n",
312-
" stop_sequences: []\n",
313-
" huggingface_pipeline_kwargs:\n",
314-
" device: cpu\n",
315-
" model: Qwen/Qwen2.5-1.5B-Instruct\n",
316-
" task: text-generation\n",
317-
" streaming_callback: null\n",
318-
" chat_template : \"{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ ' ' }}{% endif %}{% endfor %}{{ eos_token }}\"\n",
319-
" token:\n",
320-
" env_vars:\n",
321-
" - HF_API_TOKEN\n",
322-
" - HF_TOKEN\n",
323-
" strict: false\n",
324-
" type: env_var\n",
325-
" type: haystack.components.generators.chat.hugging_face_local.HuggingFaceLocalChatGenerator\n",
326-
"connections:\n",
327-
"- receiver: llm.messages\n",
328-
" sender: builder.prompt\n",
329-
"max_runs_per_component: 100\n",
330-
"metadata: {}\n",
331-
"\"\"\""
332-
]
193+
"source": "yaml_pipeline = \"\"\"\ncomponents:\n builder:\n init_parameters:\n template:\n - _content:\n - text: 'Please translate the following to French: \\n{{ sentence }}\\n'\n _meta: {}\n _name: null\n _role: user\n variables: null\n type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder\n llm:\n init_parameters:\n generation_kwargs:\n max_new_tokens: 150\n stop_sequences: []\n huggingface_pipeline_kwargs:\n device: cpu\n model: Qwen/Qwen2.5-1.5B-Instruct\n task: text-generation\n streaming_callback: null\n chat_template : \"{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ ' ' }}{% endif %}{% endfor %}{{ eos_token }}\"\n token:\n env_vars:\n - HF_API_TOKEN\n - HF_TOKEN\n strict: false\n type: env_var\n type: haystack_integrations.components.generators.transformers.chat.chat_generator.TransformersChatGenerator\nconnections:\n- receiver: llm.messages\n sender: builder.prompt\nmax_runs_per_component: 100\nmetadata: {}\n\"\"\""
333194
},
334195
{
335196
"cell_type": "markdown",
@@ -428,4 +289,4 @@
428289
},
429290
"nbformat": 4,
430291
"nbformat_minor": 0
431-
}
292+
}

tutorials/33_Hybrid_Retrieval.ipynb

Lines changed: 5 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,7 @@
55
"metadata": {
66
"id": "kTas9ZQ7lXP7"
77
},
8-
"source": [
9-
"# Tutorial: Creating a Hybrid Retrieval Pipeline\n",
10-
"\n",
11-
"- **Level**: Intermediate\n",
12-
"- **Time to complete**: 15 minutes\n",
13-
"- **Components Used**: [`DocumentSplitter`](https://docs.haystack.deepset.ai/docs/documentsplitter), [`SentenceTransformersDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder), [`InMemoryDocumentStore`](https://docs.haystack.deepset.ai/docs/inmemorydocumentstore), [`InMemoryBM25Retriever`](https://docs.haystack.deepset.ai/docs/inmemorybm25retriever), [`InMemoryEmbeddingRetriever`](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever), and [`TransformersSimilarityRanker`](https://docs.haystack.deepset.ai/docs/transformerssimilarityranker)\n",
14-
"- **Prerequisites**: None\n",
15-
"- **Goal**: After completing this tutorial, you will have learned about creating a hybrid retrieval and when it's useful."
16-
]
8+
"source": "# Tutorial: Creating a Hybrid Retrieval Pipeline\n\n- **Level**: Intermediate\n- **Time to complete**: 15 minutes\n- **Components Used**: [`DocumentSplitter`](https://docs.haystack.deepset.ai/docs/documentsplitter), [`SentenceTransformersDocumentEmbedder`](https://docs.haystack.deepset.ai/docs/sentencetransformersdocumentembedder), [`InMemoryDocumentStore`](https://docs.haystack.deepset.ai/docs/inmemorydocumentstore), [`InMemoryBM25Retriever`](https://docs.haystack.deepset.ai/docs/inmemorybm25retriever), [`InMemoryEmbeddingRetriever`](https://docs.haystack.deepset.ai/docs/inmemoryembeddingretriever), and [`SentenceTransformersSimilarityRanker`](https://docs.haystack.deepset.ai/docs/sentencetransformerssimilarityranker)\n- **Prerequisites**: None\n- **Goal**: After completing this tutorial, you will have learned about creating a hybrid retrieval and when it's useful."
179
},
1810
{
1911
"cell_type": "markdown",
@@ -230,24 +222,16 @@
230222
"metadata": {
231223
"id": "r8_jHzmosbC_"
232224
},
233-
"source": [
234-
"### 2) Rank the Results\n",
235-
"\n",
236-
"Use the [TransformersSimilarityRanker](https://docs.haystack.deepset.ai/docs/transformerssimilarityranker) that scores the relevancy of all retrieved documents for the given search query by using a cross encoder model. In this example, you will use [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) model to rank the retrieved documents but you can replace this model with other cross-encoder models on Hugging Face."
237-
]
225+
"source": "### 2) Rank the Results\n\nUse the [SentenceTransformersSimilarityRanker](https://docs.haystack.deepset.ai/docs/sentencetransformerssimilarityranker) that scores the relevancy of all retrieved documents for the given search query by using a cross encoder model. In this example, you will use [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) model to rank the retrieved documents but you can replace this model with other cross-encoder models on Hugging Face."
238226
},
239227
{
240228
"cell_type": "code",
241-
"execution_count": 9,
229+
"execution_count": null,
242230
"metadata": {
243231
"id": "cN0woIxHs4Ng"
244232
},
245233
"outputs": [],
246-
"source": [
247-
"from haystack.components.rankers import TransformersSimilarityRanker\n",
248-
"\n",
249-
"ranker = TransformersSimilarityRanker(model=\"BAAI/bge-reranker-base\")"
250-
]
234+
"source": "from haystack.components.rankers import SentenceTransformersSimilarityRanker\n\nranker = SentenceTransformersSimilarityRanker(model=\"BAAI/bge-reranker-base\")"
251235
},
252236
{
253237
"cell_type": "markdown",
@@ -533,4 +517,4 @@
533517
},
534518
"nbformat": 4,
535519
"nbformat_minor": 0
536-
}
520+
}

0 commit comments

Comments
 (0)