You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This tutorial covers every feature of the Latence AI Python SDK: the **Data Intelligence Pipeline** for multi-stage document processing, **direct API access** to individual services, job management, credits, async usage, file handling, error handling, and configuration.
4
4
5
-
> **Prerequisites**: Python 3.10+, a Latence API key from [app.latence.ai](https://app.latence.ai)
5
+
> **Prerequisites**: Python 3.9+, a Latence API key from [app.latence.ai](https://app.latence.ai)
6
6
7
7
---
8
8
@@ -143,11 +143,12 @@ The `steps` dict lets you configure each stage. Keys are short aliases:
143
143
|`knowledge_graph` or `ontology`| Relation Extraction |
144
144
|`redaction`| PII Redaction |
145
145
|`compression`| Text Compression |
146
-
|`chunking`| Text Chunking |
147
146
|`embedding`| Dense Embeddings |
148
147
|`colbert`| ColBERT Embeddings |
149
148
|`colpali`| ColPali Embeddings |
150
149
150
+
> **Note**: Chunking is not available as a pipeline step. Use `client.experimental.chunking.chunk()` for standalone text chunking.
@@ -302,15 +293,36 @@ builder.strict() # disable auto-injection of services
302
293
config = builder.build()
303
294
```
304
295
296
+
> **Note**: Chunking is not available as a pipeline step -- `builder.chunking()` raises `NotImplementedError`. Use `client.experimental.chunking.chunk()` for standalone text chunking.
297
+
298
+
### Pipeline execution model
299
+
300
+
The pipeline worker executes services as a **directed acyclic graph (DAG)**, not a linear chain. Services that share the same parent can run concurrently:
301
+
302
+
```
303
+
┌─── extraction ──── ontology
304
+
│
305
+
document_intelligence ─┼─── redaction
306
+
│
307
+
├─── compression
308
+
│
309
+
├─── embedding
310
+
│
311
+
├─── colbert
312
+
│
313
+
└─── colpali
314
+
```
315
+
316
+
This means `extraction`, `redaction`, `compression`, and embedding services all run in parallel once `document_intelligence` completes. The SDK and worker automatically handle ordering -- you just declare which services you want.
317
+
305
318
---
306
319
307
320
## 5. Pipeline: PipelineConfig Object
308
321
309
322
For maximum control, construct a `PipelineConfig` directly:
310
323
311
324
```python
312
-
from latence import PipelineConfig
313
-
from latence._models.pipeline import ServiceConfig
325
+
from latence import PipelineConfig, ServiceConfig
314
326
315
327
config = PipelineConfig(
316
328
services=[
@@ -352,8 +364,6 @@ steps:
352
364
document_intelligence:
353
365
mode: performance
354
366
output_format: markdown
355
-
use_layout_detection: true
356
-
use_chart_recognition: false
357
367
358
368
extraction:
359
369
label_mode: hybrid
@@ -419,6 +429,62 @@ job.cancel()
419
429
pkg = job.data_package
420
430
```
421
431
432
+
### Resumable jobs
433
+
434
+
If a pipeline fails partway through, it may enter `RESUMABLE` status. Completed stages are checkpointed and only remaining stages re-execute on resume:
435
+
436
+
```python
437
+
try:
438
+
pkg = job.wait_for_completion()
439
+
except JobError as e:
440
+
if e.is_resumable:
441
+
print(f"Job failed at a stage but is resumable: {e.message}")
442
+
pkg = job.resume().wait_for_completion()
443
+
else:
444
+
raise
445
+
```
446
+
447
+
### Intermediate results and report
448
+
449
+
Access per-stage download URLs and the structured pipeline report while a job is running or after completion:
450
+
451
+
```python
452
+
# Per-stage download URLs (presigned B2 URLs to results.jsonl)
0 commit comments