PDF to Text
PDF to Text (pdf-text)
Extract text content from PDF documents using auto or render-based strategies.
Transform binary json
Minimal example
actions: - pdf-text: {}JSON
{ "actions": [ { "pdf-text": {} } ]}Contents
Fields
| Field | Type | Required | Description |
|---|---|---|---|
description General | string | Short summary displayed in the editor. | |
condition General | lua-expression (string) | Conditional expression that gates whether extraction runs. Examples: 2 * count() | |
strategy Extraction | string | Strategy to apply: auto | |
max-pages Extraction | number (integer) | Maximum number of pages to analyze before stopping. Examples: 42, 1.2e-10 | |
page-ranges Extraction | string | Page range spec: e.g. “1-3,7”. | |
dpi Rendering | number (integer) | Rendering DPI when using render-based extraction. Examples: 42, 1.2e-10 | |
min-text-ratio Quality Filters | number (string) | Minimum ratio of textual chars to consider page non-garbled. Examples: 42, 1.2e-10 | |
min-avg-word-len Quality Filters | number (string) | Minimum average word length to consider page non-garbled. Examples: 42, 1.2e-10 | |
emit-document-events Advanced | boolean (bool) | Emit a document-level event alongside per-page output. |
General
Show fields
| Field | Type | Required | Description |
|---|---|---|---|
description | string | Short summary displayed in the editor. | |
condition | lua-expression (string) | Conditional expression that gates whether extraction runs. Examples: 2 * count() |
Extraction
Show fields
| Field | Type | Required | Description |
|---|---|---|---|
strategy | string | Strategy to apply: auto | |
max-pages | number (integer) | Maximum number of pages to analyze before stopping. Examples: 42, 1.2e-10 | |
page-ranges | string | Page range spec: e.g. “1-3,7”. |
Rendering
Show fields
| Field | Type | Required | Description |
|---|---|---|---|
dpi | number (integer) | Rendering DPI when using render-based extraction. Examples: 42, 1.2e-10 |
Quality Filters
Show fields
| Field | Type | Required | Description |
|---|---|---|---|
min-text-ratio | number (string) | Minimum ratio of textual chars to consider page non-garbled. Examples: 42, 1.2e-10 | |
min-avg-word-len | number (string) | Minimum average word length to consider page non-garbled. Examples: 42, 1.2e-10 |
Advanced
Show fields
| Field | Type | Required | Description |
|---|---|---|---|
emit-document-events | boolean (bool) | Emit a document-level event alongside per-page output. |