Skip to content

PDF to Text

PDF to Text (pdf-text)

Extract text content from PDF documents using auto or render-based strategies.

Transform binary json

Minimal example

actions:
- pdf-text: {}
JSON
{
"actions": [
{
"pdf-text": {}
}
]
}

Contents

Fields

FieldTypeRequiredDescription
description GeneralstringShort summary displayed in the editor.
condition Generallua-expression (string)Conditional expression that gates whether extraction runs.
Examples: 2 * count()
strategy ExtractionstringStrategy to apply: auto
max-pages Extractionnumber (integer)Maximum number of pages to analyze before stopping.
Examples: 42, 1.2e-10
page-ranges ExtractionstringPage range spec: e.g. “1-3,7”.
dpi Renderingnumber (integer)Rendering DPI when using render-based extraction.
Examples: 42, 1.2e-10
min-text-ratio Quality Filtersnumber (string)Minimum ratio of textual chars to consider page non-garbled.
Examples: 42, 1.2e-10
min-avg-word-len Quality Filtersnumber (string)Minimum average word length to consider page non-garbled.
Examples: 42, 1.2e-10
emit-document-events Advancedboolean (bool)Emit a document-level event alongside per-page output.

General

Show fields
FieldTypeRequiredDescription
descriptionstringShort summary displayed in the editor.
conditionlua-expression (string)Conditional expression that gates whether extraction runs.
Examples: 2 * count()

Extraction

Show fields
FieldTypeRequiredDescription
strategystringStrategy to apply: auto
max-pagesnumber (integer)Maximum number of pages to analyze before stopping.
Examples: 42, 1.2e-10
page-rangesstringPage range spec: e.g. “1-3,7”.

Rendering

Show fields
FieldTypeRequiredDescription
dpinumber (integer)Rendering DPI when using render-based extraction.
Examples: 42, 1.2e-10

Quality Filters

Show fields
FieldTypeRequiredDescription
min-text-rationumber (string)Minimum ratio of textual chars to consider page non-garbled.
Examples: 42, 1.2e-10
min-avg-word-lennumber (string)Minimum average word length to consider page non-garbled.
Examples: 42, 1.2e-10

Advanced

Show fields
FieldTypeRequiredDescription
emit-document-eventsboolean (bool)Emit a document-level event alongside per-page output.