Skip to content

S3

S3 (s3)

Stream data from an S3 object.

Object Store Cloud binary json raw

Minimal example

input:
s3:
bucket-name: ~
object-names: []
JSON
{
"input": {
"s3": {
"bucket-name": null,
"object-names": []
}
}
}

Contents

Fields

FieldTypeRequiredDescription
bucket-name LocationstringBucket name.
object-names Locationstring[]Object names or prefixes to target. When using list modes these are treated as search prefixes. An empty list could result in all objects being downloaded — take care with large buckets.
Prompt (if empty): An empty list could result in all objects being downloaded.
trigger TriggerTriggerRun on the provided cadence; if omitted, this input runs exactly once.
region LocationstringS3 Region.
endpoint LocationstringS3 Endpoint.
mode BehaviorModeInput behavior: use list-and-download, list, or download depending on whether you want to enumerate objects, fetch contents, or download specific names directly.
Allowed values: list-and-download-objects, list-objects, download-objects
ignore-linebreaks Behaviorboolean (bool)Treat the entire object as a single event. When false, objects are split on newlines unless json=true instructs the runtime to parse JSON arrays.
payload-mode BehaviorPayload ModeSelect how payloads should be interpreted (auto, json, raw, binary).
Allowed values: auto, json, raw, binary
preprocessors ProcessingPreprocessors[]Preprocessors (process downloaded data before making it available to the job) these processors will be run in the order they are specified.
Allowed values: extension, gzip, parquet, base64
access-key AuthenticationstringAccess Key ID.
secret-key AuthenticationstringSecret Key ID.
security-token AuthenticationstringSecurity Token.
session-token AuthenticationstringSession Token.
role-arn AuthenticationstringA Role ARN for assuming role using above credentials.
object-name-field Object Propertiesfield (string)The field that the object name from an operation should be stored in.
Examples: data_field
creation-time-field Object Propertiesfield (string)The field that the object creation time should be stored in.
Examples: data_field
last-modified-field Object Propertiesfield (string)The field that the object last modified time should be stored in.
Examples: data_field
content-length-field Object Propertiesfield (string)The field that the object content length information should be stored in.
Examples: data_field
content-type-field Object Propertiesfield (string)The field that the object content type information should be stored in.
Examples: data_field
etag-field Object Propertiesfield (string)The field that the object ETag should be stored in.
Examples: data_field
data-field Object Propertiesfield (string)A field that the object data should be nested in.
Examples: data_field
timestamp-mode BehaviorTimestamp ModeDerive a timestamp for this object for filtering purposes based on the selected strategy (e.g., object creation time vs. last modification).
maximum-age FilteringMaximum AgeIgnore objects older than the provided duration (e.g., 5m, 1h30m). Leave empty to process all visible objects.
fingerprinting Reliabilityboolean (bool)Enable object fingerprinting to download each object only once, even across restarts.
maximum-fingerprint-age Reliabilityduration (string)How long to retain stored fingerprints before they are eligible for cleanup.
include-regex Filteringregex[] (string)Include objects matching the specified regular expressions.
Examples: \d+[A-Z]*
exclude-regex Filteringregex[] (string)Exclude objects matching the specified regular expressions.
Examples: \d+[A-Z]*
retry ReliabilityRetryHow to retry failed operations (backoff, max attempts, etc.).

Trigger

Show fields
FieldTypeRequiredDescription
triggerTriggerRun on the provided cadence; if omitted, this input runs exactly once.

Location

Show fields
FieldTypeRequiredDescription
bucket-namestringBucket name.
object-namesstring[]Object names or prefixes to target. When using list modes these are treated as search prefixes. An empty list could result in all objects being downloaded — take care with large buckets.
Prompt (if empty): An empty list could result in all objects being downloaded.
regionstringS3 Region.
endpointstringS3 Endpoint.

Behavior

Show fields
FieldTypeRequiredDescription
modeModeInput behavior: use list-and-download, list, or download depending on whether you want to enumerate objects, fetch contents, or download specific names directly.
Allowed values: list-and-download-objects, list-objects, download-objects
ignore-linebreaksboolean (bool)Treat the entire object as a single event. When false, objects are split on newlines unless json=true instructs the runtime to parse JSON arrays.
payload-modePayload ModeSelect how payloads should be interpreted (auto, json, raw, binary).
Allowed values: auto, json, raw, binary
timestamp-modeTimestamp ModeDerive a timestamp for this object for filtering purposes based on the selected strategy (e.g., object creation time vs. last modification).

Processing

Show fields
FieldTypeRequiredDescription
preprocessorsPreprocessors[]Preprocessors (process downloaded data before making it available to the job) these processors will be run in the order they are specified.
Allowed values: extension, gzip, parquet, base64

Authentication

Show fields
FieldTypeRequiredDescription
access-keystringAccess Key ID.
secret-keystringSecret Key ID.
security-tokenstringSecurity Token.
session-tokenstringSession Token.
role-arnstringA Role ARN for assuming role using above credentials.

Object Properties

Show fields
FieldTypeRequiredDescription
object-name-fieldfield (string)The field that the object name from an operation should be stored in.
Examples: data_field
creation-time-fieldfield (string)The field that the object creation time should be stored in.
Examples: data_field
last-modified-fieldfield (string)The field that the object last modified time should be stored in.
Examples: data_field
content-length-fieldfield (string)The field that the object content length information should be stored in.
Examples: data_field
content-type-fieldfield (string)The field that the object content type information should be stored in.
Examples: data_field
etag-fieldfield (string)The field that the object ETag should be stored in.
Examples: data_field
data-fieldfield (string)A field that the object data should be nested in.
Examples: data_field

Filtering

Show fields
FieldTypeRequiredDescription
maximum-ageMaximum AgeIgnore objects older than the provided duration (e.g., 5m, 1h30m). Leave empty to process all visible objects.
include-regexregex[] (string)Include objects matching the specified regular expressions.
Examples: \d+[A-Z]*
exclude-regexregex[] (string)Exclude objects matching the specified regular expressions.
Examples: \d+[A-Z]*

Reliability

Show fields
FieldTypeRequiredDescription
fingerprintingboolean (bool)Enable object fingerprinting to download each object only once, even across restarts.
maximum-fingerprint-ageduration (string)How long to retain stored fingerprints before they are eligible for cleanup.
retryRetryHow to retry failed operations (backoff, max attempts, etc.).

Schema

Trigger - Cron - Window - Start Options

OptionNameTypeDescription
start-timeStart Timeobject
trackedTrackedstringExamples: /path/to/file, c:\users\joe\data\file.txt

Trigger - Interval - Window - Start Options

OptionNameTypeDescription
start-timeStart Timeobject
trackedTrackedstringExamples: /path/to/file, c:\users\joe\data\file.txt

Trigger Options

OptionNameTypeDescription
messageMessageobject
cronCronobject
intervalIntervalobject

Timestamp Mode Options

OptionNameTypeDescription
noneNonemapThe default mode, do not filter based on timestamps.
last-modifiedLast ModifiedmapFilter object on the last-modified timestamp reported by the service.
blob-name-patternBlob Name PatternstringFilter blobs on the timestamp derived from the object name for example: relevant-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/.

Trigger - Message Fields

FieldTypeRequiredDescription
limitnumber (integer)The number of times to run the input.
Examples: 42, 1.2e-10
filter-kindFilter KindSpecifies whether the message originated from the “system” or by the “user”.
Allowed values: system, user, runtime-artifact-fetch, runtime-artifact-fetch-error, runtime-artifact-clear, runtime-artifact-clear-ack, runtime-artifact-fetch-reply, runtime-artifact-update, …
filter-sourceFilter Source[]Specifies what process generated the message. Was it a “server”, “worker” or “job”?
Allowed values: job, worker, server
filter-workerstringSpecifies what worker to select.
filter-jobstringSpecifies the name of the job that the message came from.
filter-typeFilter Type[]Specifies that particular types of message ought to match.
Allowed values: worker-licensed, worker-unlicensed, variable, variable-deleted, begin-shutting-down-job, begin-shutting-down-server, begin-shutting-down-worker, broadcast-job-thread-state, …
filter-tagstringSpecifies that messages matched ought to carry a tag with a particular value. This only matches against user-generated messages.

Trigger - Cron - Window - Start - Start Time Fields

FieldTypeRequiredDescription
start-timetime-format (string)Allows the windowing to start at a specified time.
Hint: %Y-%m-%d %H:%M:%S%.3f %z
highwatermark-filepath (string)Specify file where timestamp would be stored in order to resume, for when Job has been restarted.
Examples: /path/to/file, c:\users\joe\data\file.txt

Trigger - Cron - Window Fields

FieldTypeRequiredDescription
sizeduration (string)Window size.
offsetduration (string)Window offset.
startStartSpecify file where timestamp would be stored in order to resume, for when Job has been restarted.

Trigger - Cron Fields

FieldTypeRequiredDescription
croncron-expression (string)The Cron pattern.
immediateboolean (bool)Run as soon as invoked, instead of waiting for the specified cron interval.
random-offsetduration (string)Sets a random offset to the schedule, then sticks to it.
windowWindowOptional window definition when the schedule should only read a bounded range.

Trigger - Interval - Window - Start - Start Time Fields

FieldTypeRequiredDescription
start-timetime-format (string)Allows the windowing to start at a specified time.
Hint: %Y-%m-%d %H:%M:%S%.3f %z
highwatermark-filepath (string)Specify file where timestamp would be stored in order to resume, for when Job has been restarted.
Examples: /path/to/file, c:\users\joe\data\file.txt

Trigger - Interval - Window Fields

FieldTypeRequiredDescription
sizeduration (string)Window size.
offsetduration (string)Window offset.
startStartSpecify file where timestamp would be stored in order to resume, for when Job has been restarted.

Trigger - Interval Fields

FieldTypeRequiredDescription
durationduration (string)Duration to wait between events.
random-offsetduration (string)Sets a random offset to the schedule, then sticks to it.
windowWindowOptional window definition when the interval should only cover a bounded range.

Maximum Age Fields

FieldTypeRequiredDescription
valueduration (string)

Retry Fields

FieldTypeRequiredDescription
timeouttime-interval (string)timeout (e.g. 500ms, 2s etc. - default is 30).
Examples: 500ms, 2h
retriesnumber (integer)number of retries.
Examples: 42, 1.2e-10

Trigger - Message - Filter Kind Options

ValueDescription
systemSystem
userUser
runtime-artifact-fetchRuntime Artifact Fetch
runtime-artifact-fetch-errorRuntime Artifact Fetch Error
runtime-artifact-clearRuntime Artifact Clear
runtime-artifact-clear-ackRuntime Artifact Clear Ack
runtime-artifact-fetch-replyRuntime Artifact Fetch Reply
runtime-artifact-updateRuntime Artifact Update
runtime-artifact-update-ackRuntime Artifact Update Ack

Trigger - Message - Filter Source Options

ValueDescription
jobJob
workerWorker
serverServer

Trigger - Message - Filter Type Options

ValueDescription
worker-licensedWorker Licensed
worker-unlicensedWorker Unlicensed
variableVariable
variable-deletedVariable Deleted
begin-shutting-down-jobBegin Shutting Down Job
begin-shutting-down-serverBegin Shutting Down Server
begin-shutting-down-workerBegin Shutting Down Worker
broadcast-job-thread-stateBroadcast Job Thread State
broadcast-server-thread-stateBroadcast Server Thread State
broadcast-worker-thread-stateBroadcast Worker Thread State
check-job-report-timeCheck Job Report Time
check-worker-report-timeCheck Worker Report Time
de-register-job-thread-dependencyDe Register Job Thread Dependency
de-register-server-thread-dependencyDe Register Server Thread Dependency
de-register-worker-thread-dependencyDe Register Worker Thread Dependency
deployed-job-activeDeployed Job Active
deployed-job-removedDeployed Job Removed
deployed-job-should-be-runningDeployed Job Should Be Running
deployment-phaseDeployment Phase
heart-beatHeart Beat
initialise-internal-stateInitialise Internal State
initialise-job-statesInitialise Job States
job-batch-endJob Batch End
job-backlog-updateJob Backlog Update
job-checkpoint-updateJob Checkpoint Update
job-deploy-readyJob Deploy Ready
job-deployedJob Deployed
job-document-endJob Document End
job-document-startJob Document Start
job-errorsJob Errors
job-execution-anomalyJob Execution Anomaly
job-execution-statusJob Execution Status
job-emit-customJob Emit Custom
job-finishedJob Finished
job-idleJob Idle
job-initiatedJob Initiated
job-is-processingJob Is Processing
job-logsJob Logs
job-metricsJob Metrics
job-notificationsJob Notifications
job-removingJob Removing
job-removedJob Removed
job-remove-failedJob Remove Failed
job-remove-readyJob Remove Ready
job-replacedJob Replaced
job-requiredJob Required
job-run-endedJob Run Ended
job-runtime-errorJob Runtime Error
job-runtime-settingsJob Runtime Settings
job-run-startedJob Run Started
job-running-dockerJob Running Docker
job-running-scriptJob Running Script
job-running-subprocessJob Running Subprocess
job-running-system-dJob Running System D
job-settingsJob Settings
job-step-statisticsJob Step Statistics
job-startedJob Started
job-stagedJob Staged
job-state-transitionJob State Transition
job-shutting-downJob Shutting Down
job-stoppingJob Stopping
job-stoppedJob Stopped
job-timed-outJob Timed Out
job-suspicious-silenceJob Suspicious Silence
job-thread-stateJob Thread State
job-traceJob Trace
job-trace-requires-samplesJob Trace Requires Samples
job-updatedJob Updated
job-worker-comms-errorJob Worker Comms Error
job-unstagedJob Unstaged
license-state-changedLicense State Changed
license-validation-failedLicense Validation Failed
license-validation-okLicense Validation Ok
license-volume-violationLicense Volume Violation
new-licenseNew License
override-job-coordinated-shutdownOverride Job Coordinated Shutdown
override-server-coordinated-shutdownOverride Server Coordinated Shutdown
override-worker-coordinated-shutdownOverride Worker Coordinated Shutdown
register-job-thread-dependencyRegister Job Thread Dependency
register-server-thread-dependencyRegister Server Thread Dependency
register-worker-thread-dependencyRegister Worker Thread Dependency
run-job-failureRun Job Failure
server-logsServer Logs
server-metrics-batchServer Metrics Batch
server-startedServer Started
server-startingServer Starting
server-stoppingServer Stopping
server-thread-stateServer Thread State
server-worker-comms-errorServer Worker Comms Error
shutdown-jobsShutdown Jobs
shutdown-workerShutdown Worker
system-shutdownSystem Shutdown
update-upstream-sync-for-jobUpdate Upstream Sync For Job
update-upstream-sync-for-workerUpdate Upstream Sync For Worker
update-variableUpdate Variable
user-alertUser Alert
user-generatedUser Generated
user-notificationUser Notification
worker-command-for-jobWorker Command For Job
worker-auth-lease-ackWorker Auth Lease Ack
worker-connectedWorker Connected
worker-createdWorker Created
worker-debug-heart-beatWorker Debug Heart Beat
worker-errorWorker Error
worker-first-seenWorker First Seen
worker-heart-beatWorker Heart Beat
worker-logsWorker Logs
worker-metrics-batchWorker Metrics Batch
worker-offlineWorker Offline
worker-requests-auth-leaseWorker Requests Auth Lease
worker-server-comms-errorWorker Server Comms Error
worker-settingsWorker Settings
worker-shutdownWorker Shutdown
worker-shutting-downWorker Shutting Down
worker-startedWorker Started
worker-state-uuidWorker State Uuid
worker-stoppingWorker Stopping
worker-suspicious-silenceWorker Suspicious Silence
worker-system-informationWorker System Information
worker-thread-stateWorker Thread State
worker-updatedWorker Updated
worker-modifiedWorker Modified
worker-removedWorker Removed
context-changedContext Changed
rerender-deploymentRerender Deployment
job-killedJob Killed
message-servicedMessage Serviced
failed-to-service-messageFailed To Service Message
worker-wants-initial-settingsWorker Wants Initial Settings
worker-wants-initial-settings-replyWorker Wants Initial Settings Reply
worker-wants-deployed-jobsWorker Wants Deployed Jobs
worker-wants-deployed-jobs-replyWorker Wants Deployed Jobs Reply
worker-wants-job-configurationWorker Wants Job Configuration
worker-wants-job-configuration-replyWorker Wants Job Configuration Reply
job-wants-dslir-keyJob Wants Dslir Key
job-wants-dslir-key-replyJob Wants Dslir Key Reply
worker-wants-dslir-keyWorker Wants Dslir Key
worker-wants-dslir-key-replyWorker Wants Dslir Key Reply
job-wants-variablesJob Wants Variables
job-wants-variables-replyJob Wants Variables Reply
job-wants-credentialsJob Wants Credentials
job-wants-credentials-replyJob Wants Credentials Reply
job-wants-credentials-errorJob Wants Credentials Error
job-wants-secret-variables-replyJob Wants Secret Variables Reply
job-credentials-invalidatedJob Credentials Invalidated
aggregator-healthAggregator Health
worker-verification-tokenWorker Verification Token
worker-requests-verification-tokenWorker Requests Verification Token
runtime-artifact-updateRuntime Artifact Update
runtime-artifact-update-ackRuntime Artifact Update Ack
runtime-artifact-clearRuntime Artifact Clear
runtime-artifact-clear-ackRuntime Artifact Clear Ack
runtime-artifact-fetchRuntime Artifact Fetch
runtime-artifact-fetch-replyRuntime Artifact Fetch Reply
runtime-artifact-fetch-errorRuntime Artifact Fetch Error

Mode Options

ValueDescription
list-and-download-objectsList Objects and Download
list-objectsList Objects
download-objectsDownload Given Objects

Payload Mode Options

ValueDescription
autoAuto
jsonJson
rawRaw
binaryBinary

Preprocessors Options

ValueDescription
extensionPreprocess the object or blob based on the extension of the object or blob name (.gz, .parquet)
gzipUnGzip the received data
parquetExtract the received data as JSON rows from a parquet file
base64Encode the binary data as base64