Skip to content

S3

S3 (s3)

Stream data from a S3 Object.

Contents

Fields

FieldTypeRequiredDescription
triggertriggerHow often to run the command.
modeModeThe operating mode for this input.
ignore-linebreaksboolean (bool)Treat object as one event.
preprocessorsPreprocessorsPreprocessors (process downloaded data before making it available to the job) these processors will be run in the order they are specified.
timestamp-modeTimestamp ModeDerive a timestamp for this object for filtering purposes based on the selected strategy.
maximum-agestringRemove any objects older than this many seconds from the candidate list.
fingerprintingboolean (bool)Enable object fingerprinting, which will cause an object to only be downloaded once.
maximum-fingerprint-ageduration (string)Remove any object fingerprints older than this from the tracker.
include-regexstringInclude objects matching the specified regular expressions.
exclude-regexstringExclude objects matching the specified regular expressions.
retryRetryTimeout and Retry.

Authentication

Authentication
FieldTypeRequiredDescription
access-keystringAccess Key ID.
secret-keystringSecret Key ID.
security-tokenstringSecurity Token.
session-tokenstringSession Token.
role-arnstringA Role ARN for assuming role using above credentials.

Location

Location
FieldTypeRequiredDescription
bucket-namestringThe storage service container for created objects.
object-namesstringThe object names. When using list modes these are treated as search prefixes.
regionstringS3 Region.
endpointstringS3 Endpoint.

Object properties

Object Properties
FieldTypeRequiredDescription
object-name-fieldevent-field (string)The field that the object name from an operation should be stored in.
creation-time-fieldevent-field (string)The field that the object creation time should be stored in.
last-modified-fieldevent-field (string)The field that the object last modified time should be stored in.
content-length-fieldevent-field (string)The field that the object content length information should be stored in.
content-type-fieldevent-field (string)The field that the object content type information should be stored in.
etag-fieldevent-field (string)The field that the object ETag should be stored in.
data-fieldevent-field (string)A field that the object data should be nested in.

Retry Fields

FieldTypeRequiredDescription
countintegerThe number of retry attempts. If unspecified, retries will continue indefinitely.
pausestringHow long to pause before re-trying.

Mode Options

ValueNameDescription
list-and-download-objectslist-and-download-objectsList Objects and Download
list-objectslist-objectsList Objects
download-objectsdownload-objectsDownload Given Objects

Preprocessors Options

ValueNameDescription
extensionextensionPreprocess the object or blob based on the extension of the object or blob name (.gz, .parquet)
gzipgzipUnGzip the received data
parquetparquetExtract the received data as JSON rows from a parquet file
base64base64Encode the binary data as base64

Timestamp Mode Options

ValueNameDescription
nonenoneThe default mode, do not filter based on timestamps
last-modifiedlast-modifiedFilter object on the last-modified timestamp reported by the service
blob-name-patternblob-name-patternFilter blobs on the timestamp derived from the object name for example: relevant-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/