Skip to content

Google Cloud Storage

Google Cloud Storage (gcs)

Read objects from Google Cloud Storage.

Contents

Authentication

Authentication
FieldTypeRequiredDescription
credentialsgcs_input:credentialsGoogle Cloud credentials (service account JSON, workload identity, etc.). Supports literal values or context interpolation via PossibleContextString.

Behavior

Behavior
FieldTypeRequiredDescription
modeModeInput behavior: use list-and-download, list, or download depending on whether you want to enumerate objects, fetch contents, or download specific names directly.
ignore-linebreaksboolean (bool)Treat the entire object as a single event. When false, objects are split on newlines unless json=true instructs the runtime to parse JSON arrays.
timestamp-modeTimestamp ModeDerive a timestamp for this blob for filtering purposes based on the selected strategy (e.g., object creation time vs. last modification).

Filtering

Filtering
FieldTypeRequiredDescription
maximum-ageduration (string)Ignore objects older than the provided duration (e.g., 5m, 1h30m). Leave empty to process all visible objects.
include-regexstringInclude objects matching the specified regular expressions.
exclude-regexstringExclude objects matching the specified regular expressions.

Location

Location
FieldTypeRequiredDescription
bucket-namestringThe GCS bucket to read from.
object-namesstringObject names or prefixes to target. Leave empty to consider every object exposed by the selected mode—take care with large buckets.

Object properties

Object Properties
FieldTypeRequiredDescription
object-name-fieldevent-field (string)The field that the object name from an operation should be stored in.
creation-time-fieldevent-field (string)The field that the object creation time should be stored in.
last-modified-fieldevent-field (string)The field that the object last modified time should be stored in.
content-length-fieldevent-field (string)The field that the object content length information should be stored in.
content-type-fieldevent-field (string)The field that the object content type information should be stored in.
etag-fieldevent-field (string)The field that the object ETag should be stored in.
data-fieldevent-field (string)A field to take the object data (default is to merge fields if possible).

Processing

Processing
FieldTypeRequiredDescription
preprocessorsPreprocessorsPreprocessors (process downloaded data before making it available to the job) these processors will be run in the order they are specified.

Reliability

Reliability
FieldTypeRequiredDescription
fingerprintingboolean (bool)Enable object fingerprinting to download each object only once, even across restarts.
maximum-fingerprint-ageduration (string)How long to retain stored fingerprints before they are eligible for cleanup.
retryRetryHow to retry failed operations (backoff, max attempts, etc.).

Trigger

Trigger
FieldTypeRequiredDescription
triggertriggerRun the poller on the provided cadence; omit to run continuously via the job scheduler.

Retry Fields

FieldTypeRequiredDescription
countintegerThe number of retry attempts. If unspecified, retries will continue indefinitely.
pausestringHow long to pause before re-trying.

Mode Options

ValueNameDescription
list-and-download-objectslist-and-download-objectsList Objects and Download
list-objectslist-objectsList Objects
download-objectsdownload-objectsDownload Given Objects

Timestamp Mode Options

ValueNameDescription
nonenoneThe default mode, do not filter based on timestamps
last-modifiedlast-modifiedFilter object on the last-modified timestamp reported by the service
blob-name-patternblob-name-patternFilter blobs on the timestamp derived from the object name for example: relevant-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/

Preprocessors Options

ValueNameDescription
extensionextensionPreprocess the object or blob based on the extension of the object or blob name (.gz, .parquet)
gzipgzipUnGzip the received data
parquetparquetExtract the received data as JSON rows from a parquet file
base64base64Encode the binary data as base64