Google Cloud Storage (gcs)
Read objects from Google Cloud Storage.
Contents
Authentication
Authentication
| Field | Type | Required | Description |
|---|
credentials | gcs_input:credentials | ✅ | Google Cloud credentials (service account JSON, workload identity, etc.). Supports literal values or context interpolation via PossibleContextString. |
Behavior
Behavior
| Field | Type | Required | Description |
|---|
mode | Mode | | Input behavior: use list-and-download, list, or download depending on whether you want to enumerate objects, fetch contents, or download specific names directly. |
ignore-linebreaks | boolean (bool) | | Treat the entire object as a single event. When false, objects are split on newlines unless json=true instructs the runtime to parse JSON arrays. |
timestamp-mode | Timestamp Mode | | Derive a timestamp for this blob for filtering purposes based on the selected strategy (e.g., object creation time vs. last modification). |
Filtering
Filtering
| Field | Type | Required | Description |
|---|
maximum-age | duration (string) | | Ignore objects older than the provided duration (e.g., 5m, 1h30m). Leave empty to process all visible objects. |
include-regex | string | | Include objects matching the specified regular expressions. |
exclude-regex | string | | Exclude objects matching the specified regular expressions. |
Location
Location
| Field | Type | Required | Description |
|---|
bucket-name | string | ✅ | The GCS bucket to read from. |
object-names | string | ✅ | Object names or prefixes to target. Leave empty to consider every object exposed by the selected mode—take care with large buckets. |
Object properties
Object Properties
| Field | Type | Required | Description |
|---|
object-name-field | event-field (string) | | The field that the object name from an operation should be stored in. |
creation-time-field | event-field (string) | | The field that the object creation time should be stored in. |
last-modified-field | event-field (string) | | The field that the object last modified time should be stored in. |
content-length-field | event-field (string) | | The field that the object content length information should be stored in. |
content-type-field | event-field (string) | | The field that the object content type information should be stored in. |
etag-field | event-field (string) | | The field that the object ETag should be stored in. |
data-field | event-field (string) | | A field to take the object data (default is to merge fields if possible). |
Processing
Processing
| Field | Type | Required | Description |
|---|
preprocessors | Preprocessors | | Preprocessors (process downloaded data before making it available to the job) these processors will be run in the order they are specified. |
Reliability
Reliability
| Field | Type | Required | Description |
|---|
fingerprinting | boolean (bool) | | Enable object fingerprinting to download each object only once, even across restarts. |
maximum-fingerprint-age | duration (string) | | How long to retain stored fingerprints before they are eligible for cleanup. |
retry | Retry | | How to retry failed operations (backoff, max attempts, etc.). |
Trigger
Trigger
| Field | Type | Required | Description |
|---|
trigger | trigger | | Run the poller on the provided cadence; omit to run continuously via the job scheduler. |
Retry Fields
| Field | Type | Required | Description |
|---|
count | integer | | The number of retry attempts. If unspecified, retries will continue indefinitely. |
pause | string | | How long to pause before re-trying. |
Mode Options
| Value | Name | Description |
|---|
list-and-download-objects | list-and-download-objects | List Objects and Download |
list-objects | list-objects | List Objects |
download-objects | download-objects | Download Given Objects |
Timestamp Mode Options
| Value | Name | Description |
|---|
none | none | The default mode, do not filter based on timestamps |
last-modified | last-modified | Filter object on the last-modified timestamp reported by the service |
blob-name-pattern | blob-name-pattern | Filter blobs on the timestamp derived from the object name for example: relevant-name-pattern: =(?P<Y>[\\d]{4,4})-(?P<m>[\\d]{2,2})-(?P<d>[\\d]{2,2})/ |
Preprocessors Options
| Value | Name | Description |
|---|
extension | extension | Preprocess the object or blob based on the extension of the object or blob name (.gz, .parquet) |
gzip | gzip | UnGzip the received data |
parquet | parquet | Extract the received data as JSON rows from a parquet file |
base64 | base64 | Encode the binary data as base64 |