Skip to content

Microsoft Azure

Lyft Data supports reading from and writing to Azure Blob Storage.

Configure Lyft Data to read from Azure Blob Storage

Add the azure-blob input to a job. Key fields (job-spec names in kebab-case):

  • container-name – blob container to read from (required).
  • blob-names – list of blob names or prefixes. Leave empty to target the entire container when the selected mode allows listing.
  • mode – choose list, download, or list-and-download.
  • storage-account / storage-master-key – required credentials. These accept literals or context substitutions.
  • ignore-linebreaks – surface each blob as a single event instead of newline-delimited events.
  • timestamp-mode – derive timestamps from last-modified metadata or a pattern in the blob name.
  • include-regex / exclude-regex / maximum-age – filter candidates by pattern or by age (durations such as 8h, 2d).
  • fingerprinting / maximum-fingerprint-age – enable dedupe and tune fingerprint retention.
  • preprocessors – apply gzip/parquet/base64/extension handlers before events enter the pipeline.

Example: list and download CSV exports

input:
azure-blob:
container-name: reporting
blob-names:
- exports/daily/
mode: list-and-download
include-regex:
- "\\.csv(\.gz)?$"
maximum-age: 4h
fingerprinting: true
storage-account: ${secrets.azure_account}
storage-master-key: ${secrets.azure_key}
preprocessors:
- extension

Configure Lyft Data to write to Azure Blob Storage

Add the azure-blob output to a job. Key fields:

  • container-name – destination container (required).
  • blob-destination – literal name (name: ...) or field reference (field: ...).
  • modeput uploads blobs; delete removes existing blobs.
  • disable-blob-name-guid, guid-prefix, guid-suffix – control the GUID prefix appended to uploaded blobs. Disabling requires that your names are already unique.
  • input-field – select the event field to upload; omit to serialize the entire event after preprocessors.
  • content-type – override the default text/plain content type.
  • batch & retry – configure batching and failure handling.
  • track-schema – keep __SCHEMA_NUMBER in sync when writing JSON payloads.
  • preprocessors – gzip/base64/extension handlers executed before upload.
  • storage-account / storage-master-key – credentials for writes.

Example: upload transformed data to Azure

output:
azure-blob:
container-name: processed
blob-destination:
name: exports/${{ event.partition }}/summary.json
disable-blob-name-guid: true
input-field: payload
content-type: application/json
preprocessors:
- gzip
track-schema: true
storage-account: ${secrets.azure_account}
storage-master-key: ${secrets.azure_key}

Example: delete source blobs after successful processing

output:
azure-blob:
container-name: reporting
blob-destination:
field: blob_name
mode: delete
storage-account: ${secrets.azure_account}
storage-master-key: ${secrets.azure_key}

Delete operations expect the incoming event to include the blob name (for example from the Azure input). GUID prefixes are not applied when deleting.