Microsoft Azure

Lyft Data supports reading from and writing to Azure Blob Storage.

Configure Lyft Data to read from Azure Blob Storage

Add the azure-blob input to a job. Key fields (job-spec names in kebab-case):

container-name – blob container to read from (required).
blob-names – list of blob names or prefixes. Leave empty to target the entire container when the selected mode allows listing.
mode – choose list-objects, download-objects, or list-and-download-objects.
storage-account / storage-master-key – required credentials. These accept literals or context substitutions.
ignore-linebreaks – surface each blob as a single event instead of newline-delimited events.
timestamp-mode – derive timestamps from last-modified metadata or a pattern in the blob name.
include-regex / exclude-regex / maximum-age – filter candidates by pattern or by age (durations such as 8h, 2d).
fingerprinting / maximum-fingerprint-age – enable dedupe and tune fingerprint retention.
preprocessors – apply gzip/parquet/base64/extension handlers before events enter the pipeline.

Example: list and download CSV exports

input:
  azure-blob:
    container-name: reporting
    blob-names:
      - exports/daily/
    mode: list-and-download-objects
    include-regex:
      - '\\.csv(\\.gz)?$'
    maximum-age: 4h
    fingerprinting: true
    storage-account: "{{secrets.azure_account}}"
    storage-master-key: "{{secrets.azure_key}}"
    preprocessors:
      - extension

Configure Lyft Data to write to Azure Blob Storage

Add the azure-blob output to a job. Key fields:

container-name – destination container (required).
blob-destination – literal name (name: ...) or field reference (field: ...).
mode – put uploads blobs; delete removes existing blobs.
disable-blob-name-guid, guid-prefix, guid-suffix – control the GUID prefix appended to uploaded blobs. Disabling requires that your names are already unique.
input-field – select the event field to upload; omit to serialize the entire event after preprocessors.
content-type – override the default text/plain content type.
batch & retry – configure batching and failure handling.
track-schema – keep __SCHEMA_NUMBER in sync when writing JSON payloads.
preprocessors – gzip/base64/extension handlers executed before upload.
storage-account / storage-master-key – credentials for writes.

Example: upload transformed data to Azure

output:
  azure-blob:
    container-name: processed
    blob-destination:
      name: exports/${partition}/summary.json
    disable-blob-name-guid: true
    input-field: payload
    content-type: application/json
    preprocessors:
      - gzip
    track-schema: true
    storage-account: "{{secrets.azure_account}}"
    storage-master-key: "{{secrets.azure_key}}"

Example: delete source blobs after successful processing

output:
  azure-blob:
    container-name: reporting
    blob-destination:
      field: blob_name
    mode: delete
    storage-account: "{{secrets.azure_account}}"
    storage-master-key: "{{secrets.azure_key}}"

Delete operations expect the incoming event to include the blob name (for example from the Azure input). GUID prefixes are not applied when deleting.