Skip to content

FileStore (Local Object Store)

Lyft Data includes a local object-store implementation (file-store) that mirrors cloud object store behaviour while keeping data on disk.

Configure Lyft Data to read from FileStore

Add the file-store input to a job. Key fields (job-spec names in kebab-case):

  • path – root directory for the local bucket (required).
  • object-names – file names or prefixes relative to the root. Leave empty to target every file exposed by the selected mode.
  • mode – choose list, download, or list-and-download.
  • ignore-linebreaks – surface each file as a single event instead of newline-delimited events.
  • timestamp-mode – derive timestamps from file metadata or a pattern in the file name.
  • include-regex / exclude-regex / maximum-age – filter candidate files by pattern or by an age window (e.g., 10m, 1d).
  • fingerprinting / maximum-fingerprint-age – enable dedupe and control fingerprint retention.
  • preprocessors – gzip/parquet/base64/extension handlers executed after read.

Example: list and download gzipped NDJSON files from disk

input:
file-store:
path: /var/log/lyftdata
object-names:
- exports/
mode: list-and-download
include-regex:
- "\\.json(\.gz)?$"
fingerprinting: true
maximum-age: 2h
preprocessors:
- extension

Configure Lyft Data to write to FileStore

Add the file-store output to a job. Key fields:

  • path – root directory where objects are created (required).
  • object-name – literal file name (name: ...) or field reference (field: ...).
  • modeput writes files; delete removes them.
  • disable-object-name-guid, guid-prefix, guid-suffix – control the GUID prefix added to generated names.
  • input-field – event field to persist; omit to serialize the entire event after preprocessors.
  • batch & retry – configure batching and failure handling.
  • track-schema – maintain __SCHEMA_NUMBER when writing JSON payloads.
  • preprocessors – gzip/base64/extension handlers executed before writing.

Example: archive processed events locally

output:
file-store:
path: /var/lib/lyftdata/archive
object-name:
name: processed/${{ event.partition }}/summary.json
disable-object-name-guid: true
input-field: payload
preprocessors:
- gzip
track-schema: true

Example: delete successfully processed files

output:
file-store:
path: /var/log/lyftdata
object-name:
field: object_name
mode: delete

Delete operations expect the incoming event to include the file path relative to the root (as produced by the file-store input). Delete mode does not append GUID prefixes.