Ingesting Local Files
The files input tails a directory on the worker and turns each line into an event. This tutorial shows how to import an existing batch of files, how to switch the job into watch mode, and how to avoid reprocessing data when the worker restarts.
Prerequisites
- A worker with access to the directory you want to ingest (local path or mounted volume).
- Sample files containing one JSON record per line.
- Access to the Jobs visual editor.
1. Build a one-off import job
- Create a new job named
files-importand choose Files as the input. - Set Path to the directory or glob pattern that should be scanned, for example
C:/data/logs/*.json. - Enable JSON so the runtime parses each line into fields instead of wrapping it in
_raw. - Enable Stop reading after. This tells the input to exit once every matching file has been processed, which is perfect for backfills.
- (Optional) Set File path field to
source_pathif you want the event to include the file name. Use File basename to keep only the leaf name.
files fingerprints every object it reads (path plus metadata), so rerunning the job later will skip files it has already processed unless you explicitly reset the fingerprints.
2. Add transformations and an output
Attach the actions and output you need for your downstream system. For example:
actions: - add: output-fields: ingested_at: "${time|now_time_iso}"output: print: output: stdoutYou can replace the Print output with S3, Splunk HEC, or another sink once validation is complete.
3. Validate with Run & Trace
Use Run & Trace to execute the job once. The UI sends the current definition to /api/jobs/run, so the worker runs it transiently and returns each event and its trace. Confirm the expected number of files and records arrive, and inspect the metadata fields you added.
If you need to rerun the import from the top, set Start at beginning to true. Otherwise the fingerprints make repeat runs idempotent.
4. Switch to continuous monitoring (optional)
To turn the job into a directory watcher instead of a one-off import:
- Disable Stop reading after so the runtime keeps listening for new files.
- (Optional) Reduce Run time limit or set an Output event limit during transient runs so tests stop promptly.
- Stage and deploy the job. Whenever a new file matching the glob appears, the worker processes it once and records its fingerprint.
5. Stage, deploy, and monitor
- Save, close the editor, and click Stage job.
- Deploy to a worker. The job immediately processes any unseen files, then idles while waiting for new ones.
- Monitor progress in Operate > Job status and inspect worker logs if files are skipped. Most often the fingerprints show the file was already processed or the glob did not match.
Operational tips
- Keep the fingerprint database (stored alongside the worker state directory) when upgrading workers so you do not reprocess old files by accident.
- Use Advanced scheduling with a message trigger if another pipeline should signal when to scan a directory.
- Pair the job with the Dealing with time guide to stamp ingestion timestamps or derive partitions for downstream storage.
- Follow the runbook patterns in Operate monitoring to alert when file backlogs build up or when the job encounters repeated read errors.