Polling An HTTP API
This tutorial walks through a minimal http-poll job that calls a public API, parses the JSON response, and prepares the pipeline for promotion. The same pattern applies to internal services once you swap the URL and credentials.
Prerequisites
- A running Lyft Data server with at least one worker attached.
- Outbound network access from the worker to
https://httpbin.org(or your real API). - An operator account with access to the Jobs visual editor.
1. Create the job and configure the request
- Open Jobs in the UI and click New job. Name it
http-sample. - In the visual editor, select the Input panel and choose HTTP Poll.
- Set URL to
https://httpbin.org/anything. Leave Method asGETfor now. (http-pollalso supports POST/PUT/PATCH/DELETE plus custom methods, headers, query parameters, basic auth, bearer tokens, or API keys.) - Expand the Response section and set:
- Body field to
response_bodyso the payload lands in that field. - Status field to
response_statusif you want to store the HTTP status code. - Enable JSON and Ignore line breaks because
httpbinreturns multi-line JSON. The runtime honours these flags when parsing responses and emits a single event per request.
- Body field to
- (Optional) Add headers, query parameters, or a request body to mimic your production call. The
http-pollinput merges all of these options into the outgoing request each time the trigger fires.
At this point the job issues a GET request and records the raw response body and status in the event payload.
2. Parse the response
Add actions to shape the HTTP payload:
- json: input-field: response_body- remove: fields: - response_bodyThis pipeline parses the JSON body into event fields (for example json.args, json.headers) and removes the raw string once it is no longer needed. You can now add filters, enrichments, or variable expansion just like any other job.
3. Choose an output for validation
While testing, point the job at the Print output so you can inspect events in the Run & Trace panel:
output: print: output: stdoutWhen you are ready for production, swap the output for S3, HTTP POST, Splunk HEC, or another destination.
4. Test with Run & Trace
Click Run & Trace in the editor. The UI sends the current job definition to /api/jobs/run, so the worker executes it once without staging. Inspect the trace to confirm the request URL, status code, and parsed fields look right. Adjust headers or actions until the event matches your downstream schema.
5. Schedule the polling cadence
Back in the input configuration, open Trigger and choose:
- Interval with a value such as
5mfor simple polling, or - Cron with a six-field expression (seconds, minutes, hours, day-of-month, month, day-of-week) when you need precise alignment.
http-poll also honours message-based triggers, so another pipeline can publish an internal message to fire this job on demand.
6. Stage, deploy, and monitor
- Save the job, close the editor, and click Stage job.
- Deploy to a worker and confirm it shows Running under Operate > Job status once the trigger fires.
- Watch worker logs for
http-pollstatus entries. Retries follow the backoff settings you configured on the input. - When the pipeline is stable, replace the Print output with your production sink and follow the promotion steps in Deploying jobs.
Next steps
- Add authentication using headers or bearer tokens before pointing at a real API.
- Capture response headers by setting the Headers field; they arrive as a JSON object alongside the body and status.
- Combine with the guidance in Advanced scheduling and Dealing with time for backfills or sliding windows.
- Add alerting using the run health patterns in Operate monitoring.