From Sample Job to Production Connector

This guide is for data engineers who have completed the Day 0 quick start and want to wire in real data. It focuses on choosing the right connector, validating transformations, and promoting a job safely.

1. Capture requirements (10 minutes)

Business goal: what question or downstream system are you serving?
Data source: protocol (files, object store, HTTP API, database dump), expected frequency, size, and authentication.
Destination: target format and ingestion expectations (batch vs streaming, retention requirements).

Document these answers; they drive connector selection and batching decisions.

2. Pick the right input (10 minutes)

Use the Build catalog to choose an input:

Object stores: S3, GCS, Azure Blob, or FileStore for on-prem.
APIs: HTTP Poll or HTTP Server depending on push vs pull models.
Files: Log Files input (files) for tailing logs or processing directories.

3. Design transformations (15 minutes)

Map input fields to the schema you need downstream.
Identify enrichment needs (lookups, timestamps, context values).
Select actions: Actions overview describes field edits, filters, scripts, and enrichers.
Plan for error handling (discard vs reroute) and record outlier handling.

4. Configure the job in the visual editor (30 minutes)

Start with a copy of the default job or create a new job in the editor.
Swap the input for the real connector and fill required fields (bucket, keys, URL, credentials).
Add actions for transformations and enrichment. Use add, convert, filter, or enrich as needed.
Configure the output (S3, HTTP, file-store, etc.) with batching if required.
Use Run & Trace with sample data to validate the end-to-end flow. Adjust until the output matches expectations.

5. Handle secrets and context (10 minutes)

Use job context values for API keys or environment-dependent settings.
Document required environment variables and verify they are covered in staging/production.
Reference the context management guide for merge rules and overrides.

6. Stage, test, and promote (20 minutes)

Stage the job and deploy it to a non-production worker first.
Validate metrics and logs after a sample run. Look for retries, error counts, or shape mismatches.
Update runbooks with monitoring requirements (dashboards, alerts). See the Monitoring guide for key metrics.
When satisfied, deploy to production workers and monitor closely during the first full run.

Record the job’s purpose, owners, and SLA in your team docs.
Schedule periodic reviews with downstream consumers to confirm the pipeline meets their needs.
Add lessons learned back into the build tutorials so future jobs benefit.

Quick checklist

Requirements documented (source, destination, schedule, success criteria)
Input connector selected and tested with sample data
Actions configured and validated using Run & Trace
Output batching and delivery confirmed
Context/environment variables defined
Job staged, promoted, and monitored in production

Once you’re comfortable building manually, graduate to automation with the CI/CD guide. The actions, batching, and context patterns above form the foundation of every production job.