Skip to content

From Sample Job to Production Connector

This guide is for data engineers who have completed the Day 0 quick start and want to wire in real data. It focuses on choosing the right connector, validating transformations, and promoting a job safely.

1. Capture requirements (10 minutes)

  • Business goal: what question or downstream system are you serving?
  • Data source: protocol (files, object store, HTTP API, database dump), expected frequency, size, and authentication.
  • Destination: target format and ingestion expectations (batch vs streaming, retention requirements).

Document these answers; they drive connector selection and batching decisions.

2. Pick the right input (10 minutes)

Use the Build catalog to choose an input:

3. Design transformations (15 minutes)

  • Map input fields to the schema you need downstream.
  • Identify enrichment needs (lookups, timestamps, context values).
  • Select actions: Actions overview describes field edits, filters, scripts, and enrichers.
  • Plan for error handling (discard vs reroute) and record outlier handling.

4. Configure the job in the visual editor (30 minutes)

  1. Start with a copy of the default job or create a new job in the editor.
  2. Swap the input for the real connector and fill required fields (bucket, keys, URL, credentials).
  3. Add actions for transformations and enrichment. Use add, convert, filter, or enrich as needed.
  4. Configure the output (S3, HTTP, file-store, etc.) with batching if required.
  5. Use Run & Trace with sample data to validate the end-to-end flow. Adjust until the output matches expectations.

5. Handle secrets and context (10 minutes)

  • Use job context values for API keys or environment-dependent settings.
  • Document required environment variables and verify they are covered in staging/production.
  • Reference the context management guide for merge rules and overrides.

6. Stage, test, and promote (20 minutes)

  • Stage the job and deploy it to a non-production worker first.
  • Validate metrics and logs after a sample run. Look for retries, error counts, or shape mismatches.
  • Update runbooks with monitoring requirements (dashboards, alerts). See the Monitoring guide for key metrics.
  • When satisfied, deploy to production workers and monitor closely during the first full run.

7. Share status and iterate

  • Record the job’s purpose, owners, and SLA in your team docs.
  • Schedule periodic reviews with downstream consumers to confirm the pipeline meets their needs.
  • Add lessons learned back into the build tutorials so future jobs benefit.

Quick checklist

  • Requirements documented (source, destination, schedule, success criteria)
  • Input connector selected and tested with sample data
  • Actions configured and validated using Run & Trace
  • Output batching and delivery confirmed
  • Context/environment variables defined
  • Job staged, promoted, and monitored in production

Once you’re comfortable building manually, graduate to automation with the CI/CD guide. The actions, batching, and context patterns above form the foundation of every production job.