Letting AI Design Your Data Pipelines, and What Almost Broke

AI is great at getting you from blank page to plausible architecture in minutes. That is useful. It is also exactly why teams get in trouble. A design that looks complete can still be missing the parts that matter most in production.

We ran into this the hard way on an event-ingestion pipeline. The generated DAG looked clean, unit tests passed, and backfill throughput was solid. Then a retry storm hit after an upstream timeout window. No crash. No red dashboard. Just duplicated business events entering downstream aggregates.

The pipeline did what we told it to do, not what we meant.

AI can draft structure quickly. Reliability still lives in state semantics, replay behavior, and contracts.

What AI gets right quickly

For first-pass design, AI is a force multiplier. It drafts orchestration scaffolding, baseline transforms, and dependency wiring faster than most teams can do by hand. It also gives architecture conversations something concrete to critique.

That speed is real leverage, especially early in a project.

What almost broke in production

The near-miss came from a merge strategy that was logically valid but operationally fragile. The generated plan keyed updates on a non-deterministic combination of fields that changed between retries. During replay, equivalent events were treated as new rows.

That is a classic gap between happy-path correctness and stateful correctness.

Decision area	Fast AI default	Production-safe pattern
Upsert key	Composite of mutable fields	Stable event identity plus source version
Retry handling	Re-run full batch on failure	Idempotent replay with dedupe window
Contract checks	Schema shape only	Schema plus semantic constraints
Publish condition	Task success	Task success plus quality gates

The AI output was not wrong. It was incomplete for real operating conditions.

The merge bug pattern in plain terms

This is the kind of logic that passes basic tests and later creates duplicate truth:

-- fragile: key includes fields that can shift between retries
MERGE INTO silver_events t
USING staging_events s
ON t.account_id = s.account_id
AND t.event_type = s.event_type
AND t.event_ts = s.event_ts
WHEN MATCHED THEN UPDATE SET t.payload = s.payload
WHEN NOT MATCHED THEN INSERT *;

This is the safer shape we moved to:

-- stable: deterministic identity + sequence-aware conflict handling
MERGE INTO silver_events t
USING (
  SELECT
    event_id,
    source_version,
    account_id,
    event_type,
    event_ts,
    payload
  FROM staging_events
  QUALIFY ROW_NUMBER() OVER (
    PARTITION BY event_id
    ORDER BY source_version DESC
  ) = 1
) s
ON t.event_id = s.event_id
WHEN MATCHED AND s.source_version > t.source_version THEN
  UPDATE SET
    source_version = s.source_version,
    payload = s.payload,
    event_ts = s.event_ts
WHEN NOT MATCHED THEN
  INSERT (event_id, source_version, account_id, event_type, event_ts, payload)
  VALUES (s.event_id, s.source_version, s.account_id, s.event_type, s.event_ts, s.payload);

This one change removed replay duplication and made incident triage much more straightforward.

The review contract we use now

We still use AI to design pipelines. We just require every generated design to pass a review contract before implementation:

deterministic identity and idempotency path
replay behavior under out-of-order and duplicate delivery
explicit publish gates tied to data-quality checks
runbook-ready observability fields for on-call triage

That keeps the velocity benefit without pretending generated code is production-complete.

What post-incident evidence should capture

A lightweight structured payload makes reliability reviews concrete:

{
  "pipeline": "billing_event_ingest",
  "run_id": "run-2026-01-29-03",
  "failure_mode": "retry_replay_duplicate_inserts",
  "idempotency_key": "event_id",
  "affected_rows": 18274,
  "guardrail": "dedupe_by_event_id_and_latest_source_version",
  "status": "mitigated"
}

Without this kind of record, teams repeat the same incident class with new tooling.

Final note

AI is excellent at generating pipeline drafts. It is not a substitute for production engineering judgment. The teams that win use AI for speed, then enforce stateful correctness with the same rigor they would apply to handwritten systems.