AI is great at getting you from blank page to plausible architecture in minutes. That is useful. It is also exactly why teams get in trouble. A design that looks complete can still be missing the parts that matter most in production.
We ran into this the hard way on an event-ingestion pipeline. The generated DAG looked clean, unit tests passed, and backfill throughput was solid. Then a retry storm hit after an upstream timeout window. No crash. No red dashboard. Just duplicated business events entering downstream aggregates.
The pipeline did what we told it to do, not what we meant.
AI can draft structure quickly. Reliability still lives in state semantics, replay behavior, and contracts.
What AI gets right quickly
For first-pass design, AI is a force multiplier. It drafts orchestration scaffolding, baseline transforms, and dependency wiring faster than most teams can do by hand. It also gives architecture conversations something concrete to critique.
That speed is real leverage, especially early in a project.
What almost broke in production
The near-miss came from a merge strategy that was logically valid but operationally fragile. The generated plan keyed updates on a non-deterministic combination of fields that changed between retries. During replay, equivalent events were treated as new rows.
That is a classic gap between happy-path correctness and stateful correctness.
| Decision area | Fast AI default | Production-safe pattern |
|---|---|---|
| Upsert key | Composite of mutable fields | Stable event identity plus source version |
| Retry handling | Re-run full batch on failure | Idempotent replay with dedupe window |
| Contract checks | Schema shape only | Schema plus semantic constraints |
| Publish condition | Task success | Task success plus quality gates |
The AI output was not wrong. It was incomplete for real operating conditions.
The merge bug pattern in plain terms
This is the kind of logic that passes basic tests and later creates duplicate truth:
-- fragile: key includes fields that can shift between retries
MERGE INTO silver_events t
USING staging_events s
ON t.account_id = s.account_id
AND t.event_type = s.event_type
AND t.event_ts = s.event_ts
WHEN MATCHED THEN UPDATE SET t.payload = s.payload
WHEN NOT MATCHED THEN INSERT *;
This is the safer shape we moved to:
-- stable: deterministic identity + sequence-aware conflict handling
MERGE INTO silver_events t
USING (
SELECT
event_id,
source_version,
account_id,
event_type,
event_ts,
payload
FROM staging_events
QUALIFY ROW_NUMBER() OVER (
PARTITION BY event_id
ORDER BY source_version DESC
) = 1
) s
ON t.event_id = s.event_id
WHEN MATCHED AND s.source_version > t.source_version THEN
UPDATE SET
source_version = s.source_version,
payload = s.payload,
event_ts = s.event_ts
WHEN NOT MATCHED THEN
INSERT (event_id, source_version, account_id, event_type, event_ts, payload)
VALUES (s.event_id, s.source_version, s.account_id, s.event_type, s.event_ts, s.payload);
This one change removed replay duplication and made incident triage much more straightforward.
The review contract we use now
We still use AI to design pipelines. We just require every generated design to pass a review contract before implementation:
- deterministic identity and idempotency path
- replay behavior under out-of-order and duplicate delivery
- explicit publish gates tied to data-quality checks
- runbook-ready observability fields for on-call triage
That keeps the velocity benefit without pretending generated code is production-complete.
What post-incident evidence should capture
A lightweight structured payload makes reliability reviews concrete:
{
"pipeline": "billing_event_ingest",
"run_id": "run-2026-01-29-03",
"failure_mode": "retry_replay_duplicate_inserts",
"idempotency_key": "event_id",
"affected_rows": 18274,
"guardrail": "dedupe_by_event_id_and_latest_source_version",
"status": "mitigated"
}
Without this kind of record, teams repeat the same incident class with new tooling.
Final note
AI is excellent at generating pipeline drafts. It is not a substitute for production engineering judgment. The teams that win use AI for speed, then enforce stateful correctness with the same rigor they would apply to handwritten systems.