Discussion about this post

User's avatar
The Next Evolution's avatar

Great follow up article and thanks for recognising there is more to the challenge of AI than people realise at first glance.

You've mapped the estate problem clearly, Andrei. Reading it raised something I think sits just below even the visibility question: data we don't yet know we have, and can't plan for.

We have well-established frameworks for governing data we can identify and classify. Lineage, provenance, quality standards — these work when data has a known origin and a traceable path. Generative AI is creating a category of data object that our current vocabulary and governance models have no proper name for.

When AI models start populating data products and marketplaces with synthetically generated outputs, the lineage question becomes hard in a new way. Where did that data object come from? What rules produced it? Which model version, trained on what, with what parameters? A human analyst can be interviewed. A data source can be audited. A synthetic data object generated by an opaque model chain is something else entirely — and we don't yet have a clean answer to how you certify its provenance or its fitness for use downstream.

The governance work you've described is the right first move. But I think we've started at the top of the problem: AI as technology, then as a functional capability, then embedded in business processes. The harder layer is at the bottom — the data substrate everything rests on. That's where the real governance challenge lives, and it's the layer we haven't properly reached yet.

Which matters beyond AI: if we can't solve synthetic data lineage now, at a scale we can still examine and understand, we won't stand a chance when quantum computing starts generating data objects at a complexity that makes today's generative AI output look manageable.

Ritavan's avatar

Important points!

No posts

Ready for more?