I saw Cloudera CMO Mary Wells post on LinkedIn this morning: "New research from Cloudera and Harvard Business Review Analytic Services shows most organizations are still formalizing the governance, quality, and strategy required to move from pilot to production." That line deserves attention. It is the problem statement for every CIO and CTO right now.
The ChatGPT Misconception
In my opinion the ChatGPT moment (Nov 2022) created a persistent misconception in the business world. Everyone thinks AI works like this: you ask a question in a browser, you get an answer instantly, no expertise required. That works for consumer applications. In enterprises, it breaks down immediately.
When business leaders say "we want to use AI like ChatGPT," what they do not understand is the difference between foundation models trained on broad internet data and enterprise AI trained on proprietary, business-specific data. The second one demands accuracy, governance, and trustworthiness. Without good data behind those queries, you get unreliable answers that damage credibility with customers, shareholders, and business partners.
The Current Enterprise Scenario
The numbers from the Harvard Business Review and Cloudera report are stark:
- 73% of respondents say their organizations should prioritize AI data quality more than they currently do.
- 73% agree that processing and preparing data for AI is challenging.
- 65% expect their organization's business processes will be augmented or replaced by agentic AI in the next two years.
That last statistic matters most. Agentic AI—autonomous systems that detect and fix problems without human intervention—requires clean, trusted data to function. You cannot automate what is broken. Yet most organizations have not formalised the governance and quality standards agentic AI demands.
The Data Shelf Life Problem
The report uses an analogy worth keeping in mind. Data is like oil—plentiful and potentially valuable, but requiring complex refining. One critical difference: oil can sit in storage indefinitely. Data cannot. Customer behaviours shift. Market conditions evolve. Regulatory landscapes change. The longer unprocessed data sits, the less accurately it assesses the present or forecasts the future.
I was on a webinar about this yesterday. One participant asked: our organization has 30 years of transaction data. How do we prepare it for AI? The answer was direct. Unless the data is accurately tagged and only clean data is used, it may take more than three years using traditional methods to make it AI-ready. That is probably longer than the CIO will be in that position—or long enough to shorten their tenure.
This is the scale of the challenge many organizations face. Legacy data is valuable but undocumented, uncleaned, and often inconsistent. Digitising and preparing it requires years of effort and skilled people. Most organizations underestimate this.
What Sergio Gago Gets Right
Sergio Gago, Chief Technology Officer at Cloudera, frames the real business value clearly:
"AI-ready data is fundamental to realizing transformative business insights and unlocking AI's full potential—the real business value obtained when proprietary data drives bespoke insight. The ability to leverage data that's sourced from proprietary, protected sources within an organization for use in AI models is essential for making sound business decisions and protecting credibility with customers, shareholders, and business partners."
— Sergio Gago, Chief Technology Officer, Cloudera
That is the point business leaders need to hear. It is not about having the most sophisticated model. It is about having proprietary data you trust enough to make decisions on. That requires governance, quality standards, and strategy.
What Cloudera Has Learned
I have not yet been briefed by Cloudera, though I am working on that. In the meantime, I asked my colleague Igor Ikonnikov, Advisory Fellow and Analyst at Info-Tech Research Group, what he makes of their market position.
Igor talks to CIOs and CTOs regularly about getting data ready for AI. Here is what he said:
"Since the early days of Data Lakes in mid-2010s, Cloudera has been offering a well thought-through platform that was more than a data storage solution—it enabled data cataloging, quality management and security enforcement. That is why they have remained relevant on the market up until now. Their proactive understanding of the challenges their customers are or will be facing is highly commendable and positions them as a reliable partner."
That trajectory matters. The market learned from earlier data lake failures, where organizations deployed massive repositories without governance and ended up with "data swamps"—expensive, untrusted, inaccessible collections of data. Platforms that survive do so because they embedded governance into the foundation, not as an afterthought.
Where to Go from Here
If you are connected to any AI implementation project—pilot, proof-of-concept, or production—read this report. It is direct about the gap between what organizations are doing and what they need to do.
Access the Full Report
Cloudera and Harvard Business Review: AI Data Readiness Report
If you want to discuss how this applies to your environment—data platform strategy, governance approaches, or preparing data for agentic AI—we work with CIOs and CTOs on exactly this. Info-Tech Research Group provides briefings on these topics. Unlike other analyst firms, we do not require a formal RFI process. You can request a briefing directly through connecting with me.
Thanks to Mary Wells for surfacing this on LinkedIn. It is the conversation that matters most right now.
