AWS Executive in Residence Blog
True Data-Centricity
We’ve heard that companies must become data-driven. They must treat data as an asset, govern it, improve its quality, and make it easily available across the enterprise. Perhaps these pronouncements are becoming tiresome. But really they understate the change in how we regard data and compute and their relationship.
IT has always overseen both data and compute. We apply compute to data to derive business outcomes. Compute is a verb and data is a noun; compute is active and data is acted upon. At least until recently.
Compute has always been at the center of IT. We talk about applications and workloads. In cloud migrations we measure success by the number of workloads (compute) migrated. The main technologists of IT are software developers (compute) and operations folks (compute). Theoretical computer science talks about Turing machines (compute) and about programming languages. The history of computing is about the invention of increasingly sophisticated processing devices. And the business history of Silicon Valley is the story of developments in chips (compute) and applications (compute).
Yes, I’m exaggerating. Data structures have always been an important part of computer science as well—lists, trees, graphs and all that. But these were ephemeral, in-memory constructs. Relational database systems played a huge part in the history of IT—but arguably it was the database as a server (the compute) that was the focus, not the stuff it stored. IT departments had database engineers, if you could find them when you needed them. But the data itself was subsidiary to the compute.
We seem to be entering a new world centered on data rather than compute. Compute is now a utility—you get it from the cloud. You might even get it “serverless”—that is, as pure compute without reference to hardware. On the other hand, ML is about data (training data and inference data) and more importantly, the hundreds of billions of model parameters. Instead of a programmer writing code (instructions for compute), ML puts those instructions into data in the form of the model’s parameters. The verbs are being overtaken by the nouns.
It’s odd that IT recently talked about infrastructure as code, but today we have to think about code as data. Software might be eating the world, but data is eating software.
At AWS we say that data is your differentiator. Foundation models, open-source code, and cloud services are common across companies. But when you add your own data and embed AI into your business processes, you have something differentiated. You start from your own data and bring compute and language model services to it rather than vice versa.
Even synthetic data is increasingly important. Synthetic data is actually the opposite of data: records that describe nothing real. It has sometimes been hard to get enough test data for an application, especially when data includes PII. Now AI can create fictional data with fewer privacy concerns. And synthetic data can even be used to train models. It’s a data-centric world.
Where is compute itself going? It is becoming agentic. Agentic workflows tie together these data-driven AI models to deliver business outcomes. The agentic script doesn’t include all the compute steps—it delegates the reasoning, acting, and observing to the AI models. The LLMs themselves, with all the logic stored in parameters, are provided by outside vendors. So the importance of code is declining in the enterprise.
You might even say that compute is a nuisance today. Capital investment costs for compute can be immense at the scale we need. And compute infrastructure poses sustainability concerns.
So at the risk of oversimplifying, data is increasingly at the center of IT, and compute less so.
As a result, CIOs might want to make data central to their planning processes and to the skills mix on IT teams. They may need to pay closer attention to ensuring the company’s data use is rigorous and that the data is kept healthy and meaningful. They will want to extract data from corporate silos and lubricate its flow from place to place, department to department, bits to brains. And they must secure it—encrypt it, for example—and protect it, with adequate governance and identity and access management.
The cloud provides the forms of compute needed for this data-centric view of the world. AWS’s services can enforce privacy policies, move data from place to place, store it in forms appropriate to its nature (key-value, timeseries, graph, document, and relational databases), analyze it, and present it to ML models as RAG data. Tools like Amazon Quick Suite make data widely available within the bounds of a governance model. The verbs are easily available, so you can concentrate on the nouns.
Data is not just a first-class citizen of the IT world. In fact, it is the first-class citizen.