AWS Cloud Enterprise Strategy Blog
Data Governance in the Age of Generative AI
The exponential growth of enterprise data presents unprecedented opportunities for innovation, yet many organizations struggle to capitalize on it due to inadequate data governance. A robust governance framework is crucial for future-proofing and maintaining competitiveness.
Effective data governance rests on four pillars:
- Data visibility: Clarify available data assets to inform decision-making.
- Access control: Balance accessibility with security.
- Quality assurance: Ensure data reliability for accurate analytics.
- Ownership: Drive leadership commitment and organizational buy-in.
These pillars enable your organization to trust, utilize, and protect its data—which can help build your competitive advantage.
This blog builds upon the foundation laid in my previous post, “Your AI is Only as Good as Your Data,” and covers some practical strategies you can use to establish an effective data governance framework.
Data Visibility: Strategic Transparency in Action
A key challenge in data governance is a lack of visibility. With data scattered across departments and systems, leaders often don’t know what data they have, where it’s stored, or who owns it. This fragmentation, worsened by new digital tools, mergers, and IoT data, often results in duplicate data efforts, inconsistencies, and extra work in data discovery and preparation.
Governing, securing, and leveraging data effectively without a centralized view is hard. This gap is critical with generative AI, which relies on consistent, high-quality data. In AWS’s upcoming 2025 Chief Data Officer study, 39% of respondents cite data challenges like cleaning, integration, and storage as barriers to using generative AI, with 49% working on data quality and 46% focusing on data integration.
Establish a Comprehensive Data Catalog
Start by creating a centralized data catalog to support all data governance initiatives. AWS Glue, a managed data integration service, can help by automatically cataloging data from on-premises databases, data lakes, and SaaS applications. Form a cross-functional team led by the chief data officer or a data steward to oversee the catalog initiative. Collaborate with stakeholders to identify critical data domains and sources and work with subject matter experts to add business context, data quality metrics, and usage patterns. Set policies and taxonomies for consistent data asset descriptions.
Use an Iterative Value-Based Approach
When addressing siloed data and sprawl, I recommend working iteratively, focusing on high-value and high-risk data assets. You might start with a pilot in one business area to avoid getting stuck mapping and unifying the entire data landscape. Demonstrating early success can help you build momentum and avoid the trap of gathering data without a clear purpose or consumer in mind.
Access Control: Balancing Access and Security
Managing access permissions becomes complex as organizations wrangle siloed data repositories and AI/ML models. When unique permission structures govern each data source and analytics asset, cross-functional collaboration becomes complicated. Employees are left unable to access vital data and analytics owned by other departments.
Embrace a Federated Access Management Framework
Instead of enforcing a one-size-fits-all access control, consider a federated data governance model. This approach lets data owners and stewards—those most familiar with the data and its protection needs—manage access permissions.
In this model, data consumers request access directly from data owners who grant or deny access through a centralized management system. A data mesh platform team maintains this system to ensure consistent access policies and traceability while allowing experts to control their data assets. My colleague Matthias discusses this in his blog, “Data Lakes vs. Data Mesh: Navigating the Future of Organizational Data Strategies.”
Invest in Robust Access Control and Auditing Capability
Invest in robust access control and auditing tools like AWS Lake Formation. These tools can help you centrally define and enforce fine-grained permissions, as well as monitor and audit data access across your entire data ecosystem.
Commission a Comprehensive Data Security Review
Hire an external cybersecurity firm within 90 days to independently review your data security, including ethical hacking exercises. Use their findings to create a roadmap for enhanced access controls and monitoring. Establish regular security audits and penetration tests to maintain compliance with regulations like GDPR and CCPA across all regions.
Who Is Governing Whom?
In my previous blog post, “Governing by Enabling: A Strategic Approach for Executives in Data Governance,” I suggested flipping the traditional data governance model on its head. Governing by enabling transforms governance from a set of restrictive controls into a catalyst for innovation, productivity, and data-driven decision-making.
Deploy Intuitive Self-Service Data Platforms
Create a control environment that empowers users—not constrains them—by deploying intuitive, self-service data platforms and catalogs that make it easy for employees to discover, access, and leverage the information they need while integrating security, privacy, and compliance controls behind the scenes. By automating the enforcement of governance policies and making the mechanics invisible to users, you can transform data governance from a barrier to productivity into an enabler of innovation.
Quality Assurance: Garbage in Garbage Out
Dispersing data across various systems and platforms with different governance standards increases the likelihood of errors, inconsistencies, and data integrity issues. This can result in unreliable insights and poor decision-making. Inadequate oversight makes it difficult to conduct effective audits internally and externally. These audits are crucial for demonstrating compliance with regulatory standards, which can leave you vulnerable to regulatory penalties, legal risks, and reputational damage.
Cleaning up inconsistent standards might require a lot of work: In the 2025 Chief Data Officer study, 59% of respondents agree that “The amount of work required to make our data suitable for generative AI implementations is daunting.”
Appoint Dedicated Data Quality Stewards
Identify experts in each major data domain to act as data quality stewards responsible for setting standards, implementing automated checks, and monitoring data health. Finance may appoint a steward for financial data accuracy, while marketing may have a steward to ensure customer data integrity. Choose individuals with deep data knowledge, good communication skills, and collaboration skills. Equip them with the authority, resources, and support necessary to drive successful data quality initiatives.
Implement Automated Data Quality Controls
Work with your data quality stewards to identify critical metrics such as accuracy, completeness, consistency, and timeliness. Invest in tools like AWS Glue Data Quality to automatically monitor and validate these metrics across your data pipelines, flagging any issues for swift remediation.
Establish Data Quality KPIs and Dashboards
Set specific data quality KPIs aligned with business objectives for each data product. Finance KPIs could include the accuracy of vendor information or the availability of invoice data for users. Marketing KPIs might focus on the accuracy of customer contacts, freshness of lead data, and consistency in campaign metrics. Integrate these KPIs into executive dashboards so leaders prioritize data quality and keep it visible.
Everyone Has a Role
The rapid advancement of generative AI has ushered in a new era of tremendous opportunity and significant risk. Your organization must grapple with the challenges it introduces, such as data privacy concerns and potential misuse. Embedding ethical considerations into your data governance strategies is crucial, as I explained in the blog post “Responsible AI Best Practices.”
Implement a Responsible AI Practice—Now
Set up an AI Ethics Board within 30 days, comprising representatives from legal, ethics, IT, data science, and key business units. Their first deliverable should be a set of guidelines for the ethical use of generative AI within your organization, due within 90 days. These guidelines should cover key responsible AI principles such as privacy, fairness, transparency, and accountability, aligning with industry best practices. They should also mandate AI ethics training for all employees involved in AI development or deployment.
Ownership: Building a Data-Driven Culture
Driving lasting change in an organization’s data management practices requires far more than new technologies or processes. Effective data governance relies on a profound cultural transformation that shifts how employees perceive and interact with data across the enterprise. Employees must view data controls as a strategic advantage rather than a necessary burden. As an executive spearheading data governance initiatives, you must be prepared to tackle the people-centric challenges that often pose the greatest barriers to success.
Secure Buy-In
Begin by securing buy-in from key stakeholders across the business. Communicate the strategic importance of data governance, highlighting how it enables better decision-making, improves operational efficiency, and unlocks new growth opportunities. Engage leaders from different functions to serve as champions.
Invest in data literacy training for all employees, fostering a shared understanding of the value of data and everyone’s role in maintaining data quality and integrity. Celebrate data-driven successes and recognize those actively contributing to the data governance initiative.
Shift from Data Ownership to Data Stewardship
Data ownership implies exclusive control, leading to siloed, protective behavior. Departments hoard data as a source of power rather than sharing it as a common resource.
Data stewardship reframes this. Employees are responsible for properly managing, securing, and sharing data rather than hoarding it. Finance may restrict access to customer data, but as stewards, they must also make it available to other departments that need it to serve customers effectively.
Make Data Governance a C-Suite Priority
Create a chief data officer role reporting to the CEO and establish a cross-functional Data Governance Council. Integrate governance metrics into corporate scorecards and executive evaluations and allocate a dedicated budget as a strategic investment. This requires ongoing commitment and leadership from the top. By approaching data governance with foresight, diligence, and ethical principles, your leadership can set the standard for your organization and industry, shaping the future of data-driven business.
Conclusion: Become a Data Governance Leader
Effective data governance is an ongoing process, not an end state. But the executives that commit to this journey can unlock significant competitive advantages.
You can transform your data into a strategic asset by addressing the core pillars of visibility, access, quality, and ownership. This is your opportunity to future-proof your organization and set the standard for your industry.