The EU AI Act is often discussed in terms of model behavior—transparency, accuracy, and human oversight. However, for the data leaders and compliance officers tasked with operationalizing the regulation, the real challenge lies deeper in the stack. It lies in the data layer.
This post provides a data compliance lens on the EU AI Act. We are moving beyond legal theory to discuss exactly what you must do with your datasets, unstructured repositories, governance frameworks, and audit trails to meet these new obligations.
This guide is designed for AI providers building models, deployers utilizing them in high-stakes environments, and the risk, compliance, and data governance professionals responsible for keeping the organization out of the crosshairs of regulators. To manage these risks effectively at scale, you need a robust AI Governance Platform that provides visibility into the data fueling your algorithms.
EU AI Act Basics (In 5 Minutes): Risk Tiers + Who Has Obligations
The EU AI Act does not treat all software equally. It applies a risk-based approach, meaning the stricter the risk to fundamental rights, the stricter the rules.
Risk tiers overview
- Unacceptable Risk: These systems are banned outright because they pose a clear threat to safety or rights. Examples include social scoring by governments, real-time remote biometric identification in public spaces by law enforcement (with narrow exceptions), and cognitive behavioral manipulation.
- High Risk: This is where the bulk of data compliance work lies. These systems are permitted but subject to heavy compliance obligations before they can enter the market. This includes AI used in critical infrastructure, education, employment, credit scoring, and law enforcement.
- Limited Risk: Systems with specific transparency risks, such as chatbots or deepfakes. Users must be informed they are interacting with a machine.
- Minimal Risk: The vast majority of AI systems (e.g., spam filters, video games) fall here and face no new obligations.
For a deeper dive into the broader implications for US businesses, read our guide on Understanding the European Union’s Artificial Intelligence (AI) Act.
Roles: provider vs deployer (and why it changes your checklist)
Data governance obligations differ significantly depending on your role:
- Provider: You develop the AI system or have it developed and place it on the market under your own name. You bear the brunt of data quality and technical documentation requirements.
- Deployer: You use an AI system under your authority in the course of your professional activity. You must ensure the data you feed into the system is relevant and sufficiently representative for your specific context of use.
Compliance Timeline: The Dates Your Program Plan Should Anchor On
The regulation was published in the Official Journal (Regulation (EU) 2024/1689) in July 2024. While the Act entered into force in August 2024, its requirements apply in phases. Your compliance program should work backward from these milestones:
| Milestone | Date | Requirement |
| 6 Months | Feb 2025 | Prohibitions on “Unacceptable Risk” AI systems apply. |
| 12 Months | Aug 2025 | Rules for General Purpose AI (GPAI) models apply. |
| 24 Months | Aug 2026 | Full application, including rules for High-Risk AI systems (Annex III). |
| 36 Months | Aug 2027 | Obligations for High-Risk systems already regulated under other EU product safety laws (Annex I). |
The Data Compliance Core: Article 10 “Data and Data Governance” (What It Demands)
For High-Risk AI systems, Article 10 is the center of gravity for data leaders. It explicitly states that training, validation, and testing datasets must be subject to appropriate data governance and management practices.
In plain English, you cannot simply scrape data from your file shares and dump it into a model. Article 10 demands:
- Relevance and Representativeness: Datasets must be sufficiently representative and free of errors to the best extent possible.
- Bias Mitigation: You must process data to detect and correct biases that could lead to discrimination.
- Governance Lifecycle: Governance is not a one-time check. It covers the entire lifecycle—from design choices and data collection to data preparation (cleaning, aggregation) and the formulation of assumptions.
Turning EU AI Act Data Requirements into Operational Controls
To comply with Article 10 and related transparency requirements, organizations must translate legal text into technical controls.
Data inventory + classification
The Goal: Prevent “Shadow AI” and the use of unknown, regulated data in training sets.
The Control: Implement automated discovery to scan unstructured repositories. You must know what data exists (PII, IP, sensitive attributes) and where it lives.
The Evidence: A comprehensive data catalog tagged by sensitivity and business value.
Learn more: Why is Data Classification Important for AI Readiness?
Data lineage + traceability
The Goal: Prove the origin of your model’s knowledge to prevent copyright infringement or the use of poisoned data.
The Control: Map data flows from the source (e.g., SharePoint, S3 bucket) to the model input. Document all transformations.
The Evidence: Lineage diagrams and version control logs for training datasets.
Data quality + representativeness
The Goal: Reduce errors and ensuring the model performs well for all intended user groups.
The Control: Statistical analysis of datasets to identify gaps, outliers, or missing demographic groups relevant to the deployment context.
The Evidence: Data quality assessment reports filed alongside technical documentation.
Bias detection + mitigation workflow
The Goal: Prevent discriminatory outcomes prohibited by EU law.
The Control: Run bias testing on training data. If bias is found, apply remediation (e.g., re-weighting, oversampling) and re-test.
The Evidence: Records of bias testing results and documented decisions on remediation steps taken.
Access control + logging
The Goal: Ensure only authorized personnel can manipulate training data, preventing tampering.
The Control: Role-Based Access Control (RBAC) on all data repositories used for AI. Enable detailed logging of who accessed or modified the data.
The Evidence: Access logs and security policy reviews.
Retention + deletion rules
The Goal: Adhere to data minimization principles and ensuring defensibility.
The Control: Automated policies that delete training data when it is no longer needed or when a data subject exercises their “right to be forgotten.”
The Evidence: Certificates of destruction and retention policy configurations.
Third-party and vendor datasets
The Goal: Ensure purchased or external data meets the same high standards.
The Control: Contractual requirements for data provenance and quality checks on incoming third-party data.
The Evidence: Vendor due diligence reports and data sharing agreements.
For a deeper dive on managing these controls across disparate sources, explore our Centralized Data Management solutions.
EU AI Act Compliance Checklist (Data-Focused)
Checklist for AI providers (build/ship AI systems)
- Inventory systems + datasets: Maintain an up-to-date map of all data sources feeding your development pipeline.
- Define dataset governance: Establish written policies for data collection, cleaning, and labeling.
- Validate quality + bias controls: Perform statistical checks to ensure datasets are relevant, representative, and error-free.
- Maintain technical documentation: Generate detailed technical files (Annex IV) that describe the data used for training, validation, and testing.
Checklist for deployers (use AI systems in operations)
- Inventory usage + outputs: Track where High-Risk AI is deployed and what data it processes.
- Validate data sources used in deployment: Ensure the input data you feed the model is relevant to the system’s intended purpose (input data must match the training logic).
- Establish monitoring + incident workflows: Monitor for “drift” where data patterns change over time, potentially altering the system’s risk profile.
- Maintain governance evidence: Keep logs of operation to ensure traceability of results.
Build an “Evidence Pack” So Compliance Is Provable (Not Just Aspirational)
Regulators do not operate on trust; they operate on evidence. If audited, your team must be able to produce an “Evidence Pack” quickly. This should include:
- Dataset Documentation: Summaries of data provenance, scope, and characteristics.
- Data Governance Policy + RACI: Who owns the data, and what rules govern it?
- Audit Logs: Proof of who accessed the data and model parameters.
- Change History: A record of how training datasets have evolved over time.
- Bias Remediation Records: Proof that you tested for bias and took steps to fix it.
Tools like Comply360 are essential for automating the retention of these artifacts, ensuring you are audit-ready without manual scrambling.
The Hidden Risk: Unstructured Data Repositories as AI Governance Blind Spots
Many organizations focus their AI governance efforts on structured databases. However, the unstructured data—documents, PDFs, emails, and chats living in file shares and cloud drives—often presents the greatest risk.
This unstructured data is frequently used to ground RAG (Retrieval-Augmented Generation) models or fine-tune LLMs. Yet, it often lacks the metadata and classification required by the EU AI Act. If your AI tool accesses a SharePoint site containing sensitive PII or unvetted legacy data, you are immediately non-compliant regarding data quality and governance.
Organizations must shine a light on these blind spots. You cannot govern what you cannot see. Compliance requires extending your data inventory and classification efforts into the unstructured “wild west” of your data estate.
How Congruity360 Supports EU AI Act Data Compliance
Congruity360 acts as the practical enabler of AI compliance by focusing on data readiness. We help organizations move beyond theoretical frameworks to defensible action.
- Discover + Classify: Our Classify360 Platform identifies sensitive, regulated, and high-risk data across your unstructured repositories, ensuring you know exactly what is feeding your AI.
- Reduce Risk: We enable policy-based actions to quarantine, delete, or secure data that should not be exposed to AI models.
- Support Audit Readiness: We provide the evidence trails and reporting necessary to prove that your data governance practices meet Article 10 requirements.
FAQs
When does the EU AI Act apply to my company?
The Act applies to any organization that places AI systems on the EU market or puts them into service in the EU, regardless of where the company is headquartered. If your AI output is used in the EU, you are likely in scope.
What is Article 10 about?
Article 10 outlines the data governance requirements for High-Risk AI systems. It mandates that training, validation, and testing datasets be subject to quality management, bias mitigation, and detailed documentation.
What data governance do we need for high-risk AI?
You need a governance framework that covers the entire data lifecycle: collection, cleaning, labeling, aggregation, and storage. You must also prove data is relevant, representative, and error-free.
How do we prove compliance during an audit?
You prove compliance by maintaining technical documentation (Annex IV) and automatically generated logs. You must be able to show the provenance of your data and the controls applied to it.
Does the EU AI Act cover unstructured data?
Yes. The Act applies to the datasets used to train or test AI, regardless of format. Unstructured data used in RAG models or training sets is fully in scope.
What are the penalties for non-compliance?
Penalties are severe. Non-compliance with prohibited AI practices can lead to fines of up to €35 million or 7% of global turnover. Violation of data governance obligations (Article 10) can result in fines up to €15 million or 3% of turnover.
Moving From Regulation to Data Readiness
The EU AI Act is a signal that the era of “move fast and break things” is over for AI data. The future belongs to organizations that treat data not just as a resource, but as a regulated asset requiring strict oversight. By focusing on deep visibility, classification, and operational control of your unstructured data today, you can turn compliance from a roadblock into a competitive advantage.




