Microsoft Purview Data Governance provides a unified data catalog and data map that discovers, classifies, and inventories data assets across on-premises, multi-cloud, and SaaS sources. This layer answers the foundational question: where is sensitive data, and what kind is it?
Governance discovery feeds directly into the protection stack β labels applied by governance scanning inform DLP enforcement, and posture gaps surfaced by scanning inform DSPM risk scoring. The governance layer is discovery-focused; it does not enforce controls directly.
Managed by the Compliance team. Focuses on regulatory requirements: HIPAA, HITECH, PCI DSS, GDPR, SOX. Outputs are DLP policies, retention policies, audit reports, and eDiscovery holds. Primary tool: Purview Compliance Portal.
- DLP policy ownership
- Retention and records management
- eDiscovery and litigation hold
- Audit log review and reporting
Managed by the Data team. Focuses on data discoverability, cataloging, lineage, and classification consistency across the enterprise data estate. Primary tool: Purview Data Catalog / Unified Governance Portal.
- Data Map scanning and registration
- Data Catalog ownership and glossary
- Data classification at asset level
- Business glossary and lineage
The Purview Data Map is the foundational metadata store that registers, scans, and classifies data sources. It creates a living inventory of all data assets β tables, files, reports, databases β with sensitivity classifications, ownership, and lineage attached.
Supported Source Types
Azure Data Lake, Azure SQL, Azure Blob, Synapse Analytics, Azure Cosmos DB, Azure Data Factory lineage
SQL Server, Oracle, SAP HANA, Teradata, file shares (Windows/Linux) via Self-Hosted Integration Runtime (SHIR)
Amazon S3, Google Cloud Storage, Snowflake. Cross-cloud scanning requires network connectivity and registered credentials.
SharePoint, OneDrive, Exchange (email body and attachments). Scanned natively β no SHIR required.
Power BI, Salesforce, SAP S/4HANA, Erwin, Looker. Requires connector registration and credential management.
Windows file servers with DFSR replication scanned via SHIR with MIP scanner agent deployed on member servers.
| Step | Action | Notes |
|---|---|---|
| 1 | Register SHIR in Purview portal | Integration Runtimes β New β Self-Hosted. Download installer key. |
| 2 | Install SHIR on dedicated Windows Server | Minimum: Windows Server 2016+, 4 vCPU, 8GB RAM. Isolated from domain controller. |
| 3 | Configure firewall outbound rules | Allow HTTPS (443) to purview.azure.com and servicebus.windows.net. No inbound required. |
| 4 | Register data source in Data Map | Sources β Register β select source type. Provide SHIR as runtime. |
| 5 | Create and run scan | Set classification rules, scanning scope, trigger (manual or scheduled). Review scan report. |
| 6 | Review classifications in Catalog | Asset details show detected sensitive info types and applied labels. Validate accuracy. |
Every scanned source populates the catalog with asset entries. Each asset has: schema, sensitivity classification, owner, glossary terms, lineage, and scan history. Assets are searchable across the entire data estate.
Canonical definitions for business terms β linked to catalog assets. Ensures consistent vocabulary across data teams. Governance stewards own term definitions. Examples: "Member Account," "PII Asset," "Regulated Data."
Shows data flow from source to consumption: raw files β ETL pipelines β data warehouse β reports. Lineage enables impact analysis β who consumes a dataset and what breaks if it changes.
Classification results should be reviewed quarterly. False positives (e.g., test data classified as PII) should be corrected and the scan rule tuned. False negatives require SIT or classifier adjustment.
- Start with targeted scopes (specific folders, databases) rather than full-system scans
- Schedule scans during off-peak hours β scans can be I/O intensive on source systems
- Run full scans monthly; incremental scans weekly for active repositories
- Review scan failure logs β failed scans silently miss assets without alerting by default
- Every asset should have a declared data owner β enforce this via catalog policy
- Owners are responsible for classification accuracy β not the security team
- Build a quarterly data owner review process β catalog ownership decays without maintenance
- Use Collections in the catalog to align assets with business domains and assign ownership at scale