Module 03 β€” Governance & Discovery

Data Governance

Purview Data Map Β· Data Catalog Β· Classification scanning Β· SHIR on-prem onboarding Β· Two-lane governance model

What Is Data Governance in Purview?

Microsoft Purview Data Governance provides a unified data catalog and data map that discovers, classifies, and inventories data assets across on-premises, multi-cloud, and SaaS sources. This layer answers the foundational question: where is sensitive data, and what kind is it?

Governance discovery feeds directly into the protection stack β€” labels applied by governance scanning inform DLP enforcement, and posture gaps surfaced by scanning inform DSPM risk scoring. The governance layer is discovery-focused; it does not enforce controls directly.

Two-Lane Governance Model
πŸ›οΈ Lane 1 β€” Compliance Governance

Managed by the Compliance team. Focuses on regulatory requirements: HIPAA, HITECH, PCI DSS, GDPR, SOX. Outputs are DLP policies, retention policies, audit reports, and eDiscovery holds. Primary tool: Purview Compliance Portal.

  • DLP policy ownership
  • Retention and records management
  • eDiscovery and litigation hold
  • Audit log review and reporting
πŸ—ΊοΈ Lane 2 β€” Data Asset Governance

Managed by the Data team. Focuses on data discoverability, cataloging, lineage, and classification consistency across the enterprise data estate. Primary tool: Purview Data Catalog / Unified Governance Portal.

  • Data Map scanning and registration
  • Data Catalog ownership and glossary
  • Data classification at asset level
  • Business glossary and lineage
Purview Data Map

The Purview Data Map is the foundational metadata store that registers, scans, and classifies data sources. It creates a living inventory of all data assets β€” tables, files, reports, databases β€” with sensitivity classifications, ownership, and lineage attached.

Supported Source Types

☁️ Azure

Azure Data Lake, Azure SQL, Azure Blob, Synapse Analytics, Azure Cosmos DB, Azure Data Factory lineage

🏒 On-Premises

SQL Server, Oracle, SAP HANA, Teradata, file shares (Windows/Linux) via Self-Hosted Integration Runtime (SHIR)

☁️ Multi-Cloud

Amazon S3, Google Cloud Storage, Snowflake. Cross-cloud scanning requires network connectivity and registered credentials.

πŸ“ Microsoft 365

SharePoint, OneDrive, Exchange (email body and attachments). Scanned natively β€” no SHIR required.

πŸ’Ό SaaS

Power BI, Salesforce, SAP S/4HANA, Erwin, Looker. Requires connector registration and credential management.

πŸ–₯️ DFSR / File Shares

Windows file servers with DFSR replication scanned via SHIR with MIP scanner agent deployed on member servers.

SHIR On-Premises Onboarding
What is SHIR? The Self-Hosted Integration Runtime (SHIR) is a data movement agent installed on an on-premises or private network server. It enables the Purview Data Map to scan on-premises sources without exposing them to the public internet.
StepActionNotes
1Register SHIR in Purview portalIntegration Runtimes β†’ New β†’ Self-Hosted. Download installer key.
2Install SHIR on dedicated Windows ServerMinimum: Windows Server 2016+, 4 vCPU, 8GB RAM. Isolated from domain controller.
3Configure firewall outbound rulesAllow HTTPS (443) to purview.azure.com and servicebus.windows.net. No inbound required.
4Register data source in Data MapSources β†’ Register β†’ select source type. Provide SHIR as runtime.
5Create and run scanSet classification rules, scanning scope, trigger (manual or scheduled). Review scan report.
6Review classifications in CatalogAsset details show detected sensitive info types and applied labels. Validate accuracy.
Data Catalog
Asset Registration

Every scanned source populates the catalog with asset entries. Each asset has: schema, sensitivity classification, owner, glossary terms, lineage, and scan history. Assets are searchable across the entire data estate.

Business Glossary

Canonical definitions for business terms β€” linked to catalog assets. Ensures consistent vocabulary across data teams. Governance stewards own term definitions. Examples: "Member Account," "PII Asset," "Regulated Data."

Lineage Tracking

Shows data flow from source to consumption: raw files β†’ ETL pipelines β†’ data warehouse β†’ reports. Lineage enables impact analysis β€” who consumes a dataset and what breaks if it changes.

Classification Accuracy Review

Classification results should be reviewed quarterly. False positives (e.g., test data classified as PII) should be corrected and the scan rule tuned. False negatives require SIT or classifier adjustment.

Governance Best Practices
  • Start with targeted scopes (specific folders, databases) rather than full-system scans
  • Schedule scans during off-peak hours β€” scans can be I/O intensive on source systems
  • Run full scans monthly; incremental scans weekly for active repositories
  • Review scan failure logs β€” failed scans silently miss assets without alerting by default
  • Every asset should have a declared data owner β€” enforce this via catalog policy
  • Owners are responsible for classification accuracy β€” not the security team
  • Build a quarterly data owner review process β€” catalog ownership decays without maintenance
  • Use Collections in the catalog to align assets with business domains and assign ownership at scale
References
CNC Data Security Platform Β· Module 03

Data Governance

Purview Data Map Β· Data Catalog Β· SHIR Onboarding Β· Two-Lane Model Β· Classification Scanning
Two-Lane Governance Model

πŸ›οΈ Lane 1 β€” Compliance

  • Managed by Compliance team
  • DLP Β· Retention Β· eDiscovery Β· Audit
  • Regulatory: HIPAA, HITECH, PCI DSS, GDPR, SOX
  • Primary: Purview Compliance Portal

πŸ—ΊοΈ Lane 2 β€” Data Asset

  • Managed by Data team
  • Data Map Β· Catalog Β· Lineage Β· Glossary
  • Classification at asset level
  • Primary: Purview Unified Governance Portal
Data Map Source Coverage

Cloud

Azure Data Lake, SQL, Blob, Synapse, Snowflake, Amazon S3, GCS

On-Premises

SQL Server, Oracle, SAP HANA, Teradata, Windows/Linux file shares via SHIR

Microsoft 365

SharePoint, OneDrive, Exchange, Power BI (no SHIR needed)

SHIR Onboarding β€” 6 Steps

Steps 1–3: Deploy

  • Register SHIR in Purview portal, download key
  • Install on Windows Server 2016+, 4 vCPU, 8GB RAM
  • Configure outbound HTTPS 443 to purview.azure.com

Steps 4–6: Scan

  • Register data source, assign SHIR runtime
  • Create scan with classification rules and scope
  • Review catalog assets and validate classifications
References: learn.microsoft.com/en-us/purview/purview-portal  Β·  learn.microsoft.com/en-us/purview/manage-integration-runtimes