Field Engineering Notes — CNC Data Security Platform

Evidence Classification Model

Every item on this page carries one or more of the following tags. Use these to decide how much weight to place on a recommendation before acting.

Official

Microsoft documentation or vendor product documentation. Highest trust level.

Field-tested

Experienced engineer, MVP, consultant, or operational deployment guidance.

Anecdotal

Useful but not independently validated. Treat as a signal, not a fact.

Tenant-dependent

Depends on license, rollout, SKU, preview state, region, or tenant setting.

Risk

Could create operational, privacy, cost, or audit problem if ignored.

Actionable

Should result in a task, test, control update, or runbook change.

Known Problems Matrix

Domain	Known Problem	Why It Matters	Recommended Action	Evidence	Confidence
UAL / O365 Management Activity	UAL is an audit evidence stream, not a complete investigation model. Events can be delayed, nested, or missing investigation state.	Does not reliably contain incident status, analyst assignment, false-positive classification, MTTR, or complete entity graph.	Use UAL for audit evidence. Enrich with Defender XDR incidents, Advanced Hunting, Sentinel, and workflow data for operational KPIs.	OfficialField-tested	High
Defender Advanced Hunting	30-day query window, 100K row limit, API rate constraints, 3-minute timeout per query.	Not a permanent KPI warehouse. Scheduled or on-demand queries will miss historical data if not materialized.	Use for enrichment and scheduled extraction. Materialize daily summaries into Sentinel/Log Analytics, ADX, Power BI dataset, or Splunk if 90-day or monthly executive reporting is required.	Official	High
DataSecurityEvents	High-value Purview security fact table — but availability, schema, and retention must be validated in each tenant.	May be the best available Purview DLP/security fact source, but cannot be assumed Live without tenant testing. Preview status means schema and availability can change.	Add as preferred enrichment target where available. Validate licensing (IRM opt-in required), retention period, schema completeness, and field reliability before marking Live.	OfficialTenant-dependent	Medium/High
Sentinel Ingestion Cost	Raw Defender/M365 event ingestion can become expensive without discipline.	Ingesting every available raw table creates uncontrolled cost without proportional reporting value. Log Analytics billing is volume-based.	Start with Defender XDR incidents/alerts and selected high-value event tables. Use data collection rules (DCRs), transformations, summary rules, and watchlists to reduce cost and noise before adding raw tables.	OfficialField-testedRisk	High
Splunk Ingestion	API-based collection is easier to configure but rate-limited. Event Hub and storage paths scale better but increase Azure and Splunk operational complexity.	Splunk engineers receiving raw Purview exhaust without context will encounter sparse, nested, inconsistent events that require extensive parsing to produce enterprise reporting.	Send curated Microsoft control facts, incident summaries, and KPI outputs to Splunk rather than raw Purview exhaust. Normalize first in Sentinel/KQL, then forward curated records.	Field-testedRiskActionable	High
DLP False Positives	Keyword-heavy and broad SIT policies generate high false-positive alert volumes.	Alert workflows become operationally useless if built on noisy policy design. Analysts stop responding. Block enforcement over a noisy policy causes user friction and business escalations.	No policy goes to Block until it has: baseline event volume, top false-positive clusters documented, business-owner review, exception design, user impact assessment, rollback plan, and executive-approved enforcement threshold.	OfficialField-testedActionable	High
Sensitivity Label Overuse	Over-labeling everything Confidential or Restricted destroys signal quality.	DLP, reporting, and investigation lose precision. Label volume becomes meaningless noise. KPIs based on "labeled vs. unlabeled" become unreliable.	Add label/SIT mismatch KPI, over-labeling trend report, and user/site-level label distribution report. Use label adoption analytics to find where auto-labeling is over-firing.	Field-testedActionable	High
Data Map Labels	Data Map sensitivity labels for many non-M365 sources are metadata-only labels — not file-applied protection.	Reports may incorrectly claim files are "protected" when only catalog metadata is labeled. Encryption and access control are not applied to the content object.	Distinguish "catalog metadata labeled" (Data Map) from "content object protected" (Information Protection / MIP label applied to file). Never report "file protected" unless protection is applied to the content object itself.	OfficialRisk	High
Copilot DLP	Copilot DLP policy location and coverage are tenant, version, and content dependent. Coverage for file/email items has specific date and licensing requirements.	Organizations may overstate what Copilot DLP controls actually enforce. The DLP location for Copilot supports file items and emails sent on or after January 1, 2025 — earlier content is not covered.	Validate supported file/email coverage, licensing, tenant rollout status, and actual policy behavior in a test tenant before marking this control Live. Do not promise universal Copilot grounding control.	OfficialTenant-dependentRisk	Medium/High
Endpoint DLP	Browser, OS, app, and activity coverage can vary significantly across configurations.	A rule that works in Edge may not behave identically in another browser, on an unmanaged endpoint, or for a specific app group. Gap between policy intent and actual enforcement creates audit risk.	Pilot by OS, browser, activity type, app group, and device group. Record actual observed behavior from Activity Explorer before stating a policy is operational. Document untested combinations explicitly.	OfficialField-testedActionable	High
On-Premises Scanning	Self-hosted Integration Runtime (SHIR), Data Map scanning, Information Protection scanner, and on-premises DLP repositories are frequently confused.	Engineers may choose the wrong component for a given scenario — leading to either missing coverage or wasted deployment effort.	Use this split: SHIR → Data Map scan of supported on-prem/private sources Information Protection scanner → File-share / SharePoint Server labeling and protection Purview DLP on-premises → DLP enforcement on on-premises repositories at rest Document which component is deployed and what it covers.	OfficialActionable	High
PII in SIEM Logs	DLP audit logs can contain matched content fragments — literal sensitive values — making the SIEM a regulated data store.	Sending raw matched values to Splunk, Sentinel, or Power BI may create a secondary regulated data store subject to the same data-protection requirements as the source system.	Minimize, mask, hash, or drop sensitive matched-value fragments in the export or ingestion pipeline before they land in Sentinel, Splunk, Power BI, or long-term storage. Where raw matched values are required for investigation, restrict access to Purview/Defender investigation surfaces using least-privilege RBAC and audit all access.	RiskActionableField-tested	High

Data Source Reliability Ratings

Not every data source should be treated equally. Use this when deciding which source to cite as evidence for a KPI or control.

Source	Best Use	Key Weakness	Audit evidence	Triage / ops	Executive KPI
Unified Audit Log	Activity evidence trail — who did what, when, on what workload	Weak for investigation lifecycle, triage state, and KPI enrichment	High	Medium	MediumAfter normalization
Defender XDR Incidents	Investigation lifecycle — alert status, assignment, triage, MTTA/MTTR	Not a full raw-event store; investigation state requires analyst activity	Medium	High	HighFor investigation KPIs
Defender Advanced Hunting	Enrichment and 30-day operational detail — entity correlation, SIT hits, label activity	30-day window, row/rate/timeout limits; not persistent KPI store	HighWithin 30-day window	High	MediumOnly if materialized
DataSecurityEvents	Purview data-security policy-violation facts enriched with SIT, label, user context	Preview status; IRM opt-in required; schema and retention must be validated per tenant	HighIf schema is stable	High	Medium/HighValidate availability first
Sentinel Summary Tables	Pre-aggregated KPI outputs, normalized control facts, historical trending	Requires upfront engineering discipline to build and maintain summary rules	High	High	High
Power BI	Executive presentation, trend visualization, scheduled refresh, governed sharing	Not an authoritative source — only as reliable as its upstream KQL/data model	Low as source	Low	HighAs presentation layer
Splunk Raw M365 Logs	Enterprise SIEM correlation with non-Microsoft telemetry	Parsing complexity, ingestion noise, cost, and sparse Purview events without normalization	Medium	Medium	MediumUnless normalized
Splunk Curated Facts	Enterprise SIEM correlation and audit reporting on normalized control outputs	Reliability depends on upstream Sentinel/KQL normalization quality	High	High	High

Field Validation Checklist — Before Marking Live

Every major control, KPI, or dashboard must pass all items before being marked Live. State is saved in your browser.

— / 14 complete

Microsoft documentation verified for this control or KPI
Tenant licensing confirmed (SKU, add-on, preview opt-in)
Feature confirmed available in this tenant
Test policy deployed in audit / report-only mode
Event observed in Microsoft Purview portal
Alert observed in Microsoft Defender XDR
Event observed in Sentinel / Log Analytics or approved SIEM
Required fields parsed successfully (no nulls on critical columns)
False-positive sample reviewed with business owner
Business owner reviewed enforcement impact
Known limitations documented
Owner assigned with refresh cadence defined
Rollback or remediation path defined
Evidence package linked (SharePoint, ticket, or audit note)

Recommendation Confidence Model

Use confidence levels to decide how much validation is needed before acting on a recommendation.

High Confidence

Official Microsoft documentation confirmed and observed in tenant validation. Implement with standard change management. Document evidence.

Medium Confidence

Official documentation exists but tenant validation is still pending. Implement in test/audit mode first. Validate before marking Live.

Low Confidence

Anecdotal or community guidance only. Useful as a signal or hypothesis. Do not implement without independent verification and documentation.

Do Not Implement

Known risk, unsupported capability, or feature not validated in this tenant. Document the gap instead. Escalate if blocking a required control.

Recommendation	Confidence	Basis
Use Defender XDR as primary DLP alert investigation surface	High	Official Microsoft-recommended location for DLP alert investigation
Use Sentinel / Log Analytics / KQL for Microsoft-native telemetry normalization	High	Official Native Defender XDR connector; KQL semantic model
Use Power BI for executive and audit dashboards over curated KQL outputs	High	Official Microsoft-documented Log Analytics → Power BI integration
Send curated control facts and KPI summaries to Splunk — not raw Purview exhaust	High	Field-tested Reduces raw-log parsing burden and ingestion cost
Use DataSecurityEvents as primary Purview DLP fact table in Advanced Hunting	Medium	OfficialTenant-dependent High-value but validate availability, schema, retention
Use Copilot DLP policy location as a mature, production control	Medium	OfficialTenant-dependent Feature is current but scope/coverage must be validated per tenant
Start DLP enforcement in Block mode before audit baseline is established	Do Not Implement	Risk High false-positive risk; user disruption; business escalation without baseline data
Report Data Map catalog labels as "file protected" without confirming MIP encryption	Do Not Implement	Risk Data Map labels are metadata-only for many non-M365 assets — not content protection
Use Reddit / community forum posts as implementation evidence in audit deliverables	Low	Anecdotal Useful practitioner signal; not authoritative; must be validated independently

Final Architecture Position

Recommended Architecture — Microsoft-Core, Splunk-Consumer

Purview defines controls: DLP policies, sensitivity labels, SITs, retention, Insider Risk, and DSI. It is the control plane, not the reporting plane.

Defender XDR enriches and manages DLP alerts and incidents. It is the authoritative investigation surface for alert lifecycle, triage status, MTTA, MTTR, and false-positive classification.

Sentinel and Log Analytics normalize Microsoft security telemetry with KQL semantic functions, Watchlists, analytic rules, and Sentinel Workbooks for engineering and SOC operations.

Power BI presents executive and audit dashboards over curated KQL outputs from Log Analytics. It is the executive reporting plane — not a query engine over raw security logs.

Splunk consumes curated control facts, incident summaries, and KPI outputs for enterprise SIEM correlation. Splunk should not be forced to reverse-engineer raw Purview audit logs into executive DLP reporting. That is a source-model problem, not a Splunk failure.

Evidence Boundary Standard

⚠️ A dashboard is not evidence by itself

A screenshot of a dashboard, a PDF export, or a summary number in a report is not audit-defensible evidence on its own. Evidence requires all of the following components to be documented:

Source system — where the data originated (Purview, Defender XDR, Sentinel, UAL, Splunk)
Query logic — the KQL function, SPL search, or report definition used to produce the output
Refresh timestamp — when the data was last updated and what the refresh cadence is
Owner — the named person responsible for the control and the report
Control objective — what the report is intended to prove
Retention period — how long the evidence is kept and where it is stored
Known limitations — explicit documentation of what the report cannot prove
Maturity state — Blank / Partial / Live — never hidden

A KPI report showing partial data or zero events is still valid audit evidence — it proves the control framework exists and is being measured, even if the data pipeline is not yet fully operational. Do not suppress or hide partial reports. Document the gap, assign an owner, and track it in the KPI maturity register.