Splunk Reporting Architecture — CNC Data Security Platform

📋 SPL Query Pack: 15 sections of ready-to-paste SPL — raw index validation, field extraction, fact table builders, KPI aggregations, and dashboard panel queries for all 7 fact families. Open Query Pack →

Recommended Architecture — Dual-Ingestion Model

Add-on 1

Splunk Add-on for Microsoft Office 365

Collects audit evidence: DLP.All, Audit.Exchange, Audit.SharePoint, Audit.General, Service Health, Message Trace, Entra ID metadata. This is the audit layer — not the complete investigation layer.

Add-on 2

Splunk Add-on for Microsoft Security

Collects Defender XDR incidents and alerts mapped to Splunk CIM. Provides: incident ID, severity, status, classification, assigned owner, detection source, evidence entities, related alerts. This is the investigation layer.

Enrichment Plane

Defender Advanced Hunting API

Scheduled AH exports or API pulls produce enriched datasets. 30-day lookback, 10K rows per API response (not 100K — common misconception; use pagination or incremental pulls), rate-limited, 3-minute query timeout. Treat DataSecurityEvents as Preview — requires IRM opt-in.

Data Flow

Source · Audit & Policy Plane

🏛️ Microsoft Purview / M365

Office 365 Management Activity API Audit.Exchange Audit.SharePoint Audit.General DLP.All

Source · Investigation & Enrichment Plane

🛡️ Microsoft Defender XDR

Incidents / Alerts API Advanced Hunting API Scheduled pulls · 10K rows/response · 3 min timeout

Splunk Add-on 1

Splunk Add-on for Microsoft Office 365

Splunk Add-on 2 + Enrichment

Splunk Add-on for Microsoft Security & Scheduled AH Export

Raw Audit / DLP Indexes

idx_ms_o365_audit idx_ms_o365_dlp idx_ms_service_health

Raw Defender / Hunting Indexes

idx_ms_defender_incidents idx_ms_defender_alerts idx_ms_defender_hunting

Layer 2 · Normalization

Purview Control Facts

idx_purview_control_facts

Layer 3 · KPI Summary Indexes

KPI Mart Indexes / Data Models

kpi_purview_health_daily kpi_dlp_effectiveness_daily kpi_investigation_operations_daily kpi_exec_control_score_monthly kpi_audit_evidence_monthly

🔧

Engineering

Index health · pipeline ops · schema drift

🔍

Investigation

Alert triage · MTTA/MTTR · SOC queue

📊

Executive

Control scores · compliance · audit evidence

Important caveat on the O365 Add-on: The audit feed proves that events occurred and supports control-operation evidence, but it often lacks the analyst-friendly context leadership expects — clean alert lifecycle, triage status, risk classification, ownership, incident grouping, and fully normalized policy-action fields. That context comes from Defender XDR.

Three Splunk Data Layers

Layer 1 — Raw Microsoft Telemetry Keep raw JSON. Do not over-transform too early.

Index	Source
idx_ms_o365_audit	Office 365 Management Activity API — `Audit.Exchange`, `Audit.SharePoint`, `Audit.General`
idx_ms_o365_dlp	`DLP.All` feed
idx_ms_o365_retention	UAL retention policy, disposition, record declaration events (RecordType 50)
idx_ms_purview_insider_risk	UAL InsiderRiskManagement events (RecordTypes 306, 307, 308)
idx_ms_defender_incidents	Defender XDR incidents
idx_ms_defender_alerts	Defender XDR alerts
idx_ms_defender_hunting	Scheduled Advanced Hunting query outputs
idx_ms_service_health	O365 service health / communications

Layer 2 — Normalized Purview Control Facts idx_purview_control_facts · purview_summary

Each row represents one control-relevant fact. idx_purview_control_facts is the unified normalized fact store for cross-family correlation and dashboard queries. purview_summary is the operational summary index populated by daily collect searches (see SPL Query Pack §8) — it feeds the KPI marts. Both serve Layer 2; use purview_summary as the write target for scheduled searches and idx_purview_control_facts as the curated read target for dashboards.

Field	Meaning
`event_time`	When the event happened
`ingest_time`	When Splunk received it
`source_plane`	UAL, DefenderIncident, DefenderAlert, AdvancedHunting
`workload`	Exchange, SharePoint, OneDrive, Teams, Endpoint, Browser, etc.
`policy_name`	DLP / label / IRM policy
`rule_name`	DLP rule
`rule_action`	Audit, notify, warn, block, restrict, override, etc.
`enforcement_mode`	Audit, warn, block, allow, etc.
`user_upn`	Actor
`recipient_domain`	External destination where available
`file_name` / `file_extension`	Content object and type
`sensitivity_label`	Label at time of event
`sit_names` / `sit_count`	Sensitive info types matched + count
`confidence`	SIT confidence where available
`alert_id` / `incident_id`	Defender / Purview alert and incident linkage
`severity`	Alert severity
`status`	Alert / incident status
`classification`	TP / FP / benign / unknown
`ticket_id`	ServiceNow / Jira etc.
`maturity_status`	Blank / Partial / Live

Layer 3 — KPI Marts Prevent dashboards from repeatedly parsing nested JSON

Mart	Audience
kpi_purview_health_daily	Engineering
kpi_dlp_effectiveness_daily	Engineering / SOC
kpi_investigation_operations_daily	Investigations / SOC
kpi_retention_lifecycle_daily	Records Management / Compliance
kpi_insider_risk_daily	Investigations — privacy-controlled; RBAC required
kpi_exec_control_score_monthly	Executive
kpi_audit_evidence_monthly	Audit / Compliance

Lookup Tables

📋 label_taxonomy_lookup

label_id
label_name
reporting_label_family
protection_level
encryption_expected
external_sharing_allowed

📋 sit_family_lookup

sit_name
sit_family
regulated_data_type
executive_category
severity_modifier

📋 dlp_policy_lookup

policy_name
policy_owner
control_objective
workload_scope
deployment_status
expected_action
executive_category

📋 kpi_maturity_lookup

kpi_name
data_source
maturity_status
owner
known_gap
remediation_plan

KPI Recommendations by Audience

Every KPI must have a source system, owner, refresh cadence, and maturity state. Do not mark a KPI Live unless source, refresh, parsing, owner, and dashboard validation are all complete.

Engineering KPIs

KPI	Source	Maturity	Notes
DLP event ingestion freshness	Splunk `_indextime` vs event time	LIVE	Proves pipeline health
UAL subscription / content availability	O365 Management Activity API	LIVE	Detects broken audit feed
DLP policy match volume	`DLP.All`, Advanced Hunting	LIVE	Basic activity control
Rule action distribution	UAL + AH enrichment	PARTIAL	Needs parsing / enrichment
SIT confidence distribution	AH / Purview event details	PARTIAL	May be inconsistent by workload
Label usage by workload	UAL + label events	PARTIAL	Requires label taxonomy mapping
Auto-labeling activity trend	UAL / Purview classification events	PARTIAL	Stronger with Activity Explorer export/API
SHIR / scanner health	Purview governance / Data Map telemetry	BLANK	Separate source — do not fake it
OCR pipeline status	Control register	BLANK	Mark "not deployed" until telemetry exists

Investigation KPIs

KPI	Source	Maturity	Notes
Alert volume by severity	Defender XDR alerts	LIVE	Clean executive / SOC metric
Incident volume by status	Defender XDR incidents	LIVE	Operational backlog
Triage queue depth	Defender incidents (Active + In Progress)	LIVE	Real-time operational view
Aging by severity	Defender incidents	LIVE	Critical for audit defensibility
Repeat offender users / entities	UAL + Defender evidence	LIVE	Strong SOC value
Mean time to triage (MTTA)	Defender incident status + assignment	PARTIAL	Requires lifecycle event capture
Mean time to resolve (MTTR)	Defender incident resolved timestamp	PARTIAL	Better via incident API
False-positive rate	Defender classification	PARTIAL	Requires analysts to classify
Top exfiltration vectors	DLP events + workload / action	PARTIAL	Needs normalization
Ticket creation latency	Defender alert time + ticket time	PARTIAL	Requires ITSM integration

Retention / Data Lifecycle KPIs

KPI	Source	Maturity	Notes
Retention policies deployed by workload	UAL retention policy events	BLANK	Proves scope coverage
Retention labels applied by label/workload	UAL label application events	BLANK	Proves label adoption
Records declared by workload	UAL record declaration events	BLANK	Records management activity
Items pending disposition review	UAL disposition events	BLANK	Disposition queue backlog
Average disposition review age	UAL disposition timestamps	BLANK	SLA compliance indicator
Disposition approvals / rejections	UAL disposition decision events	BLANK	Control execution evidence
Items eligible for deletion	UAL deletion eligibility events	BLANK	Lifecycle maturity signal
Items deleted after retention	UAL delete execution events	BLANK	Disposal execution proof
Items relabeled during disposition	UAL relabel / RelabelItem events	BLANK	Governance correction evidence
Workloads without retention coverage	Policy scope vs workload inventory	BLANK	Executive risk gap

Insider Risk Management KPIs

All IRM KPIs are aggregate only in shared dashboards. User-level details require approved investigation role access and Splunk RBAC controls.

KPI	Source	Maturity	Notes
IRM alerts by policy	UAL RecordType 307 (InsiderRiskManagementAlert)	BLANK	Policy activity baseline
IRM alerts by risk level	UAL RecordType 307	BLANK	Risk distribution
IRM cases opened	UAL RecordType 308 (InsiderRiskManagementCase)	BLANK	Investigation demand
IRM cases closed	UAL RecordType 308	BLANK	Throughput
IRM case aging	UAL RecordType 308 timestamps	BLANK	Backlog risk indicator
IRM false-positive rate	UAL RecordType 307 + case resolution (308)	BLANK	Tuning quality indicator
IRM risk activity volume	UAL RecordType 306 (InsiderRiskManagement — activity events)	BLANK	Exfiltration signals, policy matches, sequences
IRM escalation rate	UAL RecordTypes 307 → 308 correlation	BLANK	Alert → case promotion rate
IRM policy coverage	Policy inventory	BLANK	Deployment maturity
IRM alert-to-case conversion rate	RecordTypes 307 + 308 correlation	BLANK	Investigation selectivity

Executive KPIs

KPI	Source	Maturity	Notes
Program coverage %	Control inventory + policy scope	PARTIAL	Needs authoritative scope inventory
Protected locations	Policy scope + workload coverage	PARTIAL	Do not infer from events alone
Control Health composite score	KPI mart	PARTIAL	Good exec rollup — see formula below
Risk exposure trend, 90 days	Alerts + DLP events	LIVE once retained	Strong leadership metric
NPI / PCI protected vs exposed	SIT + action outcome	PARTIAL	Needs SIT taxonomy mapping
Block / allow-with-override ratio	DLP enforcement / action	PARTIAL	Key effectiveness metric
Member-data incidents avoided	Blocked / restricted events	PARTIAL	Define carefully — avoid inflated claims
Retention deployment status	UAL / idx_ms_o365_retention	BLANK	Now sourced via retention fact family
IRM program active (T/F)	UAL RecordTypes 306/307/308	BLANK	Confirms IRM is deployed and generating signals

Composite Score Model

Use a simple weighted score. Do not over-engineer it. Keep Health (is the control operating?) and Effectiveness (is the control reducing risk?) as separate axes — this is the correct framing for this project.

Control Health Score

Component	Weight
Ingestion freshness	20%
DLP policy telemetry present	20%
Alert pipeline active	20%
Policy / rule / action parse success	15%
Incident lifecycle completeness	15%
KPI maturity completeness	10%

Health Score = 0.20 × ingestion_freshness_score + 0.20 × dlp_event_presence_score + 0.20 × alert_pipeline_score + 0.15 × parsing_quality_score + 0.15 × incident_lifecycle_score + 0.10 × kpi_maturity_score

Effectiveness Score

Component	Weight
Sensitive events protected by block / restrict / warn	25%
High-risk events reduced over 90 days	20%
False-positive rate reduced	20%
Mean time to triage improved	15%
Repeat offenders reduced	10%
Override rate controlled	10%

Effectiveness Score = 0.25 × protected_events_score + 0.20 × high_risk_reduction_score + 0.20 × fp_reduction_score + 0.15 × mtta_improvement_score + 0.10 × repeat_offender_score + 0.10 × override_rate_score

Dashboard Package — 5 Tiers

Audience — Engineering

Dashboard 1 · Purview Pipeline Health

UAL ingestion freshness
DLP event count by day
Defender alert count by day
Defender incident count by day
API gaps / zero-event days
Duplicate rate
Parse success rate
Events missing policy / rule / action
Splunk source / sourcetype / index status

Audience — Engineering / Security / Compliance

Dashboard 2 · DLP Control Effectiveness

DLP events by workload
DLP events by policy and rule
Actions: audit / warn / block / override
Block / allow-with-override ratio
SIT families: NPI / PCI / PII
SIT confidence distribution
Top external domains
Top users by sensitive activity
Top files by repeat policy hits
Label + SIT mismatch report

Audience — SOC / Investigations

Dashboard 3 · Investigation Operations

Active incidents by severity
Aging incidents by severity
MTTA / MTTR
False-positive rate
True-positive rate
Unassigned incidents
Reopened incidents
Top policies generating FPs
High-volume users
Multi-incident users / files / devices

Audience — Leadership

Dashboard 4 · Executive Control Scorecard

Control Health composite score
Effectiveness composite score
90-day exposure trend
Protected vs exposed sensitive activity
Block / restrict / warn / override ratio
Member-data protection trend
Program coverage by workload
Top 5 control gaps
KPI maturity: Blank / Partial / Live

Audience — Records Management / Compliance

Dashboard 6 · Retention & Lifecycle Management

Retention policy coverage by workload
Retention label activity by label / workload
Records declared — standard & regulatory
Disposition queue — items pending review
Disposition approvals & rejections
Retention extensions
Items relabeled at disposition
Deletion-eligible content
Items deleted after retention
Lifecycle gaps by workload / site / mailbox

Audience — Investigations / Compliance (Privacy-Controlled)

Dashboard 7 · Insider Risk Management

IRM alerts by policy (aggregate)
IRM alerts by risk level
IRM cases by status
Case aging distribution
Alert-to-case conversion rate
False-positive rate trend
Scoped users count trend
Top triggering activity types
Policy coverage & deployment maturity
Privacy-safe executive summary panel

User-identifiable fields restricted by Splunk RBAC. Executive view shows aggregate counts and risk bands only.

Audience — Audit / Compliance

Dashboard 5 · Audit Evidence

Control objective
Data source
Evidence available
Last successful event
Last dashboard refresh
Owner
Gaps + remediation plan
Screenshot / export link
Monthly evidence package status

Reports without data are still evidence that the control framework exists, provided the report clearly shows the expected source, maturity state, gap, owner, and remediation path.

Minimum Viable Implementation — 4 Phases

Phase 1

Ingest and Prove Telemetry

Enable / verify Unified Audit Log
Configure Splunk Add-on for Microsoft Office 365
Ingest DLP.All, Audit.Exchange, Audit.SharePoint, Audit.General
Configure Splunk Add-on for Microsoft Security
Ingest Defender XDR incidents and alerts
Create ingestion health dashboard
Build Blank / Partial / Live maturity tags

Success: Splunk receives DLP + audit events daily. Defender incidents/alerts appear. Ingest freshness is measured. Data gaps are explicit and visible.

Phase 2

Normalize DLP Control Facts

Extract policy / rule / action fields from nested JSON
Normalize workload names
Normalize action outcomes: audit, notify, warn, allow, allow-with-override, block, restrict, encrypt/quarantine
Map SIT names → reporting families: NPI, PCI, PII, Financial, Credentials, Legal/privileged, Custom
Map labels → 4-label taxonomy: Public, Internal, Confidential, Restricted

Success: Top DLP policy/rule/action reports reliable. Executives see "protected vs exposed." Engineers see FP clusters. Investigators pivot from dashboard to incident.

Phase 3

Enrich from Defender Advanced Hunting

Scheduled AH queries: DLP rule matches, alert evidence, high-risk users, file/device pivots, label + SIT + policy combos
Summary export into Splunk
Correlation: UAL event ID ↔ alert ID ↔ incident ID ↔ file/user/device ↔ ticket ID
Incremental pulls — respect 30-day window, 100K row limit, rate limits

Success: Analyst dashboards stop being log viewers. Policy effectiveness and triage status visible. FP rate and MTTR become measurable.

Phase 4

Executive Scorecards

Monthly executive dashboard
Audit evidence dashboard
Control Health composite score
90-day exposure trend
Block / override / allow trend
Top risky workflows: external email, anonymous sharing, unmanaged device download, removable media, cloud upload, Teams oversharing, browser upload to unapproved domains

Success: Executives assess risk in <5 minutes. Auditors trace every report to source + control objective. Engineers see broken controls. Investigators see what requires action today.

When to Consider Alternatives

Consider Microsoft Sentinel or Defender portal-native reporting if: Splunk cannot ingest Defender XDR incidents/alerts properly; the team needs native KQL over Microsoft security tables; Advanced Hunting data is easier to operationalize in Microsoft-native tooling; leadership accepts Power BI/Sentinel workbooks instead of Splunk dashboards; or cost/data-volume concerns make full Splunk indexing unattractive.

For this project, the best answer is: Keep Splunk, but feed it better data. Use UAL for audit evidence, Defender XDR for alert lifecycle, and Advanced Hunting for enrichment.

Microsoft / Splunk Integration — Known Frictions & Mitigations

Field reality: Many engineers report that Microsoft's data surfaces are progressively harder to consume outside of the Microsoft stack. This is not always intentional product sabotage — some friction is architectural (nested JSON, API rate limits, schema drift) and some is commercial (Microsoft has obvious incentive to keep workloads in Sentinel/Defender portal). The mitigations below are all production-validated approaches.

Friction Point	What Actually Happens	Mitigation
UAL event schema is deeply nested JSON	DLP events embed `PolicyDetails[].Rules[].Actions[]` as arrays-within-arrays. The Splunk O365 Add-on ingests raw JSON but does not auto-extract nested arrays into usable fields. Most engineers hit this in the first week and spend days writing `spath` + `mvexpand` chains.	Use `spath` + `mvexpand` (see SPL Query Pack §2). Write normalized results into a summary index daily. Never run nested JSON parsing in production dashboard queries — pre-compute. Keep raw events in a separate cold/warm tier.
O365 Management Activity API: 12-hour event delay	Audit events are not real-time. Microsoft's documented SLA for most audit records is up to 24 hours; typical observed latency is 1–12 hours for DLP events. SharePoint/OneDrive events have sometimes shown 24+ hour delays in documented incidents. Engineers expecting near-real-time SIEM data are surprised.	Do not use UAL for real-time alerting. Use Defender XDR incidents/alerts for operational alerting (they're faster). UAL is your audit evidence and compliance layer, not your SOC detection feed. Set dashboard refresh expectations accordingly.
Advanced Hunting API: 10K row limit per response	The API returns a maximum of 10,000 rows per query execution. The portal UI also caps results at 10,000 rows. There is no native streaming endpoint. Queries that return more than 10K results are silently truncated unless you implement pagination or time-banded incremental pulls.	Implement time-banded incremental pulls (e.g., 6-hour windows). Use `Timestamp > {last_pull}` in your KQL to avoid full re-scan. Schedule as a Splunk modular input or external Python script. Store the last successful pull timestamp in a KV Store or lookup.
Defender XDR SIEM connector deprecation / migration	Microsoft deprecated the legacy Defender for Endpoint SIEM API (siem.windows.com) in September 2024. The replacement is the Microsoft Defender XDR Streaming API (via Event Hubs) or the Microsoft Graph Security API. Teams using older Splunk add-on versions or legacy connector configs may find incidents/alerts silently stop flowing.	Use Splunk Add-on for Microsoft Security (not the legacy Defender for Endpoint add-on). Verify it is configured against the Defender XDR incidents/alerts API endpoints, not the deprecated SIEM API. Check Splunkbase for add-on version; the current certified version supports the Graph Security API.
Event Hubs as a required intermediary	Microsoft's recommended path for streaming Defender XDR and Purview signals at scale is via Azure Event Hubs. This adds an Azure infrastructure dependency (Event Hub namespace, consumer groups, connection strings, throughput units) that a pure Splunk shop may not have provisioned. The Splunk Add-on for Microsoft Cloud Services handles Event Hubs ingestion but requires separate configuration.	For moderate volume (under ~500K events/day), the polling-based O365 Add-on and Security Add-on are sufficient and simpler to operate. Switch to Event Hubs streaming only when polling lag becomes operationally unacceptable or when volume exceeds what the Management Activity API can serve within its rate limits.
O365 Management Activity API rate limits	The API enforces per-publisher, per-tenant rate limits. Heavy polling during high-event periods (e.g., large DLP scan sweeps, major incidents) can result in HTTP 429 throttling. The Splunk Add-on handles retries but silently queues content blobs — engineers often don't notice gaps until they check ingestion freshness metrics.	Monitor the `idx_ms_service_health` index and the Splunk Add-on internal logs (`index=_internal sourcetype=splunk_ta_o365`). Build ingestion gap detection into Dashboard 1 (Pipeline Health). Use separate content type subscriptions (DLP.All vs Audit.Exchange vs Audit.General) so a throttle on one feed doesn't block all audit data.
DLP event de-duplication	The Management Activity API content blob model means that a single DLP event can appear in multiple content blobs across polling windows, or be re-delivered after a transient API error. Engineers building DLP dashboards without de-duplication will over-count events, inflating policy match counts and exec metrics.	De-duplicate on stable event identifiers: `Id` (the audit record GUID in UAL) + `CreationTime` + `UserId` + `ObjectId`. Use `dedup` in dashboard queries or, better, de-duplicate at collect time in your summary index build (see SPL §8). Include `event_key = md5(...)` in your fact rows.
Purview Insider Risk data is not in UAL by default	RecordTypes 306, 307, and 308 (IRM) are not present in the standard UAL subscription. IRM audit records require: (1) IRM to be configured and policies active, (2) the tenant admin to have explicitly opted into IRM audit logging, and (3) the O365 Management Activity subscription to be configured for `Audit.General` which is where IRM records appear. Some tenants never enable this.	Validate IRM audit record presence before building the IRM dashboard. Run validation query (SPL §1.2) and confirm RecordTypes 306/307/308 are present. If absent, IRM may not be active, IRM audit logging may be disabled, or `Audit.General` subscription may be missing. Escalate to the Microsoft Purview admin to verify IRM policy state and audit configuration.
DataSecurityEvents (Advanced Hunting) requires IRM opt-in	The `DataSecurityEvents` table in Advanced Hunting is in Preview and requires a separate IRM opt-in. Tenants that have not completed this opt-in will see the table return zero results or not appear at all in the AH schema. This is often discovered after hours of SPL debugging.	Treat `DataSecurityEvents` as optional enrichment, not a primary data source. Mark any KPIs depending on it as BLANK until opt-in is confirmed and results are validated. Document this as a known gap in the KPI maturity matrix.
Sensitivity label events are sparse in UAL	UAL captures label change events (apply, change, remove) but does not provide a snapshot of how many files are currently labeled. There is no "current label inventory" stream in UAL. Engineers expecting a labeled-file-count metric from Splunk alone will be unable to build it from UAL events.	Use UAL label events for activity trending, change detection, and downgrade alerting. For labeled-file inventory, use the Microsoft Graph API (Content Discovery), Purview Content Explorer export, or Activity Explorer API. Import these as a scheduled lookup or reference dataset in Splunk — do not try to reconstruct inventory from event counts.
Microsoft schema changes without notice	Microsoft periodically renames or restructures fields in UAL JSON payloads, Defender API responses, and Advanced Hunting table schemas — often without versioned change notifications. Teams have experienced: DLP policy fields moving inside nested arrays, RecordType values being added mid-cycle, AH column renames, and Defender API response envelope changes breaking ingestion.	Never hard-code field names in production SPL without defensive `coalesce()` fallbacks. Subscribe to the Microsoft 365 Message Center and the Defender XDR changelog. Build schema validation into Dashboard 1 (Pipeline Health) — alert when expected fields have a null rate above 5%. Maintain a field validation saved search that runs weekly.
Microsoft favors Sentinel for Purview integration	Microsoft's native Purview + Defender integration story is built for Sentinel / Log Analytics. The Purview Audit connector, the Defender XDR connector, and the Microsoft 365 Defender data connector all have first-class Sentinel support. SPL equivalents require more engineering effort, and some data surfaces (e.g., Purview Communication Compliance, certain Defender Identity signals) have no published Splunk integration path.	Accept Sentinel as a complementary system for native Microsoft-to-Microsoft signal paths. Feed curated, normalized outputs from Sentinel into Splunk via Sentinel's SIEM forwarding or Event Hub bridge, rather than trying to replicate every Microsoft connector in Splunk. See the Recommendation page for the hybrid model.

Bottom line for this project: The frictions above are real but solvable. The architecture in this document is designed around them — UAL is treated as audit evidence (not real-time SIEM), Defender XDR provides the investigation layer, Advanced Hunting provides enrichment at scheduled cadence, and Sentinel remains an option for signals that are easier to consume natively. The engineering effort required to work around Microsoft's API limitations is significant — plan for it explicitly in the engagement timeline.

Engineering Prompt — Splunk / Microsoft Team

Use with the Splunk / Microsoft engineering team. Covers the full architecture, field dictionary, lookup schemas, dashboard specs, KPI maturity definitions, and acceptance criteria.

📋 View full engineering prompt — click to expand

You are a Microsoft Purview, Microsoft Defender XDR, and Splunk engineering team building enterprise-class, audit-defensible reporting for Purview DLP, sensitivity labeling, retention/control health, and investigation operations.

Project objective:
Build Splunk dashboards and KPI marts across two reporting axes:
1. Health — Is the control operating?
2. Effectiveness — Is the control reducing risk?

Primary source systems:
- Microsoft Purview Unified Audit Log / Office 365 Management Activity API
- Office 365 Management Activity API content types:
  - DLP.All
  - Audit.Exchange
  - Audit.SharePoint
  - Audit.General
  - Audit.AzureActiveDirectory where useful
- Microsoft Defender XDR incidents and alerts
- Microsoft Defender Advanced Hunting API
- Optional enrichment from ServiceNow/Jira ticketing, HR/user metadata, and label taxonomy reference tables

Required Splunk add-ons/connectors:
- Splunk Add-on for Microsoft Office 365
- Splunk Add-on for Microsoft Security
- Optional: Splunk Add-on for Microsoft Cloud Services if Event Hubs ingestion is used

Design principle:
Do not expect the Unified Audit Log to contain full investigation context. Treat UAL/O365 Management Activity as the audit evidence layer. Treat Defender XDR incidents/alerts and Advanced Hunting exports as the investigation and enrichment layer. Splunk is the reporting, normalization, correlation, and executive KPI layer.

Required raw indexes:
- idx_ms_o365_audit
- idx_ms_o365_dlp
- idx_ms_o365_retention         (UAL retention, disposition, records management events)
- idx_ms_purview_insider_risk   (UAL RecordTypes 306, 307, 308)
- idx_ms_defender_incidents
- idx_ms_defender_alerts
- idx_ms_defender_hunting
- idx_ms_service_health

Normalized fact families and sourcetypes:
- purview:dlp:fact              → purview_dlp_fact
- purview:label:fact            → purview_label_fact
- purview:retention:fact        → purview_retention_lifecycle_fact
- purview:insider_risk:fact     → purview_insider_risk_fact
- defender:incident:fact        → defender_incident_fact
- defender:alert:fact           → defender_alert_fact
- purview:control:fact          → purview_control_fact (unified)

Required curated indexes or summary indexes:
- idx_purview_control_facts
- kpi_purview_health_daily
- kpi_dlp_effectiveness_daily
- kpi_investigation_operations_daily
- kpi_exec_control_score_monthly
- kpi_audit_evidence_monthly
- kpi_retention_lifecycle_daily
- kpi_insider_risk_daily         (privacy-controlled; RBAC required before enabling)

Normalize the following fields into idx_purview_control_facts:
- event_time, ingest_time, source_plane, workload, operation
- policy_name, rule_name, rule_action, enforcement_mode
- user_upn, user_department, user_title
- recipient, recipient_domain, external_internal_flag
- file_name, file_extension, file_path, site_url
- device_name, device_id, ip_address
- sensitivity_label, sensitivity_label_id
- sit_names, sit_family, sit_count, confidence
- alert_id, incident_id, severity, status, classification, determination, assigned_to
- ticket_id, maturity_status

Create lookup tables:
1. label_taxonomy_lookup
   label_id | label_name | reporting_label_family | protection_level | encryption_expected | external_sharing_allowed

2. sit_family_lookup
   sit_name | sit_family | regulated_data_type | executive_category | severity_modifier

3. dlp_policy_lookup
   policy_name | policy_owner | control_objective | workload_scope | deployment_status | expected_action | executive_category

4. kpi_maturity_lookup
   kpi_name | data_source | maturity_status | owner | known_gap | remediation_plan

Build dashboards:

Dashboard 1: Purview Pipeline Health (Audience: Engineering)
- UAL ingestion freshness, DLP.All ingestion freshness
- Defender incident and alert ingestion freshness
- Service health events
- Event volume by source plane
- Parse success rate, missing policy/rule/action rate
- API or connector error count, zero-event days by feed

Dashboard 2: DLP Control Effectiveness (Audience: Engineering / Security / Compliance)
- DLP events by workload, policy, rule
- Action distribution: audit, notify, warn, block, allow, override, restrict
- Block / allow-with-override ratio
- SIT family distribution: NPI, PCI, PII, financial, credentials, legal, custom
- SIT confidence distribution
- Label + SIT mismatch report
- Top external domains, top risky users, top risky files
- 90-day risk exposure trend

Dashboard 3: Investigation Operations (Audience: SOC / Investigations)
- Incidents by severity and status
- Alerts by severity
- Active queue depth, aging by severity
- Mean time to acknowledge / triage (MTTA)
- Mean time to resolve (MTTR)
- False-positive rate and true-positive rate
- Unassigned and reopened incidents
- Top policies producing false positives
- Top entities appearing across multiple incidents

Dashboard 4: Executive Control Scorecard (Audience: Leadership)
- Control Health composite score and Effectiveness composite score
- Program coverage percentage
- Protected vs exposed sensitive activity
- Block/restrict/warn/override trend
- Member-data protection trend
- Top 5 control gaps
- KPI maturity: Blank, Partial, Live
- 90-day risk trend, month-over-month improvement

Dashboard 5: Audit Evidence (Audience: Audit / Compliance)
- Control objective, evidence source, dashboard name
- Current maturity state, last successful event, last refresh
- Owner, known gap, remediation plan
- Export / screenshot evidence status

Define KPI maturity states:
- Blank: dashboard/control exists, but source data is not yet available or connected.
- Partial: some data exists but coverage, parsing, or enrichment is incomplete.
- Live: data is connected, normalized, refreshed, and report-ready.

Composite Health Score formula:
  0.20 * ingestion_freshness_score
+ 0.20 * dlp_event_presence_score
+ 0.20 * alert_pipeline_score
+ 0.15 * parsing_quality_score
+ 0.15 * incident_lifecycle_score
+ 0.10 * kpi_maturity_score

Composite Effectiveness Score formula:
  0.25 * protected_events_score
+ 0.20 * high_risk_reduction_score
+ 0.20 * fp_reduction_score
+ 0.15 * mtta_improvement_score
+ 0.10 * repeat_offender_score
+ 0.10 * override_rate_score

Advanced Hunting enrichment requirements:
Create scheduled Advanced Hunting queries for:
- DLP rule matches by policy/rule/action
- AlertInfo joined to AlertEvidence
- User/file/device/entity evidence extraction
- High-risk user and repeat-entity detection
- Label + SIT + DLP policy combinations
- Endpoint/removable media/browser upload events where available
- DataSecurityEvents where available and licensed/opted in (NOTE: Preview — requires IRM opt-in)

Engineering constraints:
- Advanced Hunting API: 30-day lookback, up to 10,000 rows per API response (paginate for larger result sets), rate-limited, 3-minute query timeout. Use incremental scheduled pulls. Note: the portal UI also caps at 10K rows per query run.
- Preserve raw events before normalization.
- Expect duplicate O365 Management Activity events — deduplicate using stable event identifiers: workload, operation, user, object, event time, source record ID.
- Do not inflate "incidents avoided." Define as blocked/restricted/warned sensitive events and label clearly as a proxy metric.
- Do not mark a KPI Live unless source, refresh, parsing, owner, and dashboard validation are complete.

Deliverables:
1. Data source inventory
2. Splunk ingestion map
3. Normalized field dictionary
4. Lookup table schemas
5. KPI maturity matrix
6. Dashboard wireframes
7. Initial SPL searches
8. Advanced Hunting KQL query pack
9. Audit evidence register
10. Gap / remediation log

Additional required domains — Data Retention / Data Lifecycle Management:
Ingest and normalize Microsoft Purview audit activities related to:
- retention policy configuration and publication
- retention label application, change, and removal
- record declaration (standard and regulatory)
- disposition review: pending, approved, rejected, extended
- item relabeling during disposition (RelabelItem)
- retention extension (ExtendRetention)
- deletion eligibility and deletion execution
- retention exceptions and hold status
- lifecycle workload coverage

Normalize into purview_retention_lifecycle_fact:
- retention_policy_id, retention_policy_name
- retention_label_id, retention_label_name, retention_label_action
- retention_action, retention_duration, retention_trigger
- retention_start_date, retention_expiration_date
- record_status, is_record, is_regulatory_record
- disposition_review_status, disposition_stage, disposition_reviewer
- disposition_decision, disposition_decision_time, disposition_comments
- delete_action, delete_eligibility_time, delete_execution_time
- retention_exception_reason, retention_hold_status
- lifecycle_workload, lifecycle_location

Retention event discovery SPL (run before building parsers):
index=o365 sourcetype="o365:management:activity"
(Operation="*Retention*" OR Operation="*Disposition*" OR Operation="*Record*"
 OR Operation="*Label*" OR Operation="RelabelItem" OR Operation="ExtendRetention")
| stats count by Operation Workload RecordType | sort -count

Additional required domains — Insider Risk Management:
Ingest and normalize Microsoft Purview Insider Risk Management audit records.
Management Activity API record types:
- 306 InsiderRiskManagement        (individual risk activity events — exfiltration signals, sequence triggers, policy matches)
- 307 InsiderRiskManagementAlert   (alert-level records — AlertId, PolicyName, Severity, AlertStatus)
- 308 InsiderRiskManagementCase    (case-level records — CaseId, CaseName, CaseStatus, Severity)

Normalize into purview_insider_risk_fact:
- irm_case_id, irm_case_name, irm_alert_id
- irm_policy_id, irm_policy_name, irm_policy_template
- irm_risk_score, irm_risk_level
- irm_activity_type, irm_activity_time, irm_triggering_event
- irm_user_upn, irm_user_department, irm_user_role (privacy-controlled)
- irm_scoped_user_status
- irm_case_status, irm_alert_status
- irm_assigned_to, irm_review_status
- irm_resolution, irm_false_positive
- irm_notes_present, irm_privacy_redaction_state
- irm_escalated_to_investigation
- irm_created_time, irm_updated_time

IRM privacy requirement:
Apply Splunk RBAC before enabling user-level IRM fields in any dashboard.
Executive and engineering dashboards show aggregate counts, trends, risk bands, and status only.
User-identifiable IRM details are restricted to approved investigation roles.

Updated purview_control_fact unified model must include:
- DLP detection
- Sensitivity labeling
- Retention and lifecycle activity
- Records management activity
- Disposition review
- Insider Risk alerts and cases
- Defender incidents and alerts
- Workflow state and ticket linkage
- KPI maturity tracking

Acceptance criteria:
- Splunk receives UAL/O365 DLP data.
- Splunk receives Defender XDR incidents and alerts.
- Retention/lifecycle audit events are discoverable in Splunk via idx_ms_o365_retention.
- Retention label and disposition activities are parsed where present in UAL.
- Retention & Lifecycle dashboard (Dashboard 6) exists.
- Insider Risk record types 306, 307, and 308 are searched and validated via idx_ms_purview_insider_risk.
- Insider Risk Management dashboard (Dashboard 7) exists.
- Insider Risk reporting is privacy-safe by default; RBAC applied.
- Retention and Insider Risk KPIs are tagged Blank, Partial, or Live.
- At least one dashboard exists for each audience: Engineering, Investigations, Executive, Audit, Records Management.
- Every KPI is tagged Blank, Partial, or Live.
- Every KPI has a source system, owner, refresh cadence, and known limitation.
- Dashboards distinguish Health from Effectiveness.
- Executives can view risk posture in under five minutes.
- Auditors can trace each report back to source telemetry and control objective.

References