Artifact 2 — KPI Reporting Architecture

Splunk Reporting Architecture

Dual-ingestion model · UAL audit evidence layer · Defender XDR investigation layer · Normalized KPI marts · 7 dashboard tiers

Recommended Architecture — Dual-Ingestion Model
Add-on 1
Splunk Add-on for Microsoft Office 365
Collects audit evidence: DLP.All, Audit.Exchange, Audit.SharePoint, Audit.General, Service Health, Message Trace, Entra ID metadata. This is the audit layer — not the complete investigation layer.
Add-on 2
Splunk Add-on for Microsoft Security
Collects Defender XDR incidents and alerts mapped to Splunk CIM. Provides: incident ID, severity, status, classification, assigned owner, detection source, evidence entities, related alerts. This is the investigation layer.
Enrichment Plane
Defender Advanced Hunting API
Scheduled AH exports or API pulls produce enriched datasets. 30-day lookback, 10K rows per API response (not 100K — common misconception; use pagination or incremental pulls), rate-limited, 3-minute query timeout. Treat DataSecurityEvents as Preview — requires IRM opt-in.
Data Flow
Source · Audit & Policy Plane
🏛️ Microsoft Purview / M365
Office 365 Management Activity API Audit.Exchange Audit.SharePoint Audit.General DLP.All
Source · Investigation & Enrichment Plane
🛡️ Microsoft Defender XDR
Incidents / Alerts API Advanced Hunting API Scheduled pulls · 10K rows/response · 3 min timeout
Splunk Add-on 1
Splunk Add-on for Microsoft Office 365
Splunk Add-on 2 + Enrichment
Splunk Add-on for Microsoft Security & Scheduled AH Export
Raw Audit / DLP Indexes
idx_ms_o365_audit idx_ms_o365_dlp idx_ms_service_health
Raw Defender / Hunting Indexes
idx_ms_defender_incidents idx_ms_defender_alerts idx_ms_defender_hunting
Layer 2 · Normalization
Purview Control Facts
idx_purview_control_facts
Layer 3 · KPI Summary Indexes
KPI Mart Indexes / Data Models
kpi_purview_health_daily kpi_dlp_effectiveness_daily kpi_investigation_operations_daily kpi_exec_control_score_monthly kpi_audit_evidence_monthly
🔧
Engineering
Index health · pipeline ops · schema drift
🔍
Investigation
Alert triage · MTTA/MTTR · SOC queue
📊
Executive
Control scores · compliance · audit evidence
Important caveat on the O365 Add-on: The audit feed proves that events occurred and supports control-operation evidence, but it often lacks the analyst-friendly context leadership expects — clean alert lifecycle, triage status, risk classification, ownership, incident grouping, and fully normalized policy-action fields. That context comes from Defender XDR.
Three Splunk Data Layers

Layer 1 — Raw Microsoft Telemetry Keep raw JSON. Do not over-transform too early.

IndexSource
idx_ms_o365_auditOffice 365 Management Activity API — Audit.Exchange, Audit.SharePoint, Audit.General
idx_ms_o365_dlpDLP.All feed
idx_ms_o365_retentionUAL retention policy, disposition, record declaration events (RecordType 50)
idx_ms_purview_insider_riskUAL InsiderRiskManagement events (RecordTypes 306, 307, 308)
idx_ms_defender_incidentsDefender XDR incidents
idx_ms_defender_alertsDefender XDR alerts
idx_ms_defender_huntingScheduled Advanced Hunting query outputs
idx_ms_service_healthO365 service health / communications

Layer 2 — Normalized Purview Control Facts idx_purview_control_facts · purview_summary

Each row represents one control-relevant fact. idx_purview_control_facts is the unified normalized fact store for cross-family correlation and dashboard queries. purview_summary is the operational summary index populated by daily collect searches (see SPL Query Pack §8) — it feeds the KPI marts. Both serve Layer 2; use purview_summary as the write target for scheduled searches and idx_purview_control_facts as the curated read target for dashboards.

FieldMeaning
event_timeWhen the event happened
ingest_timeWhen Splunk received it
source_planeUAL, DefenderIncident, DefenderAlert, AdvancedHunting
workloadExchange, SharePoint, OneDrive, Teams, Endpoint, Browser, etc.
policy_nameDLP / label / IRM policy
rule_nameDLP rule
rule_actionAudit, notify, warn, block, restrict, override, etc.
enforcement_modeAudit, warn, block, allow, etc.
user_upnActor
recipient_domainExternal destination where available
file_name / file_extensionContent object and type
sensitivity_labelLabel at time of event
sit_names / sit_countSensitive info types matched + count
confidenceSIT confidence where available
alert_id / incident_idDefender / Purview alert and incident linkage
severityAlert severity
statusAlert / incident status
classificationTP / FP / benign / unknown
ticket_idServiceNow / Jira etc.
maturity_statusBlank / Partial / Live

Layer 3 — KPI Marts Prevent dashboards from repeatedly parsing nested JSON

MartAudience
kpi_purview_health_dailyEngineering
kpi_dlp_effectiveness_dailyEngineering / SOC
kpi_investigation_operations_dailyInvestigations / SOC
kpi_retention_lifecycle_dailyRecords Management / Compliance
kpi_insider_risk_dailyInvestigations — privacy-controlled; RBAC required
kpi_exec_control_score_monthlyExecutive
kpi_audit_evidence_monthlyAudit / Compliance

Lookup Tables

📋 label_taxonomy_lookup

label_id
label_name
reporting_label_family
protection_level
encryption_expected
external_sharing_allowed

📋 sit_family_lookup

sit_name
sit_family
regulated_data_type
executive_category
severity_modifier

📋 dlp_policy_lookup

policy_name
policy_owner
control_objective
workload_scope
deployment_status
expected_action
executive_category

📋 kpi_maturity_lookup

kpi_name
data_source
maturity_status
owner
known_gap
remediation_plan

KPI Recommendations by Audience
Every KPI must have a source system, owner, refresh cadence, and maturity state. Do not mark a KPI Live unless source, refresh, parsing, owner, and dashboard validation are all complete.

Engineering KPIs

KPISourceMaturityNotes
DLP event ingestion freshnessSplunk _indextime vs event timeLIVEProves pipeline health
UAL subscription / content availabilityO365 Management Activity APILIVEDetects broken audit feed
DLP policy match volumeDLP.All, Advanced HuntingLIVEBasic activity control
Rule action distributionUAL + AH enrichmentPARTIALNeeds parsing / enrichment
SIT confidence distributionAH / Purview event detailsPARTIALMay be inconsistent by workload
Label usage by workloadUAL + label eventsPARTIALRequires label taxonomy mapping
Auto-labeling activity trendUAL / Purview classification eventsPARTIALStronger with Activity Explorer export/API
SHIR / scanner healthPurview governance / Data Map telemetryBLANKSeparate source — do not fake it
OCR pipeline statusControl registerBLANKMark "not deployed" until telemetry exists

Investigation KPIs

KPISourceMaturityNotes
Alert volume by severityDefender XDR alertsLIVEClean executive / SOC metric
Incident volume by statusDefender XDR incidentsLIVEOperational backlog
Triage queue depthDefender incidents (Active + In Progress)LIVEReal-time operational view
Aging by severityDefender incidentsLIVECritical for audit defensibility
Repeat offender users / entitiesUAL + Defender evidenceLIVEStrong SOC value
Mean time to triage (MTTA)Defender incident status + assignmentPARTIALRequires lifecycle event capture
Mean time to resolve (MTTR)Defender incident resolved timestampPARTIALBetter via incident API
False-positive rateDefender classificationPARTIALRequires analysts to classify
Top exfiltration vectorsDLP events + workload / actionPARTIALNeeds normalization
Ticket creation latencyDefender alert time + ticket timePARTIALRequires ITSM integration

Retention / Data Lifecycle KPIs

KPISourceMaturityNotes
Retention policies deployed by workloadUAL retention policy eventsBLANKProves scope coverage
Retention labels applied by label/workloadUAL label application eventsBLANKProves label adoption
Records declared by workloadUAL record declaration eventsBLANKRecords management activity
Items pending disposition reviewUAL disposition eventsBLANKDisposition queue backlog
Average disposition review ageUAL disposition timestampsBLANKSLA compliance indicator
Disposition approvals / rejectionsUAL disposition decision eventsBLANKControl execution evidence
Items eligible for deletionUAL deletion eligibility eventsBLANKLifecycle maturity signal
Items deleted after retentionUAL delete execution eventsBLANKDisposal execution proof
Items relabeled during dispositionUAL relabel / RelabelItem eventsBLANKGovernance correction evidence
Workloads without retention coveragePolicy scope vs workload inventoryBLANKExecutive risk gap

Insider Risk Management KPIs

All IRM KPIs are aggregate only in shared dashboards. User-level details require approved investigation role access and Splunk RBAC controls.
KPISourceMaturityNotes
IRM alerts by policyUAL RecordType 307 (InsiderRiskManagementAlert)BLANKPolicy activity baseline
IRM alerts by risk levelUAL RecordType 307BLANKRisk distribution
IRM cases openedUAL RecordType 308 (InsiderRiskManagementCase)BLANKInvestigation demand
IRM cases closedUAL RecordType 308BLANKThroughput
IRM case agingUAL RecordType 308 timestampsBLANKBacklog risk indicator
IRM false-positive rateUAL RecordType 307 + case resolution (308)BLANKTuning quality indicator
IRM risk activity volumeUAL RecordType 306 (InsiderRiskManagement — activity events)BLANKExfiltration signals, policy matches, sequences
IRM escalation rateUAL RecordTypes 307 → 308 correlationBLANKAlert → case promotion rate
IRM policy coveragePolicy inventoryBLANKDeployment maturity
IRM alert-to-case conversion rateRecordTypes 307 + 308 correlationBLANKInvestigation selectivity

Executive KPIs

KPISourceMaturityNotes
Program coverage %Control inventory + policy scopePARTIALNeeds authoritative scope inventory
Protected locationsPolicy scope + workload coveragePARTIALDo not infer from events alone
Control Health composite scoreKPI martPARTIALGood exec rollup — see formula below
Risk exposure trend, 90 daysAlerts + DLP eventsLIVE once retainedStrong leadership metric
NPI / PCI protected vs exposedSIT + action outcomePARTIALNeeds SIT taxonomy mapping
Block / allow-with-override ratioDLP enforcement / actionPARTIALKey effectiveness metric
Member-data incidents avoidedBlocked / restricted eventsPARTIALDefine carefully — avoid inflated claims
Retention deployment statusUAL / idx_ms_o365_retentionBLANKNow sourced via retention fact family
IRM program active (T/F)UAL RecordTypes 306/307/308BLANKConfirms IRM is deployed and generating signals
Composite Score Model
Use a simple weighted score. Do not over-engineer it. Keep Health (is the control operating?) and Effectiveness (is the control reducing risk?) as separate axes — this is the correct framing for this project.

Control Health Score

ComponentWeight
Ingestion freshness20%
DLP policy telemetry present20%
Alert pipeline active20%
Policy / rule / action parse success15%
Incident lifecycle completeness15%
KPI maturity completeness10%
Health Score = 0.20 × ingestion_freshness_score + 0.20 × dlp_event_presence_score + 0.20 × alert_pipeline_score + 0.15 × parsing_quality_score + 0.15 × incident_lifecycle_score + 0.10 × kpi_maturity_score

Effectiveness Score

ComponentWeight
Sensitive events protected by block / restrict / warn25%
High-risk events reduced over 90 days20%
False-positive rate reduced20%
Mean time to triage improved15%
Repeat offenders reduced10%
Override rate controlled10%
Effectiveness Score = 0.25 × protected_events_score + 0.20 × high_risk_reduction_score + 0.20 × fp_reduction_score + 0.15 × mtta_improvement_score + 0.10 × repeat_offender_score + 0.10 × override_rate_score
Dashboard Package — 5 Tiers
Audience — Engineering
Dashboard 1 · Purview Pipeline Health
  • UAL ingestion freshness
  • DLP event count by day
  • Defender alert count by day
  • Defender incident count by day
  • API gaps / zero-event days
  • Duplicate rate
  • Parse success rate
  • Events missing policy / rule / action
  • Splunk source / sourcetype / index status
Audience — Engineering / Security / Compliance
Dashboard 2 · DLP Control Effectiveness
  • DLP events by workload
  • DLP events by policy and rule
  • Actions: audit / warn / block / override
  • Block / allow-with-override ratio
  • SIT families: NPI / PCI / PII
  • SIT confidence distribution
  • Top external domains
  • Top users by sensitive activity
  • Top files by repeat policy hits
  • Label + SIT mismatch report
Audience — SOC / Investigations
Dashboard 3 · Investigation Operations
  • Active incidents by severity
  • Aging incidents by severity
  • MTTA / MTTR
  • False-positive rate
  • True-positive rate
  • Unassigned incidents
  • Reopened incidents
  • Top policies generating FPs
  • High-volume users
  • Multi-incident users / files / devices
Audience — Leadership
Dashboard 4 · Executive Control Scorecard
  • Control Health composite score
  • Effectiveness composite score
  • 90-day exposure trend
  • Protected vs exposed sensitive activity
  • Block / restrict / warn / override ratio
  • Member-data protection trend
  • Program coverage by workload
  • Top 5 control gaps
  • KPI maturity: Blank / Partial / Live
Audience — Records Management / Compliance
Dashboard 6 · Retention & Lifecycle Management
  • Retention policy coverage by workload
  • Retention label activity by label / workload
  • Records declared — standard & regulatory
  • Disposition queue — items pending review
  • Disposition approvals & rejections
  • Retention extensions
  • Items relabeled at disposition
  • Deletion-eligible content
  • Items deleted after retention
  • Lifecycle gaps by workload / site / mailbox
Audience — Investigations / Compliance (Privacy-Controlled)
Dashboard 7 · Insider Risk Management
  • IRM alerts by policy (aggregate)
  • IRM alerts by risk level
  • IRM cases by status
  • Case aging distribution
  • Alert-to-case conversion rate
  • False-positive rate trend
  • Scoped users count trend
  • Top triggering activity types
  • Policy coverage & deployment maturity
  • Privacy-safe executive summary panel

User-identifiable fields restricted by Splunk RBAC. Executive view shows aggregate counts and risk bands only.

Audience — Audit / Compliance
Dashboard 5 · Audit Evidence
  • Control objective
  • Data source
  • Evidence available
  • Last successful event
  • Last dashboard refresh
  • Owner
  • Gaps + remediation plan
  • Screenshot / export link
  • Monthly evidence package status

Reports without data are still evidence that the control framework exists, provided the report clearly shows the expected source, maturity state, gap, owner, and remediation path.

Minimum Viable Implementation — 4 Phases
Phase 1
Ingest and Prove Telemetry
  • Enable / verify Unified Audit Log
  • Configure Splunk Add-on for Microsoft Office 365
  • Ingest DLP.All, Audit.Exchange, Audit.SharePoint, Audit.General
  • Configure Splunk Add-on for Microsoft Security
  • Ingest Defender XDR incidents and alerts
  • Create ingestion health dashboard
  • Build Blank / Partial / Live maturity tags

Success: Splunk receives DLP + audit events daily. Defender incidents/alerts appear. Ingest freshness is measured. Data gaps are explicit and visible.

Phase 2
Normalize DLP Control Facts
  • Extract policy / rule / action fields from nested JSON
  • Normalize workload names
  • Normalize action outcomes: audit, notify, warn, allow, allow-with-override, block, restrict, encrypt/quarantine
  • Map SIT names → reporting families: NPI, PCI, PII, Financial, Credentials, Legal/privileged, Custom
  • Map labels → 4-label taxonomy: Public, Internal, Confidential, Restricted

Success: Top DLP policy/rule/action reports reliable. Executives see "protected vs exposed." Engineers see FP clusters. Investigators pivot from dashboard to incident.

Phase 3
Enrich from Defender Advanced Hunting
  • Scheduled AH queries: DLP rule matches, alert evidence, high-risk users, file/device pivots, label + SIT + policy combos
  • Summary export into Splunk
  • Correlation: UAL event ID ↔ alert ID ↔ incident ID ↔ file/user/device ↔ ticket ID
  • Incremental pulls — respect 30-day window, 100K row limit, rate limits

Success: Analyst dashboards stop being log viewers. Policy effectiveness and triage status visible. FP rate and MTTR become measurable.

Phase 4
Executive Scorecards
  • Monthly executive dashboard
  • Audit evidence dashboard
  • Control Health composite score
  • 90-day exposure trend
  • Block / override / allow trend
  • Top risky workflows: external email, anonymous sharing, unmanaged device download, removable media, cloud upload, Teams oversharing, browser upload to unapproved domains

Success: Executives assess risk in <5 minutes. Auditors trace every report to source + control objective. Engineers see broken controls. Investigators see what requires action today.

When to Consider Alternatives

Consider Microsoft Sentinel or Defender portal-native reporting if: Splunk cannot ingest Defender XDR incidents/alerts properly; the team needs native KQL over Microsoft security tables; Advanced Hunting data is easier to operationalize in Microsoft-native tooling; leadership accepts Power BI/Sentinel workbooks instead of Splunk dashboards; or cost/data-volume concerns make full Splunk indexing unattractive.

For this project, the best answer is: Keep Splunk, but feed it better data. Use UAL for audit evidence, Defender XDR for alert lifecycle, and Advanced Hunting for enrichment.

Microsoft / Splunk Integration — Known Frictions & Mitigations
Field reality: Many engineers report that Microsoft's data surfaces are progressively harder to consume outside of the Microsoft stack. This is not always intentional product sabotage — some friction is architectural (nested JSON, API rate limits, schema drift) and some is commercial (Microsoft has obvious incentive to keep workloads in Sentinel/Defender portal). The mitigations below are all production-validated approaches.
Friction Point What Actually Happens Mitigation
UAL event schema is deeply nested JSON DLP events embed PolicyDetails[].Rules[].Actions[] as arrays-within-arrays. The Splunk O365 Add-on ingests raw JSON but does not auto-extract nested arrays into usable fields. Most engineers hit this in the first week and spend days writing spath + mvexpand chains. Use spath + mvexpand (see SPL Query Pack §2). Write normalized results into a summary index daily. Never run nested JSON parsing in production dashboard queries — pre-compute. Keep raw events in a separate cold/warm tier.
O365 Management Activity API: 12-hour event delay Audit events are not real-time. Microsoft's documented SLA for most audit records is up to 24 hours; typical observed latency is 1–12 hours for DLP events. SharePoint/OneDrive events have sometimes shown 24+ hour delays in documented incidents. Engineers expecting near-real-time SIEM data are surprised. Do not use UAL for real-time alerting. Use Defender XDR incidents/alerts for operational alerting (they're faster). UAL is your audit evidence and compliance layer, not your SOC detection feed. Set dashboard refresh expectations accordingly.
Advanced Hunting API: 10K row limit per response The API returns a maximum of 10,000 rows per query execution. The portal UI also caps results at 10,000 rows. There is no native streaming endpoint. Queries that return more than 10K results are silently truncated unless you implement pagination or time-banded incremental pulls. Implement time-banded incremental pulls (e.g., 6-hour windows). Use Timestamp > {last_pull} in your KQL to avoid full re-scan. Schedule as a Splunk modular input or external Python script. Store the last successful pull timestamp in a KV Store or lookup.
Defender XDR SIEM connector deprecation / migration Microsoft deprecated the legacy Defender for Endpoint SIEM API (siem.windows.com) in September 2024. The replacement is the Microsoft Defender XDR Streaming API (via Event Hubs) or the Microsoft Graph Security API. Teams using older Splunk add-on versions or legacy connector configs may find incidents/alerts silently stop flowing. Use Splunk Add-on for Microsoft Security (not the legacy Defender for Endpoint add-on). Verify it is configured against the Defender XDR incidents/alerts API endpoints, not the deprecated SIEM API. Check Splunkbase for add-on version; the current certified version supports the Graph Security API.
Event Hubs as a required intermediary Microsoft's recommended path for streaming Defender XDR and Purview signals at scale is via Azure Event Hubs. This adds an Azure infrastructure dependency (Event Hub namespace, consumer groups, connection strings, throughput units) that a pure Splunk shop may not have provisioned. The Splunk Add-on for Microsoft Cloud Services handles Event Hubs ingestion but requires separate configuration. For moderate volume (under ~500K events/day), the polling-based O365 Add-on and Security Add-on are sufficient and simpler to operate. Switch to Event Hubs streaming only when polling lag becomes operationally unacceptable or when volume exceeds what the Management Activity API can serve within its rate limits.
O365 Management Activity API rate limits The API enforces per-publisher, per-tenant rate limits. Heavy polling during high-event periods (e.g., large DLP scan sweeps, major incidents) can result in HTTP 429 throttling. The Splunk Add-on handles retries but silently queues content blobs — engineers often don't notice gaps until they check ingestion freshness metrics. Monitor the idx_ms_service_health index and the Splunk Add-on internal logs (index=_internal sourcetype=splunk_ta_o365). Build ingestion gap detection into Dashboard 1 (Pipeline Health). Use separate content type subscriptions (DLP.All vs Audit.Exchange vs Audit.General) so a throttle on one feed doesn't block all audit data.
DLP event de-duplication The Management Activity API content blob model means that a single DLP event can appear in multiple content blobs across polling windows, or be re-delivered after a transient API error. Engineers building DLP dashboards without de-duplication will over-count events, inflating policy match counts and exec metrics. De-duplicate on stable event identifiers: Id (the audit record GUID in UAL) + CreationTime + UserId + ObjectId. Use dedup in dashboard queries or, better, de-duplicate at collect time in your summary index build (see SPL §8). Include event_key = md5(...) in your fact rows.
Purview Insider Risk data is not in UAL by default RecordTypes 306, 307, and 308 (IRM) are not present in the standard UAL subscription. IRM audit records require: (1) IRM to be configured and policies active, (2) the tenant admin to have explicitly opted into IRM audit logging, and (3) the O365 Management Activity subscription to be configured for Audit.General which is where IRM records appear. Some tenants never enable this. Validate IRM audit record presence before building the IRM dashboard. Run validation query (SPL §1.2) and confirm RecordTypes 306/307/308 are present. If absent, IRM may not be active, IRM audit logging may be disabled, or Audit.General subscription may be missing. Escalate to the Microsoft Purview admin to verify IRM policy state and audit configuration.
DataSecurityEvents (Advanced Hunting) requires IRM opt-in The DataSecurityEvents table in Advanced Hunting is in Preview and requires a separate IRM opt-in. Tenants that have not completed this opt-in will see the table return zero results or not appear at all in the AH schema. This is often discovered after hours of SPL debugging. Treat DataSecurityEvents as optional enrichment, not a primary data source. Mark any KPIs depending on it as BLANK until opt-in is confirmed and results are validated. Document this as a known gap in the KPI maturity matrix.
Sensitivity label events are sparse in UAL UAL captures label change events (apply, change, remove) but does not provide a snapshot of how many files are currently labeled. There is no "current label inventory" stream in UAL. Engineers expecting a labeled-file-count metric from Splunk alone will be unable to build it from UAL events. Use UAL label events for activity trending, change detection, and downgrade alerting. For labeled-file inventory, use the Microsoft Graph API (Content Discovery), Purview Content Explorer export, or Activity Explorer API. Import these as a scheduled lookup or reference dataset in Splunk — do not try to reconstruct inventory from event counts.
Microsoft schema changes without notice Microsoft periodically renames or restructures fields in UAL JSON payloads, Defender API responses, and Advanced Hunting table schemas — often without versioned change notifications. Teams have experienced: DLP policy fields moving inside nested arrays, RecordType values being added mid-cycle, AH column renames, and Defender API response envelope changes breaking ingestion. Never hard-code field names in production SPL without defensive coalesce() fallbacks. Subscribe to the Microsoft 365 Message Center and the Defender XDR changelog. Build schema validation into Dashboard 1 (Pipeline Health) — alert when expected fields have a null rate above 5%. Maintain a field validation saved search that runs weekly.
Microsoft favors Sentinel for Purview integration Microsoft's native Purview + Defender integration story is built for Sentinel / Log Analytics. The Purview Audit connector, the Defender XDR connector, and the Microsoft 365 Defender data connector all have first-class Sentinel support. SPL equivalents require more engineering effort, and some data surfaces (e.g., Purview Communication Compliance, certain Defender Identity signals) have no published Splunk integration path. Accept Sentinel as a complementary system for native Microsoft-to-Microsoft signal paths. Feed curated, normalized outputs from Sentinel into Splunk via Sentinel's SIEM forwarding or Event Hub bridge, rather than trying to replicate every Microsoft connector in Splunk. See the Recommendation page for the hybrid model.
Bottom line for this project: The frictions above are real but solvable. The architecture in this document is designed around them — UAL is treated as audit evidence (not real-time SIEM), Defender XDR provides the investigation layer, Advanced Hunting provides enrichment at scheduled cadence, and Sentinel remains an option for signals that are easier to consume natively. The engineering effort required to work around Microsoft's API limitations is significant — plan for it explicitly in the engagement timeline.
Engineering Prompt — Splunk / Microsoft Team

Use with the Splunk / Microsoft engineering team. Covers the full architecture, field dictionary, lookup schemas, dashboard specs, KPI maturity definitions, and acceptance criteria.

You are a Microsoft Purview, Microsoft Defender XDR, and Splunk engineering team building enterprise-class, audit-defensible reporting for Purview DLP, sensitivity labeling, retention/control health, and investigation operations.

Project objective:
Build Splunk dashboards and KPI marts across two reporting axes:
1. Health — Is the control operating?
2. Effectiveness — Is the control reducing risk?

Primary source systems:
- Microsoft Purview Unified Audit Log / Office 365 Management Activity API
- Office 365 Management Activity API content types:
  - DLP.All
  - Audit.Exchange
  - Audit.SharePoint
  - Audit.General
  - Audit.AzureActiveDirectory where useful
- Microsoft Defender XDR incidents and alerts
- Microsoft Defender Advanced Hunting API
- Optional enrichment from ServiceNow/Jira ticketing, HR/user metadata, and label taxonomy reference tables

Required Splunk add-ons/connectors:
- Splunk Add-on for Microsoft Office 365
- Splunk Add-on for Microsoft Security
- Optional: Splunk Add-on for Microsoft Cloud Services if Event Hubs ingestion is used

Design principle:
Do not expect the Unified Audit Log to contain full investigation context. Treat UAL/O365 Management Activity as the audit evidence layer. Treat Defender XDR incidents/alerts and Advanced Hunting exports as the investigation and enrichment layer. Splunk is the reporting, normalization, correlation, and executive KPI layer.

Required raw indexes:
- idx_ms_o365_audit
- idx_ms_o365_dlp
- idx_ms_o365_retention         (UAL retention, disposition, records management events)
- idx_ms_purview_insider_risk   (UAL RecordTypes 306, 307, 308)
- idx_ms_defender_incidents
- idx_ms_defender_alerts
- idx_ms_defender_hunting
- idx_ms_service_health

Normalized fact families and sourcetypes:
- purview:dlp:fact              → purview_dlp_fact
- purview:label:fact            → purview_label_fact
- purview:retention:fact        → purview_retention_lifecycle_fact
- purview:insider_risk:fact     → purview_insider_risk_fact
- defender:incident:fact        → defender_incident_fact
- defender:alert:fact           → defender_alert_fact
- purview:control:fact          → purview_control_fact (unified)

Required curated indexes or summary indexes:
- idx_purview_control_facts
- kpi_purview_health_daily
- kpi_dlp_effectiveness_daily
- kpi_investigation_operations_daily
- kpi_exec_control_score_monthly
- kpi_audit_evidence_monthly
- kpi_retention_lifecycle_daily
- kpi_insider_risk_daily         (privacy-controlled; RBAC required before enabling)

Normalize the following fields into idx_purview_control_facts:
- event_time, ingest_time, source_plane, workload, operation
- policy_name, rule_name, rule_action, enforcement_mode
- user_upn, user_department, user_title
- recipient, recipient_domain, external_internal_flag
- file_name, file_extension, file_path, site_url
- device_name, device_id, ip_address
- sensitivity_label, sensitivity_label_id
- sit_names, sit_family, sit_count, confidence
- alert_id, incident_id, severity, status, classification, determination, assigned_to
- ticket_id, maturity_status

Create lookup tables:
1. label_taxonomy_lookup
   label_id | label_name | reporting_label_family | protection_level | encryption_expected | external_sharing_allowed

2. sit_family_lookup
   sit_name | sit_family | regulated_data_type | executive_category | severity_modifier

3. dlp_policy_lookup
   policy_name | policy_owner | control_objective | workload_scope | deployment_status | expected_action | executive_category

4. kpi_maturity_lookup
   kpi_name | data_source | maturity_status | owner | known_gap | remediation_plan

Build dashboards:

Dashboard 1: Purview Pipeline Health (Audience: Engineering)
- UAL ingestion freshness, DLP.All ingestion freshness
- Defender incident and alert ingestion freshness
- Service health events
- Event volume by source plane
- Parse success rate, missing policy/rule/action rate
- API or connector error count, zero-event days by feed

Dashboard 2: DLP Control Effectiveness (Audience: Engineering / Security / Compliance)
- DLP events by workload, policy, rule
- Action distribution: audit, notify, warn, block, allow, override, restrict
- Block / allow-with-override ratio
- SIT family distribution: NPI, PCI, PII, financial, credentials, legal, custom
- SIT confidence distribution
- Label + SIT mismatch report
- Top external domains, top risky users, top risky files
- 90-day risk exposure trend

Dashboard 3: Investigation Operations (Audience: SOC / Investigations)
- Incidents by severity and status
- Alerts by severity
- Active queue depth, aging by severity
- Mean time to acknowledge / triage (MTTA)
- Mean time to resolve (MTTR)
- False-positive rate and true-positive rate
- Unassigned and reopened incidents
- Top policies producing false positives
- Top entities appearing across multiple incidents

Dashboard 4: Executive Control Scorecard (Audience: Leadership)
- Control Health composite score and Effectiveness composite score
- Program coverage percentage
- Protected vs exposed sensitive activity
- Block/restrict/warn/override trend
- Member-data protection trend
- Top 5 control gaps
- KPI maturity: Blank, Partial, Live
- 90-day risk trend, month-over-month improvement

Dashboard 5: Audit Evidence (Audience: Audit / Compliance)
- Control objective, evidence source, dashboard name
- Current maturity state, last successful event, last refresh
- Owner, known gap, remediation plan
- Export / screenshot evidence status

Define KPI maturity states:
- Blank: dashboard/control exists, but source data is not yet available or connected.
- Partial: some data exists but coverage, parsing, or enrichment is incomplete.
- Live: data is connected, normalized, refreshed, and report-ready.

Composite Health Score formula:
  0.20 * ingestion_freshness_score
+ 0.20 * dlp_event_presence_score
+ 0.20 * alert_pipeline_score
+ 0.15 * parsing_quality_score
+ 0.15 * incident_lifecycle_score
+ 0.10 * kpi_maturity_score

Composite Effectiveness Score formula:
  0.25 * protected_events_score
+ 0.20 * high_risk_reduction_score
+ 0.20 * fp_reduction_score
+ 0.15 * mtta_improvement_score
+ 0.10 * repeat_offender_score
+ 0.10 * override_rate_score

Advanced Hunting enrichment requirements:
Create scheduled Advanced Hunting queries for:
- DLP rule matches by policy/rule/action
- AlertInfo joined to AlertEvidence
- User/file/device/entity evidence extraction
- High-risk user and repeat-entity detection
- Label + SIT + DLP policy combinations
- Endpoint/removable media/browser upload events where available
- DataSecurityEvents where available and licensed/opted in (NOTE: Preview — requires IRM opt-in)

Engineering constraints:
- Advanced Hunting API: 30-day lookback, up to 10,000 rows per API response (paginate for larger result sets), rate-limited, 3-minute query timeout. Use incremental scheduled pulls. Note: the portal UI also caps at 10K rows per query run.
- Preserve raw events before normalization.
- Expect duplicate O365 Management Activity events — deduplicate using stable event identifiers: workload, operation, user, object, event time, source record ID.
- Do not inflate "incidents avoided." Define as blocked/restricted/warned sensitive events and label clearly as a proxy metric.
- Do not mark a KPI Live unless source, refresh, parsing, owner, and dashboard validation are complete.

Deliverables:
1. Data source inventory
2. Splunk ingestion map
3. Normalized field dictionary
4. Lookup table schemas
5. KPI maturity matrix
6. Dashboard wireframes
7. Initial SPL searches
8. Advanced Hunting KQL query pack
9. Audit evidence register
10. Gap / remediation log

Additional required domains — Data Retention / Data Lifecycle Management:
Ingest and normalize Microsoft Purview audit activities related to:
- retention policy configuration and publication
- retention label application, change, and removal
- record declaration (standard and regulatory)
- disposition review: pending, approved, rejected, extended
- item relabeling during disposition (RelabelItem)
- retention extension (ExtendRetention)
- deletion eligibility and deletion execution
- retention exceptions and hold status
- lifecycle workload coverage

Normalize into purview_retention_lifecycle_fact:
- retention_policy_id, retention_policy_name
- retention_label_id, retention_label_name, retention_label_action
- retention_action, retention_duration, retention_trigger
- retention_start_date, retention_expiration_date
- record_status, is_record, is_regulatory_record
- disposition_review_status, disposition_stage, disposition_reviewer
- disposition_decision, disposition_decision_time, disposition_comments
- delete_action, delete_eligibility_time, delete_execution_time
- retention_exception_reason, retention_hold_status
- lifecycle_workload, lifecycle_location

Retention event discovery SPL (run before building parsers):
index=o365 sourcetype="o365:management:activity"
(Operation="*Retention*" OR Operation="*Disposition*" OR Operation="*Record*"
 OR Operation="*Label*" OR Operation="RelabelItem" OR Operation="ExtendRetention")
| stats count by Operation Workload RecordType | sort -count

Additional required domains — Insider Risk Management:
Ingest and normalize Microsoft Purview Insider Risk Management audit records.
Management Activity API record types:
- 306 InsiderRiskManagement        (individual risk activity events — exfiltration signals, sequence triggers, policy matches)
- 307 InsiderRiskManagementAlert   (alert-level records — AlertId, PolicyName, Severity, AlertStatus)
- 308 InsiderRiskManagementCase    (case-level records — CaseId, CaseName, CaseStatus, Severity)

Normalize into purview_insider_risk_fact:
- irm_case_id, irm_case_name, irm_alert_id
- irm_policy_id, irm_policy_name, irm_policy_template
- irm_risk_score, irm_risk_level
- irm_activity_type, irm_activity_time, irm_triggering_event
- irm_user_upn, irm_user_department, irm_user_role (privacy-controlled)
- irm_scoped_user_status
- irm_case_status, irm_alert_status
- irm_assigned_to, irm_review_status
- irm_resolution, irm_false_positive
- irm_notes_present, irm_privacy_redaction_state
- irm_escalated_to_investigation
- irm_created_time, irm_updated_time

IRM privacy requirement:
Apply Splunk RBAC before enabling user-level IRM fields in any dashboard.
Executive and engineering dashboards show aggregate counts, trends, risk bands, and status only.
User-identifiable IRM details are restricted to approved investigation roles.

Updated purview_control_fact unified model must include:
- DLP detection
- Sensitivity labeling
- Retention and lifecycle activity
- Records management activity
- Disposition review
- Insider Risk alerts and cases
- Defender incidents and alerts
- Workflow state and ticket linkage
- KPI maturity tracking

Acceptance criteria:
- Splunk receives UAL/O365 DLP data.
- Splunk receives Defender XDR incidents and alerts.
- Retention/lifecycle audit events are discoverable in Splunk via idx_ms_o365_retention.
- Retention label and disposition activities are parsed where present in UAL.
- Retention & Lifecycle dashboard (Dashboard 6) exists.
- Insider Risk record types 306, 307, and 308 are searched and validated via idx_ms_purview_insider_risk.
- Insider Risk Management dashboard (Dashboard 7) exists.
- Insider Risk reporting is privacy-safe by default; RBAC applied.
- Retention and Insider Risk KPIs are tagged Blank, Partial, or Live.
- At least one dashboard exists for each audience: Engineering, Investigations, Executive, Audit, Records Management.
- Every KPI is tagged Blank, Partial, or Live.
- Every KPI has a source system, owner, refresh cadence, and known limitation.
- Dashboards distinguish Health from Effectiveness.
- Executives can view risk posture in under five minutes.
- Auditors can trace each report back to source telemetry and control objective.
References

Splunk Reporting Architecture

CNC Data Security Platform — Artifact 2: KPI Reporting

Model: Dual-ingestion — UAL (audit evidence) + Defender XDR (investigation) + Advanced Hunting (enrichment) → Splunk normalization → KPI marts → 5 dashboard tiers

Add-ons: Splunk Add-on for Microsoft Office 365 · Splunk Add-on for Microsoft Security

Raw indexes: idx_ms_o365_audit · idx_ms_o365_dlp · idx_ms_o365_retention · idx_ms_purview_insider_risk · idx_ms_defender_incidents · idx_ms_defender_alerts · idx_ms_defender_hunting · idx_ms_service_health

Fact families: purview_dlp_fact · purview_label_fact · purview_retention_lifecycle_fact · purview_insider_risk_fact · defender_incident_fact · defender_alert_fact · purview_control_fact (unified)

KPI marts: kpi_purview_health_daily · kpi_dlp_effectiveness_daily · kpi_investigation_operations_daily · kpi_exec_control_score_monthly · kpi_audit_evidence_monthly · kpi_retention_lifecycle_daily · kpi_insider_risk_daily

Dashboards: 1 Pipeline Health (Eng) · 2 DLP Effectiveness (Eng/SOC) · 3 Investigation Ops (SOC) · 4 Executive Scorecard (Leadership) · 5 Audit Evidence (Audit) · 6 Retention & Lifecycle (Records Mgmt) · 7 Insider Risk (Investigations — privacy-controlled)

Maturity states: Blank (no data) · Partial (incomplete) · Live (connected, normalized, report-ready)