The Core Problem
Your CFO asks why the AI denied a loan. The chain-of-thought log says: "Denied due to insufficient credit history." But Anthropic research shows this might be a post-hoc rationalization, not the actual reason.
The epistemological problem: You cannot audit what the model "thinks." You can only audit what it did—inputs, outputs, retrievals, tool calls. Trust Architecture shifts from introspection to externalized observability.
What Regulations Actually Require
| Regulation | Key Requirement | Technical Implication |
|---|---|---|
| EU AI Act Art. 12 | Automatic event logging over system lifetime | 100% request/response capture with timestamps, KB versions |
| EU AI Act Art. 13 | Log interpretation mechanisms | Compliance dashboards, not raw files |
| EU AI Act | 10-year retention | WORM cold storage architecture |
| GDPR Art. 22 | No "solely automated" decisions with legal effects | Human-in-loop with actual authority to override |
| GDPR Art. 5(1)(e) | Data kept no longer than necessary | Conflicts with 10-year retention → dual-store pattern |
| NYC Law 144 | Annual bias audit, four-fifths rule | Impact ratio < 0.8 = disparate impact flag |
| SEC (proposed) | "Eliminate or neutralize" AI conflicts | Audit trails proving unbiased recommendations |
GDPR "Right to Explanation": Since neural network logic is opaque, shift to counterfactual explanations: "Denied because debt-to-income was X; if Y, would have been approved."
Why Self-Explanation Fails
Chain-of-thought is unfaithful. When models were given biasing cues, they adopted biased answers but invented logical justifications that omitted the bias entirely.
Attention isn't explanation. Jain & Wallace research showed attention weights don't reliably correlate with feature importance. Different attention distributions yield identical predictions.
What works instead:
- Counterfactual testing: Same input, varied demographics → observe output differences directly
- Externalized reasoning: Force models to write executable code (auditable)
- Traceability: "Input A → Output B using KB Version C" beats hallucinated explanations
The Standards Stack
Security Baseline
SOC 2
Processing integrity, confidentiality, change management
AI Governance
ISO 42001
38 Annex A controls: data provenance, impact assessment, lifecycle
Risk Framework
NIST RMF
Map → Measure → Manage → Govern
Integration: NIST to identify risks, ISO 42001 to structure management, SOC 2 to prove controls.
The Auditable Box Pattern
AI Gateway Architecture
Application Layer
AI Gateway (Compliance Enforcement)
Logging, PII Redaction, Policy EnforcementModel Providers
Gateway functions: 100% logging with Article 12 metadata, PII redaction before prompts leave enclave, policy enforcement (jailbreaks, permissions), rate limiting.
Dual-Store Architecture
Resolves 10-year retention vs GDPR erasure conflict:
| Tier | Content | Retention | Access |
|---|---|---|---|
| Hot | Full request/response | 30-90 days | Engineering |
| Cold Archive | Anonymized logs + metadata | 7-10 years | Compliance, Auditors |
| PII Vault | User ID ↔ Anon ID mapping | Policy-defined | DPO, Legal only |
Erasure flow: Delete Hot + PII Vault; Cold Archive stays intact (no longer personal data).
Platform Comparison
Enterprise LLM Observability
| Feature | LangSmith | Langfuse | Helicone | Arize |
|---|---|---|---|---|
| SOC 2 Type II | ||||
| HIPAA/BAA | Self-host | Self-host | ||
| Self-Hosting |
Self-hosted options (Langfuse, Helicone) keep logs in your VPC. See Agent Observability for implementation patterns.
Sector Patterns
Healthcare (HIPAA): Zero retention pattern—process in volatile memory, push to EHR, retain nothing. Abridge exemplifies this. BAAs required (LangSmith, OpenAI Enterprise, Claude Enterprise offer them).
Legal/Finance: Harvey uses SOC 2 + ISO 27001, no-training guarantee, regional data sovereignty. Permission-aware RAG filters documents by AD permissions before LLM sees them.
Enterprise Providers: Claude Enterprise (500K context, SSO, audit logs, no-training commitment) and ChatGPT Enterprise (Compliance API for audit export, data isolation) both designed for regulated use.
The Bottom Line
The pattern: Log everything, rely on traceability over introspection, implement dual-store to resolve regulatory conflicts, design with the auditor as primary user.
The agents that survive regulated industries will prove, years later, exactly what they did and why.
Related Reading
Agent Observability: Monitoring AI Systems in Production
Evaluation ends at deployment. Observability begins. Distributed tracing, guardrails, and the monitoring stack that keeps production agents reliable.
The Agent Safety Stack: Defense-in-Depth for Autonomous AI
Agents that take actions have different risk profiles than chatbots. Here is the defense-in-depth architecture: prompt injection defense, red teaming, kill switches, and guardrail benchmarks.
The HITL Firewall: How Human Oversight Doubles Your AI ROI
Full autonomy is a myth for high-stakes tasks. Smart thresholds with human review deliver 85% cost reduction at 98% accuracy. Here are the approval patterns that work.
Harvey: The $8B Legal AI That BigLaw Actually Trusts
How Harvey became the category-defining legal AI by solving what ChatGPT couldn't: data privacy through the Vault, 0.2% hallucination rate through citation-backed generation, and workflow integration at 4,000-lawyer firms. The definitive case for vertical AI.