Compliance blueprint for GPU inference infrastructure. Cryptographic witness anchors for Triton, Dynamo, and NIM. Zero inference latency.
Who this is for: Infrastructure architects, compliance officers, and MLOps teams operating GPU clusters for AI training and inference. Applicable to any organization running dedicated GPU infrastructure that serves regulated markets (EU, US federal, financial services, healthcare, defense).
Context: GPU clusters purpose-built for AI training and inference are becoming the defining infrastructure of the decade. They produce predictions the way power plants produce electricity. And like power plants, they need metering. The EU AI Act (enforcement August 2, 2026) requires automatic logging of AI system behavior (Art. 12). Today, no standard compliance layer exists for GPU inference infrastructure.
An AI factory is a large-scale GPU cluster purpose-built for training and serving AI models. Unlike general-purpose cloud compute, AI factories are optimized for one thing: turning data into predictions at scale.
The core components of a modern AI factory:
At scale, a single AI factory can serve millions of inference requests per day across dozens of models. Sovereign AI deployments (national GPU clusters in France, Japan, India, Singapore, Saudi Arabia) are multiplying this pattern globally.
AI factories are subject to a growing web of regulations that demand evidence of responsible AI operation. The common thread: regulators want to know what the AI did, when, why, and whether it was operating within bounds.
| Regulation | Key Requirement | AI Factory Impact |
|---|---|---|
| EU AI Act Art. 12 | Automatic logging of AI system behavior | Every inference must produce a traceable record |
| EU AI Act Art. 9 | Risk management system | Guardrail decisions must be documented |
| EU AI Act Art. 11 | Technical documentation | Model identity, weights, and versions must be tracked |
| EU AI Act Art. 15 | Accuracy, robustness, cybersecurity | Drift detection and adversarial testing evidence required |
| EU AI Act Art. 50 | Transparency for AI-generated content | Content provenance marking at the inference layer |
| EO 14110 Sec. 4.2 | Dual-use foundation model reporting | Training compute and model capability documentation |
| NIST AI RMF | MAP / MEASURE / MANAGE / GOVERN | Continuous monitoring of AI system behavior |
| CMMC Level 2+ | CUI protection, access control, audit | Classified inference requires hardware attestation |
| SR 11-7 | Model risk management | Drift detection, baseline comparison, validation evidence |
| Colorado AI Act | Algorithmic impact assessment | Automated decision documentation (effective Jan 2027) |
Enforcement timeline: EU AI Act general provisions enforce August 2, 2026. Colorado AI Act enforces January 1, 2027. CMMC 2.0 rulemaking is final. SR 11-7 examinations are ongoing. The compliance window is closing.
AI factories have world-class infrastructure for running models. They have no standard infrastructure for proving what those models did.
You would never run a data center without audit logs. AI factories are the new data centers, and most of them have no compliance audit trail.
SWT3 is a cryptographic witnessing protocol that sits alongside the inference pipeline and produces compliance evidence without interfering with model serving.
What SWT3 is:
What SWT3 is not:
For every inference event, SWT3 produces a Witness Anchor: a compact, cryptographically signed record containing:
The anchor fingerprint formula is locked and produces identical results across Python, TypeScript, Rust, C#, and Ruby SDKs. Raw prompts and responses never leave the inference host. Only hashes are transmitted.
SWT3 runs parallel to the inference path. It observes completion events and mints anchors in background threads. The inference pipeline is never blocked.
Three integration points for AI factories:
Decorate any Triton Python backend with @witness_execute(). The decorator intercepts the execute() method, hashes request/response tensors, and mints an anchor after each batch completes.
from swt3_ai.adapters.triton import witness_execute
class TritonPythonModel:
@witness_execute()
def execute(self, requests):
# your inference logic unchanged
return responses
Wrap streaming inference endpoints with @witness_endpoint(). Chunks pass through untouched in real-time. The anchor mints after stream completion from accumulated data.
from swt3_ai.adapters.dynamo import witness_endpoint
@witness_endpoint()
@dynamo_endpoint()
async def generate(self, request):
async for chunk in self.backend.generate(request):
yield chunk
For agent runtimes that produce OCSF v1.7.0 events, the OpenShell adapter consumes structured logs and mints anchors from five event classes: network activity, process activity, file activity, configuration changes, and security findings.
from swt3_ai.adapters.openshell import OpenShellWitness
observer = OpenShellWitness()
observer.process_event(ocsf_event) # mints anchor from event
Each AI factory compliance concern maps to one or more SWT3 procedures from the Unified Compliance Taxonomy.
| AI Factory Concern | Procedure | What It Witnesses | Regulation |
|---|---|---|---|
| Inference provenance | AI-INF.1 |
Prompt hash, response hash, model ID, timestamp | Art. 12 |
| Model identity | AI-MDL.1 |
Model weight file integrity (SHA-256) | Art. 11 |
| Adapter stack | AI-MDL.6 |
LoRA/QLoRA adapter versions and hashes | Art. 11 |
| Quantization | AI-MDL.7 |
Quantization method, bit depth, calibration | Art. 15 |
| Hardware attestation | AI-HW.1 |
GPU inventory, health, topology | SI-7, CMMC |
| Hardware root of trust | AI-HW.3 |
TPM 2.0 / Pluton PCR register state | CMMC |
| Guardrail decisions | AI-GRD.1 |
Policy check result, rule version | Art. 9 |
| Agent tool calls | AI-TOOL.1 |
Tool name, input hash, output hash | Art. 14 |
| Access control | AI-ACC.1 |
Credential routing, authorization gate | AC-2 |
| Model drift | AI-DRIFT.1 |
Statistical divergence from baseline | Art. 15, SR 11-7 |
| Adversarial testing | AI-REDTEAM.1 |
Red team scope, findings, methodology | Art. 9(8) |
| Supply chain | AI-SBOM.1 |
Component manifest, dependency hashes | EO 14028 |
| Content provenance | AI-MARK.1 |
AI-generated content marking | Art. 50 |
| Audit trail integrity | AI-AUDIT.1 |
Tamper-evident log with Merkle rollup | Art. 12 |
| Data governance | AI-DATA.1 |
Input classification, data source | Art. 10 |
Every inference request produces a witness anchor binding the prompt hash, response hash, model identifier, and millisecond-precision timestamp into a single cryptographic fingerprint. This is the foundational procedure for Article 12 automatic logging compliance.
Request the tenant's daily Merkle root for any date range. Each root covers all inference anchors for that day. Verify any individual anchor's inclusion via the proof API.
Witnesses the GPU/accelerator inventory, health status, memory topology, and interconnect configuration of the inference host. For AI factories, this provides the link between a compliance record and the physical hardware that produced it.
Cross-reference AI-HW.1 anchors with AI-INF.1 anchors to trace any inference back to the specific hardware that served it. The hardware attestation anchor includes GPU UUIDs.
Witnesses statistical divergence metrics comparing current model behavior against a registered baseline. For AI factories serving multiple model versions, drift detection provides continuous evidence that models are operating within validated bounds.
For SR 11-7 examinations, drift anchors provide the continuous monitoring evidence that model risk management requires. Look for the baseline_id and divergence_score in the anchor factors.
Witnesses the integrity of the compliance audit trail itself. Daily Merkle rollups bind all anchors for a tenant into a single root hash. Any tampering with historical records invalidates the Merkle proof chain.
The Merkle root is the single artifact that proves the entire day's compliance record is intact. Request it via the /api/v1/merkle endpoint.
SWT3 integrates with AI factories at two levels, depending on the deployment model and compliance requirements.
Install the Python SDK and add a decorator to the inference backend. Best for single-model deployments or teams with direct access to the inference code.
pip install swt3-ai
# Set connection string
export SWT3_DSN=https://axm_live_xxx@sovereign.tenova.io/MY_ENCLAVE
# Add one decorator to the Triton backend
@witness_execute()
def execute(self, requests):
return responses
Best for: Single model, direct code access, fastest integration path.
Deploy a separate container alongside the inference server that consumes completion events from the Triton metrics endpoint or Dynamo event stream. No changes to the inference code.
Best for: Multi-model Kubernetes clusters, existing CI/CD pipelines, teams that cannot modify model code.
Both deployment options use the same configuration pattern:
# Option A: Connection string (recommended)
SWT3_DSN=https://axm_live_xxx@sovereign.tenova.io/MY_ENCLAVE
# Option B: Individual environment variables
SWT3_ENDPOINT=https://sovereign.tenova.io
SWT3_API_KEY=axm_live_xxx
SWT3_TENANT_ID=MY_ENCLAVE
# Optional: Set clearing level (0=Analytics, 1=Standard, 2=Sensitive, 3=Classified)
SWT3_CLEARING_LEVEL=1
If no configuration is set, all SWT3 adapters operate as transparent no-ops. You can instrument your code today and activate witnessing when ready.
SWT3 ships 14 industry compliance profiles that pre-configure the required procedures, clearing levels, and signing tiers for specific regulatory environments. Four profiles are directly relevant to AI factory operators:
| Profile | Framework | Clearing Level | Key Procedures |
|---|---|---|---|
| defense-govcon | CMMC + NIST 800-171 | 3 (Classified) | AI-HW.1, AI-HW.3, AI-INF.1, AI-ACC.1, AI-SBOM.1 |
| fintech-model-risk | SR 11-7 | 2 (Sensitive) | AI-DRIFT.1, AI-BASE.1, AI-INF.1, AI-AUDIT.1 |
| healthcare-clinical | HIPAA | 2 (Sensitive) | AI-CONSENT.1, AI-DATA.1, AI-INF.1, AI-EXPL.1 |
| autonomous-systems | Safety-critical | 3 (Classified) | AI-SAFE.1, AI-ROBUST.1, AI-HW.1, AI-CHAIN.1 |
Profiles are YAML files that can be loaded via swt3 init --profile defense-govcon or by setting SWT3_PROFILE in the environment.
Three steps to compliance-metered inference:
# Python
pip install swt3-ai
# TypeScript
npm install @tenova/swt3-ai
# Triton backend (one decorator)
from swt3_ai.adapters.triton import witness_execute
class TritonPythonModel:
@witness_execute()
def execute(self, requests):
return responses
# Or Dynamo endpoint (one decorator)
from swt3_ai.adapters.dynamo import witness_endpoint
@witness_endpoint()
async def generate(self, request):
async for chunk in self.backend.generate(request):
yield chunk
# Run the zero-config demo to see anchors locally
python -m swt3_ai.demo
# Verify an anchor fingerprint
# https://sovereign.tenova.io/verify
For connected mode (cloud ledger), set the SWT3_DSN environment variable and anchors will stream to the compliance ledger automatically.