Who this is for: Infrastructure architects, MLOps teams, and compliance officers at organizations that run AI inference across multiple accelerator vendors. Applicable to any enterprise routing workloads across NVIDIA GPU, Google TPU, AMD MI, AWS Trainium, or Intel Gaudi infrastructure.
Why this matters now: The AI compute market is fragmenting. Enterprises increasingly route inference across competing silicon to optimize cost and latency. When workloads cross vendor boundaries, no single vendor's monitoring tools can produce a unified compliance trail. Auditors need vendor-neutral evidence that spans the full silicon stack.
Contents
1. The Cross-Silicon Compliance Gap 2. Multi-Vendor Silicon Landscape 3. How SWT3 Hardware Discovery Works 4. Regulatory Mapping 5. Multi-Silicon Profile Quick Start 6. Cross-Silicon Drift Detection 7. Factor Encoding Reference 8. References1. The Cross-Silicon Compliance Gap
When an enterprise runs its fraud detection model on an NVIDIA H100 cluster in Virginia and its customer service model on Google TPUs in a third-party data center in New York, it faces a question no single vendor can answer: "Can you prove, to a regulator, that both deployments maintained continuous compliance?"
NVIDIA's tooling monitors NVIDIA clusters. Google's tooling monitors Google TPU clusters. Neither can attest what happened on the other's silicon. The enterprise is left assembling compliance evidence from two incompatible telemetry stacks, two different log formats, and two vendor consoles that require separate authentication.
This gap widens as silicon diversity increases. AMD MI300X offers compelling price-performance for inference. AWS Trainium is cost-optimized for training. Intel Gaudi targets inference throughput. Organizations that optimize across these architectures gain economic advantage but lose compliance coherence.
SWT3 solves this by operating at the application layer, above the silicon. The fingerprint formula produces identical cryptographic attestations regardless of which accelerator processes the inference. An auditor verifying an anchor cannot tell from the fingerprint whether it was minted on an NVIDIA GPU, a Google TPU, or an AMD MI300X. The ai_context records which silicon was present, but the cryptographic proof is vendor-neutral by design.
2. Multi-Vendor Silicon Landscape
| Vendor | Accelerator | Discovery Method | SWT3 Detection |
|---|---|---|---|
| NVIDIA | H100, H200, B200, A100, L4, T4 | pynvml / nvidia-smi | Auto |
| TPU v5p, v5e, v6e, Trillium | JAX devices (Python) / TPU_NAME env (Node.js) | Auto | |
| AMD | MI300X, MI325X, MI250 | rocm-smi | Auto |
| AWS | Trainium2, Trainium, Inferentia2 | neuron-ls | Auto |
| Intel | Gaudi3, Gaudi2 | hl-smi | Auto |
| Generic | Any PCI accelerator | /sys/bus/pci/devices class scan | Auto |
All discovery is automatic. The SDK tries each path in priority order and returns the first successful detection. No configuration required. No vendor-specific dependencies. If the vendor's CLI tools are present on the host, the SDK finds them. If not, it falls through to the next path gracefully.
The Fragmentation Accelerates
In a single week (June 24-25, 2026), OpenAI unveiled Jalapeno -- a custom inference ASIC built with Broadcom, targeting 50% lower cost-per-token than NVIDIA GPUs -- and IBM announced the world's first sub-1nm transistor architecture. Enterprises will increasingly run mixed fleets: NVIDIA for training, custom silicon for inference, cloud-native accelerators for burst. Each vendor has a commercial interest in their workloads appearing compliant. Independent cross-silicon attestation is the only architecture that scales without conflicts of interest.
3. How SWT3 Hardware Discovery Works
When you call witness.witness_hardware(), the SDK executes a priority-ordered discovery chain:
- NVIDIA -- Tries pynvml (structured API, optional dependency), then nvidia-smi subprocess
- Google TPU -- Tries JAX device enumeration (Python) or TPU_NAME environment variable (Node.js)
- AMD -- Tries rocm-smi subprocess with CSV output
- AWS -- Tries neuron-ls subprocess with JSON output
- Intel -- Tries hl-smi subprocess with CSV output
- PCI fallback -- Scans /sys/bus/pci/devices for accelerator class codes (0x0302, 0x1200)
The first path that returns results wins. All hardware identifiers (UUIDs, bus IDs, serial numbers) are SHA-256 hashed at discovery time. Raw values never leave the discovery module.
Zero Dependency, Zero Trust
Every discovery path uses subprocess calls to vendor CLI tools or reads from sysfs. No vendor SDKs are required at runtime. No network calls are made during discovery. No hardware identifiers are transmitted in cleartext. The only optional dependency is pynvml (for richer NVIDIA metadata), and it is not required.
3.1 What Gets Recorded
The witness anchor's ai_context (at clearing level 0 or 1) records:
silicon_vendor-- which vendor's silicon was detected (nvidia, google, amd, aws, intel, mixed)discovery_method-- which tool produced the data (pynvml, nvidia-smi, jax, rocm-smi, neuron-ls, hl-smi, tpu-env, pci)topology-- cluster topology (single, multi-gpu, DGX-H100, NVL72, multi-node)accelerator_count-- how many accelerator units were detectedaccelerators[]-- per-unit details (name, memory, vendor, family, hashed IDs)
At clearing level 2 and above, the context is stripped (sensitive clearing). The fingerprint itself contains the accelerator count and topology code but not the vendor identity, preserving the vendor-neutral verification property.
4. Regulatory Mapping
| Regulation | Requirement | Cross-Silicon Relevance | SWT3 Procedure |
|---|---|---|---|
| EU AI Act Art. 12 | Automatic logging of AI system behavior | Logs must capture which compute substrate processed each inference, regardless of vendor | AI-HW.1 |
| EU AI Act Art. 15 | Accuracy, robustness, cybersecurity | Model behavior may vary across silicon (floating point precision, kernel implementations) | AI-DRIFT.1, AI-PERF.1 |
| NIST AI RMF MG-3.2 | Resource allocation monitoring | Resource characteristics change when workloads move between accelerator families | AI-HW.1, AI-ENV.1 |
| EO 14110 Sec. 4.2 | Compute reporting for dual-use models | Reporting thresholds are compute-agnostic; attestation must cover all silicon used | AI-HW.1, AI-SUPPLY.1 |
| CMMC Level 3 | CUI processing environment controls | CUI processed on different silicon substrates requires consistent environment attestation | AI-HW.1, AI-ENV.1 |
5. Multi-Silicon Profile Quick Start
Create a .swt3.yaml file in your project root:
profile: multi-silicon api_key_env: SWT3_API_KEY tenant_id: YOUR_TENANT agent_id: inference-router-v1
The multi-silicon profile activates 8 required procedures covering hardware attestation, inference provenance, drift detection, environment attestation, performance validation, supply chain risk, model integrity, and audit logging. Hardware attestation is required and re-attests every 30 minutes.
Then witness hardware at service startup:
# Python from swt3_ai import Witness witness = Witness.from_config() witness.witness_hardware() # Auto-detects any accelerator vendor
// TypeScript
import { Witness } from "@tenova/swt3-ai";
const witness = Witness.fromConfig();
witness.witnessHardware(); // Auto-detects any accelerator vendor
That is the entire integration. No vendor-specific configuration. No conditional imports. The SDK discovers whatever silicon is present and records it in the witness anchor.
Optional: Enforce Expected Silicon
If your deployment should only run on a specific vendor's silicon, add a runtime profile to your config:
hardware:
require_attestation: true
runtime_profile:
expected_silicon_vendor: nvidia
min_gpu_count: 8
expected_topology: DGX-H100
The SDK logs a warning if the actual hardware doesn't match. In advisory mode, inference continues. In strict mode, the policy violation is recorded. The witness is never a blocker -- it attests what happened, it does not enforce what should happen.
6. Cross-Silicon Drift Detection
When a model migrates from one silicon architecture to another, its behavior may change. Floating point precision differs between CUDA cores and TPU matrix units. Quantization kernels produce different rounding. Attention implementations vary across vendor libraries.
These differences are typically small but can be compliance-relevant for high-risk AI systems where output reproducibility matters (EU AI Act Art. 15).
SWT3 detects silicon migration through the hardware attestation chain:
AI-HW.1records the silicon vendor at service startupAI-DRIFT.1monitors model output for statistical drift over time- If a deployment's silicon vendor changes between attestation windows, the anchor chain shows the transition: the
silicon_vendorfield changes from one anchor to the next - Any concurrent drift in model outputs (detected by
AI-DRIFT.1) can be correlated to the silicon change
An auditor reviewing the anchor chain can see: "Model X was running on NVIDIA H100 from January through March (stable drift metrics), migrated to Google TPU v5p in April (drift spike), then stabilized on TPU by May." The evidence trail is automatic.
7. Factor Encoding Reference
The AI-HW.1 procedure encodes three factors into the witness fingerprint:
| Factor | Value | Meaning |
|---|---|---|
| factor_a | Accelerator count | Number of accelerator units detected (GPUs, TPU chips, Gaudi cards, Neuron devices) |
| factor_b | Health status (1 or 0) | 1 if accelerators detected and topology matches expected; 0 if no accelerators or topology mismatch |
| factor_c | Topology code (0-3) | 0 = single, 1 = multi-GPU/DGX, 2 = NVL/multi-node, 3 = unknown |
The silicon vendor is not encoded in the factors. This is intentional: the fingerprint is vendor-neutral, meaning a verifier can confirm the attestation without knowing which vendor's silicon produced it. The vendor identity is recorded in the ai_context observations at clearing level 0-1 for auditors who need the detail.
8. References
- Regulation (EU) 2024/1689 -- EU AI Act, Art. 12 (automatic logging), Art. 15 (accuracy and robustness)
- EU AI Act Omnibus Agreement (May 7, 2026) -- timeline changes for high-risk systems
- NIST AI Risk Management Framework 1.0 -- MG-3.2 (resource allocation monitoring)
- Executive Order 14110 -- Sec. 4.2 (compute reporting thresholds)
Related SWT3 Guides
- EU AI Act Omnibus Agreement -- comprehensive analysis of all Omnibus amendments
- Bias Detection Witnessing -- sensitive data processing for bias correction
- AI Factory Compliance Blueprint -- GPU cluster metering with SWT3
- SWT3 Protocol Specification -- formal specification with ABNF grammar
- Design Rationale -- why every protocol decision was made
- Live Demo Audit Portal -- interactive compliance evidence