How to Evaluate AI Systems Against the NIST AI RMF Using Cryptographic Witness Evidence

Audience: You are an assessor or auditor evaluating an AI system's risk management practices against the NIST AI RMF (AI 100-1). This walkthrough maps AI RMF functions and categories to SWT3 procedures that generate cryptographic compliance evidence. Each category includes verification steps and common findings to accelerate your assessment.

Protocol note: SWT3 is an industry-agnostic cryptographic witness protocol. The evidence described in this guide is identical regardless of the AI system's application domain, deployment model, or organizational context. This walkthrough maps that universal evidence to NIST AI RMF functions and categories so assessors can verify risk management practices using independently verifiable cryptographic proof.

1. Overview 2. How to Use This Walkthrough 3. GOVERN Function GOVERN 1.1 -- Dual-Use and Oversight Policies GOVERN 1.2 -- Technical Environment and Provenance GOVERN 1.3 -- Supply Chain and Multi-Agent Governance GOVERN 1.4 -- Audit and Logging GOVERN 1.5 -- Trust, Authorization, and Guardrails GOVERN 1.7 -- Transparency and Documentation GOVERN 2.1 -- Roles and Responsibilities GOVERN 2.2 -- Governance Mechanisms GOVERN 4.1 -- Human Oversight Mechanisms GOVERN 6.1 -- Policy Enforcement 4. MAP Function MAP 1.1 -- System Identification and Inventory MAP 2.1 -- Risk Identification MAP 2.3 -- Fairness, Explainability, and Data Provenance MAP 3.5 -- Data Governance MAP 4.1 -- Data Lineage MAP 5.2 -- Impact Assessment 5. MEASURE Function MEASURE 2.5 -- Performance, Fairness, and Explainability Metrics MEASURE 2.6 -- Drift, Robustness, and Model Integrity MEASURE 3.1 -- Red Team and Supply Chain Testing 6. MANAGE Function MANAGE 1.3 -- Model Lifecycle MANAGE 2.2 -- Cybersecurity MANAGE 2.3 -- Safety and Security Controls MANAGE 2.4 -- Access Control and Revocation MANAGE 3.1 -- Incident Response MANAGE 3.2 -- Autonomous Operations and Incident Management MANAGE 4.1 -- Human Oversight, Post-Market, and Violations 7. Anchor Anatomy 8. Assessment Resources

1. Overview

The NIST AI Risk Management Framework (AI 100-1) organizes AI governance into four functions: GOVERN MAP MEASURE MANAGE. Each function contains categories and subcategories that describe organizational practices for trustworthy AI.

The SWT3 protocol generates cryptographic witness anchors for 106 AI-specific procedures across 55 namespaces. This walkthrough maps these procedures to 27 NIST AI RMF categories across all four functions. For each category, you will find the specific procedures that produce evidence, what to verify, and what findings to watch for.

Coverage summary: 106 SWT3 procedures across 55 namespaces map to 27 AI RMF categories. GOVERN receives the deepest coverage (37 procedures across 10 categories), reflecting the framework's emphasis on organizational governance as the foundation for trustworthy AI.

2. How to Use This Walkthrough

Step 1: Identify the AI RMF categories in scope for your assessment.

Not all categories apply to every system. A narrow-purpose classifier may only touch MAP 1.1 and MEASURE 2.5, while a general-purpose model deployed across business units may require full coverage.

Step 2: For each category, locate the SWT3 procedures in this guide.

Each category section lists the procedures that generate relevant evidence. Check the organization's witness ledger for anchors matching those procedure IDs.

Step 3: Verify anchors using the public verification endpoint.

Navigate to /verify and enter any SWT3 anchor fingerprint to confirm it has not been tampered with. Batch verification is available for enclave-wide integrity checks.

Step 4: Document gaps where procedures exist but no anchors have been minted.

A procedure without a corresponding anchor indicates the organization has not yet generated evidence for that requirement. This is a finding, not necessarily a failure -- the practice may exist without being witnessed.

Assessor tip: Use the interactive assessment tool to filter procedures by the NIST AI RMF framework and track completion status during your evaluation.

3. GOVERN Function

The GOVERN function establishes and maintains the organizational policies, processes, and accountability structures needed for trustworthy AI. It is the largest function in the AI RMF and receives the broadest SWT3 procedure coverage because governance decisions ripple through every other function.

SWT3 maps 37 procedures across 10 GOVERN categories. Evidence in this function tends to be organizational rather than technical, so look for anchors that witness policy versions, approval gates, and role assignments rather than model metrics.

GOVERN 1.1 -- Dual-Use and Oversight Policies

What this requires: The organization must establish policies addressing the potential for AI systems to be repurposed for unintended uses, including dual-use scenarios. Governance structures must define oversight responsibilities and human-in-the-loop requirements for high-risk applications.

Procedure	Title	What It Witnesses
`AI-DUALUSE.1`	Dual-Use Risk Classification	Records the classification decision for systems with potential dual-use applications, including risk tier and justification.
`AI-GOV.1`	Governance Policy Version	Witnesses the active governance policy version, including approval authority and effective date.
`AI-HITL.1`	Human-in-the-Loop Declaration	Records whether human oversight is required for the system and at what decision points it is enforced.

How to verify

1. Request the organization's AI governance policy and confirm a current AI-GOV.1 anchor exists with a matching policy version hash.

2. For systems with dual-use potential, verify that AI-DUALUSE.1 was minted before operational deployment, not retroactively.

3. Confirm the HITL declaration in AI-HITL.1 matches the actual operational workflow. Interview operators to validate.

Common finding: Organizations frequently have governance policies but no AI-DUALUSE.1 classification on record. If the system has potential for repurposing beyond its stated intent, the absence of a dual-use classification is a material gap. The assessor determines which repurposing scenarios are relevant based on the system's capabilities and deployment context.

GOVERN 1.2 -- Technical Environment and Provenance

What this requires: The organization must document the technical environment in which AI systems operate, including hardware, software dependencies, model provenance, and adapter configurations. This supports reproducibility and incident investigation.

Procedure	Title	What It Witnesses
`AI-ENV.1`	Runtime Environment	Records the runtime environment configuration, including OS, framework versions, and dependency manifests.
`AI-ENV.2`	Deployment Environment	Witnesses the deployment target (cloud region, container image, orchestrator) at the time of release.
`AI-HW.1`	Hardware Attestation	Records the compute hardware (GPU model, memory, accelerator) used during inference or training.
`AI-HW.3`	Hardware Runtime Profile	Witnesses runtime hardware utilization, thermal state, and error rates during model execution.
`AI-MDL.5`	Model Weights Hash	Records the SHA-256 hash of model weight files, establishing a tamper-evident baseline.
`AI-MDL.6`	Adapter Stack	Witnesses the LoRA, QLoRA, or fine-tuning adapter configuration active at inference time.

How to verify

1. Compare the AI-MDL.5 weight hash in the ledger against a fresh hash of the model files currently in production. Any mismatch indicates undocumented model changes.

2. Verify AI-ENV.1 and AI-ENV.2 anchors exist for each deployment environment. Organizations running in multiple regions should have per-region anchors.

3. If the system uses fine-tuned adapters, confirm AI-MDL.6 anchors record the adapter lineage (base model, training dataset reference, adapter version).

Assessor tip: AI-MDL.5 (weight hashing) is the single most important provenance control. If the organization can demonstrate continuous weight integrity via a chain of AI-MDL.5 anchors, they have strong evidence for model tamper detection.

GOVERN 1.3 -- Supply Chain and Multi-Agent Governance

What this requires: AI supply chains must be documented, including third-party model providers, data sources, and downstream consumers. For multi-agent systems, the organization must define how agent-to-agent interactions are governed and what chain-of-custody rules apply.

Procedure	Title	What It Witnesses
`AI-CHAIN.1`	Chain-of-Custody Start	Records the beginning of a witnessed inference chain, establishing a cycle_id for correlation.
`AI-CHAIN.2`	Chain-of-Custody Link	Witnesses each subsequent step in a multi-model or multi-agent pipeline, linking to the parent cycle_id.
`AI-GOV.6`	Supply Chain Attestation	Records the declared provenance of third-party models, datasets, and API dependencies.
`AI-MULTI.1`	Multi-Agent Coordination	Witnesses the coordination protocol between agents, including role assignments and escalation rules.

How to verify

1. For multi-agent systems, request the full chain of AI-CHAIN.1 and AI-CHAIN.2 anchors for a sample transaction. Verify that cycle_ids form a complete, unbroken chain from start to finish.

2. Check that AI-GOV.6 supply chain attestations exist for every third-party model in the system's AI SBOM.

3. For AI-MULTI.1, confirm that the witnessed coordination protocol matches the actual runtime behavior by comparing with system logs.

Common finding: Organizations using orchestration frameworks (LangChain, CrewAI, AutoGen) often have no chain-of-custody witnessing. Each agent call is an unwitnessed handoff. If the system routes between multiple models or agents, the absence of AI-CHAIN.1/CHAIN.2 anchors is a significant gap.

GOVERN 1.4 -- Audit and Logging

What this requires: The organization must maintain audit trails for AI system decisions, including who accessed the system, what inputs were provided, and what outputs were generated. Logging must be tamper-evident and time-stamped.

Procedure	Title	What It Witnesses
`AI-AUDIT.1`	Audit Trail Integrity	Witnesses the integrity of the AI audit log, including log completeness and tamper-detection status.
`AI-AUDIT.2`	External Timestamp Attestation	Records an RFC 3161 timestamp from an external Timestamp Authority, proving the audit record existed at a specific time.
`AI-LOG.1`	Inference Logging	Witnesses that inference inputs and outputs are being captured, including the log retention policy in effect.

How to verify

1. Request a sample of AI-AUDIT.1 anchors and verify they are generated on a regular cadence (daily or per-session).

2. For organizations claiming tamper-evident logging, check for AI-AUDIT.2 anchors with RFC 3161 timestamps. These provide independent third-party proof of log existence.

3. Verify AI-LOG.1 confirms active inference logging. Check the retention period factor -- many organizations log inferences but purge them too quickly for meaningful audit.

Assessor tip: AI-AUDIT.2 with an RFC 3161 timestamp is the strongest form of audit evidence because the timestamp comes from an external, independent authority. If the organization has these, prioritize verifying the TSA certificate chain.

GOVERN 1.5 -- Trust, Authorization, and Guardrails

What this requires: AI systems must implement authorization controls, trust boundaries, and operational guardrails. Access to model capabilities must be gated, and the system must prevent unauthorized or unsafe outputs through input/output filtering.

Procedure	Title	What It Witnesses
`AI-AUTO.2`	Autonomous Generation Depth	Records the depth of autonomous generation (e.g., recursive code generation), including halt conditions.
`AI-CONSENT.1`	Consent Verification	Witnesses that user consent was obtained before AI processing, including the consent mechanism and scope.
`AI-GOV.4`	Policy Compliance Gate	Records a pre-inference policy check result, confirming the request was authorized against the active policy.
`AI-GRD.1`	Input Guardrail	Witnesses input filtering decisions, including blocked prompts and the guardrail rule that triggered.
`AI-GRD.2`	Output Guardrail	Records output filtering decisions, including content that was blocked or modified before delivery.
`AI-TRUST.1`	Trust Verification	Witnesses the trust verification result for an agent or service requesting model access.
`AI-TRUST.2`	Credential Presentation	Records the credential presented by an agent, including credential type, issuer, and expiration.

How to verify

1. Check that AI-GRD.1 and AI-GRD.2 anchors exist and are being generated at inference time, not just during initial deployment.

2. For systems requiring consent under applicable regulations, verify AI-CONSENT.1 anchors exist for each data subject interaction or confirm a batch consent mechanism is witnessed.

3. Verify AI-GOV.4 (policy compliance gate) fires before inference. The anchor timestamp must precede the corresponding inference anchor. If the gate fires after inference, it is not a gate -- it is a post-hoc check.

4. For agent-to-agent communication, verify that AI-TRUST.1 and AI-TRUST.2 form a matched pair for each trust negotiation.

Common finding: Guardrails (AI-GRD.1, AI-GRD.2) exist in the system configuration but are not witnessed at runtime. The guardrail may be active, but without anchors, there is no cryptographic proof it was enforced for any specific inference.

GOVERN 1.7 -- Transparency and Documentation

What this requires: Organizations must maintain comprehensive documentation of AI system capabilities, limitations, intended uses, and known risks. Transparency measures must include model cards, data sheets, and clear labeling of AI-generated content.

Procedure	Title	What It Witnesses
`AI-CHR.1`	Agent Charter	Records the agent's declared purpose, constraints, and operational boundaries.
`AI-DATA.2`	Training Data Attestation	Witnesses the training data provenance, including dataset identifiers and preprocessing steps.
`AI-GRD.3`	Policy Version Binding	Records which specific policy version governs a model's guardrail configuration.
`AI-LIC.1`	License Compliance	Witnesses the license terms under which a model or dataset is used, including open-weight governance obligations.
`AI-MARK.1`	AI Content Marking	Records whether AI-generated content is labeled as such, including the marking mechanism.
`AI-SKILL.1`	Skill Manifest	Witnesses the declared capabilities (skills) of an AI agent, providing a machine-readable capability inventory.
`AI-TOOL.1`	Tool Usage	Records which external tools an agent invoked, including tool identity and the inputs provided.
`AI-TRANS.1`	Transparency Report	Witnesses the publication or update of a transparency report, including scope and reporting period.
`AI-WATERMARK.1`	Output Watermark	Records whether AI-generated outputs carry a digital watermark and the watermarking method used.

How to verify

1. Check that AI-CHR.1 (agent charter) exists for every agent in the system. The charter should predate the agent's first inference anchor.

2. For systems using open-weight models (LLaMA, Mistral, Falcon), verify that AI-LIC.1 records the license type and any use restrictions. Many open-weight licenses prohibit specific applications.

3. Confirm AI-MARK.1 anchors demonstrate active content marking. For EU AI Act high-risk systems, this is a legal requirement under Article 50.

4. Review AI-TOOL.1 anchors for completeness. If the agent can invoke 12 tools but only 3 have witnessed usage, ask why the other 9 are not covered.

Framework intersection: GOVERN 1.7 transparency requirements overlap significantly with EU AI Act Article 11 (Technical Documentation) and Article 50 (Transparency for GPAI). If the organization is also subject to EU AI Act, AI-MARK.1 and AI-TRANS.1 evidence serves both frameworks.

GOVERN 2.1 -- Roles and Responsibilities

What this requires: The organization must define and document roles and responsibilities for AI risk management, including who is accountable for model performance, who approves deployment decisions, and who monitors ongoing operations.

Procedure	Title	What It Witnesses
`AI-GOV.2`	Responsibility Assignment	Records the assignment of AI governance roles, including the responsible individual and their scope of authority.
`AI-INF.3`	Inference Authorization	Witnesses the identity of the individual or service that authorized a specific inference or batch of inferences.

How to verify

1. Confirm AI-GOV.2 anchors name specific individuals or role-holders, not generic teams. Accountability requires named parties.

2. For high-risk systems, verify AI-INF.3 records show that authorized personnel (not just service accounts) approved inference operations.

GOVERN 2.2 -- Governance Mechanisms

What this requires: The organization must implement mechanisms (boards, committees, automated checks) that enforce governance decisions throughout the AI lifecycle.

Procedure	Title	What It Witnesses
`AI-GOV.7`	Governance Board Decision	Records decisions made by an AI governance board or ethics committee, including the decision rationale and vote outcome.

How to verify

1. Check that AI-GOV.7 anchors exist for major governance decisions (model deployment approval, risk acceptance, decommissioning).

2. Verify the cadence of governance board meetings by examining the timestamp distribution of AI-GOV.7 anchors.

GOVERN 4.1 -- Human Oversight Mechanisms

What this requires: The organization must implement effective mechanisms for human oversight of AI systems, including the ability to intervene, override, or shut down AI operations when necessary.

Procedure	Title	What It Witnesses
`AI-HITL.3`	Human Override Record	Records instances where a human overrode an AI decision, including the override reason and the original AI output.

How to verify

1. Request AI-HITL.3 anchors and verify that overrides are being recorded. A system with zero overrides over months of operation may indicate the override mechanism is unused or unavailable.

2. Cross-reference override frequency with the system's risk tier. High-risk systems with very few overrides warrant closer examination of the oversight process.

GOVERN 6.1 -- Policy Enforcement

What this requires: Governance policies must be actively enforced, not merely documented. The organization must demonstrate that policy violations are detected, escalated, and resolved.

Procedure	Title	What It Witnesses
`AI-GOV.5`	Policy Enforcement Action	Records an enforcement action taken in response to a policy violation, including the violation type and remediation.

How to verify

1. Verify AI-GOV.5 anchors exist. The absence of any enforcement actions may indicate either excellent compliance or a failure to detect violations.

2. Cross-reference AI-GOV.5 records with AI-VIO.1 (violation detection) anchors from the MANAGE function to confirm violations flow through to enforcement.

4. MAP Function

The MAP function focuses on understanding the AI system's context, identifying risks, and characterizing the system's capabilities and limitations. It is the foundation for risk-informed decisions in MEASURE and MANAGE. SWT3 maps 16 procedures across 6 MAP categories.

Evidence in this function should demonstrate that the organization understands what the system does, who it affects, and what can go wrong. Look for anchors that document system identity, data provenance, and impact assessments.

MAP 1.1 -- System Identification and Inventory

What this requires: The organization must maintain an inventory of AI systems, including unique identifiers, intended purposes, deployment status, and technical specifications. Each system must be identifiable and trackable throughout its lifecycle.

Procedure	Title	What It Witnesses
`AI-GOV.3`	System Registration	Records the registration of an AI system in the organization's inventory, including purpose and classification.
`AI-ID.1`	Agent Identity	Witnesses the cryptographic identity of an AI agent, binding it to a persistent identifier (agent_id).
`AI-MDL.4`	Model Card	Records the model card contents, including architecture, training data summary, and performance benchmarks.
`AI-SBOM.1`	AI Software Bill of Materials	Witnesses the full dependency tree of the AI system, including model providers, libraries, and data sources.

How to verify

1. Confirm every AI system in scope has an AI-ID.1 anchor. The agent_id should be consistent across all subsequent anchors for that system.

2. Verify AI-SBOM.1 exists and is current (minted within the last 90 days or after any dependency change).

3. Check that AI-MDL.4 model card anchors cover all models in production, not just the primary model.

Common finding: Organizations maintain an internal spreadsheet of AI systems but have no AI-GOV.3 or AI-ID.1 anchors. Without cryptographic identity binding, there is no tamper-evident link between the system in the inventory and the system generating inferences.

MAP 2.1 -- Risk Identification

What this requires: The organization must identify and document risks associated with AI systems, including technical risks (model failure, data poisoning), societal risks (bias, discrimination), and operational risks (availability, misuse).

Procedure	Title	What It Witnesses
`AI-RISK.1`	Risk Assessment	Records a risk assessment outcome, including identified risks, likelihood, impact, and mitigation status.

How to verify

1. Confirm AI-RISK.1 anchors are generated at deployment and on a recurring schedule (quarterly minimum for high-risk systems).

2. Review the risk categories captured. A risk assessment that only covers technical risks but ignores fairness, privacy, and societal impact is incomplete under the AI RMF.

MAP 2.3 -- Fairness, Explainability, and Data Provenance

What this requires: The organization must characterize the AI system's behavior with respect to fairness, explainability, and data quality. This includes understanding how the model makes decisions, whether it exhibits bias, and whether its data sources are reliable.

Procedure	Title	What It Witnesses
`AI-EXPL.2`	Explainability Method	Records the explainability technique applied (SHAP, LIME, attention maps) and whether it is available to end users.
`AI-FAIR.2`	Bias Detection Method	Witnesses the bias detection methodology used, including protected attributes tested and statistical tests applied.
`AI-FAIR.3`	Bias Mitigation Action	Records specific actions taken to mitigate detected bias, including the technique and its measured effect.
`AI-INF.1`	Inference Record	Witnesses an individual inference event, including model version, clearing level, and response metadata.
`AI-MDL.2`	Model Version	Records the active model version at inference time, enabling traceability from output to specific model state.
`AI-RAG.1`	RAG Provenance	Witnesses the retrieval-augmented generation context, including source documents retrieved and relevance scores.

How to verify

1. For systems making consequential decisions about individuals, verify AI-FAIR.2 anchors document the protected attributes tested. The absence of fairness testing evidence is a material gap. The assessor determines which use cases are consequential based on the system's context.

2. Check that AI-EXPL.2 records the explainability method. Systems that claim to be "explainable" should have anchored evidence of which technique is used and when.

3. For RAG systems, verify AI-RAG.1 anchors exist. These prove the system can trace its outputs to specific source documents, which is critical for factual accuracy claims.

Assessor tip: AI-INF.1 and AI-MDL.2 together form the minimum viable inference trail. If the organization has these two, you can trace any output back to a specific model version at a specific time. Start your evidence review here.

MAP 3.5 -- Data Governance

What this requires: The organization must implement data governance practices that address data quality, representativeness, and fitness for purpose. Data used to train, test, or operate AI systems must be documented and managed.

Procedure	Title	What It Witnesses
`AI-DATA.1`	Data Quality Attestation	Witnesses the data quality assessment outcome, including completeness, accuracy, and representativeness metrics.
`AI-DATA.4`	Synthetic Data Declaration	Records whether synthetic data was used in training or testing, including the generation method and proportion.

How to verify

1. Confirm AI-DATA.1 anchors exist for each training dataset and are refreshed when data is updated or augmented.

2. If the organization uses synthetic data, verify AI-DATA.4 documents the percentage of synthetic vs. real data and the generation methodology.

MAP 4.1 -- Data Lineage

What this requires: The organization must maintain data lineage records that trace data from source to model input, including transformations, filtering, and aggregation steps.

Procedure	Title	What It Witnesses
`AI-DATA.3`	Data Lineage Record	Records the full lineage of a dataset, including source systems, transformation pipeline, and version history.

How to verify

1. Request AI-DATA.3 anchors and trace the lineage from raw source to model input. Every transformation step should be documented.

2. Verify the lineage record includes data retention and deletion policies, not just the processing pipeline.

MAP 5.2 -- Impact Assessment

What this requires: The organization must conduct impact assessments for AI systems, particularly those that affect individuals' rights, safety, or opportunities. Assessments should consider both intended and unintended consequences.

Procedure	Title	What It Witnesses
`AI-DPIA.1`	Data Protection Impact Assessment	Records the completion of a DPIA, including the assessment scope, identified risks, and mitigation measures.
`AI-IMPACT.1`	Societal Impact Assessment	Witnesses a broader impact assessment covering societal, environmental, and equity dimensions.

How to verify

1. Verify AI-DPIA.1 was completed before deployment, not retroactively. The anchor timestamp should predate the first AI-INF.1 inference anchor.

2. For high-impact systems, check that AI-IMPACT.1 covers environmental and equity dimensions, not just privacy and security.

Framework intersection: AI-DPIA.1 evidence satisfies both NIST AI RMF MAP 5.2 and GDPR Article 35 (DPIA). If the organization processes EU personal data, this procedure generates dual-framework evidence from a single witness event.

5. MEASURE Function

The MEASURE function addresses quantitative and qualitative assessment of AI system performance, reliability, and trustworthiness. Evidence in this function is inherently technical, covering metrics, drift detection, and adversarial testing. SWT3 maps 14 procedures across 3 MEASURE categories.

This is where the "show your work" principle applies most directly. Governance policies (GOVERN) and risk identification (MAP) set expectations; MEASURE proves the system meets them.

MEASURE 2.5 -- Performance, Fairness, and Explainability Metrics

What this requires: The organization must define and regularly evaluate metrics for AI system performance, fairness, and explainability. Metrics must be documented, tracked over time, and compared against acceptable thresholds.

Procedure	Title	What It Witnesses
`AI-EXPL.1`	Explainability Score	Records the quantitative explainability score for a model or inference, including the scoring methodology.
`AI-FAIR.1`	Fairness Metric	Witnesses the measured fairness metric (disparate impact ratio, equalized odds, demographic parity) and the threshold applied.
`AI-INF.2`	Inference Quality	Records quality metrics for inferences (confidence score, latency, token count), enabling performance trend analysis.
`AI-PERF.1`	Performance Benchmark	Witnesses the results of a formal performance benchmark, including the benchmark suite, dataset, and scores achieved.
`AI-RAG.2`	RAG Relevance Score	Records the relevance scoring of retrieved documents in a RAG pipeline, measuring retrieval quality.
`AI-SKILL.2`	Memory Context	Witnesses the memory/context window utilization of an agent, including what was retained and what was discarded.

How to verify

1. Request a time series of AI-FAIR.1 anchors and verify the organization has defined acceptable thresholds. Measuring fairness without a threshold is measurement without accountability.

2. Check AI-PERF.1 benchmark results against the model card (AI-MDL.4). Performance claims in the model card should be supported by anchored benchmark evidence.

3. For RAG systems, verify AI-RAG.2 relevance scores are within acceptable ranges. Consistently low relevance scores indicate retrieval quality issues that affect output accuracy.

Common finding: Organizations track performance metrics in internal dashboards but do not generate SWT3 anchors for them. This means the metrics exist but are not cryptographically witnessed, so their integrity cannot be verified by an external assessor.

MEASURE 2.6 -- Drift, Robustness, and Model Integrity

What this requires: The organization must monitor AI systems for drift (changes in model behavior over time), assess robustness against adversarial inputs, and verify model integrity against known baselines.

Procedure	Title	What It Witnesses
`AI-BASE.1`	Baseline Establishment	Records the establishment of a performance baseline, including the baseline metrics and the conditions under which they were measured.
`AI-DRIFT.1`	Drift Detection	Witnesses a drift detection event, including the drift magnitude, direction, and whether thresholds were exceeded.
`AI-MDL.3`	Model Validation	Records model validation results against acceptance criteria, including test datasets and pass/fail determination.
`AI-MDL.7`	Quantization Record	Witnesses model quantization parameters (INT8, FP16, GPTQ), documenting precision-performance tradeoffs.
`AI-ROBUST.1`	Robustness Test	Records the results of adversarial robustness testing, including attack types simulated and the model's resilience.
`AI-SKILL.3`	Reward Model	Witnesses the reward model configuration and alignment metrics for RLHF-trained systems.

How to verify

1. Verify that AI-BASE.1 was established before deployment and that AI-DRIFT.1 anchors reference the baseline. Drift is meaningless without a baseline to drift from.

2. Check AI-DRIFT.1 frequency. For production systems, drift detection should run at least weekly. For high-frequency systems, daily or continuous monitoring is expected.

3. If the model has been quantized (common for edge deployment), verify AI-MDL.7 documents the quantization method and any accuracy impact measured against the full-precision baseline.

4. For RLHF systems, verify AI-SKILL.3 documents the reward model version and alignment metrics. Reward model changes can silently alter system behavior.

Assessor tip: The AI-BASE.1 to AI-DRIFT.1 chain is the strongest evidence of continuous monitoring. Ask for the full chain: baseline establishment, regular drift checks, and any remediation actions taken when drift thresholds were exceeded.

MEASURE 3.1 -- Red Team and Supply Chain Testing

What this requires: The organization must conduct adversarial testing (red teaming) of AI systems and assess supply chain risks through structured testing programs.

Procedure	Title	What It Witnesses
`AI-REDTEAM.1`	Red Team Exercise	Records the execution and findings of an AI red team exercise, including attack scenarios tested and vulnerabilities discovered.
`AI-SUPPLY.1`	Supply Chain Assessment	Witnesses a supply chain risk assessment, including third-party dependencies evaluated and risk ratings assigned.

How to verify

1. Verify AI-REDTEAM.1 anchors exist and were generated before production deployment or at regular intervals (annually minimum).

2. Check that AI-SUPPLY.1 covers all critical dependencies identified in the AI-SBOM.1 (from MAP 1.1). A supply chain assessment that misses key dependencies is incomplete.

3. Cross-reference red team findings with AI-GRD.1/GRD.2 (GOVERN 1.5) to verify that discovered vulnerabilities led to guardrail updates.

6. MANAGE Function

The MANAGE function addresses the ongoing operation, maintenance, and incident response for AI systems. Evidence here demonstrates that the organization does not just deploy and forget -- it actively manages risks throughout the system's operational life. SWT3 maps 13 procedures across 7 MANAGE categories.

MANAGE 1.3 -- Model Lifecycle

What this requires: The organization must manage AI models across their full lifecycle, including development, testing, deployment, monitoring, updating, and decommissioning.

Procedure	Title	What It Witnesses
`AI-MDL.1`	Model Registration	Records the formal registration of a model for production use, including approval authority and deployment constraints.

How to verify

1. Verify AI-MDL.1 exists for every model in production. Cross-reference with the system inventory (AI-GOV.3) to ensure no unregistered models are operating.

2. Check the approval chain. The anchor should reference who authorized the model for production use.

MANAGE 2.2 -- Cybersecurity

What this requires: The organization must address cybersecurity risks specific to AI systems, including adversarial attacks, model theft, data poisoning, and prompt injection.

Procedure	Title	What It Witnesses
`AI-CYBER.1`	AI-Specific Threat Assessment	Records an AI-specific cybersecurity threat assessment, including threats evaluated and countermeasures deployed.

How to verify

1. Verify AI-CYBER.1 covers AI-specific threats (prompt injection, model inversion, training data extraction), not just general IT security threats.

2. Check that the threat assessment is current. AI threat landscapes evolve rapidly; an assessment older than 6 months should be refreshed.

MANAGE 2.3 -- Safety and Security Controls

What this requires: The organization must implement safety and security controls proportional to the AI system's risk level. Controls must be documented, tested, and maintained.

Procedure	Title	What It Witnesses
`AI-SAFE.1`	Safety Boundary	Records the safety boundaries defined for the AI system, including operational limits and fail-safe conditions.
`AI-SEC.1`	Security Control Attestation	Witnesses the security control posture of the AI system, including encryption, access controls, and network isolation.
`AI-SEC.2`	Vulnerability Assessment	Records the results of an AI-focused vulnerability assessment, including findings and remediation status.

How to verify

1. Confirm AI-SAFE.1 defines concrete, measurable boundaries (not vague "best effort" language). Safety boundaries should specify what happens when limits are reached.

2. Verify AI-SEC.1 and AI-SEC.2 are generated on a regular cadence. Security attestation should align with the organization's overall vulnerability management cycle.

Common finding: Organizations conduct general IT vulnerability assessments but do not assess AI-specific vulnerabilities (model poisoning, inference manipulation, embedding attacks). AI-SEC.2 should cover threats unique to AI systems, not just the infrastructure they run on.

MANAGE 2.4 -- Access Control and Revocation

What this requires: The organization must control access to AI systems and maintain the ability to revoke access or retract AI outputs when necessary.

Procedure	Title	What It Witnesses
`AI-ACC.1`	Access Control Decision	Records an access control decision for the AI system, including the requester, resource, and authorization result.
`AI-REV.1`	Anchor Revocation	Witnesses the revocation of a previously-issued witness anchor, including the revocation reason code and the target fingerprint.

How to verify

1. Verify AI-ACC.1 anchors demonstrate active access control enforcement, not just policy documentation.

2. Check for AI-REV.1 anchors. If the organization has never issued a revocation, ask whether they have a documented revocation procedure and whether it has been tested. Seven reason codes are supported: unspecified, model_recall, policy_violation, data_contamination, consent_withdrawal, regulatory_order, error_correction.

3. If revocations exist, verify the revoked anchors are flagged in the public verification endpoint. Navigate to /verify and confirm revoked anchors display their revocation status.

Assessor tip: AI-REV.1 is the "undo button" for the SWT3 protocol. Organizations that have exercised revocation demonstrate a mature incident response capability. Ask for the revocation log and review the reason codes used.

MANAGE 3.1 -- Incident Response

What this requires: The organization must have an incident response plan specific to AI systems, including procedures for detecting, containing, and recovering from AI-related incidents.

Procedure	Title	What It Witnesses
`AI-IR.1`	Incident Response Activation	Records the activation of an AI incident response plan, including the incident classification and initial containment actions.

How to verify

1. Verify an AI-specific incident response plan exists and is distinct from the general IT incident response plan.

2. Check for AI-IR.1 anchors from tabletop exercises or drills, not just real incidents. A tested plan is more credible than an untested one.

MANAGE 3.2 -- Autonomous Operations and Incident Management

What this requires: For AI systems with autonomous decision-making capabilities, the organization must define operational boundaries and maintain incident management processes that account for autonomous behavior.

Procedure	Title	What It Witnesses
`AI-AUTO.1`	Autonomous Decision Record	Records an autonomous decision made without human intervention, including the decision rationale and confidence level.
`AI-INCIDENT.1`	AI Incident Record	Witnesses the details of an AI-related incident, including impact, root cause, and corrective actions taken.

How to verify

1. For autonomous systems, verify AI-AUTO.1 anchors are generated for each autonomous decision. The volume should match the system's actual decision rate.

2. Check AI-INCIDENT.1 records for completeness: root cause analysis, corrective actions, and follow-up verification should all be documented.

3. Cross-reference AI-INCIDENT.1 with AI-IR.1 to verify that incidents triggered the incident response plan as expected.

MANAGE 4.1 -- Human Oversight, Post-Market, and Violations

What this requires: The organization must maintain human oversight mechanisms, conduct post-market monitoring of deployed AI systems, and track policy violations or system failures.

Procedure	Title	What It Witnesses
`AI-HITL.2`	Human Review Outcome	Records the outcome of a human review of AI outputs, including the reviewer's assessment and any corrections applied.
`AI-PMM.1`	Post-Market Monitoring	Witnesses a post-market monitoring report, including performance metrics observed in production and any anomalies detected.
`AI-VIO.1`	Violation Detection	Records the detection of a policy or safety violation by the AI system, including the violation type and severity.

How to verify

1. Verify AI-PMM.1 anchors exist on a regular cadence (monthly or quarterly). Post-market monitoring is not a one-time activity.

2. Cross-reference AI-VIO.1 (violation detection) with AI-GOV.5 (GOVERN 6.1, policy enforcement). Violations that are detected but never enforced indicate a governance gap.

3. For AI-HITL.2, check the reviewer volume. If one person reviews all AI outputs, examine whether the review is meaningful or a rubber-stamp process.

Common finding: Post-market monitoring (AI-PMM.1) is the most frequently missing procedure in production AI systems. Organizations invest heavily in pre-deployment testing but have no structured process for monitoring the system after it is live. This is a critical gap for the MANAGE function.

Lifecycle Chain Evidence (v6.0)

SWT3 v6.0 introduces lifecycle chains: multi-anchor sequences linked by a shared cycle ID that trace operational events from initiation through resolution. Three new procedures provide structured evidence for risk management processes, operational monitoring, and model lifecycle management.

Procedure	Title	AI RMF Category	What It Witnesses
`AI-EMRG.1`	Emergency Override	GOVERN 1.3, MANAGE 3.1	Records the full lifecycle of an emergency override: trigger condition, human authorization, system state change, and post-event review. Provides evidence for risk management processes and incident response readiness.
`AI-DRIFT.2`	Consequence-Mapped Drift	MEASURE 2.6	Witnesses drift detection with consequence thresholds that map statistical drift to downstream risk impact. Each drift event is assessed not just by magnitude but by its operational consequences, providing evidence for continuous monitoring practices.
`AI-ASSESS.1`	Champion-Challenger Assessment	MANAGE 2.2, MANAGE 1.3	Records structured model evaluation where a production model (champion) is compared against a candidate (challenger) across defined criteria. Provides evidence for model lifecycle management and cybersecurity assessment through systematic model validation.

How to verify lifecycle chains

1. Lifecycle chains share a common cycle ID. Filter the ledger by procedure and look for anchors with matching cycle IDs. Each chain should contain multiple anchors representing sequential stages.

2. A complete chain has a start anchor, intermediate anchors, and a resolution anchor. An incomplete chain indicates an open lifecycle event.

3. Cross-reference: AI-EMRG.1 chains should correlate with AI-IR.1 (incident response). AI-DRIFT.2 chains should reference AI-BASE.1 baselines. AI-ASSESS.1 chains should produce updated AI-PERF.1 metrics.

Assessor tip: Lifecycle chains provide process evidence rather than point-in-time attestations. An AI-DRIFT.1 anchor proves drift was measured. An AI-DRIFT.2 chain proves drift was detected, its consequences were mapped to specific risks, a response was selected, and the outcome was verified. This distinction matters for MEASURE 2.6 assessments where continuous monitoring must demonstrate both detection and response.

7. Anchor Anatomy

Every SWT3 Witness Anchor follows a deterministic format that encodes the deployment tier, infrastructure provider, procedure namespace, specific procedure, verdict, timestamp, and a 12-character SHA-256 fingerprint. Here is an annotated example for an AI RMF-relevant procedure:

SWT3-E-AWS-AI-DRIFT1-PASS-1780300000-a3f8c92b1d07

Segment	Value	Meaning
`SWT3-E`	Tier	E = Enclave deployment tier (self-hosted, full control)
`AWS`	Provider	Infrastructure provider where the AI system operates
`AI`	UCT Namespace	AI procedures namespace within the Unified Control Taxonomy
`DRIFT1`	Procedure	AI-DRIFT.1, drift detection (MEASURE 2.6)
`PASS`	Verdict	The procedure's acceptance criteria were met
`1780300000`	Epoch	Unix timestamp when the anchor was minted
`a3f8c92b1d07`	Fingerprint	First 12 characters of SHA-256 hash over witness data

Assessor tip: To verify any anchor, paste its fingerprint at /verify. The verifier will confirm the anchor exists in the ledger, display its full metadata, and flag any revocation status. Batch verification is available for enclave-wide integrity checks.

8. Assessment Resources

Interactive Assessment Tools

Assessment tracker -- Filter by NIST AI RMF and track procedure completion during your evaluation.

Assessment checklist -- Printable checklist of all AI RMF-mapped procedures with pass/fail columns.

Anchor verifier -- Verify individual anchors or run enclave-wide integrity checks.

Related Guides

Assessment Playbook -- Operational playbook for conducting AI compliance assessments across all frameworks.

NIST CI AI Profile Crosswalk -- Mapping between the NIST Cybersecurity for IoT/AI profile and SWT3 procedures.

Assessor Evidence Matrix -- Complete evidence matrix showing what each procedure produces and where to find it.

SDK documentation: For technical integration details, visit the SDK documentation page. The SWT3 SDK is available for Python (pip install swt3-ai), TypeScript (npm install @tenova/swt3-ai), Rust, C#, and Ruby.

Contents

1. Overview

2. How to Use This Walkthrough

Step 1: Identify the AI RMF categories in scope for your assessment.

Step 2: For each category, locate the SWT3 procedures in this guide.

Step 3: Verify anchors using the public verification endpoint.

Step 4: Document gaps where procedures exist but no anchors have been minted.

3. GOVERN Function

GOVERN 1.1 -- Dual-Use and Oversight Policies

How to verify

GOVERN 1.2 -- Technical Environment and Provenance

How to verify

GOVERN 1.3 -- Supply Chain and Multi-Agent Governance

How to verify

GOVERN 1.4 -- Audit and Logging

How to verify

GOVERN 1.5 -- Trust, Authorization, and Guardrails

How to verify

GOVERN 1.7 -- Transparency and Documentation

How to verify

GOVERN 2.1 -- Roles and Responsibilities

How to verify

GOVERN 2.2 -- Governance Mechanisms

How to verify

GOVERN 4.1 -- Human Oversight Mechanisms

How to verify

GOVERN 6.1 -- Policy Enforcement

How to verify

4. MAP Function

MAP 1.1 -- System Identification and Inventory

How to verify

MAP 2.1 -- Risk Identification

How to verify

MAP 2.3 -- Fairness, Explainability, and Data Provenance

How to verify

MAP 3.5 -- Data Governance

How to verify

MAP 4.1 -- Data Lineage

How to verify

MAP 5.2 -- Impact Assessment

How to verify

5. MEASURE Function

MEASURE 2.5 -- Performance, Fairness, and Explainability Metrics

How to verify

MEASURE 2.6 -- Drift, Robustness, and Model Integrity

How to verify

MEASURE 3.1 -- Red Team and Supply Chain Testing

How to verify

6. MANAGE Function

MANAGE 1.3 -- Model Lifecycle

How to verify

MANAGE 2.2 -- Cybersecurity

How to verify

MANAGE 2.3 -- Safety and Security Controls

How to verify

MANAGE 2.4 -- Access Control and Revocation

How to verify

MANAGE 3.1 -- Incident Response

How to verify

MANAGE 3.2 -- Autonomous Operations and Incident Management

How to verify

MANAGE 4.1 -- Human Oversight, Post-Market, and Violations

How to verify

Lifecycle Chain Evidence (v6.0)

How to verify lifecycle chains

7. Anchor Anatomy

8. Assessment Resources

Interactive Assessment Tools

Related Guides