Skip to content

CNCF AI Conformance

Overview

This directory contains evidence for CNCF Kubernetes AI Conformance certification. Each submission certifies a specific product on a specific Kubernetes distribution, with evidence collected using AICR as the validation tooling.

Note: It is the product deployed on a Kubernetes platform that is conformant. AICR serves as the deployment and validation tooling (similar to sonobuoy for K8s conformance), while the certified product is the AI inference/training platform.

Submissions

VersionProductPlatformStatusEvidence
v1.35NVIDIA NIMEKS9/9 PASSv1.35/nim-eks/

Directory Structure

docs/conformance/cncf/
├── index.md                          # This file
└── v1.35/                            # Kubernetes version
    └── nim-eks/                      # Product + platform (mirrors CNCF repo)
        ├── PRODUCT.yaml              # CNCF submission metadata
        ├── README.md                 # Submission overview + results table
        └── evidence/                 # Behavioral evidence files
            ├── index.md
            ├── dra-support.md
            ├── gang-scheduling.md
            ├── secure-accelerator-access.md
            ├── accelerator-metrics.md
            ├── ai-service-metrics.md
            ├── inference-gateway.md
            ├── robust-operator.md
            ├── pod-autoscaling.md
            └── cluster-autoscaling.md

pkg/evidence/scripts/                 # Evidence collection script + test manifests
├── collect-evidence.sh
└── manifests/
    ├── dra-gpu-test.yaml
    ├── gang-scheduling-test.yaml
    └── hpa-gpu-test.yaml

Usage

Evidence collection has two steps:

Structural Validation (CI)

aicr validate checks component health, CRDs, and constraints for CI:

bash
# Structural validation + evidence rendering
aicr validate -r recipe.yaml \
  --phase conformance --evidence-dir ./evidence

CNCF Submission Evidence

Add --cncf-submission to collect detailed behavioral evidence for CNCF AI Conformance submission. This deploys GPU workloads, captures command outputs, workload logs, nvidia-smi output, and Prometheus queries:

bash
# Collect all behavioral evidence
aicr validate --phase conformance \
  --evidence-dir ./evidence --cncf-submission

# Collect specific features
aicr validate --phase conformance \
  --evidence-dir ./evidence --cncf-submission -f dra -f hpa

Alternatively, run the evidence collection script directly:

bash
./pkg/evidence/scripts/collect-evidence.sh all
./pkg/evidence/scripts/collect-evidence.sh dra

Note: The --cncf-submission flag deploys GPU workloads and takes ~5-10 minutes. The evidence collection script automatically detects the AI workload type (NIM inference, Dynamo inference, or Kubeflow training) and collects appropriate metrics and operator evidence.

Two Modes

aicr validate --phase conformance--cncf-submission
PurposeCI pass/failCNCF submission evidence
Speed~3 minutes~5-10 minutes
Deploys workloadsYes (DRA, gang, HPA, secure access)Yes (all + GPU stress test)
OutputPass/fail + diagnostic artifactsDetailed behavioral evidence (command outputs, logs, metrics)
DRA GPU allocation testDeploys pod, verifies GPU access + isolationSame + nvidia-smi output capture
Gang scheduling testDeploys PodGroup, verifies co-schedulingSame + worker logs
HPA autoscalingMetrics API + scale-up validationCUDA GPU stress test + scale-up
MetricsCustom metrics API data-path verificationDCGM exporter + Prometheus queries
GatewayCondition verification (Accepted, Programmed)Same
Webhook testRejection test with invalid CRSame
Cluster autoscalingCloud node group validationCloud-provider autoscaler API

Released under the Apache 2.0 License.