Kubernetes Deployment
Deploy the AICR API Server in your Kubernetes cluster for self-hosted recipe generation.
Overview
API Server deployment enables self-hosted recipe generation:
- Isolated deployment: Recipe data stays within your infrastructure
- Custom recipes: Modify embedded recipe data (see
recipes/) - High availability: Deploy multiple replicas with load balancing
- Observability: Prometheus
/metricsendpoint and structured logging
API Server scope:
- Recipe generation from query parameters (query mode)
- Does not capture snapshots (use agent Job or CLI)
- Generates bundles via
POST /v1/bundle - Does not analyze snapshots (query mode only)
Agent deployment (separate component):
- Kubernetes Job captures cluster configuration
- Writes snapshot to ConfigMap via Kubernetes API
- Requires RBAC: ServiceAccount with ConfigMap create/update permissions
- See Agent Deployment
Typical workflow:
- Deploy agent Job → Captures snapshot → Writes to ConfigMap
- CLI reads ConfigMap → Generates recipe → Writes to file or ConfigMap
- CLI reads recipe → Generates bundle → Writes to filesystem
- Apply bundle to cluster (Helm install, kubectl apply)
Quick Start
Deploy with Kustomize
# Create namespace
kubectl create namespace aicr
# Deploy API server (save the manifest from the Deployment section below as aicrd-deployment.yaml)
kubectl apply -f aicrd-deployment.yaml
# Check deployment
kubectl get pods -n aicr
kubectl get svc -n aicrDeploy with Helm
Status: Helm chart not yet available. Use Kustomize or manual deployment.
Manual Deployment
1. Create Namespace
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: aicr
labels:
app: aicrdkubectl apply -f namespace.yaml2. Create Deployment
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: aicrd
namespace: aicr
labels:
app: aicrd
spec:
replicas: 3
selector:
matchLabels:
app: aicrd
template:
metadata:
labels:
app: aicrd
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
securityContext:
runAsNonRoot: true
runAsUser: 65532
fsGroup: 65532
containers:
- name: api-server
image: ghcr.io/nvidia/aicrd:latest
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: PORT
value: "8080"
- name: LOG_LEVEL
value: "info"
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]kubectl apply -f deployment.yaml3. Create Service
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: aicrd
namespace: aicr
labels:
app: aicrd
spec:
type: ClusterIP
selector:
app: aicrd
ports:
- name: http
port: 80
targetPort: http
protocol: TCPkubectl apply -f service.yaml4. Create Ingress (Optional)
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: aicrd
namespace: aicr
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
spec:
ingressClassName: nginx
tls:
- hosts:
- aicr.yourdomain.com
secretName: aicr-tls
rules:
- host: aicr.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: aicrd
port:
number: 80kubectl apply -f ingress.yamlAgent Deployment
Deploy the AICR Agent as a Kubernetes Job to automatically capture cluster configuration.
1. Create RBAC Resources
# agent-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: aicr
namespace: gpu-operator
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: aicr
namespace: gpu-operator
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "create", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: aicr
namespace: gpu-operator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: aicr
subjects:
- kind: ServiceAccount
name: aicr
namespace: gpu-operator # Must match ServiceAccount namespace
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: aicr
rules:
- apiGroups: [""]
resources: ["nodes", "pods"]
verbs: ["get", "list"]
- apiGroups: ["nvidia.com"]
resources: ["clusterpolicies"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: aicr
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: aicr
subjects:
- kind: ServiceAccount
name: aicr
namespace: gpu-operatorkubectl apply -f agent-rbac.yaml2. Create Agent Job
# agent-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: aicr
namespace: gpu-operator
labels:
app: aicr-agent
spec:
template:
metadata:
labels:
app: aicr-agent
spec:
serviceAccountName: aicr
restartPolicy: Never
containers:
- name: aicr
image: ghcr.io/nvidia/aicr:latest
imagePullPolicy: IfNotPresent
command:
- aicr
- snapshot
- --output
- cm://gpu-operator/aicr-snapshot
securityContext:
privileged: true
runAsUser: 0
runAsGroup: 0
hostPID: true
hostNetwork: true
hostIPC: true
volumes:
- name: systemd
hostPath:
path: /run/systemd
type: DirectoryNote: The agent defaults to privileged mode, which is required for GPU, SystemD, and OS collectors. For PSS-restricted namespaces where only the Kubernetes collector is needed, use
--privileged=falsewhen deploying via the CLI. See Agent Deployment for details.
kubectl apply -f agent-job.yaml
# Wait for completion
kubectl wait --for=condition=complete job/aicr -n gpu-operator --timeout=5m
# Verify ConfigMap was created
kubectl get configmap aicr-snapshot -n gpu-operator
# View snapshot data
kubectl get configmap aicr-snapshot -n gpu-operator -o jsonpath='{.data.snapshot\.yaml}'3. Generate Recipe from ConfigMap
# Using CLI (local or in another Job)
aicr recipe --snapshot cm://gpu-operator/aicr-snapshot \
--intent training \
--platform kubeflow \
--output recipe.yaml
# Or write recipe back to ConfigMap
aicr recipe --snapshot cm://gpu-operator/aicr-snapshot \
--intent training \
--platform kubeflow \
--output cm://gpu-operator/aicr-recipe4. Generate Bundle
# From file
aicr bundle --recipe recipe.yaml --output ./bundles
# From ConfigMap
aicr bundle --recipe cm://gpu-operator/aicr-recipe --output ./bundlesE2E Testing
Validate the complete workflow:
# Run all CLI integration tests (no cluster needed)
make e2e
# Run cluster-based E2E tests (requires Kind cluster)
make e2e-tiltCLI tests use Kyverno Chainsaw for declarative YAML assertions. See tests/chainsaw/README.md for details.
Configuration Options
Environment Variables
| Variable | Default | Description |
|---|---|---|
PORT | 8080 | HTTP server port |
LOG_LEVEL | info | Logging level: debug, info, warn, error |
RATE_LIMIT | 100 | Requests per second |
RATE_BURST | 200 | Burst capacity |
READ_TIMEOUT | 30s | HTTP read timeout |
WRITE_TIMEOUT | 30s | HTTP write timeout |
IDLE_TIMEOUT | 60s | HTTP idle timeout |
Note: The API server uses structured JSON logging to stderr. The CLI supports three logging modes (CLI/Text/JSON), but the API server always uses JSON for consistent log aggregation.
ConfigMap for Custom Recipe Data (Advanced)
Note: This example shows the concept of mounting custom recipe data. The actual recipe format uses a base-plus-overlay architecture. See
recipes/for the current schema (overlays/*.yamlincludingbase.yaml).
# configmap.yaml - Example showing custom recipe data mounting
apiVersion: v1
kind: ConfigMap
metadata:
name: aicr-recipe-data
namespace: aicr
data:
overlays/base.yaml: |
# Your custom base recipe
apiVersion: aicr.nvidia.com/v1alpha1
kind: RecipeMetadata
# ... (see recipes/overlays/base.yaml for schema)Mount in deployment:
spec:
template:
spec:
volumes:
- name: recipe-data
configMap:
name: aicr-recipe-data
containers:
- name: api-server
volumeMounts:
- name: recipe-data
mountPath: /data
env:
- name: RECIPE_DATA_PATH
value: /dataHigh Availability
Horizontal Pod Autoscaler
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: aicrd
namespace: aicr
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: aicrd
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15kubectl apply -f hpa.yamlPod Disruption Budget
# pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: aicrd
namespace: aicr
spec:
minAvailable: 2
selector:
matchLabels:
app: aicrdkubectl apply -f pdb.yamlMonitoring
Prometheus ServiceMonitor
# servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: aicrd
namespace: aicr
labels:
app: aicrd
spec:
selector:
matchLabels:
app: aicrd
endpoints:
- port: http
path: /metrics
interval: 30s
scrapeTimeout: 10skubectl apply -f servicemonitor.yamlGrafana Dashboard
Key panels:
- Request rate (by status code)
- Request duration (p50, p95, p99)
- Error rate
- Rate limit rejections
- Active connections
Security
Network Policies
# networkpolicy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: aicrd
namespace: aicr
spec:
podSelector:
matchLabels:
app: aicrd
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 53 # DNS
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: TCP
port: 443 # Kubernetes APIPod Security Standards
# Add to namespace
apiVersion: v1
kind: Namespace
metadata:
name: aicr
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restrictedRBAC (If API server needs K8s access)
# serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: aicrd
namespace: aicr
---
# role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: aicrd
rules:
- apiGroups: [""]
resources: ["nodes", "pods"]
verbs: ["get", "list"]
---
# rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: aicrd
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: aicrd
subjects:
- kind: ServiceAccount
name: aicrd
namespace: aicrTroubleshooting
Check Pod Status
# Pod status
kubectl get pods -n aicr
# Describe pod
kubectl describe pod -n aicr -l app=aicrd
# View logs
kubectl logs -n aicr -l app=aicrd
# Follow logs
kubectl logs -n aicr -l app=aicrd -fCheck Service
# Service status
kubectl get svc -n aicr
# Endpoints
kubectl get endpoints -n aicr
# Test from within cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl http://aicrd.aicr.svc.cluster.local/healthCheck Ingress
# Ingress status
kubectl get ingress -n aicr
# Describe ingress
kubectl describe ingress aicrd -n aicr
# Check cert-manager certificate
kubectl get certificate -n aicrPerformance Issues
# Check resource usage
kubectl top pods -n aicr
# Check HPA status
kubectl get hpa -n aicr
# Check metrics
kubectl exec -n aicr -it deploy/aicrd -- \
wget -qO- http://localhost:8080/metricsConnection Refused
- Check service exists:
kubectl get svc -n aicr - Check endpoints:
kubectl get endpoints -n aicr - Check pod is ready:
kubectl get pods -n aicr - Check readiness probe:
kubectl describe pod -n aicr <pod-name>
Rate Limiting
Check rate limit settings:
kubectl exec -n aicr deploy/aicrd -- env | grep RATEAdjust via deployment:
env:
- name: RATE_LIMIT
value: "200" # Increase limit
- name: RATE_BURST
value: "400"Upgrading
Rolling Update
# Update image
kubectl set image deployment/aicrd \
api-server=ghcr.io/nvidia/aicrd:v0.8.0 \
-n aicr
# Watch rollout
kubectl rollout status deployment/aicrd -n aicr
# Rollback if needed
kubectl rollout undo deployment/aicrd -n aicrBlue-Green Deployment
# Deploy new version
kubectl apply -f deployment-v2.yaml
# Switch service
kubectl patch service aicrd -n aicr \
-p '{"spec":{"selector":{"version":"v2"}}}'
# Delete old deployment
kubectl delete deployment aicrd-v1 -n aicrBackup and Disaster Recovery
Export Configuration
# Export all resources
kubectl get all -n aicr -o yaml > aicr-backup.yaml
# Export specific resources
kubectl get deployment,service,ingress -n aicr -o yaml > aicr-config.yamlRestore from Backup
# Restore namespace and resources
kubectl apply -f aicr-backup.yamlCost Optimization
Resource Limits
Start with minimal resources:
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 200m
memory: 256MiMonitor and adjust based on usage.
Vertical Pod Autoscaler (Optional)
# vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: aicrd
namespace: aicr
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: aicrd
updatePolicy:
updateMode: "Auto"See Also
- API Reference - API endpoint documentation
- Automation - CI/CD integration
- Data Flow - Understanding data architecture
- API Server Architecture - Internal architecture