API Server Architecture
The aicrd provides HTTP REST API access to AICR configuration recipe generation and bundle creation capabilities.
Overview
The API server provides HTTP REST access to Steps 2 and 4 of the AICR workflow – recipe generation and bundle creation. It is a production-ready HTTP service built on Go's net/http with middleware for rate limiting, metrics, request tracking, and graceful shutdown.
Four-Step Workflow Context
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Snapshot │─────▶│ Recipe │─────▶│ Validate │─────▶│ Bundle │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
CLI/Agent only API Server CLI only API ServerAPI Server Capabilities:
- Recipe generation (Step 2) via
GET /v1/recipeendpoint - Bundle creation (Step 4) via
POST /v1/bundleendpoint - Query mode only – generates recipes from environment parameters
- Health and metrics endpoints for Kubernetes deployment
- Production-ready HTTP server with middleware stack
- Supply chain security with SLSA Build Level 3 attestations
API Server Limitations:
- No snapshot capture – Use CLI
aicr snapshotor Kubernetes Agent - No snapshot mode – Cannot analyze captured snapshots (query mode only)
- No validation – Use CLI
aicr validateto check constraints against snapshots - No ConfigMap integration – API server doesn't read/write ConfigMaps
API Server Configuration:
- Criteria allowlists – Restrict allowed values for accelerator, service, intent, and OS via environment variables
- Value overrides – Supported via
?set=bundler:path=valuequery parameters on/v1/bundle - Node scheduling – Supported via
?system-node-selectorand?accelerated-node-selectorquery parameters
For complete workflow, use the CLI which supports:
- All four steps: snapshot → recipe → validate → bundle
- ConfigMap I/O:
cm://namespace/nameURIs - Agent deployment: Kubernetes Job with RBAC
- E2E testing: Chainsaw tests in
tests/chainsaw/cli/
Architecture Diagram
Request Flow
Complete Request Flow with Middleware
Component Details
Entry Point: cmd/aicrd/main.go
Minimal entry point:
package main
import (
"log"
"github.com/NVIDIA/aicr/pkg/api"
)
func main() {
if err := api.Serve(); err != nil {
log.Fatal(err)
}
}API Package: pkg/api/server.go
Responsibilities:
- Initialize structured logging
- Parse criteria allowlists from environment variables
- Create recipe builder with allowlist configuration
- Create bundle handler with allowlist configuration
- Setup HTTP routes
- Configure server with middleware
- Handle graceful shutdown
Key Features:
- Version info injection via ldflags:
version,commit,date - Routes:
/v1/recipe→ recipe handler,/v1/bundle→ bundle handler - Criteria allowlists parsed from
AICR_ALLOWED_*environment variables - Server configured with production defaults
- Graceful shutdown on SIGINT/SIGTERM
Initialization Flow:
func Serve() error {
// 1. Setup logging
logging.SetDefaultStructuredLogger(name, version)
// 2. Parse allowlists from environment
allowLists, err := recipe.ParseAllowListsFromEnv()
if err != nil {
return fmt.Errorf("failed to parse allowlists: %w", err)
}
// 3. Create recipe handler with allowlists
rb := recipe.NewBuilder(
recipe.WithVersion(version),
recipe.WithAllowLists(allowLists),
)
// 4. Create bundle handler with allowlists
bb, err := bundler.New(
bundler.WithAllowLists(allowLists),
)
// 5. Setup routes and start server
// ...
}Server Infrastructure: pkg/server/
Production-ready HTTP server implementation with 10 files:
Core Components
server.go (217 lines)
- Server struct with config, HTTP server, rate limiter, ready state
- Functional options pattern for configuration
- Graceful shutdown using
signal.NotifyContextanderrgroup - Default root handler listing available routes
config.go (72 lines)
- Configuration struct with sensible defaults
- Environment variable support (PORT)
- Timeout configuration (read, write, idle, shutdown)
- Rate limiting parameters
middleware.go (123 lines)
- Middleware chain builder
- Request ID middleware (UUID generation/validation)
- Rate limiting middleware (token bucket)
- Panic recovery middleware
- Logging middleware (structured logs)
health.go (60 lines)
/health- Liveness probe (always returns 200)/ready- Readiness probe (returns 503 when not ready)- JSON response with status and timestamp
errors.go (49 lines)
- Standardized error response structure
- Error codes (RATE_LIMIT_EXCEEDED, INTERNAL_ERROR, etc.)
- WriteError helper with request ID tracking
metrics.go (90 lines)
- Prometheus metrics:
aicr_http_requests_total- Counter by method, path, statusaicr_http_request_duration_seconds- Histogram by method, pathaicr_http_requests_in_flight- Gaugeaicr_rate_limit_rejects_total- Counteraicr_panic_recoveries_total- Counter
context.go (8 lines)
- Context key type for request ID storage
doc.go (200 lines)
- Comprehensive package documentation
- Usage examples
- API endpoint descriptions
- Error handling documentation
- Deployment examples
Request Processing Pipeline
Recipe Handler: pkg/recipe/handler.go
HTTP handler for recipe generation endpoint. Supports both GET (query parameters) and POST (criteria body) methods.
Handler Flow
func (b *Builder) HandleRecipes(w http.ResponseWriter, r *http.Request) {
var criteria *Criteria
var err error
// 1. Route based on HTTP method
switch r.Method {
case http.MethodGet:
// 2a. Parse query parameters for GET
criteria, err = ParseCriteriaFromRequest(r)
case http.MethodPost:
// 2b. Parse request body for POST (JSON or YAML)
criteria, err = ParseCriteriaFromBody(r.Body, r.Header.Get("Content-Type"))
defer r.Body.Close()
default:
// Reject other methods
w.Header().Set("Allow", "GET, POST")
return 405
}
// 3. Validate criteria format
if err := criteria.Validate(); err != nil {
return 400 with error details
}
// 4. Validate against allowlists (if configured)
if b.AllowLists != nil {
if err := b.AllowLists.ValidateCriteria(criteria); err != nil {
return 400 with allowed values in error details
}
}
// 5. Build recipe
recipe, err := b.BuildFromCriteria(r.Context(), criteria)
if err != nil {
return 500
}
// 6. Set cache headers
w.Header().Set("Cache-Control", "public, max-age=600")
// 7. Respond with JSON
serializer.RespondJSON(w, http.StatusOK, recipe)
}POST Request Body Format
POST requests accept a RecipeCriteria resource (Kubernetes-style):
kind: RecipeCriteria
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
name: my-criteria
spec:
service: eks
accelerator: gb200
os: ubuntu
intent: trainingSupported content types:
application/json- JSON formatapplication/x-yaml- YAML format
#### Query Parameter Parsing
| Parameter | Type | Validation | Example |
|-----------|------|------------|--------|
| `service` | ServiceType | Enum: eks, gke, aks, oke, any | `service=eks` |
| `accelerator` | AcceleratorType | Enum: h100, gb200, a100, l40, any | `accelerator=h100` |
| `gpu` | AcceleratorType | Alias for accelerator | `gpu=h100` |
| `intent` | IntentType | Enum: training, inference, any | `intent=training` |
| `os` | OSType | Enum: ubuntu, rhel, cos, amazonlinux, any | `os=ubuntu` |
| `nodes` | int | >= 0 | `nodes=8` |
### Recipe Builder: `pkg/recipe/builder.go`
Shared with CLI - same logic as described in CLI architecture.
## API Endpoints
### Recipe Generation
**Endpoints**:
- `GET /v1/recipe` - Generate recipe from query parameters
- `POST /v1/recipe` - Generate recipe from criteria body
#### GET Method
**Query Parameters**:
- `service` - Kubernetes service type (eks, gke, aks, oke)
- `accelerator` - GPU/accelerator type (h100, gb200, a100, l40)
- `gpu` - Alias for accelerator (backwards compatibility)
- `intent` - Workload intent (training, inference)
- `os` - Operating system family (ubuntu, rhel, cos, amazonlinux)
- `nodes` - Number of GPU nodes (0 = any/unspecified)
#### POST Method
**Content Types**: `application/json`, `application/x-yaml`
**Request Body**: `RecipeCriteria` resource with kind, apiVersion, metadata, and spec fields.
```yaml
kind: RecipeCriteria
apiVersion: aicr.nvidia.com/v1alpha1
metadata:
name: my-criteria
spec:
service: eks
accelerator: h100
intent: trainingResponse: 200 OK
{
"apiVersion": "aicr.nvidia.com/v1alpha1",
"kind": "Recipe",
"metadata": {
"version": "v1.0.0",
"created": "2025-12-25T12:00:00Z",
"appliedOverlays": [
"base",
"eks",
"eks-training",
"gb200-eks-training"
]
},
"criteria": {
"service": "eks",
"accelerator": "gb200",
"intent": "training",
"os": "any"
},
"componentRefs": [
{
"name": "gpu-operator",
"version": "v25.3.3",
"order": 1
}
],
"constraints": {
"driver": {
"version": "580.82.07"
}
}
}Error Response: 400 Bad Request
{
"code": "INVALID_REQUEST",
"message": "invalid gpu type: must be one of h100, gb200, a100, l40, ALL",
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2025-12-25T12:00:00Z",
"retryable": false
}Rate Limited: 429 Too Many Requests
{
"code": "RATE_LIMIT_EXCEEDED",
"message": "Rate limit exceeded",
"details": {
"limit": 100,
"burst": 200
},
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2025-12-25T12:00:00Z",
"retryable": true
}Headers:
X-Request-Id- Unique request identifierX-RateLimit-Limit- Total requests allowed per secondX-RateLimit-Remaining- Requests remaining in current windowX-RateLimit-Reset- Unix timestamp when window resetsCache-Control- Caching policy (public, max-age=300)
Health Check
Endpoint: GET /health
Response: 200 OK
{
"status": "healthy",
"timestamp": "2025-12-25T12:00:00Z"
}Readiness Check
Endpoint: GET /ready
Response: 200 OK (ready) or 503 Service Unavailable (not ready)
{
"status": "ready",
"timestamp": "2025-12-25T12:00:00Z"
}Metrics
Endpoint: GET /metrics
Response: Prometheus text format
# HELP aicr_http_requests_total Total number of HTTP requests
# TYPE aicr_http_requests_total counter
aicr_http_requests_total{method="GET",path="/v1/recipe",status="200"} 1234
# HELP aicr_http_request_duration_seconds HTTP request latency in seconds
# TYPE aicr_http_request_duration_seconds histogram
aicr_http_request_duration_seconds_bucket{method="GET",path="/v1/recipe",le="0.005"} 1000
aicr_http_request_duration_seconds_sum{method="GET",path="/v1/recipe"} 12.34
aicr_http_request_duration_seconds_count{method="GET",path="/v1/recipe"} 1234
# HELP aicr_http_requests_in_flight Current number of HTTP requests being processed
# TYPE aicr_http_requests_in_flight gauge
aicr_http_requests_in_flight 5
# HELP aicr_rate_limit_rejects_total Total number of requests rejected due to rate limiting
# TYPE aicr_rate_limit_rejects_total counter
aicr_rate_limit_rejects_total 42
# HELP aicr_panic_recoveries_total Total number of panics recovered in HTTP handlers
# TYPE aicr_panic_recoveries_total counter
aicr_panic_recoveries_total 0Root
Endpoint: GET /
Response: 200 OK
{
"service": "aicrd",
"version": "v1.0.0",
"routes": [
"/v1/recipe"
]
}Usage Examples
cURL Examples
# Basic recipe request
curl "http://localhost:8080/v1/recipe?os=ubuntu&gpu=h100"
# Full specification
curl "http://localhost:8080/v1/recipe?os=ubuntu&service=eks&accelerator=gb200&intent=training&nodes=8"
# With request ID
curl -H "X-Request-Id: 550e8400-e29b-41d4-a716-446655440000" \
"http://localhost:8080/v1/recipe?os=ubuntu&gpu=h100"
# Health check
curl http://localhost:8080/health
# Readiness check
curl http://localhost:8080/ready
# Metrics
curl http://localhost:8080/metricsDemo API Server Deployment
Note: This section describes the demonstration deployment of the
aicrdAPI server for testing and development purposes only. It is not a production service. Users should self-host theaicrdAPI server in their own infrastructure for production use. See the Kubernetes Deployment section below for deployment guidance.
Example: Google Cloud Run
The demo API server is deployed to Google Cloud Run as an example of how to deploy aicrd:
Demo Configuration:
- Platform: Google Cloud Run (fully managed serverless)
- Authentication: Public access (for demo purposes)
- Auto-scaling: 0-100 instances based on load
- Region:
us-west1
CI/CD Pipeline (on-tag.yaml):
Supply Chain Security:
- SLSA Build Level 3 compliance
- Signed SBOMs in SPDX format
- Attestations logged in Rekor transparency log
- Verification:
gh attestation verify oci://ghcr.io/nvidia/aicrd:TAG --owner nvidia
Demo Monitoring:
- Health endpoint:
/health - Readiness endpoint:
/ready - Prometheus metrics:
/metrics - Request tracing with
X-Request-Idheaders
Scaling Behavior (demo):
- Min instances: 0 (scales to zero when idle)
- Max instances: 100 (automatic scaling)
- Cold start: 2-3 seconds
- Request timeout: 30 seconds
- Concurrency: 80 requests per instance
Cloud Run Benefits (for reference):
- Zero operational overhead
- Automatic HTTPS with managed certificates
- Built-in DDoS protection
- Pay-per-use pricing (scales to zero)
- Global load balancing
Client Libraries
Go Client:
import (
"encoding/json"
"fmt"
"net/http"
"net/url"
)
func getRecipe(os, gpu string) (*Recipe, error) {
baseURL := "http://localhost:8080/v1/recipe"
params := url.Values{}
params.Add("os", os)
params.Add("gpu", gpu)
resp, err := http.Get(baseURL + "?" + params.Encode())
if err != nil {
return nil, err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("unexpected status: %d", resp.StatusCode)
}
var recipe Recipe
if err := json.NewDecoder(resp.Body).Decode(&recipe); err != nil {
return nil, err
}
return &recipe, nil
}Python Client:
import requests
def get_recipe(os, gpu):
url = "http://localhost:8080/v1/recipe"
params = {"os": os, "gpu": gpu}
response = requests.get(url, params=params)
response.raise_for_status()
return response.json()
# Usage
recipe = get_recipe("ubuntu", "h100")
print(f"Matched {len(recipe['matchedRuleId'])} rules")Kubernetes Deployment
Deployment Manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: aicrd
namespace: aicr-system
spec:
replicas: 3
selector:
matchLabels:
app: aicrd
template:
metadata:
labels:
app: aicrd
spec:
containers:
- name: server
image: ghcr.io/nvidia/aicrd:v1.0.0
ports:
- containerPort: 8080
name: http
env:
- name: PORT
value: "8080"
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: aicrd
namespace: aicr-system
spec:
selector:
app: aicrd
ports:
- port: 80
targetPort: http
type: ClusterIP
---
apiVersion: v1
kind: ServiceMonitor
metadata:
name: aicrd
namespace: aicr-system
spec:
selector:
matchLabels:
app: aicrd
endpoints:
- port: http
path: /metrics
interval: 30sIngress with TLS
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: aicrd
namespace: aicr-system
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- api.aicr.nvidia.com
secretName: aicr-api-tls
rules:
- host: api.aicr.nvidia.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: aicrd
port:
number: 80HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: aicrd
namespace: aicr-system
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: aicrd
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: aicr_http_requests_in_flight
target:
type: AverageValue
averageValue: "50"Performance Characteristics
Throughput
- Rate Limit: 100 requests/second per instance (configurable)
- Burst: 200 requests (configurable)
- Target Latency: p50 <10ms, p99 <50ms
- Max Concurrent: Limited by rate limiter
Resource Usage
- CPU: ~50m idle, ~200m at 100 req/s
- Memory: ~100MB baseline, ~200MB at peak
- Disk: None (stateless, embedded recipe data)
Scalability
- Horizontal: Fully stateless, linear scaling
- Vertical: Recipe store cached in memory (sync.Once)
- Load Balancing: Round-robin or least-connections
Caching Strategy
- Recipe Store: Loaded once per process, cached globally
- Client-Side: 5-minute cache via Cache-Control header
- CDN: Recommended for public-facing deployments
Error Handling
Error Response Format
All errors follow a consistent JSON structure:
{
"code": "ERROR_CODE",
"message": "Human-readable error message",
"details": {"key": "value"},
"requestId": "uuid",
"timestamp": "2025-12-25T12:00:00Z",
"retryable": true/false
}Error Codes
| Code | HTTP Status | Description | Retryable |
|---|---|---|---|
RATE_LIMIT_EXCEEDED | 429 | Too many requests | Yes |
INVALID_REQUEST | 400 | Invalid parameters or disallowed criteria value | No |
METHOD_NOT_ALLOWED | 405 | Wrong HTTP method | No |
INTERNAL_ERROR | 500 | Server error | Yes |
SERVICE_UNAVAILABLE | 503 | Not ready | Yes |
Allowlist Validation Error Example:
When a request uses a criteria value not in the configured allowlist:
{
"code": "INVALID_REQUEST",
"message": "accelerator type not allowed",
"details": {
"requested": "gb200",
"allowed": ["h100", "l40"]
},
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2026-01-27T12:00:00Z",
"retryable": false
}Error Handling Strategy
- Validation Errors: Return 400 with specific error message
- Rate Limiting: Return 429 with Retry-After header
- Panics: Recover, log, return 500
- Context Cancellation: Return early, cleanup resources
- Resource Exhaustion: Rate limiting prevents this
Security
Attack Mitigation
Rate Limiting:
- Token bucket algorithm prevents abuse
- Per-instance limit (shared across all clients)
- Configurable limits and burst
Header Attacks:
- 64KB header size limit
- 5-second header read timeout
- Prevents slowloris attacks
Resource Exhaustion:
- Request timeouts (read, write, idle)
- In-flight request limits
- Graceful shutdown prevents connection drops
Input Validation:
- Strict enum validation
- Version string parsing with bounds
- UUID validation for request IDs
Production Considerations
TLS:
- Use reverse proxy (nginx, Envoy) for TLS termination
- Or add TLS support to server (future enhancement)
Authentication:
- Add API key middleware (future enhancement)
- Or use service mesh mTLS (Istio, Linkerd)
Authorization:
- Currently none (public API)
- Could add rate limits per API key
Monitoring:
- Prometheus metrics for observability
- Request ID tracking for distributed tracing
- Structured logging for debugging
Monitoring & Observability
Prometheus Metrics
Request Metrics:
aicr_http_requests_total- Total requests by method, path, statusaicr_http_request_duration_seconds- Request latency histogramaicr_http_requests_in_flight- Current active requests
Error Metrics:
aicr_rate_limit_rejects_total- Rate limit rejectionsaicr_panic_recoveries_total- Panic recoveries
Grafana Dashboard
Example queries:
# Request rate
rate(aicr_http_requests_total[5m])
# Error rate
rate(aicr_http_requests_total{status=~"5.."}[5m])
# Latency percentiles
histogram_quantile(0.99, rate(aicr_http_request_duration_seconds_bucket[5m]))
# Rate limit rejections
rate(aicr_rate_limit_rejects_total[5m])Alerting Rules
groups:
- name: aicrd
rules:
- alert: HighErrorRate
expr: rate(aicr_http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
annotations:
summary: High error rate on aicrd
- alert: HighLatency
expr: histogram_quantile(0.99, rate(aicr_http_request_duration_seconds_bucket[5m])) > 0.1
for: 5m
annotations:
summary: High latency on aicrd
- alert: HighRateLimitRejects
expr: rate(aicr_rate_limit_rejects_total[5m]) > 10
for: 5m
annotations:
summary: High rate limit rejectionsDistributed Tracing
Request ID tracking enables correlation:
- Client sends request with
X-Request-Idheader - Server logs all operations with request ID
- Response includes same
X-Request-Id - Client can correlate logs across services
Future: OpenTelemetry integration for full tracing
Testing Strategy
Unit Tests
- Handler validation logic
- Middleware functionality
- Error response formatting
- Query parsing
Integration Tests
- Full HTTP request/response cycle
- Rate limiting behavior
- Graceful shutdown
- Health/ready endpoints
Load Tests
- Sustained load at rate limit
- Burst handling
- Latency under load
- Memory stability
Example Test
func TestRecipeHandler(t *testing.T) {
// Create test server
builder := recipe.NewBuilder()
handler := builder.HandleRecipes
// Create test request
req := httptest.NewRequest(
"GET",
"/v1/recipe?os=ubuntu&gpu=h100",
nil,
)
w := httptest.NewRecorder()
// Execute handler
handler(w, req)
// Verify response
assert.Equal(t, http.StatusOK, w.Code)
var resp recipe.Recipe
err := json.Unmarshal(w.Body.Bytes(), &resp)
assert.NoError(t, err)
assert.Equal(t, "ubuntu", resp.Request.Os)
}Dependencies
External Libraries
net/http- Standard HTTP servergolang.org/x/time/rate- Rate limitinggolang.org/x/sync/errgroup- Concurrent error handlinggithub.com/prometheus/client_golang- Prometheus metricsgithub.com/google/uuid- UUID generationgopkg.in/yaml.v3- Recipe store parsinglog/slog- Structured logging
Internal Packages
pkg/recipe- Recipe building logicpkg/measurement- Data modelpkg/version- Semantic versioningpkg/serializer- JSON response formattingpkg/logging- Logging configuration
Build & Deployment
Automated CI/CD Pipeline
Production builds are automated through GitHub Actions workflows. When a semantic version tag is pushed (e.g., v0.8.12), the on-tag.yaml workflow:
- Validates code with Go CI (tests + linting)
- Builds multi-platform binaries and container images with GoReleaser and ko
- Generates SBOMs (SPDX for binaries and for containers)
- Attests images with SLSA v1.0 provenance and SBOM attestations
- Deploys to Google Cloud Run with Workload Identity Federation
Supply Chain Security:
- SLSA Build Level 3 compliance
- Cosign keyless signing with Fulcio + Rekor
- GitHub Attestation API for provenance
- Multi-platform builds: darwin/linux × amd64/arm64
Verify Release Artifacts:
# Get latest release tag
export TAG=$(curl -s https://api.github.com/repos/NVIDIA/aicr/releases/latest | jq -r '.tag_name')
# Verify attestations
gh attestation verify oci://ghcr.io/nvidia/aicrd:${TAG} --owner nvidiaFor detailed CI/CD architecture, see CONTRIBUTING.md and Architecture Overview.
Local Build Configuration
For local development and testing:
VERSION ?= $(shell git describe --tags --always --dirty)
COMMIT ?= $(shell git rev-parse --short HEAD)
DATE ?= $(shell date -u +%Y-%m-%dT%H:%M:%SZ)
LDFLAGS := -X github.com/NVIDIA/aicr/pkg/api.version=$(VERSION)
LDFLAGS += -X github.com/NVIDIA/aicr/pkg/api.commit=$(COMMIT)
LDFLAGS += -X github.com/NVIDIA/aicr/pkg/api.date=$(DATE)
go build -ldflags="$(LDFLAGS)" -o bin/aicrd ./cmd/aicrdContainer Image
Production images are built with ko (automated in CI/CD). For local development:
FROM golang:1.26-alpine AS builder
WORKDIR /app
COPY . .
RUN go build -ldflags="-X github.com/NVIDIA/aicr/pkg/api.version=v1.0.0" \
-o /bin/aicrd ./cmd/aicrd
FROM alpine:3.19
RUN apk --no-cache add ca-certificates
COPY --from=builder /bin/aicrd /usr/local/bin/
EXPOSE 8080
ENTRYPOINT ["aicrd"]Note: Production images use distroless base (gcr.io/distroless/static) for minimal attack surface.
Environment Variables
| Variable | Default | Description |
|---|---|---|
PORT | 8080 | Server port |
AICR_ALLOWED_ACCELERATORS | (none) | Comma-separated list of allowed GPU types (e.g., h100,l40). If not set, all types allowed. |
AICR_ALLOWED_SERVICES | (none) | Comma-separated list of allowed K8s services (e.g., eks,gke). If not set, all services allowed. |
AICR_ALLOWED_INTENTS | (none) | Comma-separated list of allowed intents (e.g., training). If not set, all intents allowed. |
AICR_ALLOWED_OS | (none) | Comma-separated list of allowed OS types (e.g., ubuntu,rhel). If not set, all OS types allowed. |
Criteria Allowlists:
When allowlist environment variables are configured, the API server validates incoming requests against the allowed values. This enables operators to restrict the API to specific configurations.
# Start server with restricted accelerators
export AICR_ALLOWED_ACCELERATORS=h100,l40
export AICR_ALLOWED_SERVICES=eks,gke
./aicrd
# Server logs on startup:
# INFO criteria allowlists configured accelerators=2 services=2 intents=0 os_types=0
# DEBUG criteria allowlists loaded accelerators=["h100","l40"] services=["eks","gke"] intents=[] os_types=[]Validation behavior:
- Requests with disallowed values return HTTP 400 with error details
- The
anyvalue is always allowed regardless of allowlist - Both
/v1/recipeand/v1/bundleendpoints enforce allowlists - CLI (
aicr) is not affected by allowlists
Future Enhancements
Short-Term (< 3 months)
Authentication & Authorization
Rationale: Protect API from unauthorized access, enable usage tracking
Implementation: API key middleware with HMAC-SHA256 verification
Example:gofunc APIKeyMiddleware(validKeys map[string]string) func(http.Handler) http.Handler { return func(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { key := r.Header.Get("X-API-Key") if _, ok := validKeys[key]; !ok { http.Error(w, "Invalid API key", http.StatusUnauthorized) return } next.ServeHTTP(w, r) }) } }Reference: HTTP Authentication
CORS Support
Use Case: Enable browser-based clients (web dashboards)
Implementation:rs/corsmiddleware with configurable origins
Configuration:goc := cors.New(cors.Options{ AllowedOrigins: []string{"https://dashboard.example.com"}, AllowedMethods: []string{"GET", "POST", "OPTIONS"}, AllowedHeaders: []string{"Content-Type", "X-API-Key"}, AllowCredentials: true, MaxAge: 86400, // 24 hours }) handler := c.Handler(mux)Reference: CORS Specification
Response Compression
Benefit: Reduce bandwidth by 70-80% for JSON responses
Implementation:gziphandlermiddleware with quality thresholdgoimport "github.com/NYTimes/gziphandler" handler := gziphandler.GzipHandler(mux) // Only compresses responses > 1KBTrade-off: CPU usage (+5-10%) vs bandwidth savings
Reference: gziphandlerNative TLS Support
Rationale: Eliminate need for reverse proxy in simple deployments
Implementation:http.ListenAndServeTLSwith Let's Encrypt integrationgoimport "golang.org/x/crypto/acme/autocert" m := &autocert.Manager{ Prompt: autocert.AcceptTOS, Cache: autocert.DirCache("/var/cache/aicr"), HostPolicy: autocert.HostWhitelist("api.example.com"), } srv := &http.Server{ Addr: ":https", TLSConfig: m.TLSConfig(), Handler: handler, } srv.ListenAndServeTLS("", "")Reference: autocert Package
API Versioning
Use Case: Support /v2 API with breaking changes while maintaining /v1
Pattern: URL-based versioning with version-specific handlersgov1 := http.NewServeMux() v1.HandleFunc("/recipe", handleRecipeV1) v2 := http.NewServeMux() v2.HandleFunc("/recipe", handleRecipeV2) mux := http.NewServeMux() mux.Handle("/v1/", http.StripPrefix("/v1", v1)) mux.Handle("/v2/", http.StripPrefix("/v2", v2))Reference: API Versioning Best Practices
Mid-Term (3-6 months)
OpenTelemetry Integration
Use Case: Distributed tracing across services
Implementation: OTLP exporter with automatic instrumentationgoimport ( "go.opentelemetry.io/otel" "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp" "go.opentelemetry.io/otel/sdk/trace" ) func initTracer() (*trace.TracerProvider, error) { exporter, err := otlptracehttp.New(context.Background(), otlptracehttp.WithEndpoint("otel-collector:4318"), otlptracehttp.WithInsecure(), ) if err != nil { return nil, err } tp := trace.NewTracerProvider( trace.WithBatcher(exporter), trace.WithResource(/* service name */), ) otel.SetTracerProvider(tp) return tp, nil }Reference: OpenTelemetry Go
Recipe Caching
Benefit: 95%+ cache hit rate for repeated queries
Implementation: Redis with TTL, fallback to recipe buildergoimport "github.com/redis/go-redis/v9" func getRecipe(ctx context.Context, key string) (*recipe.Recipe, error) { // Try cache first cached, err := rdb.Get(ctx, key).Result() if err == nil { var r recipe.Recipe json.Unmarshal([]byte(cached), &r) return &r, nil } // Cache miss - build recipe r, err := builder.BuildRecipe(ctx, params) if err != nil { return nil, err } // Cache with 1 hour TTL json, _ := json.Marshal(r) rdb.Set(ctx, key, json, time.Hour) return r, nil }Reference: go-redis
GraphQL API
Rationale: Enable clients to request only needed fields
Implementation:graphql-gowith recipe schemagraphqltype Query { recipe( os: String! osVersion: String gpu: String! service: String ): Recipe } type Recipe { request: RequestInfo! measurements: [Measurement!]! context: RecipeContext }Trade-off: Added complexity vs flexible querying
Reference: GraphQL Go
Long-Term (6-12 months)
gRPC Support
Benefit: 5-10x better performance, smaller payloads
Implementation: Protobuf definition with streaming supportprotobufservice RecipeService { rpc GetRecipe(RecipeRequest) returns (Recipe); rpc StreamRecipes(stream RecipeRequest) returns (stream Recipe); rpc GetSnapshot(SnapshotRequest) returns (Snapshot); } message RecipeRequest { string os = 1; string os_version = 2; string gpu = 3; string service = 4; }Deployment: Run HTTP/2 and gRPC on same port with
cmux
Reference: gRPC GoMulti-Tenancy
Use Case: SaaS deployment with per-customer isolation
Implementation: Tenant ID from API key, separate rate limitsgotype TenantRateLimiter struct { limiters map[string]*rate.Limiter mu sync.RWMutex } func (t *TenantRateLimiter) Allow(tenantID string) bool { t.mu.RLock() limiter, exists := t.limiters[tenantID] t.mu.RUnlock() if !exists { t.mu.Lock() limiter = rate.NewLimiter(rate.Limit(100), 200) // Per-tenant t.limiters[tenantID] = limiter t.mu.Unlock() } return limiter.Allow() }Database: Separate recipe stores per tenant
Admin API
Use Case: Runtime configuration updates without restart
Endpoints:POST /admin/config/rate-limit- Update rate limitsPOST /admin/config/log-level- Change log verbosityGET /admin/debug/pprof- CPU/memory profilingPOST /admin/cache/flush- Clear recipe cache Security: Separate admin API key with IP allowlist
Feature Flags
Rationale: A/B testing, gradual rollouts, instant rollback
Implementation: LaunchDarkly or custom flag servicegoimport "github.com/launchdarkly/go-server-sdk/v7" func handleRecipe(w http.ResponseWriter, r *http.Request) { user := ldclient.NewUser(getUserID(r)) // Check feature flag if client.BoolVariation("use-optimized-builder", user, false) { // Use new optimized recipe builder recipe = optimizedBuilder.Build(params) } else { // Fall back to stable builder recipe = stableBuilder.Build(params) } }Reference: LaunchDarkly Go SDK
Production Deployment Patterns
Pattern 1: Kubernetes with Horizontal Pod Autoscaler
Use Case: Auto-scale API servers based on request rate
Deployment Manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: aicrd
namespace: aicr
spec:
replicas: 3 # Initial replicas
selector:
matchLabels:
app: aicrd
template:
metadata:
labels:
app: aicrd
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: aicrd
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: api-server
image: ghcr.io/nvidia/aicrd:latest # Or use specific tag like v0.8.12
ports:
- name: http
containerPort: 8080
protocol: TCP
- name: metrics
containerPort: 9090
protocol: TCP
env:
- name: PORT
value: "8080"
- name: LOG_LEVEL
value: "info"
# Criteria allowlists (optional - omit to allow all values)
- name: AICR_ALLOWED_ACCELERATORS
value: "h100,l40,a100"
- name: AICR_ALLOWED_SERVICES
value: "eks,gke,aks"
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 2
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: recipes
mountPath: /etc/aicr/recipes
readOnly: true
- name: tmp
mountPath: /tmp
volumes:
- name: recipes
configMap:
name: aicr-recipes
- name: tmp
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: aicrd
namespace: aicr
spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
selector:
app: aicrd
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: aicrd-hpa
namespace: aicr
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: aicrd
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100 # Double pods
periodSeconds: 60
- type: Pods
value: 4 # Add 4 pods
periodSeconds: 60
selectPolicy: Max
scaleDown:
stabilizationWindowSeconds: 300 # 5 min cooldown
policies:
- type: Percent
value: 50 # Remove 50% of pods
periodSeconds: 60Ingress with TLS:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: aicrd
namespace: aicr
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/limit-rps: "20"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.aicr.example.com
secretName: aicr-api-tls
rules:
- host: api.aicr.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: aicrd
port:
name: httpPattern 2: Service Mesh with mTLS
Use Case: Zero-trust security with automatic mTLS encryption
Istio VirtualService:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: aicrd
namespace: aicr
spec:
hosts:
- aicrd.aicr.svc.cluster.local
- api.aicr.example.com
gateways:
- aicr-gateway
http:
- match:
- uri:
prefix: /v1/recipe
route:
- destination:
host: aicrd
port:
number: 80
timeout: 10s
retries:
attempts: 3
perTryTimeout: 3s
retryOn: 5xx,reset,connect-failure
headers:
response:
add:
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Strict-Transport-Security: max-age=31536000
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: aicrd
namespace: aicr
spec:
host: aicrd
trafficPolicy:
tls:
mode: ISTIO_MUTUAL # mTLS between services
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 50
http2MaxRequests: 100
maxRequestsPerConnection: 2
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50
---
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: aicrd
namespace: aicr
spec:
selector:
matchLabels:
app: aicrd
mtls:
mode: STRICT # Require mTLS
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: aicrd
namespace: aicr
spec:
selector:
matchLabels:
app: aicrd
action: ALLOW
rules:
- from:
- source:
namespaces: ["aicr", "monitoring"]
to:
- operation:
methods: ["GET", "POST"]
paths: ["/v1/*", "/health", "/metrics"]Pattern 3: Load Balancer with Health Checks
Use Case: Bare-metal deployment with HAProxy
HAProxy Configuration:
global
log /dev/log local0
maxconn 4096
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5s
timeout client 30s
timeout server 30s
retries 3
option redispatch
frontend aicr_api_frontend
bind *:443 ssl crt /etc/ssl/certs/aicr-api.pem
bind *:80
redirect scheme https if !{ ssl_fc }
# Rate limiting
stick-table type ip size 100k expire 30s store http_req_rate(10s)
http-request track-sc0 src
http-request deny deny_status 429 if { sc_http_req_rate(0) gt 100 }
# Security headers
http-response set-header Strict-Transport-Security "max-age=31536000"
http-response set-header X-Content-Type-Options "nosniff"
default_backend aicr_api_backend
backend aicr_api_backend
balance roundrobin
option httpchk GET /health
http-check expect status 200
server api1 10.0.1.10:8080 check inter 10s fall 3 rise 2 maxconn 100
server api2 10.0.1.11:8080 check inter 10s fall 3 rise 2 maxconn 100
server api3 10.0.1.12:8080 check inter 10s fall 3 rise 2 maxconn 100Pattern 4: Blue-Green Deployment
Use Case: Zero-downtime updates with instant rollback
Kubernetes Service Switching:
#!/bin/bash
# Blue-green deployment script
set -euo pipefail
NAMESPACE=aicr
APP=aicrd
NEW_VERSION=$1
# Deploy green version
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${APP}-green
namespace: ${NAMESPACE}
spec:
replicas: 3
selector:
matchLabels:
app: ${APP}
version: green
template:
metadata:
labels:
app: ${APP}
version: green
spec:
containers:
- name: api-server
image: ghcr.io/nvidia/${APP}:${NEW_VERSION}
# ... same spec as blue ...
EOF
# Wait for green to be ready
kubectl rollout status deployment/${APP}-green -n ${NAMESPACE}
# Run smoke tests
GREEN_IP=$(kubectl get svc ${APP}-green -n ${NAMESPACE} -o jsonpath='{.spec.clusterIP}')
curl -f http://${GREEN_IP}/health || (echo "Health check failed" && exit 1)
curl -f "http://${GREEN_IP}/v1/recipe?os=ubuntu&gpu=h100" || (echo "Recipe test failed" && exit 1)
# Switch service to green
kubectl patch service ${APP} -n ${NAMESPACE} -p '{"spec":{"selector":{"version":"green"}}}'
echo "Switched to green (${NEW_VERSION})"
echo "Monitor for 10 minutes, then delete blue deployment"
echo "Rollback: kubectl patch service ${APP} -n ${NAMESPACE} -p '{\"spec\":{\"selector\":{\"version\":\"blue\"}}}'"
# Optional: Auto-delete blue after monitoring period
# sleep 600
# kubectl delete deployment ${APP}-blue -n ${NAMESPACE}Reliability Patterns
Circuit Breaker
Use Case: Prevent cascading failures when recipe store is slow
Implementation:
import "github.com/sony/gobreaker"
var (
recipeStoreBreaker *gobreaker.CircuitBreaker
)
func init() {
settings := gobreaker.Settings{
Name: "RecipeStore",
MaxRequests: 3, // Half-open state allows 3 requests
Interval: 60 * time.Second, // Reset counts every 60s
Timeout: 30 * time.Second, // Stay open for 30s
ReadyToTrip: func(counts gobreaker.Counts) bool {
failureRatio := float64(counts.TotalFailures) / float64(counts.Requests)
return counts.Requests >= 10 && failureRatio >= 0.6
},
OnStateChange: func(name string, from gobreaker.State, to gobreaker.State) {
log.Info("Circuit breaker state changed",
"name", name,
"from", from,
"to", to,
)
},
}
recipeStoreBreaker = gobreaker.NewCircuitBreaker(settings)
}
func handleRecipe(w http.ResponseWriter, r *http.Request) {
result, err := recipeStoreBreaker.Execute(func() (interface{}, error) {
return buildRecipe(r.Context(), params)
})
if err != nil {
if errors.Is(err, gobreaker.ErrOpenState) {
http.Error(w, "Service temporarily unavailable", http.StatusServiceUnavailable)
return
}
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
recipe := result.(*recipe.Recipe)
json.NewEncoder(w).Encode(recipe)
}Reference: gobreaker
Bulkhead Pattern
Use Case: Isolate resources for different endpoints
Implementation:
import "golang.org/x/sync/semaphore"
var (
// Separate semaphores for different endpoints
recipeSem = semaphore.NewWeighted(100) // 100 concurrent recipe requests
snapshotSem = semaphore.NewWeighted(10) // 10 concurrent snapshot requests
)
func handleRecipeWithBulkhead(w http.ResponseWriter, r *http.Request) {
// Acquire from recipe bulkhead
if !recipeSem.TryAcquire(1) {
http.Error(w, "Too many requests", http.StatusTooManyRequests)
return
}
defer recipeSem.Release(1)
// Process request
handleRecipe(w, r)
}
func handleSnapshotWithBulkhead(w http.ResponseWriter, r *http.Request) {
// Acquire from snapshot bulkhead (more expensive operation)
if !snapshotSem.TryAcquire(1) {
http.Error(w, "Too many requests", http.StatusTooManyRequests)
return
}
defer snapshotSem.Release(1)
handleSnapshot(w, r)
}Benefit: Recipe slowness doesn't affect snapshot endpoint
Retry with Exponential Backoff
Use Case: Resilient calls to external APIs (recipe store, etc.)
Implementation:
import "github.com/cenkalti/backoff/v4"
func fetchRecipeWithRetry(ctx context.Context, key string) (*recipe.Recipe, error) {
var r *recipe.Recipe
operation := func() error {
var err error
r, err = recipeStore.Get(ctx, key)
// Don't retry on 404
if errors.Is(err, ErrNotFound) {
return backoff.Permanent(err)
}
return err
}
// Exponential backoff: 100ms, 200ms, 400ms, 800ms, 1.6s, 3.2s
bo := backoff.NewExponentialBackOff()
bo.InitialInterval = 100 * time.Millisecond
bo.MaxInterval = 5 * time.Second
bo.MaxElapsedTime = 30 * time.Second
err := backoff.Retry(operation, backoff.WithContext(bo, ctx))
return r, err
}Reference: backoff
Graceful Degradation
Use Case: Serve stale/cached data when primary source fails
Implementation:
var (
recipeCacheTTL = 1 * time.Hour
recipeCache = sync.Map{}
)
type cachedRecipe struct {
recipe *recipe.Recipe
timestamp time.Time
}
func handleRecipeWithFallback(w http.ResponseWriter, r *http.Request) {
key := buildCacheKey(r)
// Try primary source
recipe, err := buildRecipe(r.Context(), params)
if err == nil {
// Cache successful response
recipeCache.Store(key, cachedRecipe{
recipe: recipe,
timestamp: time.Now(),
})
json.NewEncoder(w).Encode(recipe)
return
}
// Primary failed - try cache
if cached, ok := recipeCache.Load(key); ok {
cr := cached.(cachedRecipe)
age := time.Since(cr.timestamp)
log.Warn("Serving stale recipe",
"key", key,
"age", age,
"error", err,
)
w.Header().Set("X-Cache", "stale")
w.Header().Set("X-Cache-Age", age.String())
json.NewEncoder(w).Encode(cr.recipe)
return
}
// No cache available
http.Error(w, "Service unavailable", http.StatusServiceUnavailable)
}Performance Optimization
Connection Pooling
HTTP Client with Keep-Alive:
var httpClient = &http.Client{
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
DisableCompression: false,
ForceAttemptHTTP2: true,
},
Timeout: 10 * time.Second,
}
// Reuse client for all outbound requests
resp, err := httpClient.Get("https://recipe-store.example.com/recipes")Response Caching
In-Memory Cache with TTL:
import "github.com/patrickmn/go-cache"
var (
responseCache = cache.New(5*time.Minute, 10*time.Minute)
)
func handleRecipeWithCache(w http.ResponseWriter, r *http.Request) {
key := buildCacheKey(r)
// Check cache
if cached, found := responseCache.Get(key); found {
w.Header().Set("X-Cache", "hit")
w.Header().Set("Content-Type", "application/json")
w.Write(cached.([]byte))
return
}
// Cache miss - build recipe
recipe, err := buildRecipe(r.Context(), params)
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
// Serialize and cache
data, _ := json.Marshal(recipe)
responseCache.Set(key, data, cache.DefaultExpiration)
w.Header().Set("X-Cache", "miss")
w.Header().Set("Content-Type", "application/json")
w.Write(data)
}Request Coalescing
Deduplicate Concurrent Identical Requests:
import "golang.org/x/sync/singleflight"
var requestGroup singleflight.Group
func handleRecipeWithCoalescing(w http.ResponseWriter, r *http.Request) {
key := buildCacheKey(r)
// Deduplicate requests with same key
result, err, shared := requestGroup.Do(key, func() (interface{}, error) {
return buildRecipe(r.Context(), params)
})
if shared {
w.Header().Set("X-Request-Coalesced", "true")
}
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
json.NewEncoder(w).Encode(result)
}Benefit: 10 concurrent identical requests = 1 recipe build
Memory Profiling
# Enable pprof endpoint
import _ "net/http/pprof"
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
# Capture heap profile
curl http://localhost:6060/debug/pprof/heap > heap.prof
# Analyze
go tool pprof heap.prof
(pprof) top10
(pprof) list buildRecipe
# Check for memory leaks
# Compare two profiles taken 5 minutes apart
go tool pprof -base heap1.prof heap2.prof
(pprof) top10 # Shows allocations between profilesSecurity Hardening
Rate Limiting Per IP
import "golang.org/x/time/rate"
type ipRateLimiter struct {
limiters map[string]*rate.Limiter
mu sync.RWMutex
rate rate.Limit
burst int
}
func newIPRateLimiter(r rate.Limit, b int) *ipRateLimiter {
return &ipRateLimiter{
limiters: make(map[string]*rate.Limiter),
rate: r,
burst: b,
}
}
func (i *ipRateLimiter) getLimiter(ip string) *rate.Limiter {
i.mu.RLock()
limiter, exists := i.limiters[ip]
i.mu.RUnlock()
if !exists {
i.mu.Lock()
limiter = rate.NewLimiter(i.rate, i.burst)
i.limiters[ip] = limiter
// Cleanup old limiters (simple implementation)
if len(i.limiters) > 10000 {
i.limiters = make(map[string]*rate.Limiter)
}
i.mu.Unlock()
}
return limiter
}
func (i *ipRateLimiter) middleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
ip := getClientIP(r)
limiter := i.getLimiter(ip)
if !limiter.Allow() {
http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
func getClientIP(r *http.Request) string {
// Check X-Forwarded-For header (behind proxy)
xff := r.Header.Get("X-Forwarded-For")
if xff != "" {
ips := strings.Split(xff, ",")
return strings.TrimSpace(ips[0])
}
// Fall back to RemoteAddr
ip, _, _ := net.SplitHostPort(r.RemoteAddr)
return ip
}Input Validation
import "github.com/go-playground/validator/v10"
var validate = validator.New()
type RecipeRequest struct {
OS string `validate:"required,oneof=ubuntu rhel cos"`
OSVersion string `validate:"omitempty,semver"`
GPU string `validate:"required,oneof=h100 gb200 a100 l40"`
Service string `validate:"omitempty,oneof=eks gke aks self-managed"`
}
func handleRecipe(w http.ResponseWriter, r *http.Request) {
req := RecipeRequest{
OS: r.URL.Query().Get("os"),
OSVersion: r.URL.Query().Get("osv"),
GPU: r.URL.Query().Get("gpu"),
Service: r.URL.Query().Get("service"),
}
if err := validate.Struct(req); err != nil {
validationErrors := err.(validator.ValidationErrors)
http.Error(w, validationErrors.Error(), http.StatusBadRequest)
return
}
// Proceed with validated input
}Security Headers Middleware
func securityHeadersMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// HSTS
w.Header().Set("Strict-Transport-Security",
"max-age=31536000; includeSubDomains; preload")
// Prevent MIME sniffing
w.Header().Set("X-Content-Type-Options", "nosniff")
// Prevent clickjacking
w.Header().Set("X-Frame-Options", "DENY")
// XSS protection
w.Header().Set("X-XSS-Protection", "1; mode=block")
// CSP
w.Header().Set("Content-Security-Policy",
"default-src 'none'; script-src 'self'; connect-src 'self'; img-src 'self'; style-src 'self';")
// Referrer policy
w.Header().Set("Referrer-Policy", "strict-origin-when-cross-origin")
next.ServeHTTP(w, r)
})
}Observability
Custom Metrics
import "github.com/prometheus/client_golang/prometheus"
var (
recipeBuildDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "aicr_recipe_build_duration_seconds",
Help: "Time to build recipe",
Buckets: prometheus.ExponentialBuckets(0.001, 2, 12), // 1ms to 4s
},
[]string{"os", "gpu", "service"},
)
recipeCacheHits = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "aicr_recipe_cache_hits_total",
Help: "Number of recipe cache hits",
},
[]string{"cache_type"},
)
activeConnections = prometheus.NewGauge(
prometheus.GaugeOpts{
Name: "aicr_active_connections",
Help: "Number of active HTTP connections",
},
)
)
func init() {
prometheus.MustRegister(
recipeBuildDuration,
recipeCacheHits,
activeConnections,
)
}
func handleRecipe(w http.ResponseWriter, r *http.Request) {
start := time.Now()
defer func() {
duration := time.Since(start).Seconds()
recipeBuildDuration.WithLabelValues(
params.OS,
params.GPU,
params.Service,
).Observe(duration)
}()
// Check cache
if cached, found := cache.Get(key); found {
recipeCacheHits.WithLabelValues("memory").Inc()
// ...
}
// Build recipe
// ...
}Structured Logging with Context
import "log/slog"
func handleRecipe(w http.ResponseWriter, r *http.Request) {
// Create logger with request context
logger := slog.With(
"request_id", r.Header.Get("X-Request-ID"),
"remote_addr", r.RemoteAddr,
"user_agent", r.UserAgent(),
)
logger.Info("Handling recipe request",
"os", params.OS,
"gpu", params.GPU,
)
recipe, err := buildRecipe(r.Context(), params)
if err != nil {
logger.Error("Failed to build recipe",
"error", err,
"params", params,
)
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
logger.Info("Recipe built successfully",
"measurement_count", len(recipe.Measurements),
"duration_ms", time.Since(start).Milliseconds(),
)
json.NewEncoder(w).Encode(recipe)
}Distributed Tracing
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/trace"
)
func handleRecipe(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
tracer := otel.Tracer("aicrd")
ctx, span := tracer.Start(ctx, "handleRecipe",
trace.WithAttributes(
attribute.String("os", params.OS),
attribute.String("gpu", params.GPU),
),
)
defer span.End()
// Propagate context to child operations
recipe, err := buildRecipeWithTrace(ctx, params)
if err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
span.SetAttributes(
attribute.Int("measurement_count", len(recipe.Measurements)),
)
json.NewEncoder(w).Encode(recipe)
}
func buildRecipeWithTrace(ctx context.Context, params Params) (*recipe.Recipe, error) {
tracer := otel.Tracer("aicrd")
ctx, span := tracer.Start(ctx, "buildRecipe")
defer span.End()
// Build recipe with traced context
return builder.Build(ctx, params)
}References
Official Documentation
- net/http Package - Go standard HTTP library
- golang.org/x/time/rate - Token bucket rate limiter
- errgroup - Concurrent error handling
- context Package - Request cancellation and deadlines
- slog Package - Structured logging
Production Patterns
- Kubernetes Patterns - Deployment, scaling, networking
- Twelve-Factor App - Cloud-native application principles
- Google SRE Book - Site reliability engineering
- Release Engineering - Deployment best practices
HTTP & APIs
- HTTP/2 in Go - HTTP/2 server push
- RESTful API Design - Google Cloud API design guide
- OpenAPI Specification - API documentation standard
- API Versioning - Version management strategies
Observability
- Prometheus Go Client - Metrics collection
- OpenTelemetry Go - Distributed tracing
- Grafana Dashboards - Metrics visualization
- Jaeger Tracing - Distributed tracing backend
Security
- OWASP API Security - API security risks
- HTTP Security Headers - Security header reference
- Rate Limiting Strategies - Google Cloud guide
- mTLS in Kubernetes - Istio mutual TLS
Performance
- Go Performance Tips - Optimization techniques
- pprof Profiler - CPU and memory profiling
- High Performance Go - Dave Cheney's workshop
- Go Memory Model - Concurrency guarantees
Reliability
- Circuit Breaker Pattern - Failure isolation
- Retry with Backoff - Resilient retries
- Chaos Engineering - Resilience testing principles
- SLOs and Error Budgets - Reliability targets