Contributing
Thank you for your interest in contributing to NVIDIA AICR! We welcome contributions from developers of all backgrounds and experience levels.
Table of Contents
- Code of Conduct
- Getting Started
- How to Contribute
- Design Principles
- Pull Request Process
- Developer Certificate of Origin
- Tips for Contributors
Code of Conduct
This project follows NVIDIA's commitment to fostering an open and welcoming environment. Please be respectful and professional in all interactions. See CODE_OF_CONDUCT.md for details.
Getting Started
Before contributing:
- Read the Getting Started Guide to understand the project
- Check existing issues to avoid duplicates
- Review the security policy for security-related contributions
- Set up your development environment following DEVELOPMENT.md
How to Contribute
Reporting Bugs
- Use the bug report template
- Describe the issue clearly with steps to reproduce
- Include system information (OS, Go version, Kubernetes version)
- Attach logs or screenshots if applicable
- Check if the issue already exists before creating a new one
Suggesting Enhancements
- Use the feature request template
- Clearly describe the proposed feature and its use case
- Explain how it benefits the project and users
- Provide examples or mockups if applicable
Improving Documentation
- Fix typos, clarify instructions, or add examples
- Update README.md for user-facing changes
- Update API documentation when endpoints change
- Ensure code comments are accurate and helpful
Contributing Code
- Fix bugs, add features, or improve performance
- Follow the development workflow in DEVELOPMENT.md
- Ensure all tests pass and code meets quality standards
- Write tests for new functionality
Go dependencies (vendor)
This project vendors Go dependencies. After changing go.mod or go.sum, run make tidy (which runs go mod vendor) and commit go.mod, go.sum, and the vendor/ directory. CI will fail if vendor/ is out of sync.
Adding Validation Constraints
AICR uses a validator framework to check cluster state against requirements. To add new validation constraints:
Quick Start:
# Generate all necessary files
make generate-validator ARGS="--constraint Deployment.my-app.version --phase deployment --description 'Validates my-app version'"This creates three files with TODOs guiding implementation:
- Helper functions with validation logic
- Unit tests with table-driven test cases
- Integration test with automatic registration
Next Steps:
- Implement the TODOs in generated files
- Add comprehensive test cases
- Run
make test- registration validation ensures completeness - Submit PR - CI enforces all requirements
See Validator Development Guide for complete guide with examples, architecture overview, and troubleshooting.
Design Principles
These principles guide all design decisions in AICR. When faced with trade-offs, these principles take precedence.
Local Development Equals CI
The same tools, same versions, and same validation run locally and in CI.
What: Tool versions are centralized in .settings.yaml. Both make tools-setup (local) and GitHub Actions use this single source of truth. make qualify runs the exact same checks as CI.
Why: "Works on my machine" is not acceptable. If a contributor can run make qualify locally and it passes, CI will pass. This eliminates surprise failures and reduces feedback loops.
Adoption Comes from Idiomatic Experience
The system integrates into how users already work. We provide validated configuration, not a new operational model.
What: AICR outputs standard formats (Helm values, Kubernetes manifests) that work with existing tools (kubectl, ArgoCD, Flux). Users don't need to learn "the AICR way" of deploying.
Why: If adoption requires retraining users on a new workflow, our design has failed. Value comes from correctness, not from lock-in.
Correctness Must Be Reproducible
Given the same inputs, the same system version must always produce the same result (e.g. recipe, bundle artifacts).
What: No hidden state, no implicit defaults, no non-deterministic behavior. A recipe/bundle/image digest generated using the same version of aicr today must be identical to one generated tomorrow.
Why: Reproducibility is a prerequisite for debugging, validation, and trust. If users can't reproduce a result, they can't trust it.
Metadata Is Separate from Consumption
Validated configuration exists independent of how it is rendered, packaged, or deployed.
What: Recipes define what is correct. Bundlers and deployers determine how to deliver it (Helm, ArgoCD, raw manifests). The recipe doesn't change based on the deployment mechanism.
Why: This prevents tight coupling of correctness to a specific tool, workflow, or delivery mechanism. Users can adopt new deployment tools without re-validating their configurations.
Recipe Specialization Requires Explicit Intent
More specific recipes are never matched unless explicitly requested. Generic intent cannot silently resolve to specialized configurations.
What: If a user requests a "training" recipe, they get the training configuration. The system never silently upgrades to a more specific variant (e.g., "training-distributed-horovod") without explicit opt-in.
Why: This prevents accidental misconfiguration and preserves user control. Surprises in infrastructure configuration are dangerous.
Trust Requires Verifiable Provenance
Trust is established through evidence, not assertions. Every released artifact carries verifiable proof of origin and build process.
What: All releases include SLSA Build Level 3 provenance, SBOM attestations, and Sigstore signatures. Users can verify exactly which commit, workflow, and build produced any artifact.
Why: This underpins supply-chain security, compliance, and confidence. "Trust us" is not a security model.
Pull Request Process
Before Submitting
Ensure all checks pass:
bashmake qualifyUpdate documentation if needed:
- README.md for user-facing changes
- DEVELOPMENT.md for developer workflow changes
- Code comments and godoc for API changes
Commit with required provenance:
bash# External contributors (DCO sign-off required) git commit -s -m "feat: add network collector - Implement NetworkCollector interface - Add unit tests with 80% coverage - Update factory registration Fixes #123" # NVIDIA org members / automation (DCO sign-off exempt) git commit -S -m "feat: add network collector"External contributors must use
-s. NVIDIA organization members are exempt from DCO bot sign-off checks and should use cryptographic signing (-S).
Creating the Pull Request
- Push your branch and open a PR against
main - Fill out the PR template completely:
- Summary: Brief description of changes
- Type of Change: Bug fix, feature, breaking change, etc.
- Testing: What testing was performed
- Checklist: Verify all items
Review Process
Automated Checks run via GitHub Actions:
- Go tests with race detector
- golangci-lint
- YAML linting
- Security scans (Anchore in CI, Grype in
make scan) - Coverage tracking
- E2E tests
Maintainer Review covers:
- Correctness and functionality
- Code style and Go idioms
- Test coverage and quality
- Documentation completeness
Address Feedback by pushing new commits:
bashgit commit -s -m "address review: improve error handling" # external contributors # or git commit -S -m "address review: improve error handling" # NVIDIA org members / automation git push origin your-branchMerge: Once approved and CI passes, a maintainer will merge
Issue and PR Lifecycle
Automated bots manage the lifecycle of issues and pull requests:
| Day | Action |
|---|---|
| 0 | Issue/PR opened, needs-triage label added to issues |
| 14 | Inactive PRs receive a reminder comment |
| 30 | Inactive PRs marked lifecycle/stale |
| 44 | Stale PRs auto-closed |
| 60 | Inactive issues marked lifecycle/stale |
| 74 | Stale issues auto-closed |
| 90+ | Closed issues/PRs locked |
To prevent auto-close: Add the lifecycle/frozen label. PRs with do-not-merge are also exempt.
After Merging
# Update your local repository
git checkout main
git pull upstream main
# Delete your feature branch
git branch -d your-branch
git push origin --delete your-branchDeveloper Certificate of Origin
Contributions must satisfy Developer Certificate of Origin (DCO) policy. External contributors (non-NVIDIA organization members) must include a DCO sign-off on each commit. NVIDIA organization members are exempt from DCO bot sign-off checks and should use cryptographic signing (-S).
How to Sign Off (External Contributors)
Add the -s flag to your commit:
git commit -s -m "Your commit message"This adds a "Signed-off-by" line:
Signed-off-by: Jane Developer <jane@example.com>Configure Git for Automatic Sign-off
git config user.name "Your Name"
git config user.email "your.email@example.com"Amending Commits
If you forget to sign off:
git commit --amend --signoff
git push --force-with-lease origin your-branchNVIDIA Org Members and Automation
NVIDIA organization members are exempt from DCO bot sign-off checks (.github/dco.yml). Use cryptographic commit signing:
git commit -S -m "Your commit message"What You're Certifying
By signing off, you certify the Developer Certificate of Origin 1.1:
Developer's Certificate of Origin 1.1
By making a contribution to this project, I certify that:
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or
(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.
(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.Tips for Contributors
First-Time Contributors
Recommended starting points:
- Start with issues labeled
good first issue - Read existing code in the package you're modifying before writing
- Run
make tools-checkto verify your environment - Study the Design Principles section
Good first contributions:
- Documentation improvements (typos, clarifications)
- Adding test cases to existing tests
- Improving error messages with better context
Writing Good Commit Messages
Short summary (50 chars or less)
More detailed explanation if needed. Wrap at 72 characters.
Explain the problem being solved and why this approach was chosen.
- Bullet points are fine
- Use present tense ("Add feature" not "Added feature")
- Reference issues: "Fixes #123" or "Related to #456"
Signed-off-by: Your Name <your@email.com>Code Style
- Follow existing patterns in the codebase
- Use
pkg/errorsfor error handling (notfmt.Errorf) - Always check
ctx.Done()in loops and long operations - Write table-driven tests for multiple test cases
- Use functional options for configuration
Getting Help
- GitHub Issues: Create an issue with the "question" label
- Existing Issues: Search for similar questions first
- Recent PRs: Look at merged PRs for examples
Additional Resources
- DEVELOPMENT.md - Development setup, architecture, and tooling
- Getting Started Guide - Project overview and quick start
- Documentation Overview - System overview and glossary
- Architecture Documentation - Architecture documentation
Thank you for contributing to NVIDIA AICR! Your efforts help improve GPU-accelerated infrastructure for everyone.