CI/CD Pipeline¶
Complete guide to DataEngineX continuous integration and release automation.
Quick Links: CI Workflow ยท Release Automation ยท Troubleshooting ยท Quick Reference
๐ Table of Contents¶
- Overview
- Project Structure
- Continuous Integration (CI)
- Release Automation
- Rollback Procedures
- Pipeline Metrics
- CI/CD Evolution
- Troubleshooting
- Best Practices
- Related Documentation
- Quick Reference
Overview¶
DEX is a pure Python library published to PyPI. The pipeline is:
- CI: Automated testing, linting, and security scanning on every PR
- Release: Push a
v{X.Y.Z}tag tomainโrelease.ymlbuilds, publishes, and creates a GitHub Release with CycloneDX SBOM
graph LR
Dev[Developer] --> PR[Create PR to dev]
PR --> CI[CI: Quality/Test/Security]
CI --> Review[Code Review]
Review --> MergeDev[Merge to dev]
MergeDev --> PRMain[PR dev โ main]
PRMain --> MergeMain[Merge to main]
MergeMain --> Tag[Push tag vX.Y.Z]
Tag --> Release[release.yml]
Release --> Build[Build wheel + sdist]
Build --> PyPI[Publish to PyPI<br/>Trusted Publishing OIDC]
Build --> GHRelease[GitHub Release<br/>+ CycloneDX SBOM]
style CI fill:#e1f5ff
style Release fill:#f8f5ff
style PyPI fill:#d4edda
style GHRelease fill:#d4edda
Project Structure¶
DEX is a single-package repo:
| Component | Location | Purpose | Release |
|---|---|---|---|
| dataenginex | src/dataenginex/ |
Core framework (API, middleware, storage, ML) | PyPI (v{version}) |
Unified Testing¶
The root pyproject.toml defines the package and test config:
name = "dataenginex",version = "<current>"(seepyproject.toml)[tool.hatch.build.targets.wheel] packages = ["src/dataenginex"]- Dependency groups:
dev(required),data(PySpark),notebook(pandas),ml(sentence-transformers)
CI workflow (ci.yml) runs two sequential jobs:
qualityjob:uv sync --group ml+poe quality(lint + imports-check + typecheck + security audit)testjob (needs quality):poe test-cov-coreโ pytest with coverage, uploads to Codecovtest-compatjob: weekly schedule only โ Python 3.11/3.12 compatibility matrixconcurrency: cancel-in-progress: trueโ stale runs cancelled on new push
Release Automation¶
- Release: Push tag
v{X.Y.Z}tomainโrelease.ymltriggers three parallel jobs: build wheel+sdist, publish to PyPI via OIDC trusted publishing, and create GitHub Release with CycloneDX SBOM attached
Continuous Integration (CI)¶
Workflow: .github/workflows/ci.yml
Triggers:
- Push to
mainordevbranches - Pull requests targeting
mainordev
Jobs:
1. Code Quality (quality job)¶
Requirements: Must pass before the test job starts.
2. Tests (test job)¶
Coverage threshold: 80%. Results uploaded to Codecov with flags: dataenginex.
3. Python Compatibility (test-compat job)¶
Runs on a weekly schedule only (not on every PR). Tests against Python 3.11 and 3.12 to catch compatibility regressions before they affect users on older versions.
4. Security Scans¶
Runs via the shared reusable workflow at .github/workflows/security.yml:
- Trivy: Misconfig and secret scan โ results uploaded to GitHub Security tab; HIGH/CRITICAL misconfiguration gate blocks the job
- CodeQL: Static analysis โ handled by GitHub's default setup (results in Security tab)
Results: Available in the GitHub Security tab.
Release Automation¶
Workflow: .github/workflows/release.yml
Trigger: Push a tag matching v[0-9]+.[0-9]+.[0-9]+ to main
Jobs:
- build โ
uv buildโ upload wheel + sdist as artifact - publish-pypi โ download artifact โ
pypa/gh-action-pypi-publish(OIDC trusted publishing, no API token needed) - github-release โ generate CycloneDX SBOM โ
gh release createwith SBOM attached
How to release:
# After merging to main, create and push the tag
git tag v1.2.3
git push origin v1.2.3
# Monitor the release workflow
gh run list --workflow=release.yml --limit 5
gh run watch
PyPI trusted publishing: Configured at pypi.org/manage/project/dataenginex/settings/publishing/. Environment name: pypi. No API tokens โ uses GitHub OIDC.
Flow:
feature โ PR to dev โ PR to main โ merge โ git tag vX.Y.Z โ push tag โ release.yml โ PyPI + GitHub Release
Rollback Procedures¶
Rollback a PyPI Release¶
PyPI does not support deleting releases, but you can:
- Yank the release on PyPI (marks it as broken;
pip installavoids it by default):
- Publish a patch release with the fix:
# The pre-commit hook auto-bumps the patch version on commit
git commit -m "fix: revert breaking change"
git push origin main
git tag v<new-patch>
git push origin v<new-patch>
Rollback a Git Tag¶
# Delete tag locally and remotely
git tag -d v<version>
git push origin :refs/tags/v<version>
# Delete the GitHub release via gh CLI
gh release delete v<version> --yes
Pipeline Metrics¶
Build Times¶
- CI (Lint + Test): ~2 minutes
- Package validation: ~1 minute
- PyPI publish: ~2 minutes
Success Rates (Target)¶
- CI Pass Rate: >95%
- Release Success Rate: >99%
Monitoring¶
# Recent CI runs
gh run list --workflow ci.yml --limit 10
# Recent releases
gh run list --workflow release.yml --limit 10
# Failed builds
gh run list --workflow release.yml --status failure
CI/CD Evolution¶
Current State โ ¶
- Automated CI with lint, test, type checks
- Security scanning (CodeQL, Semgrep)
- Automated PyPI release on tag push
- Package validation (wheel + twine check)
- GitHub Pages documentation deployment
Future Enhancements ๐¶
- E2E smoke tests: Post-release validation (install from PyPI and run examples)
- Slack notifications: Release status updates
- Release notes: Auto-generated from commits
- Canary releases: TestPyPI smoke test before PyPI promotion
Troubleshooting¶
CI Fails with Lint Errors¶
PyPI Publish Not Triggering¶
- Confirm tag
v{X.Y.Z}was pushed tomain(notdev) - Verify PyPI trusted publisher matches: workflow
release.yml, environmentpypi - View workflow logs:
gh run list --workflow release.yml
Package Build Fails¶
# Build locally to diagnose
uv build
twine check dist/*
# Verify pyproject.toml metadata
uv run python -c "import dataenginex; print(dataenginex.__version__)"
Best Practices¶
Development Workflow¶
- Create feature branch from
dev - Develop and test locally
- Run quality checks before committing:
uv run poe lint,uv run poe typecheck,uv run poe test - Create PR targeting
dev - Wait for CI to pass
- Get code review approval
- Merge to dev โ integration testing
- Create release PR from
devโmain - Merge to main โ bump version if releasing
Commit Messages¶
Use conventional commits for clarity:
feat: add new endpoint for data processing
fix: resolve memory leak in pipeline
chore: update dependencies
docs: improve deployment runbook
test: add integration tests for API
PR Guidelines¶
- Keep PRs small: \<500 lines of code
- Single purpose: One feature/fix per PR
- Test coverage: Include tests for new code
- Documentation: Update docs for API changes
Related Documentation¶
Next Steps:
- Deployment Runbook (in
infradexrepo) - Release procedures - Observability - Monitor applications built on DEX
- Contributing Guide - Development workflow
Quick Reference¶
Workflows Overview¶
| Workflow | Trigger | Purpose | File |
|---|---|---|---|
| CI | push main/dev, PRs to main/dev |
Code quality (poe quality) + tests + weekly compat |
ci.yml |
| Security | push main/dev, PRs to main/dev |
Trivy (misconfig + secrets) + CodeQL (default setup) | security.yml |
| Release | Push tag v*.*.* to main |
Build โ PyPI (trusted publishing) + GitHub Release + CycloneDX SBOM | release.yml |
Local Commands¶
# Local development
uv lock
uv sync
uv run poe test
uv run poe lint
# Local with all dependencies (data + notebook)
uv sync --group data --group notebook
uv run poe test-cov
# Create PR
gh pr create --title "feat: add feature" --body "Description"
# Trigger optional integration tests
gh pr edit <pr-number> --add-label full-test
# Check CI status
gh pr checks <pr-number>
# Monitor CI
gh run list --workflow ci.yml
gh run view <run-id> --log
# Release: push tag to trigger release.yml
git tag v<version> && git push origin v<version>
gh run list --workflow release.yml