CI/CD Pipeline¶
Complete guide to DataEngineX continuous integration and release automation.
Quick Links: CI Workflow ยท Release Automation ยท PyPI Publishing ยท Troubleshooting ยท Quick Reference
๐ Table of Contents¶
- Overview
- Project Structure
- Continuous Integration (CI)
- Release Automation
- PyPI Publishing
- Rollback Procedures
- Pipeline Metrics
- CI/CD Evolution
- Troubleshooting
- Best Practices
- Related Documentation
- Quick Reference
Overview¶
DEX is a pure Python library published to PyPI. The pipeline is:
- CI: Automated testing, linting, and security scanning on every PR
- Release: Automated tagging and GitHub release creation on version bumps
- PyPI Publish: Automated publishing triggered by GitHub releases
graph LR
Dev[Developer] --> PR[Create PR]
PR --> CI[CI: Lint/Test/Security]
CI --> Review[Code Review]
Review --> MergeMain[Merge to main]
MergeMain --> VersionBump{Version bump?}
VersionBump -->|Yes| Release[release-dataenginex.yml<br/>Create tag + release]
Release --> PyPI[pypi-publish.yml<br/>Publish to PyPI]
style CI fill:#e1f5ff
style Release fill:#f8f5ff
style PyPI fill:#d4edda
Project Structure¶
DEX is a single-package repo:
| Component | Location | Purpose | Release |
|---|---|---|---|
| dataenginex | src/dataenginex/ |
Core framework (API, middleware, storage, ML) | PyPI (v{version}) |
Unified Testing¶
The root pyproject.toml defines the package and test config:
name = "dataenginex",version = "<current>"(seepyproject.toml)[tool.hatch.build.targets.wheel] packages = ["src/dataenginex"]- Dependency groups:
dev(required),data(PySpark/Airflow),notebook(pandas),ml(sentence-transformers),dashboard(streamlit)
CI workflow (ci.yml) runs in a single job (poe lint โ poe typecheck โ pytest):
- Single
cijob:uv sync --all-extras+poe lint+poe typecheck+pytest --cov concurrency: cancel-in-progress: trueโ stale runs cancelled on new pushpaths-ignoreโ skips CI on doc-only changes
Release Automation¶
- Release automation:
release-please.ymlreads conventional commits โ creates Release PR (bumpspyproject.toml+CHANGELOG.md+uv.lock) โ on merge createsv{version}tag + GitHub Release - Post-release:
release-dataenginex.ymlgenerates CycloneDX SBOM and attaches it to the release - PyPI publishing (
pypi-publish.yml): Triggered by GitHub release published โ detects changes insrc/dataenginex/since lastv*tag โ publishes to TestPyPI then PyPI
Continuous Integration (CI)¶
Workflow: .github/workflows/ci.yml
Triggers:
- Push to
mainordevbranches - Pull requests targeting
mainordev
Jobs:
1. Lint and Test¶
Runs code quality checks and test suite:
Requirements: All checks must pass before merge
2. Security Scans¶
Runs in parallel via .github/workflows/security.yml:
- CodeQL: Static analysis for security vulnerabilities
- Semgrep: OWASP Top 10 and best practice checks
Results: Available in GitHub Security tab
3. Integration Test (Optional)¶
Optional job for full dependency coverage (PySpark, Airflow, Pandas):
Trigger:
- Manual:
gh workflow run ci.yml - Label: Add
full-testlabel to pull request
What it does:
# Installs all dependency groups
uv sync --group dev --group data --group notebook
# Runs full test suite (may take longer)
uv run poe test-cov
Use case: Validate changes to data pipelines, ML models, or when adding new dependencies to data or notebook groups.
Release Automation¶
DataEngineX Releases¶
Workflow: .github/workflows/release-dataenginex.yml
Trigger: release: types: [published] โ fires when release-please creates a GitHub Release
What it does:
- Generates CycloneDX SBOM for the release
- Attaches
sbom-dataenginex-{version}.jsonto the GitHub Release
How to release DataEngineX:
Releases are fully automated via release-please. Push conventional commits to main; release-please creates the Release PR and tag.
# Monitor release-please PR
gh pr list --label "autorelease: pending"
# After merging the Release PR, monitor post-release workflows
gh run list --workflow=pypi-publish.yml --limit 5
gh run list --workflow=release-dataenginex.yml --limit 5
PyPI Publishing¶
Workflow: .github/workflows/pypi-publish.yml
Trigger: GitHub release published (from release-dataenginex.yml)
What it does:
- Receives GitHub release event from DataEngineX release
- Detects if files under
src/dataenginex/actually changed since previousv{version}tag - If changes found:
- Builds wheel distributions
- Publishes to TestPyPI (dry-run)
- Promotes to PyPI (stable semver tags only, not pre-release)
- If no changes: skips publishing with informational message
Publish gates:
- Only publishes if code actually changed (not just version bump in other files)
- TestPyPI first for dry-run verification
- PyPI promotion requires stable semver tag:
vMAJOR.MINOR.PATCH(notv1.2.3-rc1) - Pre-release tags: publish to TestPyPI only
Automatic flow:
conventional commits โ main โ release-please Release PR โ merge โ v{version} tag + GitHub Release โ pypi-publish.yml โ PyPI
Manual trigger (if needed):
Rollback Procedures¶
Rollback a PyPI Release¶
PyPI does not support deleting releases, but you can:
- Yank the release on PyPI (marks it as broken;
pip installavoids it by default):
- Publish a patch release with the fix:
# Bump version in pyproject.toml (e.g., 0.6.1)
git commit -m "fix: revert breaking change"
git push origin main
Rollback a Git Tag¶
# Delete tag locally and remotely
git tag -d v<version>
git push origin :refs/tags/v<version>
# Delete the GitHub release via gh CLI
gh release delete v<version> --yes
Pipeline Metrics¶
Build Times¶
- CI (Lint + Test): ~2 minutes
- Package validation: ~1 minute
- PyPI publish: ~2 minutes
Success Rates (Target)¶
- CI Pass Rate: >95%
- Release Success Rate: >99%
Monitoring¶
# Recent CI runs
gh run list --workflow ci.yml --limit 10
# Recent releases
gh run list --workflow release-dataenginex.yml --limit 10
# Failed builds
gh run list --workflow pypi-publish.yml --status failure
CI/CD Evolution¶
Current State โ ¶
- Automated CI with lint, test, type checks
- Security scanning (CodeQL, Semgrep)
- Automated PyPI release on version bump
- Package validation (wheel + twine check)
- GitHub Pages documentation deployment
Future Enhancements ๐¶
- E2E smoke tests: Post-release validation (install from PyPI and run examples)
- SonarCloud integration: Code quality gates
- Slack notifications: Release status updates
- Release notes: Auto-generated from commits
- Canary releases: TestPyPI smoke test before PyPI promotion
Troubleshooting¶
CI Fails with Lint Errors¶
PyPI Publish Not Triggering¶
- Verify version bump is in root
pyproject.toml(not elsewhere) - Confirm push was to
mainbranch (notdev) - Check
release-dataenginex.ymlran and created a GitHub release - View workflow logs:
gh run list --workflow pypi-publish.yml
Package Build Fails¶
# Build locally to diagnose
uv build
twine check dist/*
# Verify pyproject.toml metadata
uv run python -c "import dataenginex; print(dataenginex.__version__)"
Best Practices¶
Development Workflow¶
- Create feature branch from
dev - Develop and test locally
- Run quality checks before committing:
uv run poe lint,uv run poe typecheck,uv run poe test - Create PR targeting
dev - Wait for CI to pass
- Get code review approval
- Merge to dev โ integration testing
- Create release PR from
devโmain - Merge to main โ bump version if releasing
Commit Messages¶
Use conventional commits for clarity:
feat: add new endpoint for data processing
fix: resolve memory leak in pipeline
chore: update dependencies
docs: improve deployment runbook
test: add integration tests for API
PR Guidelines¶
- Keep PRs small: \<500 lines of code
- Single purpose: One feature/fix per PR
- Test coverage: Include tests for new code
- Documentation: Update docs for API changes
Related Documentation¶
Next Steps:
- Deployment Runbook (in
infradexrepo) - Release procedures - Observability - Monitor applications built on DEX
- Contributing Guide - Development workflow
Quick Reference¶
Workflows Overview¶
| Workflow | Trigger | Purpose | File |
|---|---|---|---|
| CI | push main/dev, PRs to main/dev |
Lint, test, type-check | ci.yml |
| Security | push main/dev, PRs to main/dev |
CodeQL + Semgrep scans | security.yml |
| Release Please | push main |
Create/update Release PR with version bump + CHANGELOG | release-please.yml |
| Release DataEngineX | GitHub release published | Generate + attach CycloneDX SBOM | release-dataenginex.yml |
| PyPI Publish | GitHub release published | Detect changes + publish to TestPyPI/PyPI | pypi-publish.yml |
Local Commands¶
# Local development
uv lock
uv sync
uv run poe test
uv run poe lint
# Local with all dependencies (data + notebook)
uv sync --group data --group notebook
uv run poe test-cov
# Create PR
gh pr create --title "feat: add feature" --body "Description"
# Trigger optional integration tests
gh pr edit <pr-number> --add-label full-test
# Check CI status
gh pr checks <pr-number>
# Monitor CI
gh run list --workflow ci.yml
gh run view <run-id> --log
# Manual PyPI publish
gh workflow run pypi-publish.yml -f tag=v<version>
# Promote to production (dev โ main PR)
./scripts/promote.sh