CI/CD Pipeline¶
Complete guide to DataEngineX continuous integration and deployment automation.
Quick Links: CI Workflow ยท CD Workflow ยท Troubleshooting ยท Quick Reference
๐ Table of Contents¶
- Overview
- Origin: Dual-Project Structure
- Continuous Integration (CI)
- Continuous Deployment (CD)
- Release Automation (Matrix Approach)
- PyPI Publishing
- Deployment Flow
- GitOps with ArgoCD
- Image Promotion Strategy
- Rollback Procedures
- Pipeline Metrics
- CI/CD Evolution
- Troubleshooting
- Best Practices
- Related Documentation
- Quick Reference
Overview¶
DEX uses a GitOps-based CI/CD pipeline with: - CI: Automated testing, linting, and security scanning on every PR - CD: Automated Docker builds and deployment manifest updates - ArgoCD: GitOps-based continuous deployment to Kubernetes
graph LR
Dev[Developer] --> PR[Create PR]
PR --> CI[CI: Lint/Test/Security]
CI --> Review[Code Review]
Review --> MergeDev[Merge to dev]
Review --> MergeMain[Merge to main]
MergeDev --> BuildDev[CD: Build Image]
BuildDev --> UpdateDev[CD: Update dev manifest]
UpdateDev --> ArgoDev[ArgoCD: Sync dex-dev]
MergeMain --> BuildMain[CD: Build Image]
BuildMain --> UpdateMain[CD: Update prod manifest]
UpdateMain --> ArgoMain[ArgoCD: Sync dex]
style CI fill:#e1f5ff
style BuildDev fill:#fff3cd
style BuildMain fill:#fff3cd
style ArgoDev fill:#d4edda
style ArgoMain fill:#d4edda
Project Structure¶
DEX is dual-project:
| Component | Location | Purpose | Release |
|---|---|---|---|
| DataEngineX (core) | packages/dataenginex/ |
Core framework (API, middleware, storage) | PyPI (independently versioned) |
| CareerDEX (app) | src/careerdex/ |
Job matching application | Docker app (versioned with root pyproject.toml) |
Unified Testing¶
The root pyproject.toml orchestrates all tests:
- Imports dataenginex>=0.4.0 as a dependency (editable path: packages/dataenginex)
- Defines app package build target under [tool.hatch.build.targets.wheel] packages = ["src/careerdex"]
- Declares dependency groups: dev (required), data (PySpark/Airflow), notebook (pandas)
CI workflow (ci.yml) runs both projects together in a single pipeline:
- lint-and-test job: uv sync + poe lint/test-cov (tests both dataenginex + careerdex with dev deps only)
- integration-test job (optional, label/dispatch): uv sync --group data --group notebook (full stack)
Separate Validation¶
- Package validation (
package-validation.yml): Runs on every push tomain/devand package-related PR changes โ builds wheel + twine check (CD dependency gate) - Release automation (matrix):
release-dataenginex.yml: Watchespackages/dataenginex/pyproject.tomlfor version changes โ createsdataenginex-vX.Y.Ztag + releaserelease-careerdex.yml: Watches rootpyproject.tomlfor version changes โ createscareerdex-vX.Y.Ztag + release- PyPI publishing (
pypi-publish.yml): Triggered by DataEngineX release โ detects changes inpackages/dataenginex/since last tag โ publishes to PyPI
Continuous Integration (CI)¶
Workflow: .github/workflows/ci.yml
Triggers:
- Push to main or dev branches
- Pull requests targeting main or dev
Jobs:
1. Lint and Test¶
Runs code quality checks and test suite:
Requirements: All checks must pass before merge
2. Security Scans¶
Runs in parallel via .github/workflows/security.yml:
- CodeQL: Static analysis for security vulnerabilities
- Semgrep: OWASP Top 10 and best practice checks
Results: Available in GitHub Security tab
3. Integration Test (Optional)¶
Optional job for full dependency coverage (PySpark, Airflow, Pandas):
Trigger:
- Manual: gh workflow run ci.yml
- Label: Add full-test label to pull request
What it does:
# Installs all dependency groups
uv sync --group dev --group data --group notebook
# Runs full test suite (may take longer)
uv run poe test-cov
Use case: Validate changes to data pipelines, ML models, or when adding new dependencies to data or notebook groups.
Continuous Deployment (CD)¶
Workflow: .github/workflows/cd.yml
Trigger: workflow_run on main/dev after required upstream workflows complete successfully for the same commit SHA (Continuous Integration, Security Scans, Package Validation)
Jobs:
1. Build and Push Docker Image¶
Builds immutable Docker image with SHA tag:
# Image naming convention
ghcr.io/thedataenginex/dex:sha-<8-char-commit-sha>
# Example
ghcr.io/thedataenginex/dex:sha-a1b2c3d4
Tags Applied:
- sha-XXXXXXXX - Immutable SHA tag (always)
- v<project_version> - Semantic release tag for main branch builds only
- latest - Latest main branch build (main only)
- dev - Moving tag for dev branch builds only
Registry: GitHub Container Registry (ghcr.io)
Build Cache: GitHub Actions cache for faster builds
2. Update Dev Manifest (dev branch only)¶
Automatically updates dev environment when changes merge to dev:
# Updates: infra/argocd/overlays/dev/kustomization.yaml
images:
- name: thedataenginex/dex
newTag: sha-a1b2c3d4 # โ Updated by CD
Commit Message: chore: update dev image to sha-XXXXXXXX [skip ci]
Result: ArgoCD detects change and syncs dex-dev namespace
3. Update Prod Manifest (main branch only)¶
Automatically updates prod when changes merge to main:
# Updates:
# - infra/argocd/overlays/prod/kustomization.yaml
images:
- name: thedataenginex/dex
newTag: sha-a1b2c3d4 # โ Updated by CD
Commit Message: chore: update main image to sha-XXXXXXXX [skip ci]
Result: ArgoCD syncs dex namespace
If protected branch rules reject direct push, CD falls back to creating a promotion PR (or issue when PR creation is not permitted), and reports deployment as pending manual approval rather than false success.
4. Security Scan¶
Runs Trivy vulnerability scan on built image:
Results: Uploaded to GitHub Security tab as SARIF report
Severity Thresholds: - CRITICAL: Block deployment (manual review required) - HIGH: Alert but allow deployment - MEDIUM/LOW: Informational
Release Automation (Matrix Approach)¶
DEX uses parallel, independent release workflows for each package:
DataEngineX Releases¶
Workflow: .github/workflows/release-dataenginex.yml
Trigger: Version change in packages/dataenginex/pyproject.toml on main branch
What it does:
1. Detects version bump in packages/dataenginex/pyproject.toml
2. Extracts version (e.g., 0.5.0)
3. Creates git tag: dataenginex-v0.5.0
4. Creates GitHub release โ automatically triggers pypi-publish.yml
5. Publishes to TestPyPI/PyPI
How to release DataEngineX:
# Update version in packages/dataenginex/pyproject.toml
version = "0.5.0"
# Commit and push
git add packages/dataenginex/pyproject.toml
git commit -m "chore: bump dataenginex to 0.5.0"
git push origin main
CareerDEX Releases¶
Workflow: .github/workflows/release-careerdex.yml
Trigger: Version change in root pyproject.toml on main branch
What it does:
1. Detects version bump in root pyproject.toml
2. Extracts version (e.g., 0.3.6)
3. Creates git tag: careerdex-v0.3.6
4. Creates GitHub release for the app
5. No PyPI publish (app is Docker-based, not a library)
How to release CareerDEX:
# Update version in root pyproject.toml
version = "0.3.6"
# Commit and push
git add pyproject.toml
git commit -m "chore: bump careerdex to 0.3.6"
git push origin main
PyPI Publishing¶
Workflow: .github/workflows/pypi-publish.yml
Trigger: GitHub release published (from release-dataenginex.yml)
What it does:
1. Receives GitHub release event from DataEngineX release
2. Detects if files under packages/dataenginex/ actually changed since previous dataenginex-vX.Y.Z tag
3. If changes found:
- Builds wheel distributions
- Publishes to TestPyPI (dry-run)
- Promotes to PyPI (stable semver tags only, not pre-release)
4. If no changes: skips publishing with informational message
Publish gates:
- Only publishes if code actually changed (not just version bump in other files)
- TestPyPI first for dry-run verification
- PyPI promotion requires stable semver tag: dataenginex-vMAJOR.MINOR.PATCH (not dataenginex-v1.2.3-rc1)
- Pre-release tags: publish to TestPyPI only
Automatic flow:
DataEngineX version bump โ release-dataenginex.yml โ GitHub release โ pypi-publish.yml โ PyPI
Manual trigger (if needed):
Deployment Flow¶
Dev Environment Flow¶
sequenceDiagram
participant Dev as Developer
participant GH as GitHub
participant CI as CI Pipeline
participant CD as CD Pipeline
participant GHCR as ghcr.io
participant Argo as ArgoCD
participant K8s as Kubernetes
Dev->>GH: Push to dev branch
GH->>CI: Trigger CI workflow
CI->>CI: Run tests, lint, security
CI-->>GH: โ CI passes
GH->>CD: Trigger CD workflow
CD->>GHCR: Build & push image (sha-XXXXXXXX)
CD->>GH: Commit/push dev kustomization.yaml update
GH->>Argo: Git change detected
Argo->>K8s: Sync dex-dev namespace
K8s-->>Argo: โ Sync complete
Prod Environment Flow¶
sequenceDiagram
participant Dev as Developer
participant GH as GitHub
participant CI as CI Pipeline
participant CD as CD Pipeline
participant GHCR as ghcr.io
participant Argo as ArgoCD
participant K8s as Kubernetes
Dev->>GH: Merge to main (promote from dev)
GH->>CI: Trigger CI workflow
CI->>CI: Run tests, lint, security
CI-->>GH: โ CI passes
GH->>CD: Trigger CD workflow
CD->>GHCR: Build & push image (sha-XXXXXXXX)
CD->>GH: Commit/push prod kustomization.yaml update
GH->>Argo: Git change detected
Argo->>K8s: Sync dex
K8s-->>Argo: โ Sync complete
GitOps with ArgoCD¶
ArgoCD Applications¶
# Dev application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: dex-dev
spec:
source:
repoURL: https://github.com/TheDataEngineX/DEX
targetRevision: dev # โ Tracks dev branch
path: infra/argocd/overlays/dev
destination:
namespace: dex-dev
syncPolicy:
automated:
prune: true
selfHeal: true
# Prod application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: dex
spec:
source:
repoURL: https://github.com/TheDataEngineX/DEX
targetRevision: main # โ Tracks main branch
path: infra/argocd/overlays/prod
destination:
namespace: dex
syncPolicy:
automated:
prune: true
selfHeal: true
Sync Policies¶
- Auto-sync: Enabled for all environments
- Self-heal: ArgoCD automatically corrects manual kubectl changes
- Prune: Removes resources deleted from git
Monitoring Deployments¶
# Watch ArgoCD sync status
argocd app get dex-dev
argocd app get dex
# View sync history
argocd app history dex-dev
# Manual sync (if needed)
argocd app sync dex-dev --prune
Image Promotion Strategy¶
Why SHA Tags?¶
- Immutable: Same image from dev โ prod
- Traceable: Links to exact git commit
- Auditable: Clear promotion history in git
- Rollback-friendly: Easy to revert to previous SHA
Promotion Flow¶
graph TD
Build[Build sha-a1b2c3d4] --> Dev[Deploy to Dev]
Dev --> DevTest{Dev Tests Pass?}
DevTest -->|No| DevFix[Fix Issues]
DevFix --> Build
DevTest -->|Yes| Prod[Promote to Prod]
Prod --> Monitor[Monitor Prod]
Manual Promotion (Dev โ Prod)¶
Use the promotion script to create a PR from dev into main:
# Branch promotion (dev โ main)
./scripts/promote.sh
# Or promote a specific image tag to prod
./scripts/promote.sh --image-tag sha-a1b2c3d4
# Auto-merge after checks pass
./scripts/promote.sh --auto-merge
Rollback Procedures¶
Quick Rollback (Dev)¶
# Find previous image
git log --oneline infra/argocd/overlays/dev/kustomization.yaml
# Revert to previous commit
git revert HEAD
git push origin dev
# ArgoCD auto-syncs to previous image
Controlled Rollback (Prod)¶
# 1. Identify last good image
LAST_GOOD="sha-xyz78901"
git log infra/argocd/overlays/prod/kustomization.yaml
# 2. Update to last good image
sed -i "s|newTag:.*|newTag: $LAST_GOOD|g" infra/argocd/overlays/prod/kustomization.yaml
# 3. Emergency commit to main
git add infra/argocd/overlays/prod/kustomization.yaml
git commit -m "fix: rollback prod to $LAST_GOOD"
git push origin main
# ArgoCD syncs within 3 minutes (or force sync)
argocd app sync dex
Emergency Manual Rollback¶
If ArgoCD is unavailable:
# Direct kubectl update
kubectl set image deployment/dex dex=ghcr.io/thedataenginex/dex:sha-xyz78901 -n dex
kubectl rollout status deployment/dex -n dex
# Update git to match (after recovery)
Pipeline Metrics¶
Build Times¶
- CI (Lint + Test): ~2 minutes
- Docker Build: ~3 minutes (with cache)
- ArgoCD Sync: ~30 seconds
Total Dev Deployment: ~6 minutes from merge
Success Rates (Target)¶
- CI Pass Rate: >95%
- CD Success Rate: >99%
- Deployment Success Rate: >99%
Monitoring¶
# Recent CI runs
gh run list --workflow ci.yml --limit 10
# Recent deployments
argocd app history dex-dev --limit 10
# Failed builds
gh run list --workflow cd.yml --status failure
Troubleshooting¶
CI Fails with Lint Errors¶
Image Not Building¶
# Check CD workflow logs
gh run view --log
# Verify Docker build locally
docker build -t dex:local .
# Check registry authentication
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin
ArgoCD Not Syncing¶
# Check application status
argocd app get dex-dev
# View recent sync errors
argocd app get dex-dev --refresh
# Force sync
argocd app sync dex-dev --prune --force
# Check git repo connection
argocd repo list
Image Not Updating in Kubernetes¶
# Verify image in kustomization
cat infra/argocd/overlays/dev/kustomization.yaml
# Check if ArgoCD sees the change
argocd app diff dex-dev
# Verify image exists in registry
docker pull ghcr.io/thedataenginex/dex:sha-XXXXXXXX
# Check pod image
kubectl get pod -n dex-dev -o jsonpath='{.items[0].spec.containers[0].image}'
Security Considerations¶
Image Scanning¶
- Pre-deployment: Trivy scan in CD pipeline
- Runtime: Falco monitors container behavior
- Registry: GHCR vulnerability scanning enabled
Secrets Management¶
- Never commit secrets to git
- Use Kubernetes Secrets for runtime config
- Rotate regularly: Database credentials, API keys
Supply Chain Security¶
- Signed commits: Required for prod deployments
- SBOM: Generated with each build
- Provenance: Image build attestation
Best Practices¶
Development Workflow¶
- Create feature branch from
dev - Develop and test locally
- Run quality checks before committing:
uv run poe lint,uv run poe typecheck,uv run poe test - Create PR targeting
dev - Wait for CI to pass
- Get code review approval
- Merge to dev โ Auto-deploys to dev environment
- Verify in dev environment
- Create release PR from
devโmain - Merge to main โ Auto-deploys to prod
Commit Messages¶
Use conventional commits for clarity:
feat: add new endpoint for data processing
fix: resolve memory leak in pipeline
chore: update dependencies
docs: improve deployment runbook
test: add integration tests for API
PR Guidelines¶
- Keep PRs small: <500 lines of code
- Single purpose: One feature/fix per PR
- Test coverage: Include tests for new code
- Documentation: Update docs for API changes
Deployment Safety¶
- Deploy during business hours (for prod)
- Monitor for 15 minutes after deployment
- Keep rollback plan ready
- Communicate in team channel before prod deploy
CI/CD Evolution¶
Current State โ ¶
- Automated CI with lint, test, type checks
- Automated CD with Docker builds
- GitOps deployment with ArgoCD
- Security scanning (CodeQL, Trivy, Semgrep)
- Automated dev deployments
- Automated prod manifest updates
Future Enhancements ๐¶
- Canary deployments: Gradual rollout to prod
- Blue-green deployments: Zero-downtime releases
- E2E smoke tests: Post-deployment validation
- Performance testing: Load tests in dev
- SonarCloud integration: Code quality gates
- Slack notifications: Deployment status updates
- Automated rollback: On health check failures
- Release notes: Auto-generated from commits
Related Documentation¶
Next Steps: - Deployment Runbook - Deploy and rollback procedures - Local K8s Setup - Kubernetes & ArgoCD setup - Observability - Monitor deployments
Related Topics: - SDLC Overview - Development lifecycle - Local K8s Setup - Test locally - Contributing Guide - Development workflow
Quick Reference¶
Workflows Overview¶
| Workflow | Trigger | Purpose | File |
|---|---|---|---|
| CI (Primary) | push main/dev, PRs to main/dev |
Lint, test, type-check (dev deps) | .github/workflows/ci.yml |
| CI (Integration) | PR label full-test or manual dispatch |
Full test (data + notebook groups) | .github/workflows/ci.yml |
| Security | push main/dev, PRs to main/dev |
CodeQL + Semgrep scans | .github/workflows/security.yml |
| Package | Changes to packages/dataenginex/** |
Build wheel + twine check (dataenginex only) | .github/workflows/package-validation.yml |
| CD | workflow_run after CI + Security + Package Validation succeed on main/dev |
Build Docker image, update GitOps manifests, verify deployment | .github/workflows/cd.yml |
| Release DataEngineX | Version change in packages/dataenginex/pyproject.toml on main |
Extract version, create dataenginex-vX.Y.Z tag + release |
.github/workflows/release-dataenginex.yml |
| Release CareerDEX | Version change in root pyproject.toml on main |
Extract version, create careerdex-vX.Y.Z tag + release |
.github/workflows/release-careerdex.yml |
| PyPI Publish | GitHub release (DataEngineX) published | Detect changes + publish dataenginex to TestPyPI/PyPI | .github/workflows/pypi-publish.yml |
Local Commands¶
# Local development
uv lock
uv sync
uv run poe test
uv run poe lint
# Local with all dependencies (data + notebook)
uv sync --group data --group notebook
uv run poe test-cov
# Create PR
gh pr create --title "feat: add feature" --body "Description"
# Trigger optional integration tests
gh pr edit <pr-number> --add-label full-test
# Check CI status
gh pr checks <pr-number>
# View CD logs
gh run list --workflow cd.yml
gh run view <run-id> --log
# Monitor deployments
argocd app get dex-dev
kubectl get pods -n dex-dev
kubectl logs -n dex-dev -l app=dex -f
# Promote to production
./scripts/promote.sh
# Rollback
git revert HEAD
git push origin dev # or main