Skip to content

CausalIQ Workflow - Development Roadmap & Progress

Single source of truth for all development planning and progress tracking

Last updated: 2025-11-18

Current Status: Phase 1-2 Complete - Action Framework + Workflow Engine [99% COMPLETE]

🎯 Major Achievement: Complete Action Registry System + Dynamic Plugin Discovery

Key Breakthrough: We've successfully implemented the complete action registry system with automatic discovery, enabling a true plugin ecosystem for causal discovery workflows. The registry provides zero-configuration action discovery, automatic registration, and seamless integration with the workflow execution engine.

Latest Achievement: Complete ActionRegistry implementation with auto-discovery via import-time introspection, comprehensive test action package demonstrating the plugin pattern, and full integration with WorkflowExecutor for end-to-end action execution. The system now supports dynamic action loading from external packages without circular dependencies.

Action Registry Highlights: - ✅ Zero-configuration discovery - Actions automatically discovered via import-time introspection using 'CausalIQAction' convention - ✅ Plugin architecture - Complete test action package demonstrating external action development patterns - ✅ Seamless integration - WorkflowExecutor now executes actions via ActionRegistry with full parameter mapping - ✅ Comprehensive validation - Registry validates action availability and provides detailed error reporting - ✅ Production-ready pattern - External packages export 'CausalIQAction' class and become immediately available in workflows - ✅ Complete documentation - Registry API documentation with usage examples and architecture notes - ✅ 100% test coverage maintained - Registry and integration fully tested with comprehensive edge case coverage

Implementation Highlights: - ✅ Action framework foundation - Abstract base classes with type-safe input/output specifications - ✅ GraphML format adoption - Design decision for causal graph representation (DAGs, PDAGs, CPDAGs, MAGs, PAGs) - ✅ Matrix variable architecture - Schema support for parameterized experiments - ✅ GitHub Actions-inspired syntax - Familiar workflow patterns with schema validation - ✅ WorkflowExecutor class - 99-line implementation featuring YAML workflow parsing, matrix expansion, and comprehensive validation

Phase 1 Features (Month 1): Action Framework Foundation ✅ 100% Complete

✅ Foundation Infrastructure [COMPLETED]

  • [x] Testing framework - Comprehensive pytest setup covering unit, functional, integration (112/112 tests passing)
  • [x] CI/CD workflow - GitHub Actions workflow with linting, formatting, type checking
  • [x] Code quality - Black, isort, flake8, MyPy integration with 100% compliance
  • [x] Documentation structure - MkDocs integration with restructured API documentation
  • [x] API documentation - Comprehensive 6-page API reference with Google-style docstrings
  • [x] Development environment - Complete workspace setup with proper tooling
  • [x] Configuration foundation - JSON Schema-based workflow validation established
  • [x] Test policy compliance - Function-based test structure with single-line comments

✅ Action Framework [COMPLETED]

  • [x] Action base classes - Abstract Action class with type-safe input/output specifications
  • [x] Error handling - ActionExecutionError and ActionValidationError with comprehensive context
  • [x] Input/output specification - ActionInput dataclass for type hints and validation
  • [x] Reference implementation - DummyStructureLearnerAction demonstrating framework patterns
  • [x] GraphML format decision - Adopted GraphML as standard for causal graph representation
  • [x] Matrix variable support - Actions receive dataset, algorithm, and parameter inputs

✅ Workflow Schema Integration [COMPLETED]

  • [x] GitHub Actions-inspired syntax - Familiar workflow patterns adapted for causal discovery
  • [x] Matrix strategy support - Parameterized experiments with matrix variable expansion
  • [x] Path construction fields - data_root, output_root, id fields for organizing experiment outputs
  • [x] Action parameters - with blocks for passing parameters to actions
  • [x] Schema validation - JSON Schema validation with comprehensive error reporting

Phase 2 Features (Current): Workflow Execution Engine [95% Complete]

✅ CI-Style Workflow Engine [COMPLETED]

  • [x] WorkflowExecutor class - Complete 99-line implementation with comprehensive testing (112 total tests, 100% coverage)
  • [x] Workflow parser - Parse GitHub Actions-style YAML workflows with schema validation
  • [x] Matrix expansion - Convert matrix variables into individual experiment jobs using cartesian product
  • [x] Path construction - Dynamic file path generation from matrix variables with flexible templating
  • [x] Schema validation - JSON Schema validation with corrected $schema/$id fields and required id/description
  • [x] Error handling - Comprehensive validation and parsing error management
  • [x] Template variable system - Full template validation with context checking and error reporting

✅ Documentation Infrastructure [COMPLETED]

  • [x] API restructure - Separated API documentation into focused, navigable pages
  • [x] Google-style docstrings - Complete class variable documentation for comprehensive API coverage
  • [x] Cross-linking - Proper navigation between API sections with back/forward links
  • [x] Usage examples - Comprehensive examples covering basic to advanced usage patterns
  • [x] MkDocs integration - Updated navigation structure with proper page organization
  • [x] CI integration - Documentation builds without warnings or broken links

🔄 Research Reproducibility Platform [STREAMLINED 3-COMMIT APPROACH]

Architectural Focus: Optimized path to working CLI with external actions. Start with dynamic action discovery and build incrementally.

🔄 Research Reproducibility Platform [STREAMLINED 3-COMMIT APPROACH]

Architectural Focus: Optimized path to working CLI with external actions. Start with dynamic action discovery and build incrementally.

Commit 1: Template Variable ValidationCOMPLETED - [x] Template extraction - Parse {{variable}} patterns from action parameters - [x] Context validation - Verify template variables exist in matrix + workflow properties - [x] Error reporting - Clear errors for unknown/malformed template variables - [x] Comprehensive tests - Cover valid, invalid, and malformed template scenarios

Commit 2: Documentation & Test InfrastructureCOMPLETED - [x] API documentation restructure - Separated into focused pages (Actions, Registry, Workflow, Schema, CLI, Examples) - [x] Google-style docstrings - Complete class variable documentation for API generation - [x] Test policy compliance - Converted major test files to function-based structure with single-line comments - [x] 100% test coverage - Maintained comprehensive coverage across 112 tests - [x] MkDocs integration - Updated navigation and cross-linking structure - [x] Documentation quality - Eliminated broken links and warnings

Commit 3: Action Registry & Step Execution EngineCOMPLETED - [x] ActionRegistry class - Centralized registry for dynamic action discovery via import-time introspection - [x] Dynamic discovery - Load actions from imported packages using convention-over-configuration - [x] Step executor - Complete integration with WorkflowExecutor for uses: action step execution - [x] Action execution - Full mapping of workflow with: blocks to action inputs with validation - [x] Error handling - Comprehensive action discovery and execution error management - [x] Plugin architecture - Zero-configuration plugin system with test action package demonstrating pattern

Commit 4: CLI Implementation & Mode Support 🔑 NEXT - [ ] causaliq-workflow command - Complete CLI with workflow file execution using ActionRegistry - [ ] Mode-based operation - --mode=dry-run|run|compare for validation, execution, and testing - [ ] Parameter injection - CLI parameters available as template variables via WorkflowExecutor - [ ] CLI error handling - User-friendly error reporting for workflow and action failures - [ ] Workflow validation - Pre-execution validation with clear error messages via ActionRegistry

Commit 5: External Package Integration & Demo - [ ] causaliq-discovery package - Simple structure learning action (PC algorithm) - [ ] Entry point registration - Dynamic discovery working with external package - [ ] End-to-end workflow - Complete example: CLI → ActionRegistry → External action → Results - [ ] Real algorithm execution - PC structure learning with actual data processing - [ ] Output standardization - GraphML files and standardized result formats

Milestone Achievement: After these 5 commits, CausalIQ Workflow will support: - ✅ Complete documentation infrastructure - Comprehensive API reference with proper navigation - ✅ Template variable validation - Full context checking and error reporting - ✅ Test infrastructure compliance - Function-based test structure with 100% coverage - ✅ Dynamic action discovery - Load actions from external packages via import-time introspection - [ ] Complete CLI interface - Full causaliq-workflow command with mode support and parameter injection - [ ] External action execution - Real structure learning via causaliq-discovery package - [ ] Conservative execution - Skip work if outputs exist, enabling safe workflow restarts
- [ ] Research reproducibility foundation - Ready for causaliq-papers integration

🔮 Future Enhancements [When Proven Necessary]

Parallel Jobs Support: Add jobs: syntax and parallel execution when performance demands require it DASK Integration: Step-level parallelization for computationally intensive actions
Formal Parameter Schemas: Optional workflow input definitions for enhanced validation when workflows become complex - ✅ DASK-powered task parallelization within steps - ✅ Workflow composition via calling with parameters - ✅ Intelligent action optimization with dry-run and caching - ✅ Research reproducibility platform foundation for causaliq-papers integration - [ ] Data file handling - Read actual CSV datasets and produce results - [ ] Algorithm parameters - Support real algorithm configuration options

Milestone Achievement: After these 8 commits, CausalIQ Workflow will support: - ✅ Parallel job execution with dependency management - ✅ DASK-powered task parallelization within steps - ✅ Workflow composition via calling with parameters - ✅ Intelligent action optimization with dry-run and caching - ✅ Research reproducibility platform foundation for causaliq-papers integration

🔮 Research Reproducibility Ecosystem [FUTURE - Integration with causaliq-papers]

Vision: CausalIQ Workflow serves as the execution engine for a comprehensive research reproducibility platform.

causaliq-papers Integration Architecture:

# High-level research reproducibility workflow
causaliq-papers replicate peters2023causal --target=figure3

# causaliq-papers processes paper dependencies and generates:
├── workflow-dependencies.yml    # Analyzes what's needed for figure3
├── optimized-reproduction.yml   # Generates minimal workflow-of-workflows
└── execution-plan.json         # Dependency graph for execution

# Then calls causaliq-workflow to execute:
causaliq-workflow run optimized-reproduction.yml --target=figure3

Workflow-of-Workflows Pattern: - Paper reproduction = Top-level workflow calling component workflows - Dependency resolution = causaliq-papers analyzes workflow graph to minimize execution - Asset targeting = Generate only requested paper assets (tables, figures, results) - Intelligence integration = Actions optimize across the entire workflow graph

⏸️ Algorithm Integration [FUTURE - After Working Workflow]

  • [ ] Advanced algorithms - Additional causal discovery algorithms beyond PC/GES
  • [ ] Package plugins - bnlearn (R), Tetrad (Java), causal-learn (Python) integration
  • [ ] Cross-language bridges - rpy2, py4j integration for R/Java algorithm access
  • [ ] Algorithm benchmarking - Systematic comparison across algorithm implementations

Success Metrics - Phase 1 ✅ + Phase 2 ✅

  • Framework Foundation: Action framework with type-safe interfaces implemented
  • Schema Architecture: GitHub Actions-inspired workflow syntax with matrix support
  • Reference Implementation: DummyStructureLearnerAction proving framework viability
  • Format Decision: GraphML adopted as standard for causal graph representation
  • Workflow Parsing: Complete WorkflowExecutor with YAML parsing and matrix expansion
  • Path Construction: Dynamic file path generation from matrix variables
  • Schema Validation: Corrected JSON Schema with proper $id field and field requirements
  • Test Coverage: 100% coverage maintained across 112 comprehensive tests
  • Documentation Infrastructure: Complete API reference with structured navigation
  • Template System: Full template variable validation and context checking
  • Code Quality: Policy-compliant test structure with function-based organization
  • Action Registry: Complete plugin architecture with auto-discovery and validation
  • Plugin System: Zero-configuration action packages with production-ready patterns

Next Milestone: Functional Causal Discovery Workflow

Target: Complete working workflow capable of executing real causal discovery experiments Success Criteria: - Execute complete workflows from command line - Support real structure learning algorithms (PC, GES) - Handle matrix expansion with parallel step execution
- Generate organized experimental outputs with GraphML graphs - Maintain 100% test coverage and CI compliance - Comprehensive API documentation with proper navigation

Timeline: 2 focused commits remaining to transition from framework to working research tool - ✅ Required/optional section validation per pattern - ✅ Hierarchical field validation with detailed error reporting - ✅ Flexible validation schemas defined in external YAML - [x] Flexible workflow patterns - 5 patterns supporting diverse research needs - ✅ Series pattern for comparative research (algorithm comparison across datasets/parameters) - ✅ Task pattern for sequential operations (preprocessing → algorithm → analysis)
- ✅ Mixed pattern combining multiple approaches - ✅ Workflow pattern for DAG-based workflows with dependencies - ✅ Longitudinal_research pattern for temporal causal discovery studies - [ ] Configuration inheritance - Create workflows based on templates with overrides

✅ CI-Style Workflow Engine [COMPLETED]

  • [x] Workflow parser - Parse GitHub Actions-style YAML workflows
  • [x] Matrix expansion - Convert matrix variables into individual experiment jobs
  • [x] Path construction - Dynamic file path generation from matrix variables
  • [x] Schema validation - JSON Schema validation with required id/description fields
  • [x] WorkflowExecutor class - Complete 99-line implementation with comprehensive testing
  • [ ] Step execution - Execute workflow steps with action-based architecture
  • [ ] Environment management - Handle workflow environment variables and context
  • [ ] Conditional execution - Support if: conditions in workflow steps
  • [ ] Artifact handling - Manage inputs/outputs between workflow steps

⏸️ DASK Task Graph Integration [PENDING]

  • [ ] Matrix job expansion - Convert matrix configs into DASK task graphs
  • [ ] Dependency management - Handle job dependencies with DASK
  • [ ] Local cluster management - Setup and manage local DASK clusters
  • [ ] Progress monitoring - Track workflow execution with real-time updates
  • [ ] Resource estimation - Estimate compute requirements for planning

⏸️ Configuration Migration [PENDING]

  • [ ] CI workflow validation - Ensure CI workflows validate correctly
  • [ ] Documentation update - Update all docs to reflect CI workflow approach

Phase 2 Features (Month 2): Research Integration [NOT STARTED]

⏸️ Algorithm Package Integration

  • [ ] R bnlearn integration - Execute R bnlearn algorithms via rpy2
  • Matrix-driven algorithm selection: algorithm: ["pc", "iamb", "gs"]
  • [ ] Java Tetrad integration - Integration with Java-based Tetrad via py4j
  • Cross-language workflow steps with data serialization
  • [ ] Python causal-learn - Direct integration with Python algorithms
  • Native Python execution within workflow steps
  • [ ] Package discovery - Automatic detection of available packages
  • [ ] Dependency validation - Check required packages before workflow execution

⏸️ Dataset Management with CI Patterns

  • [ ] Zenodo integration - Dataset download as workflow action
  • uses: zenodo-download@v1 action pattern
  • [ ] Dataset caching - Local storage and reuse with cache actions
  • [ ] Matrix dataset expansion - Multiple datasets in workflow matrix
  • matrix: {dataset: ["asia", "sachs"], sample_size: [100, 1000]}
  • [ ] Dataset transformations - Preprocessing steps as workflow actions

⏸️ Advanced Matrix Workflows

  • [ ] Cross-product expansion - Full matrix combinations with intelligent batching
  • [ ] Conditional matrices - Include/exclude matrix combinations based on conditions
  • [ ] Matrix job dependencies - Sequential and parallel matrix job orchestration
  • [ ] Result aggregation - Collect and combine results across matrix jobs

⏸️ LLM Integration as Actions

  • [ ] Model averaging action - LLM-guided model averaging as reusable action
  • [ ] Hypothesis generation - LLM analysis steps in workflow
  • [ ] Result interpretation - LLM post-processing actions
  • [ ] Research workflow templates - Pre-built workflows for common research patterns

Phase 3 Features (Month 3): Production CI Features [NOT STARTED]

⏸️ Advanced Workflow Management

  • [ ] Workflow queuing - Manage multiple concurrent workflows like CI runners
  • [ ] Pause/resume - Interrupt and restart workflows with state preservation
  • [ ] Workflow artifacts - Persistent storage and retrieval of workflow outputs
  • [ ] Workflow caching - Cache intermediate results for faster re-runs
  • [ ] Branch/PR workflows - Different workflows for different experiment branches

⏸️ Enterprise CI Features

  • [ ] Secrets management - Secure handling of API keys and credentials
  • [ ] Environment isolation - Containerized execution environments
  • [ ] Resource limits - CPU, memory, and time limits per workflow/job
  • [ ] Approval workflows - Human approval steps for expensive experiments
  • [ ] Scheduled workflows - Cron-style scheduled execution

⏸️ Monitoring and Observability

  • [ ] Workflow status dashboard - Real-time workflow execution monitoring
  • [ ] Job logs and traces - Detailed logging with searchable history
  • [ ] Performance metrics - Resource usage, timing, and efficiency tracking
  • [ ] Alert integration - Notifications for workflow success/failure
  • [ ] Audit trail - Complete execution history for reproducibility

⏸️ Results and Artifacts

  • [ ] Standardized outputs - Replace pickle files with structured formats
  • [ ] Version tracking - Track algorithm versions and parameter changes
  • [ ] Result comparison - Compare outputs across workflow runs
  • [ ] Export capabilities - Multiple output formats (CSV, JSON, HDF5)
  • [ ] Reproducibility metadata - Complete metadata for result reproduction

Success Criteria by Phase

Phase 1 Success Metrics

  • [ ] Execute GitHub Actions-style YAML workflows locally
  • [ ] Matrix expansion generates individual causal discovery jobs
  • [ ] Package-level algorithm integration (bnlearn, Tetrad, causal-learn)
  • [ ] DASK task graph execution with progress monitoring
  • [ ] Jinja2 template processing for workflow variables

Phase 2 Success Metrics

  • [ ] Multi-language workflows (R, Java, Python) in single configuration
  • [ ] Automatic dataset download and matrix expansion across datasets
  • [ ] LLM integration actions for model averaging and analysis
  • [ ] Advanced matrix workflows with conditional execution
  • [ ] Research workflow templates for common causal discovery patterns

Phase 3 Success Metrics

  • [ ] Production-grade workflow queue management
  • [ ] Enterprise features: secrets, isolation, limits, approvals
  • [ ] Comprehensive monitoring dashboard with real-time status
  • [ ] Standardized result formats with complete reproducibility metadata
  • [ ] Foundation ready for large-scale research deployment

Post Three-Month Features (Research Phase)

Q2 2026: Advanced Research Features

  • Workflow marketplace - Sharing and discovering research workflow templates
  • Interactive notebooks - Jupyter integration with workflow execution
  • Publication workflows - Generate reproducible research outputs automatically
  • Domain knowledge integration - Expert knowledge as workflow conditions

Q3-Q4 2026: Migration and Scale

  • Multi-machine execution - Distributed workflows across compute clusters
  • Cloud provider integration - AWS, GCP, Azure workflow runners
  • GPU acceleration - Support for GPU-accelerated algorithms
  • Web interface - Browser-based workflow designer and monitor

Beyond 2026: Advanced Capabilities

  • Workflow orchestration - Complex multi-stage research workflows
  • Real-time collaboration - Multiple researchers on shared workflows
  • AI-assisted optimization - Automated hyperparameter and workflow tuning
  • Integration ecosystem - Plugins for major research tools and platforms

This roadmap leverages the familiar GitHub Actions paradigm while building a powerful platform specifically designed for causal discovery research workflows.