Action Auto-Discovery Architecture

Overview

The action architecture provides reusable, automatically-discoverable workflow components following GitHub Actions patterns. Actions are zero-configuration plugins that become available immediately upon installation, with no registry files or manual setup required.

Auto-Discovery Action Framework

How Actions Are Found and Used

The Discovery Lifecycle

Installation Phase: Developer installs action package (pip install my-action)
Discovery Phase: Framework scans Python environment for action packages
Registration Phase: Actions are automatically registered by module name
Execution Phase: Workflows reference actions by name (uses: "my-action")

Convention-Based Action Definition

Actions follow a simple naming convention for automatic discovery:

# my_action/__init__.py - Must export class named 'CausalIQAction'
from causaliq_workflow.action import Action

class CausalIQAction(Action):  # Must be named 'CausalIQAction'
    name = "my-action"
    version = "1.0.0"
    description = "Performs custom analysis"

    def run(self, inputs):
        # Implementation here
        return {"status": "complete"}

Base Action Interface

from abc import ABC, abstractmethod
from typing import Dict, Any, Optional
from dataclasses import dataclass
import semantic_version

@dataclass
class ActionInput:
    """Define action input specification."""
    name: str
    description: str
    required: bool = False
    default: Any = None
    type_hint: str = "Any"

@dataclass
class ActionOutput:
    """Define action output specification."""
    name: str
    description: str
    value: Any

class Action(ABC):
    """Base class for all workflow actions."""

    # Action metadata
    name: str = ""
    version: str = "1.0.0"
    description: str = ""
    author: str = ""

    # Input/output specifications
    inputs: Dict[str, ActionInput] = {}
    outputs: Dict[str, str] = {}  # name -> description mapping

    @abstractmethod
    def run(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
        """Execute action with validated inputs, return outputs."""
        pass

    def validate_inputs(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
        """Validate and process input values."""
        validated = {}

        for input_name, input_spec in self.inputs.items():
            if input_spec.required and input_name not in inputs:
                raise ValueError(f"Required input '{input_name}' missing for action {self.name}")

            value = inputs.get(input_name, input_spec.default)
            validated[input_name] = value

        return validated

    def format_outputs(self, raw_outputs: Dict[str, Any]) -> Dict[str, ActionOutput]:
        """Format raw outputs with metadata."""
        formatted = {}

        for name, value in raw_outputs.items():
            description = self.outputs.get(name, f"Output from {self.name}")
            formatted[name] = ActionOutput(
                name=name,
                description=description,
                value=value
            )

        return formatted

Auto-Discovery Action Registry

The registry automatically discovers and manages actions without configuration:

import pkgutil
import importlib
from typing import Dict, Type, Any

class ActionRegistry:
    """Automatically discover and manage workflow actions."""

    def __init__(self):
        self._actions: Dict[str, Type[Action]] = {}
        self._discover_actions()  # Automatic discovery on initialization

    def _discover_actions(self):
        """Scan Python environment for action packages."""

        # Iterate through all importable modules
        for finder, module_name, ispkg in pkgutil.iter_modules():
            try:
                # Attempt to import the module
                module = importlib.import_module(module_name)

                # Check if module exports an 'Action' class
                if hasattr(module, 'Action'):
                    action_class = getattr(module, 'Action')

                    # Verify it's a proper Action subclass
                    if (isinstance(action_class, type) and 
                        issubclass(action_class, Action) and 
                        action_class != Action):

                        # Register using module name as action identifier
                        self._actions[module_name] = action_class

            except ImportError:
                # Skip modules that can't be imported
                continue

    def get_available_actions(self) -> Dict[str, Type[Action]]:
        """Return copy of available actions."""
        return self._actions.copy()

    def get_action_class(self, action_name: str) -> Type[Action]:
        """Get action class by name."""
        if action_name not in self._actions:
            raise ActionRegistryError(f"Action '{action_name}' not found. Available actions: {list(self._actions.keys())}")
        return self._actions[action_name]

How Discovery Works Step-by-Step

Registry Initialization: When ActionRegistry() is created, discovery starts automatically
Module Scanning: Uses pkgutil.iter_modules() to iterate through all Python modules
Safe Import: Attempts to import each module, skipping those that fail
Convention Check: Looks for a class named 'Action' in each module
Validation: Ensures the Action class inherits from the base Action class
Registration: Maps module name to Action class for workflow lookup

Action Package Development Workflow

Step 1: Create Standard Python Package

mkdir my_custom_action
cd my_custom_action

Step 2: Define Package Structure

my_custom_action/
├── pyproject.toml
├── my_custom_action/
│   └── __init__.py  # Must export 'Action' class
└── README.md

Step 3: Implement Action Convention

# my_custom_action/__init__.py
from causaliq_workflow.action import Action

class CausalIQAction(Action):  # Must be named 'CausalIQAction'
    name = "my-custom-action"
    version = "1.0.0" 
    description = "Custom analysis action"

    def run(self, inputs):
        # Action implementation
        result = self.perform_analysis(inputs['data'])
        return {"analysis_result": result}

Step 4: Install and Use Immediately

pip install my_custom_action
causaliq-workflow my-experiment.yml  # Action automatically available


## Auto-Discovery Action Examples

### Example 1: Simple Analysis Action

**Package: causaliq_analysis**

```python
# causaliq_analysis/__init__.py
from causaliq_workflow.action import Action
import pandas as pd
import networkx as nx

class CausalIQAction(Action):  # Auto-discovered by this name
    name = "causaliq-analysis"
    version = "1.0.0"
    description = "Basic causal graph analysis"

    def run(self, inputs):
        """Analyze causal graph structure."""
        graph_path = inputs['graph_path']
        graph = nx.read_graphml(graph_path)

        analysis = {
            "nodes": len(graph.nodes),
            "edges": len(graph.edges), 
            "density": nx.density(graph),
            "is_dag": nx.is_directed_acyclic_graph(graph)
        }

        return {"analysis": analysis}

Usage in Workflow:

steps:
  - name: "Analyze Graph"
    uses: "causaliq_analysis"  # Automatically discovered
    with:
      graph_path: "/results/learned_graph.xml"

Example 2: Data Loading Action

Package: causaliq_data

# causaliq_data/__init__.py  
from causaliq_workflow.action import Action
import pandas as pd
from pathlib import Path

class CausalIQAction(Action):
    name = "causaliq-data"
    version = "2.1.0"
    description = "Load and preprocess causal datasets"

    def run(self, inputs):
        """Load dataset with optional preprocessing."""
        dataset_name = inputs['dataset']
        sample_size = inputs.get('sample_size')

        # Load from standard datasets
        if dataset_name == "asia":
            data = self._load_asia_network()
        elif dataset_name == "cancer":
            data = self._load_cancer_network()
        else:
            # Load from file path
            data = pd.read_csv(dataset_name)

        # Apply sampling if requested
        if sample_size and sample_size < len(data):
            data = data.sample(n=sample_size, random_state=42)

        output_path = inputs['output_path']
        data.to_csv(f"{output_path}/data.csv", index=False)

        return {
            "data_path": f"{output_path}/data.csv",
            "rows": len(data),
            "columns": len(data.columns)
        }

Example 3: Algorithm Bridge Action

Package: causaliq_pc_algorithm

# causaliq_pc_algorithm/__init__.py
from causaliq_workflow.action import Action
import pandas as pd
import networkx as nx

class CausalIQAction(Action):
    name = "causaliq-pc-algorithm" 
    version = "1.5.2"
    description = "PC algorithm for causal structure learning"

    def run(self, inputs):
        """Execute PC algorithm."""
        data_path = inputs['data_path']
        alpha = inputs.get('alpha', 0.05)
        output_path = inputs['output_path']

        # Load data
        data = pd.read_csv(data_path)

        # Run PC algorithm (implementation details omitted)
        graph = self._execute_pc_algorithm(data, alpha)

        # Save results
        nx.write_graphml(graph, f"{output_path}/graph.xml")

        # Generate metadata
        metadata = {
            "algorithm": "pc",
            "alpha": alpha,
            "nodes": len(graph.nodes),
            "edges": len(graph.edges)
        }

        with open(f"{output_path}/metadata.json", 'w') as f:
            json.dump(metadata, f, indent=2)

        return {
            "graph_path": f"{output_path}/graph.xml",
            "metadata_path": f"{output_path}/metadata.json",
            "edge_count": len(graph.edges)
        }
            columns = dataset.columns.tolist()
            np.random.shuffle(columns)
            randomised = dataset[columns]
            transformation_log.append(f"Shuffled column order: {' -> '.join(columns)}")

        elif strategy == "subsample":
            subsample_size = min(len(dataset) // 2, 1000)
            randomised = dataset.sample(n=subsample_size).reset_index(drop=True)
            transformation_log.append(f"Subsampled to {subsample_size} rows")

        elif strategy == "bootstrap":
            randomised = dataset.sample(n=len(dataset), replace=True).reset_index(drop=True)
            transformation_log.append("Bootstrap resampling applied")

        else:
            raise ValueError(f"Unknown randomisation strategy: {strategy}")

        return {
            "randomised_dataset": randomised,
            "transformation_log": transformation_log
        }

Algorithm Execution Actions

import networkx as nx

class CausalDiscoveryAction(Action):
    """Execute causal discovery algorithm from various packages."""

    name = "causal-discovery"
    version = "1.0.0"
    description = "Run causal discovery algorithm with automatic package detection"

    inputs = {
        "algorithm": ActionInput("algorithm", "Algorithm name (pc, ges, lingam, etc.)", required=True),
        "package": ActionInput("package", "Algorithm package (bnlearn, tetrad, causal-learn, auto)", default="auto"),
        "data": ActionInput("data", "Input dataset", required=True),
        "parameters": ActionInput("parameters", "Algorithm-specific parameters", default={})
    }

    outputs = {
        "learned_graph": "Learned causal graph as NetworkX DiGraph",
        "algorithm_info": "Information about algorithm execution",
        "performance_metrics": "Execution time, memory usage, convergence info"
    }

    def run(self, inputs: Dict[str, Any]) -> Dict[str, Any]:
        """Execute causal discovery algorithm."""
        algorithm = inputs["algorithm"].lower()
        package = inputs["package"]
        data = inputs["data"]
        parameters = inputs["parameters"]

        # Auto-detect package if needed
        if package == "auto":
            package = self._detect_best_package(algorithm)

        # Execute algorithm
        start_time = time.time()

        if package == "bnlearn":
            learned_graph, algo_info = self._execute_bnlearn(algorithm, data, parameters)
        elif package == "tetrad":
            learned_graph, algo_info = self._execute_tetrad(algorithm, data, parameters)
        elif package == "causal-learn":
            learned_graph, algo_info = self._execute_causal_learn(algorithm, data, parameters)
        else:
            raise ValueError(f"Unsupported package: {package}")

        execution_time = time.time() - start_time

        performance_metrics = {
            "execution_time_seconds": execution_time,
            "algorithm": algorithm,
            "package": package,
            "num_variables": len(data.columns),
            "num_samples": len(data),
            "num_edges": learned_graph.number_of_edges()
        }

        return {
            "learned_graph": learned_graph,
            "algorithm_info": algo_info,
            "performance_metrics": performance_metrics
        }

    def _detect_best_package(self, algorithm: str) -> str:
        """Detect best available package for algorithm."""
        algorithm_packages = {
            "pc": ["bnlearn", "causal-learn", "tetrad"],
            "ges": ["causal-learn", "tetrad"], 
            "lingam": ["causal-learn"],
            "iamb": ["bnlearn"],
            "gs": ["bnlearn"]
        }

        preferred_packages = algorithm_packages.get(algorithm, ["causal-learn"])

        # Check availability and return first available
        for package in preferred_packages:
            if self._is_package_available(package):
                return package

        raise RuntimeError(f"No available package found for algorithm: {algorithm}")

    def _execute_bnlearn(self, algorithm: str, data: pd.DataFrame, 
                        parameters: Dict) -> tuple:
        """Execute algorithm using R bnlearn."""
        try:
            import rpy2.robjects as ro
            from rpy2.robjects import pandas2ri
        }

Benefits of Auto-Discovery Architecture

For Action Developers

Zero Configuration Setup

No registry management: No need to maintain configuration files or plugin registries
Standard Python patterns: Use familiar pyproject.toml, pip install, and package structure
Immediate availability: Actions become available as soon as the package is installed
Simple convention: Just export a class named 'Action' from the package

Development Workflow

Create package: Standard Python package with pyproject.toml
Implement action: Export 'Action' class following the interface
Test locally: pip install -e . for development testing
Publish: Standard PyPI publishing or GitHub releases
Use immediately: Actions available in all workflows without restart

For Workflow Authors

Seamless Integration

Familiar syntax: Uses standard GitHub Actions-style uses: "action-name"
No configuration: No need to declare or configure actions before use
Version management: Standard semantic versioning through package versions
Dependency handling: Python's pip handles all dependencies automatically

Ecosystem Growth

Organic discovery: New actions become available automatically
Community contributions: Easy for community to create and share actions
Quality assurance: Actions are regular Python packages with standard testing
Documentation: Standard Python documentation tools apply

For the Framework

Architectural Benefits

Reduced complexity: No registry files, configuration, or plugin management code
Robustness: Discovery failures don't break the system (graceful degradation)
Performance: Lazy loading and one-time discovery minimize overhead
Maintainability: Less framework code means easier maintenance

Ecosystem Integration

Standard distribution: Uses PyPI and standard Python packaging
Cross-platform: Works wherever Python works
Version compatibility: Standard semantic versioning for compatibility management
Testing integration: Actions can be tested with standard Python testing tools

Auto-Discovery Implementation Patterns

Cross-Language Bridges

Actions can bridge to R, Java, and other languages:

# causaliq_bnlearn/__init__.py
from causaliq_workflow.action import Action
import rpy2.robjects as ro

class CausalIQAction(Action):
    name = "causaliq-bnlearn"

    def __init__(self):
        # Initialize R environment once
        ro.r('library(bnlearn)')

    def run(self, inputs):
        # Bridge to R bnlearn package
        algorithm = inputs['algorithm']  # 'pc', 'gs', 'iamb', etc.
        ro.globalenv['data'] = inputs['data']
        ro.r(f'result <- {algorithm}(data)')
        return {"graph": self._convert_to_networkx(ro.r('result'))}

Algorithm Collections

Single packages can provide multiple related algorithms:

# causaliq_constraint_based/__init__.py
class CausalIQAction(Action):
    name = "causaliq-constraint-based"

    def run(self, inputs):
        algorithm = inputs['algorithm']

        if algorithm == 'pc':
            return self._run_pc(inputs)
        elif algorithm == 'fci':
            return self._run_fci(inputs)
        elif algorithm == 'cfci':
            return self._run_cfci(inputs)
        else:
            raise ValueError(f"Unknown algorithm: {algorithm}")

This auto-discovery architecture creates a vibrant, extensible ecosystem where actions can be developed, shared, and used with minimal friction while maintaining the robustness and reliability needed for scientific workflows.