Core Concepts

CausalIQ Workflow uses a declarative YAML syntax inspired by GitHub Actions. This section explains the fundamental building blocks.

Workflows

A workflow is a YAML file defining a sequence of steps to execute. Every workflow has a steps: section containing one or more steps.

# workflow.yml
steps:
  - name: "Merge Graphs"
    uses: "causaliq-analysis"
    with:
      action: "merge_graphs"
      input: "results/graphs.db"
      output: "results/merged.db"

Workflow-Level Properties

Property	Required	Description
`id`	No	Unique identifier for the workflow
`description`	No	Human-readable description
`root_dir`	No	Base directory for relative paths (default: current directory)
`steps`	Yes	List of steps to execute
`matrix`	No	Parameter matrix for expansion (see below)

Custom properties defined at the workflow level become available as template variables in step parameters.

Steps

Each step performs a single unit of work. Steps execute sequentially.

steps:
  - name: "Evaluate Graphs"
    uses: "causaliq-analysis"
    with:
      action: "evaluate_graph"
      input: "results/graphs.db"
      reference: "networks/asia/true.graphml"

Step Properties

Property	Required	Description
`name`	Yes	Human-readable step name
`uses`	Yes	CausalIQ package providing the action
`with`	Yes	Action parameters

The uses property specifies which CausalIQ package provides the action (e.g., causaliq-analysis, causaliq-discovery, causaliq-knowledge).

Actions

Actions are the reusable operations provided by CausalIQ packages. Each action has specific parameters documented in its package.

The action parameter within the with: block specifies which action to run:

with:
  action: "merge_graphs"    # Required: which action to perform
  input: "results/graphs.db"
  output: "results/merged.db"

Actions follow one of three patterns — create, update, or aggregate — which determine their input/output behaviour. See Action Patterns.

Matrix Expansion

The matrix feature runs steps across multiple parameter combinations, essential for comparative experiments.

matrix:
  network: [asia, cancer, alarm]
  sample_size: [100, 1000]

steps:
  - name: "Learn Structure"
    uses: "causaliq-discovery"
    with:
      action: "learn_structure"
      network: "{{network}}"
      sample_size: "{{sample_size}}"
      output: "results/graphs.db"

This creates 6 jobs (3 networks × 2 sample sizes), each with a unique combination of network and sample_size.

Matrix Behaviour

Each combination produces a separate execution
Matrix values are available as template variables
Results are stored with matrix values as cache keys

Null Values and Dimension Matching

A null value on either side (target or entry) means the dimension is not applicable and is treated as a wildcard — it always matches. A missing key on the entry side (dimension absent from the input cache) is also treated as a wildcard, so that caches with fewer dimensions can be consumed by broader matrices.

matrix:
  network: [asia, cancer]
  llm_model: [null]          # wildcard — match any llm_model
  sample_size: [1K, 10K]

Scenario	Target	Entry	Matches?
Target wildcard	`null`	any value	Yes
Entry wildcard	`"claude"`	`null`	Yes
Both null	`null`	`null`	Yes
Missing key in entry	`"1K"`	(absent)	Yes
Concrete match	`"asia"`	`"asia"`	Yes
Concrete mismatch	`"asia"`	`"alarm"`	No

!!! tip "Separate caches for separate sources" When aggregating entries from different sources (e.g. FGES and LLM PDGs), store them in separate caches rather than one shared cache with null dimensions. Use the list syntax for input: to read from multiple caches:

```yaml
input:
  - results/fges-pdgs.db
  - results/llm-pdgs.db
```

This avoids entries from one source unintentionally matching
targets intended for the other source via null wildcard matching.

Template Variables

Template variables use {{variable}} syntax to reference workflow properties and matrix values within step parameters.

matrix:
  network: [asia, cancer]

steps:
  - name: "Learn Structure"
    uses: "causaliq-discovery"
    with:
      action: "learn_structure"
      network: "{{network}}"
      dataset: "data/{{network}}_10k.csv"
      output: "results/graphs.db"

Variable Resolution Order

Variables are resolved in this order:

Workflow properties: id, description, custom workflow-level values
Matrix variables: Values from the current matrix combination
Entry metadata: For UPDATE pattern steps, values from cache entry metadata

Validation

Template variables are validated at parse time for CREATE and AGGREGATE patterns. Unknown variables cause a validation error:

WorkflowExecutionError: Unknown template variables: unknown_var
Available context: id, description, network

For UPDATE patterns, variables not in workflow context are deferred to runtime resolution from entry metadata.