Workflow Caching
CausalIQ Workflow uses SQLite-based caches to store step results, enabling conservative execution and reproducibility.
What is a Workflow Cache?
A workflow cache is a .db file containing:
- Entries: Individual results from workflow steps
- Metadata: Key-value pairs describing each entry (algorithm, parameters, metrics)
- Objects: Named data objects (graphs, tables, traces) stored with each entry
Caches are the primary mechanism for passing results between workflow steps and enabling conservative execution.
Cache Entries
Each entry in a cache has:
| Component | Description |
|---|---|
| Key | Matrix values that uniquely identify the entry |
| Metadata | Dictionary of properties (algorithm, scores, timestamps) |
| Objects | Named data items (e.g., graph, trace, summary) |
Entry Keys
Entries are keyed by their matrix values. For a workflow with:
matrix:
network: [asia, cancer]
sample_size: [100, 1000]
Each entry is identified by a unique {network, sample_size} combination.
Entry Metadata
Metadata is a flat dictionary stored with each entry. Actions add their results here:
{
"network": "asia",
"sample_size": 1000,
"algorithm": "pc",
"node_count": 8,
"edge_count": 8,
"f1_score": 0.857,
"evaluate_graph": {"completed": "2026-03-15T10:23:45"}
}
The presence of action-specific metadata (e.g., evaluate_graph) indicates
that action has been applied to the entry.
Entry Objects
Objects are named data items stored with an entry:
| Name | Format | Description |
|---|---|---|
graph |
GraphML | Learned or generated graph |
trace |
JSON | Algorithm iteration history |
summary |
JSON | Statistical summary table |
Objects are stored as content strings with a format identifier.
Conservative Execution
By default, workflows execute conservatively — skipping work that has already been completed:
- Create steps: Skip if entry with matching key exists
- Update steps: Skip if action metadata already present on entry
- Aggregate steps: Skip if output entry with matching key exists
This enables:
- Resumable workflows: Restart interrupted workflows without re-running completed steps
- Incremental updates: Add new analysis to existing results
- Efficient iteration: Modify workflow and re-run without starting from scratch
Bypassing Conservative Execution
Use --mode=force to re-run all steps regardless of existing results:
cqflow run workflow.yml --mode=force
Cache Files
Cache files are self-contained SQLite databases:
results/
├── graphs.db # Learned graphs from discovery
├── evaluated.db # Graphs with evaluation metrics
└── merged.db # Aggregated results
Cache Location
Specify cache paths relative to the workflow's root_dir:
root_dir: "/experiments/project-001"
steps:
- name: "Learn"
uses: "causaliq-discovery"
with:
action: "learn_structure"
output: "results/graphs.db" # → /experiments/project-001/results/graphs.db
Exporting and Importing
Caches can be exported to open formats for sharing and archival:
# Export to directory
cqflow export-cache -i results/graphs.db -o ./exported
# Export to zip
cqflow export-cache -i results/graphs.db -o results.zip
# Import from export
cqflow import-cache -i ./exported -o results/restored.db
Exported format uses:
- JSON for metadata
- GraphML for graph objects
- JSON for other objects
This enables interoperability with external tools and long-term archival in open formats.
Cache Schema Consistency
When using a cache across multiple workflow runs, the matrix dimensions must
remain consistent. Adding or removing matrix variables from a workflow that
writes to an existing cache will raise a MatrixSchemaError.
To change matrix dimensions, either:
- Use a new cache file
- Export, delete, and re-import the cache
- Delete the cache and regenerate
Python API
For programmatic cache access, see the Workflow Cache API.
from causaliq_workflow.cache import WorkflowCache
with WorkflowCache("results/graphs.db") as cache:
# Check if entry exists
key = {"network": "asia", "sample_size": 1000}
if cache.exists(key):
entry = cache.get(key)
print(entry.metadata["f1_score"])