Workflow Integration¶
This guide explains how to use CausalIQ Knowledge as part of automated CausalIQ Workflows for reproducible causal discovery experiments.
Overview¶
CausalIQ Knowledge integrates with
causaliq-workflow through
Python entry points. When both packages are installed, the causaliq-knowledge
action becomes automatically available in workflow files.
Installation¶
Or install from Test PyPI for development versions:
Quick Start¶
1. Create a Network Context¶
Create models/smoking.json:
{
"schema_version": "2.0",
"network": "smoking",
"domain": "epidemiology",
"variables": [
{
"name": "smoking",
"llm_name": "tobacco_use",
"type": "binary",
"short_description": "Patient smoking status"
},
{
"name": "lung_cancer",
"llm_name": "malignancy",
"type": "binary",
"short_description": "Lung cancer diagnosis"
},
{
"name": "genetics",
"llm_name": "genetic_risk",
"type": "categorical",
"states": ["low", "medium", "high"],
"short_description": "Genetic risk factors"
}
]
}
2. Create a Workflow File¶
Create workflow.yaml:
description: "Generate causal graph for smoking study"
id: "smoking-graph"
steps:
- name: "Generate Graph"
uses: "causaliq-knowledge"
with:
action: "generate_graph"
network_context: "models/smoking.json"
output: "results/smoking_graph.json"
llm_cache: "cache/smoking.db"
llm_model: "groq/llama-3.1-8b-instant"
3. Run the Workflow¶
# Validate without executing
causaliq-workflow workflow.yaml --mode dry-run
# Execute the workflow
causaliq-workflow workflow.yaml --mode run
Action Parameters¶
The causaliq-knowledge action supports the generate_graph operation:
| Parameter | Required | Default | Description |
|---|---|---|---|
action |
Yes | - | Must be generate_graph |
network_context |
Yes | - | Path to network context JSON file |
output |
Yes | - | Output: .json file path or none for stdout |
llm_cache |
Yes | - | Cache: .db file path or none to disable |
llm_model |
No | groq/llama-3.1-8b-instant |
LLM model identifier |
prompt_detail |
No | standard |
Detail level: minimal, standard, rich |
use_benchmark_names |
No | false |
Use original benchmark variable names |
llm_temperature |
No | 0.1 |
LLM temperature (0.0-2.0) |
LLM Model Identifiers¶
Models must include a provider prefix:
- Groq:
groq/llama-3.1-8b-instant,groq/llama-3.1-70b-versatile - Gemini:
gemini/gemini-2.5-flash,gemini/gemini-2.0-flash - OpenAI:
openai/gpt-4o-mini,openai/gpt-4o - Anthropic:
anthropic/claude-sonnet-4-20250514 - DeepSeek:
deepseek/deepseek-chat,deepseek/deepseek-reasoner - Mistral:
mistral/mistral-small-latest - Ollama:
ollama/llama3,ollama/mistral
Prompt Detail Levels¶
- minimal: Variable names only (tests general LLM knowledge)
- standard: Names, types, states, and short descriptions
- rich: Full context including extended descriptions
Workflow Examples¶
Comparing Multiple LLM Models¶
description: "Compare graph generation across LLM providers"
id: "model-comparison"
workflow_cache: "results/{{id}}_cache.db"
matrix:
model:
- "groq/llama-3.1-8b-instant"
- "gemini/gemini-2.5-flash"
- "deepseek/deepseek-chat"
steps:
- name: "Generate Graph"
uses: "causaliq-knowledge"
with:
action: "generate_graph"
network_context: "models/cancer.json"
llm_cache: "cache/llm_cache.db"
llm_model: "{{model}}"
# Results written to workflow_cache with key: {model}
This generates 3 graphs, one for each model, stored in the Workflow Cache.
Comparing Prompt Detail Levels¶
description: "Compare prompt detail levels"
id: "detail-comparison"
workflow_cache: "results/{{id}}_cache.db"
matrix:
detail:
- "minimal"
- "standard"
- "rich"
steps:
- name: "Generate Graph"
uses: "causaliq-knowledge"
with:
action: "generate_graph"
network_context: "models/asia.json"
llm_cache: "cache/asia_llm.db"
llm_model: "groq/llama-3.1-8b-instant"
prompt_detail: "{{detail}}"
Multi-Network Analysis¶
description: "Generate graphs for benchmark networks"
id: "benchmark-analysis"
workflow_cache: "results/{{id}}_cache.db"
matrix:
network:
- "asia"
- "cancer"
- "earthquake"
- "survey"
steps:
- name: "Generate Graph"
uses: "causaliq-knowledge"
with:
action: "generate_graph"
network_context: "models/{{network}}/{{network}}.json"
llm_cache: "cache/{{network}}_llm.db"
Full Comparison Matrix¶
description: "Full model × detail × network comparison"
id: "full-comparison"
workflow_cache: "results/{{id}}_cache.db"
matrix:
network:
- "asia"
- "cancer"
model:
- "groq/llama-3.1-8b-instant"
- "gemini/gemini-2.5-flash"
detail:
- "minimal"
- "standard"
steps:
- name: "Generate Graph"
uses: "causaliq-knowledge"
with:
action: "generate_graph"
network_context: "models/{{network}}.json"
llm_cache: "cache/llm_cache.db"
llm_model: "{{model}}"
prompt_detail: "{{detail}}"
This generates 8 graphs (2 networks × 2 models × 2 detail levels), all stored in a single Workflow Cache with matrix values as keys.
Action Output¶
When a workflow step completes successfully, it returns:
{
"status": "success",
"edges_count": 5,
"variables_count": 8,
"output_file": "results/cancer_graph.json",
"cache_stats": {
"cache_hits": 2,
"cache_misses": 6
}
}
In dry-run mode, it returns validation results without executing:
{
"status": "skipped",
"reason": "dry-run mode",
"validated_params": {
"network_context": "models/cancer.json",
"output": "results/graph.json",
"llm_cache": "cache/cancer.db"
}
}
Output File Format¶
The generated graph JSON file contains:
{
"edges": [
{"source": "smoking", "target": "lung_cancer", "confidence": 0.95},
{"source": "genetics", "target": "lung_cancer", "confidence": 0.8}
],
"variables": ["smoking", "lung_cancer", "genetics"],
"reasoning": "Based on epidemiological evidence...",
"metadata": {
"model": "groq/llama-3.1-8b-instant",
"prompt_detail": "standard",
"timestamp": "2026-02-04T10:30:00Z"
}
}
Environment Setup¶
API Keys¶
Set environment variables for your chosen LLM providers:
# Groq (free tier)
export GROQ_API_KEY="your-groq-key"
# Google Gemini (free tier)
export GEMINI_API_KEY="your-gemini-key"
# OpenAI
export OPENAI_API_KEY="your-openai-key"
# Anthropic
export ANTHROPIC_API_KEY="your-anthropic-key"
See LLM Provider Setup for details.
Directory Structure¶
A typical project structure:
my-project/
├── models/
│ ├── asia.json
│ ├── cancer.json
│ └── smoking.json
├── workflows/
│ ├── generate_graphs.yaml
│ └── compare_models.yaml
├── results/ # Generated output files
├── cache/ # LLM response cache
└── .env # API keys (add to .gitignore)
Troubleshooting¶
Action Not Found¶
Ensure both packages are installed in the same environment:
Schema Validation Error¶
Upgrade to the latest causaliq-workflow:
Invalid LLM Model¶
Include the provider prefix in llm_model:
Cache Path Error¶
Use .db extension or none:
Next Steps¶
- Network Context Format - Define variables
- CLI Reference - Command-line usage
- API Reference - Programmatic access