Skip to content

Generate Graph

The generate_graph action asks LLMs to propose causal edges together with probabilities of the edge existing, and having a specific orientation. It returns a Probability Dependency Graph (PDG) encapsulating this information. These can be used to support, or be compared with, statistical structure learning algorithms which will be provided in the causaliq-discovery package.

This is a create action (see workflow action patterns) meaning it creates new output entries from each matched input entry.

Parameters

Parameter CLI Default Description
network_context -n None Path to network context JSON file
output -o None Output directory (CLI) or workflow cache (.db)
llm_cache -c None LLM cache: .db file path or none to disable
llm_model -m groq/llama-3.1-8b-instant LLM model identifier
llm_temperature -t 0.1 LLM temperature (0.0-2.0)
llm_max_tokens 4000 Maximum tokens in LLM response (100-100000)
llm_timeout 120.0 LLM request timeout in seconds (10-600)
llm_seed None Seed index for multi-sampling (busts cache)
prompt_detail -p standard Detail level: minimal, standard, rich
use_benchmark_names -b false Use benchmark names instead of LLM names

Notes:

  • Values must be supplied for all parameters without a default
  • In CLI, parameter names use hyphens (e.g., --network-context, --llm-cache)
  • The llm_cache parameter is required for both CLI and workflow usage. Use none to disable caching (not recommended for production)

How It Works

Step 1: Load Network Context

The network context JSON file defines the variables and domain context for the graph generation. See Network Context Format for the complete specification.

Step 2: Generate Edge Queries

For each pair of variables, the LLM is asked to estimate:

  • Probability of a directed edge in each direction
  • Probability of an undirected edge
  • Probability of no edge

The prompt detail level controls how much context is provided:

Level Includes
minimal Variable names only
standard Names, types, states, short descriptions
rich Full context including extended descriptions

Step 3: Aggregate to PDG

Responses are aggregated into a Probability Dependency Graph (PDG) where each edge has four probability values:

\[P(forward) + P(backward) + P(undirected) + P(none) = 1.0\]

Step 4: Return Results

The PDG is saved along with generation metadata including:

Metadata Description
model LLM model used
provider LLM provider (groq, gemini, etc.)
prompt_detail Detail level used
tokens_input Total input tokens
tokens_output Total output tokens
cost_usd Estimated API cost
latency_ms Total generation time

CLI Usage

Basic Usage

Generate a causal graph from a network context file:

cqknow generate-graph -n asia.json -c cache.db -o results/

This creates:

  • results/graph.graphml — The generated PDG
  • results/metadata.json — Generation metadata

With Specific Model

Use a different LLM model:

cqknow generate-graph -n asia.json -c cache.db -o results/ \
    -m gemini/gemini-2.5-flash

Rich Prompt Context

Provide more context to the LLM for better results:

cqknow generate-graph -n asia.json -c cache.db -o results/ \
    -p rich

Test Benchmark Memorisation

Use original benchmark names to test if the LLM has memorised the structure:

cqknow generate-graph -n asia.json -c cache.db -o results/ \
    --use-benchmark-names

Without Output Files

Print results to stderr without writing files:

cqknow generate-graph -n asia.json -c none -o none

Workflow Usage

In a CausalIQ workflow, generate_graph operates as a CREATE action that generates a new PDG entry in the workflow cache:

steps:
  - name: "Generate LLM graph"
    uses: "causaliq-knowledge"
    with:
      action: "generate_graph"
      network_context: "models/asia.json"
      output: "results/asia.db"
      llm_cache: "cache/llm_cache.db"
      llm_model: "groq/llama-3.1-8b-instant"

Comparing Multiple Models

description: "Compare graph generation across LLM providers"
id: "model-comparison"
workflow_cache: "results/{{id}}_cache.db"

matrix:
  model:
    - "groq/llama-3.1-8b-instant"
    - "gemini/gemini-2.5-flash"
    - "deepseek/deepseek-chat"

steps:
  - name: "Generate Graph"
    uses: "causaliq-knowledge"
    with:
      action: "generate_graph"
      network_context: "models/cancer.json"
      output: "results/cancer.db"
      llm_cache: "cache/llm_cache.db"
      llm_model: "{{model}}"

Comparing Prompt Detail Levels

description: "Compare prompt detail levels"
id: "detail-comparison"
workflow_cache: "results/{{id}}_cache.db"

matrix:
  detail:
    - "minimal"
    - "standard"
    - "rich"

steps:
  - name: "Generate Graph"
    uses: "causaliq-knowledge"
    with:
      action: "generate_graph"
      network_context: "models/asia.json"
      output: "results/asia.db"
      llm_cache: "cache/asia_llm.db"
      llm_model: "groq/llama-3.1-8b-instant"
      prompt_detail: "{{detail}}"

Multi-Network Analysis

description: "Generate graphs for benchmark networks"
id: "benchmark-analysis"
workflow_cache: "results/{{id}}_cache.db"

matrix:
  network:
    - "asia"
    - "cancer"
    - "earthquake"
    - "survey"

steps:
  - name: "Generate Graph"
    uses: "causaliq-knowledge"
    with:
      action: "generate_graph"
      network_context: "models/{{network}}/{{network}}.json"
      output: "results/{{network}}.db"
      llm_cache: "cache/{{network}}_llm.db"

LLM Model Identifiers

Models must include a provider prefix. Use cqknow list-models to see available models.

Provider Example Models
Groq groq/llama-3.1-8b-instant, groq/llama-3.1-70b-versatile
Gemini gemini/gemini-2.5-flash, gemini/gemini-2.0-flash
OpenAI openai/gpt-4o-mini, openai/gpt-4o
Anthropic anthropic/claude-sonnet-4-20250514
DeepSeek deepseek/deepseek-chat, deepseek/deepseek-reasoner
Mistral mistral/mistral-small-latest
Ollama ollama/llama3, ollama/mistral

LLM Caching

The llm_cache parameter specifies a SQLite database for caching LLM API responses. This:

  • Reduces costs by avoiding redundant API calls
  • Speeds up re-runs of experiments
  • Enables reproducibility by storing responses

Use LLM Cache Management commands to inspect, export, and import cache contents.

### Full Comparison Matrix

```yaml
description: "Full model × detail × network comparison"
id: "full-comparison"
workflow_cache: "results/{{id}}_cache.db"

matrix:
  network:
    - "asia"
    - "cancer"
  model:
    - "groq/llama-3.1-8b-instant"
    - "gemini/gemini-2.5-flash"
  detail:
    - "minimal"
    - "standard"

steps:
  - name: "Generate Graph"
    uses: "causaliq-knowledge"
    with:
      action: "generate_graph"
      network_context: "models/{{network}}.json"
      llm_cache: "cache/llm_cache.db"
      llm_model: "{{model}}"
      prompt_detail: "{{detail}}"

This generates 8 graphs (2 networks × 2 models × 2 detail levels), all stored in a single Workflow Cache with matrix values as keys.

Action Output

When a workflow step completes successfully, it returns:

{
    "status": "success",
    "edges_count": 5,
    "variables_count": 8,
    "output_file": "results/cancer_graph.json",
    "cache_stats": {
        "cache_hits": 2,
        "cache_misses": 6
    }
}

In dry-run mode, it returns validation results without executing:

{
    "status": "skipped",
    "reason": "dry-run mode",
    "validated_params": {
        "network_context": "models/cancer.json",
        "output": "results/graph.json",
        "llm_cache": "cache/cancer.db"
    }
}

Output File Format

The generated graph is saved as a PDG (Probabilistic Dependency Graph) in GraphML format. Each edge carries separate existence and orientation probabilities:

<edge source="smoking" target="lung_cancer">
  <data key="existence">0.95</data>
  <data key="orientation">0.85</data>
</edge>

Where:

  • existence: Probability that a causal relationship exists (0.0-1.0)
  • orientation: Confidence that the direction is source→target vs reverse (0.5 = uncertain, >0.5 = forward, <0.5 = reverse)

Environment Setup

API Keys

Set environment variables for your chosen LLM providers:

# Groq (free tier)
export GROQ_API_KEY="your-groq-key"

# Google Gemini (free tier)
export GEMINI_API_KEY="your-gemini-key"

# OpenAI
export OPENAI_API_KEY="your-openai-key"

# Anthropic
export ANTHROPIC_API_KEY="your-anthropic-key"

See LLM Provider Setup for details.

Directory Structure

A typical project structure:

my-project/
├── models/
│   ├── asia.json
│   ├── cancer.json
│   └── smoking.json
├── workflows/
│   ├── generate_graphs.yaml
│   └── compare_models.yaml
├── results/           # Generated output files
├── cache/             # LLM response cache
└── .env               # API keys (add to .gitignore)

Troubleshooting

Action Not Found

ActionRegistryError: Action 'causaliq-knowledge' not found

Ensure both packages are installed in the same environment:

pip install causaliq-knowledge causaliq-workflow

Schema Validation Error

WorkflowExecutionError: Schema file not found

Upgrade to the latest causaliq-workflow:

pip install --upgrade causaliq-workflow

Invalid LLM Model

ValueError: LLM model must start with provider prefix

Include the provider prefix in llm_model:

# Wrong
llm_model: "llama-3.1-8b-instant"

# Correct
llm_model: "groq/llama-3.1-8b-instant"

Cache Path Error

ValueError: llm_cache must be 'none' or a path ending with .db

Use .db extension or none:

# Wrong
llm_cache: "cache/data"

# Correct
llm_cache: "cache/data.db"
# Or
llm_cache: "none"

Next Steps