Graph Generator¶

The generator module provides the GraphGenerator class for generating complete causal graphs from network context using LLMs.

Import Pattern¶

from causaliq_knowledge.graph import (
    GraphGenerator,
    GraphGeneratorConfig,
    NetworkContext,
    GeneratedGraph,
    PromptDetail,
    OutputFormat,
)
from causaliq_core.cache import TokenCache

Overview¶

GraphGenerator orchestrates the full graph generation workflow:

Create a generator with model and configuration
Optionally set up caching with TokenCache
Generate graphs from variable dictionaries or NetworkContext files
Receive structured GeneratedGraph objects with edges and metadata

Quick Start¶

Here's a complete working example:

from causaliq_knowledge.graph import (
    GraphGenerator,
    GraphGeneratorConfig,
    NetworkContext,
    PromptDetail,
    OutputFormat,
)

# Create generator with model identifier
# Format: "provider/model_name"
generator = GraphGenerator(model="groq/llama-3.1-8b-instant")

# Option 1: Generate from a list of variables
graph = generator.generate_graph(
    variables=[
        {"name": "smoking"},
        {"name": "lung_cancer"},
        {"name": "age"},
    ],
    domain="oncology",
)

# Option 2: Generate from a network context file
context = NetworkContext.load("research/models/my_model.json")
graph = generator.generate_from_context(context)

# Access the results
print(f"Generated {len(graph.edges)} edges")
for edge in graph.edges:
    print(f"  {edge.source} -> {edge.target} ({edge.confidence:.2f})")

# Access metadata
print(f"Model: {graph.metadata.model}")
print(f"Latency: {graph.metadata.latency_ms}ms")
print(f"Cost: ${graph.metadata.cost_usd:.4f}")

Configuration¶

GraphGeneratorConfig¶

Configuration dataclass for graph generation parameters.

from causaliq_knowledge.graph import GraphGeneratorConfig, PromptDetail, OutputFormat

config = GraphGeneratorConfig(
    temperature=0.1,              # LLM sampling temperature
    max_tokens=2000,              # Maximum response tokens
    timeout=60.0,                 # Request timeout in seconds
    output_format=OutputFormat.EDGE_LIST,  # or ADJACENCY_MATRIX
    prompt_detail=PromptDetail.STANDARD,   # MINIMAL, STANDARD, or RICH
    use_llm_names=True,           # Use llm_name field from specs
    request_id="",                # Optional request identifier
)

Attributes:

Attribute	Type	Default	Description
`temperature`	float	0.1	LLM sampling temperature (lower = more deterministic)
`max_tokens`	int	2000	Maximum tokens in LLM response
`timeout`	float	60.0	Request timeout in seconds
`output_format`	OutputFormat	EDGE_LIST	Response format (EDGE_LIST or ADJACENCY_MATRIX)
`prompt_detail`	PromptDetail	STANDARD	Detail level (MINIMAL, STANDARD, or RICH)
`use_llm_names`	bool	True	Use llm_name instead of benchmark name
`request_id`	str	""	Optional identifier for requests

Creating a Generator¶

from causaliq_knowledge.graph import GraphGenerator, GraphGeneratorConfig
from causaliq_core.cache import TokenCache

# Basic creation with just a model
generator = GraphGenerator(model="groq/llama-3.1-8b-instant")

# With custom configuration
config = GraphGeneratorConfig(
    temperature=0.2,
    prompt_detail=PromptDetail.RICH,
)
generator = GraphGenerator(model="gemini/gemini-2.0-flash", config=config)

# With caching enabled
cache = TokenCache(db_path="graph_cache.db")
generator = GraphGenerator(
    model="openai/gpt-4o",
    config=config,
    cache=cache,
)

# Or set cache after creation
generator = GraphGenerator(model="anthropic/claude-sonnet-4-20250514")
generator.set_cache(cache, use_cache=True)

Generating from Variables¶

Use generate_graph() when you have a list of variable dictionaries:

from causaliq_knowledge.graph import GraphGenerator, PromptDetail

generator = GraphGenerator(model="groq/llama-3.1-8b-instant")

# Minimal - just variable names
graph = generator.generate_graph(
    variables=[
        {"name": "smoking"},
        {"name": "lung_cancer"},
        {"name": "age"},
        {"name": "genetics"},
    ],
    domain="oncology",
)

# With more context
graph = generator.generate_graph(
    variables=[
        {
            "name": "smoking",
            "type": "binary",
            "description": "Whether the patient smokes",
        },
        {
            "name": "lung_cancer",
            "type": "binary",
            "description": "Diagnosis of lung cancer",
        },
    ],
    domain="oncology",
    level=PromptDetail.RICH,  # Override config's prompt_detail
)

# Access results
for edge in graph.edges:
    print(f"{edge.source} -> {edge.target}")
    print(f"  Confidence: {edge.confidence}")
    print(f"  Rationale: {edge.rationale}")

Generating from Network Context¶

Use generate_from_context() when you have a JSON network context file:

from causaliq_knowledge.graph import GraphGenerator, NetworkContext, PromptDetail

generator = GraphGenerator(model="gemini/gemini-2.0-flash")

# Load the network context
context = NetworkContext.load("research/models/asia/asia.json")

# Generate with default settings from config
graph = generator.generate_from_context(context)

# Override settings for this specific call
graph = generator.generate_from_context(
    context=context,
    level=PromptDetail.MINIMAL,
    use_llm_names=False,  # Use benchmark names instead
)

Supported LLM Providers¶

GraphGenerator supports all providers via the provider/model format:

Provider	Example Model String
Anthropic	`anthropic/claude-sonnet-4-20250514`
DeepSeek	`deepseek/deepseek-chat`
Gemini	`gemini/gemini-2.0-flash`
Groq	`groq/llama-3.1-8b-instant`
Mistral	`mistral/mistral-large-latest`
Ollama	`ollama/llama3.2`
OpenAI	`openai/gpt-4o`

# Using different providers
gen_groq = GraphGenerator(model="groq/llama-3.1-8b-instant")
gen_gemini = GraphGenerator(model="gemini/gemini-2.0-flash")
gen_openai = GraphGenerator(model="openai/gpt-4o")
gen_anthropic = GraphGenerator(model="anthropic/claude-sonnet-4-20250514")

Caching¶

GraphGenerator integrates with TokenCache for response caching:

from causaliq_knowledge.graph import GraphGenerator
from causaliq_core.cache import TokenCache

# Create cache and generator
cache = TokenCache(db_path="graph_cache.db")
generator = GraphGenerator(
    model="gemini/gemini-2.0-flash",
    cache=cache,
)

# First call - hits the LLM
graph1 = generator.generate_graph(
    variables=[{"name": "A"}, {"name": "B"}],
    domain="test",
)
print(f"From cache: {graph1.metadata.from_cache}")  # False

# Second call with same inputs - uses cache
graph2 = generator.generate_graph(
    variables=[{"name": "A"}, {"name": "B"}],
    domain="test",
)
print(f"From cache: {graph2.metadata.from_cache}")  # True

# Disable caching for specific generator
generator.set_cache(cache, use_cache=False)

Prompt Detail Levels¶

Control the amount of context provided to the LLM:

Level	Description
`PromptDetail.MINIMAL`	Variable names only
`PromptDetail.STANDARD`	Names, types, and brief descriptions
`PromptDetail.RICH`	Full descriptions, roles, states, and constraints

from causaliq_knowledge.graph import GraphGenerator, GraphGeneratorConfig, PromptDetail

# Set at config level (default for all calls)
config = GraphGeneratorConfig(prompt_detail=PromptDetail.MINIMAL)
generator = GraphGenerator(model="groq/llama-3.1-8b-instant", config=config)

# Override per call
graph = generator.generate_graph(
    variables=[{"name": "A"}, {"name": "B"}],
    domain="test",
    level=PromptDetail.RICH,  # Use rich for this call only
)

Output Formats¶

Choose between edge list and adjacency matrix output:

from causaliq_knowledge.graph import GraphGenerator, GraphGeneratorConfig, OutputFormat

# Edge list format (default)
config = GraphGeneratorConfig(output_format=OutputFormat.EDGE_LIST)
generator = GraphGenerator(model="groq/llama-3.1-8b-instant", config=config)

# Adjacency matrix format
config = GraphGeneratorConfig(output_format=OutputFormat.ADJACENCY_MATRIX)
generator = GraphGenerator(model="groq/llama-3.1-8b-instant", config=config)

Working with Results¶

GeneratedGraph¶

The result of generation is a GeneratedGraph object:

graph = generator.generate_graph(...)

# Access edges
for edge in graph.edges:
    print(f"Source: {edge.source}")
    print(f"Target: {edge.target}")
    print(f"Confidence: {edge.confidence}")
    print(f"Rationale: {edge.rationale}")

# Access metadata
meta = graph.metadata
print(f"Model: {meta.model}")
print(f"Provider: {meta.provider}")
print(f"Timestamp: {meta.timestamp}")
print(f"Latency: {meta.latency_ms}ms")
print(f"Input tokens: {meta.input_tokens}")
print(f"Output tokens: {meta.output_tokens}")
print(f"Cost: ${meta.cost_usd:.6f}")
print(f"From cache: {meta.from_cache}")

Generator Statistics¶

generator = GraphGenerator(model="groq/llama-3.1-8b-instant")

# After some generations...
stats = generator.get_stats()
print(f"Model: {stats['model']}")
print(f"Call count: {stats['call_count']}")
print(f"Client call count: {stats['client_call_count']}")

Complete Example¶

Here's a full example showing a typical workflow:

"""Generate a causal graph from a model specification."""

from pathlib import Path

from causaliq_knowledge.graph import (
    GraphGenerator,
    GraphGeneratorConfig,
    ModelLoader,
    PromptDetail,
    OutputFormat,
)
from causaliq_core.cache import TokenCache


def main():
    # Set up caching
    cache = TokenCache(db_path=Path("cache/graph_cache.db"))

    # Configure the generator
    config = GraphGeneratorConfig(
        temperature=0.1,
        max_tokens=2000,
        prompt_detail=PromptDetail.STANDARD,
        output_format=OutputFormat.EDGE_LIST,
    )

    # Create generator
    generator = GraphGenerator(
        model="groq/llama-3.1-8b-instant",
        config=config,
        cache=cache,
    )

    # Load model specification
    spec = ModelLoader.load("research/models/asia/asia.json")
    print(f"Loaded spec: {spec.name}")
    print(f"Variables: {len(spec.variables)}")

    # Generate graph
    graph = generator.generate_from_spec(spec)

    # Display results
    print(f"\nGenerated {len(graph.edges)} edges:")
    for edge in graph.edges:
        print(f"  {edge.source} -> {edge.target} ({edge.confidence:.2f})")

    # Show metadata
    print(f"\nMetadata:")
    print(f"  Model: {graph.metadata.provider}/{graph.metadata.model}")
    print(f"  Latency: {graph.metadata.latency_ms}ms")
    print(f"  Tokens: {graph.metadata.input_tokens} in, "
          f"{graph.metadata.output_tokens} out")
    print(f"  Cost: ${graph.metadata.cost_usd:.6f}")
    print(f"  Cached: {graph.metadata.from_cache}")


if __name__ == "__main__":
    main()

API Reference¶

Graph generator using LLM providers.

This module provides the GraphGenerator class for generating complete causal graphs from variable specifications using LLM providers.

Classes:

GraphGeneratorConfig –

Configuration for the GraphGenerator.
GraphGenerator –

Generate causal graphs from network context using LLMs.

GraphGeneratorConfig `dataclass` ¶

GraphGeneratorConfig(
    temperature: float = 0.1,
    max_tokens: int = 2000,
    timeout: float = 60.0,
    output_format: OutputFormat = EDGE_LIST,
    prompt_detail: PromptDetail = STANDARD,
    use_llm_names: bool = True,
    request_id: str = "",
)

Configuration for the GraphGenerator.

Attributes:

temperature (float) –

LLM sampling temperature (lower = more deterministic).
max_tokens (int) –

Maximum tokens in LLM response.
timeout (float) –

Request timeout in seconds.
output_format (OutputFormat) –

Desired output format (edge_list or adjacency_matrix).
prompt_detail (PromptDetail) –

Detail level for variable information in prompts.
use_llm_names (bool) –

Use llm_name instead of benchmark name in prompts.
request_id (str) –

Optional identifier for requests (stored in metadata).

GraphGenerator ¶

GraphGenerator(
    model: str = "groq/llama-3.1-8b-instant",
    config: Optional[GraphGeneratorConfig] = None,
    cache: Optional["TokenCache"] = None,
)

Generate causal graphs from network context using LLMs.

This class provides methods for generating complete causal graphs from NetworkContext objects or variable dictionaries. It supports all LLM providers available in causaliq-knowledge and integrates with the TokenCache for efficient caching of requests.

Attributes:

model (str) –

The LLM model identifier (e.g., "groq/llama-3.1-8b-instant").
config (GraphGeneratorConfig) –

Configuration for generation parameters.

Example

from causaliq_knowledge.graph import NetworkContext from causaliq_knowledge.graph.generator import GraphGenerator

Load network context¶

context = NetworkContext.load("asia.json")

Create generator¶

generator = GraphGenerator(model="groq/llama-3.1-8b-instant")

Generate graph¶

graph = generator.generate_from_context(context) print(f"Generated {len(graph.edges)} edges")

Parameters:

model ¶
(str, default: 'groq/llama-3.1-8b-instant' ) –

LLM model identifier with provider prefix. Supported: - "groq/llama-3.1-8b-instant" (Groq API) - "gemini/gemini-2.5-flash" (Google Gemini) - "openai/gpt-4o" (OpenAI) - "anthropic/claude-3-5-sonnet-20241022" (Anthropic) - "deepseek/deepseek-chat" (DeepSeek) - "mistral/mistral-small-latest" (Mistral) - "ollama/llama3.2:1b" (Local Ollama)
config ¶
(Optional[GraphGeneratorConfig], default: None ) –

Generation configuration. Uses defaults if None.
cache ¶
(Optional['TokenCache'], default: None ) –

TokenCache instance for caching. Disabled if None.

Raises:

ValueError –

If the model prefix is not supported.

Methods:

generate_from_context –

Generate a causal graph from a NetworkContext.
generate_graph –

Generate a causal graph from variable dictionaries.
get_stats –

Get statistics about generation calls.
set_cache –

Configure caching for this generator.

call_count `property` ¶

call_count: int

Return the number of generation calls made.

config `property` ¶

config: GraphGeneratorConfig

Return the generator configuration.

model `property` ¶

model: str

Return the model identifier.

generate_from_context ¶

generate_from_context(
    context: "NetworkContext",
    level: Optional[PromptDetail] = None,
    output_format: Optional[OutputFormat] = None,
    system_prompt: Optional[str] = None,
    use_llm_names: Optional[bool] = None,
) -> GeneratedGraph

Generate a causal graph from a NetworkContext.

Convenience method that extracts variables and domain from the context automatically.

Parameters:

context ¶
('NetworkContext') –

The network context.
level ¶
(Optional[PromptDetail], default: None ) –

View level for context. Uses config default if None.
output_format ¶
(Optional[OutputFormat], default: None ) –

Output format. Uses config default if None.
system_prompt ¶
(Optional[str], default: None ) –

Custom system prompt (optional).
use_llm_names ¶
(Optional[bool], default: None ) –

Use llm_name instead of benchmark name. Uses config default if None.

Returns:

GeneratedGraph –

GeneratedGraph with proposed edges and metadata.

Raises:

ValueError –

If LLM response cannot be parsed.

generate_graph ¶

generate_graph(
    variables: List[Dict[str, Any]],
    level: Optional[PromptDetail] = None,
    domain: Optional[str] = None,
    output_format: Optional[OutputFormat] = None,
    system_prompt: Optional[str] = None,
) -> GeneratedGraph

Generate a causal graph from variable dictionaries.

Parameters:

variables ¶
(List[Dict[str, Any]]) –

List of variable dictionaries with at least "name".
level ¶
(Optional[PromptDetail], default: None ) –

View level for context. Uses config default if None.
domain ¶
(Optional[str], default: None ) –

Optional domain context for the query.
output_format ¶
(Optional[OutputFormat], default: None ) –

Output format. Uses config default if None.
system_prompt ¶
(Optional[str], default: None ) –

Custom system prompt (optional).

Returns:

GeneratedGraph –

GeneratedGraph with proposed edges and metadata.

Raises:

ValueError –

If LLM response cannot be parsed.

get_stats ¶

get_stats() -> Dict[str, Any]

Get statistics about generation calls.

Returns:

Dict[str, Any] –

Dict with call_count, model, and client stats.

set_cache ¶

set_cache(cache: Optional['TokenCache'], use_cache: bool = True) -> None

Configure caching for this generator.

Parameters:

cache ¶
(Optional['TokenCache']) –

TokenCache instance for caching, or None to disable.
use_cache ¶
(bool, default: True ) –

Whether to use the cache (default True).

Graph Generator¶

Import Pattern¶

Overview¶

Quick Start¶

Configuration¶

GraphGeneratorConfig¶

Creating a Generator¶

Generating from Variables¶

Generating from Network Context¶

Supported LLM Providers¶

Caching¶

Prompt Detail Levels¶

Output Formats¶

Working with Results¶

GeneratedGraph¶

Generator Statistics¶

Complete Example¶

API Reference¶

GraphGeneratorConfig dataclass ¶

GraphGenerator ¶

Load network context¶

Create generator¶

Generate graph¶

model ¶

config ¶

cache ¶

call_count property ¶

config property ¶

model property ¶

generate_from_context ¶

context ¶

level ¶

output_format ¶

system_prompt ¶

use_llm_names ¶

generate_graph ¶

variables ¶

level ¶

domain ¶

output_format ¶

system_prompt ¶

get_stats ¶

set_cache ¶

cache ¶

use_cache ¶

GraphGeneratorConfig `dataclass` ¶

`model` ¶

`config` ¶

`cache` ¶

call_count `property` ¶

config `property` ¶

model `property` ¶

`context` ¶

`level` ¶

`output_format` ¶

`system_prompt` ¶

`use_llm_names` ¶

`variables` ¶

`level` ¶

`domain` ¶

`output_format` ¶

`system_prompt` ¶

`cache` ¶

`use_cache` ¶