Skip to content

Graph PDG Module

The PDG (Probabilistic Dependency Graph) class represents a probability distribution over edge states between node pairs. Unlike SDG which stores a single deterministic edge type, PDG stores probabilities for each possible edge state.

Overview

PDG is designed for:

  • Graph averaging: Combining multiple structure learning runs
  • Uncertainty representation: Representing structural uncertainty in causal graphs
  • LLM fusion: Integrating LLM-generated graphs with statistical methods

The PDG class is independent of the SDG class hierarchy (not a subclass) as it represents a fundamentally different concept: a distribution over graphs rather than a single graph.

Classes

EdgeProbabilities

Probability distribution over edge states between two nodes.

Stores probabilities for each possible edge state:

  • forward: P(source -> target) directed edge in stored direction
  • backward: P(target -> source) directed edge opposite to stored
  • undirected: P(source -- target) undirected edge
  • none: P(no edge between source and target)

Properties:

  • p_exist: Probability that any edge exists (sum of forward, backward, undirected)
  • p_directed: Probability of a directed edge (sum of forward, backward)
  • most_likely_state(): Returns the most probable edge state

Usage:

from causaliq_core.graph import EdgeProbabilities

# Create edge probability distribution
probs = EdgeProbabilities(
    forward=0.6,
    backward=0.2,
    undirected=0.1,
    none=0.1
)

# Query properties
print(probs.p_exist)      # 0.9 (probability edge exists)
print(probs.p_directed)   # 0.8 (probability edge is directed)
print(probs.most_likely_state())  # "forward"

PDG

Probabilistic Dependency Graph - distribution over SDG structures.

Features:

  • Stores probability distributions for each node pair
  • Supports threshold-based graph extraction
  • GraphML I/O for serialisation

Usage:

from causaliq_core.graph import PDG, EdgeProbabilities

# Create a PDG
nodes = ["A", "B", "C"]
edges = {
    ("A", "B"): EdgeProbabilities(forward=0.8, none=0.2),
    ("A", "C"): EdgeProbabilities(forward=0.6, backward=0.3, none=0.1),
}
pdg = PDG(nodes, edges)

# Query edge probabilities
probs = pdg.get_probabilities("A", "B")
print(probs.forward)  # 0.8

# Extract graph at threshold
pdag = pdg.to_pdag(threshold=0.5)

Reference

Probabilistic Dependency Graph (PDG)

This module provides PDG (Probabilistic Dependency Graph) which represents a probability distribution over edge states between node pairs. Unlike SDG which stores a single deterministic edge type, PDG stores probabilities for each possible edge state.

PDG is designed for: - Graph averaging from multiple structure learning runs - Fusing LLM-generated graphs with statistical structure learning - Representing uncertainty in causal graph structure

The PDG class is independent of the SDG class hierarchy (not a subclass) as it represents uncertainty over graphs rather than a single graph.

Classes:

  • EdgeProbabilities

    Probability distribution over edge states between two nodes.

  • PDG

    Probabilistic Dependency Graph - distribution over SDG structures.

Classes

EdgeProbabilities dataclass

EdgeProbabilities(
    forward: float = 0.0,
    backward: float = 0.0,
    undirected: float = 0.0,
    none: float = 1.0,
)

Probability distribution over edge states between two nodes.

Stores probabilities for each possible edge state. The edge is stored with source node alphabetically before target node (canonical form).

Attributes:

  • forward (float) –

    P(source -> target) directed edge in stored direction.

  • backward (float) –

    P(target -> source) directed edge opposite to stored.

  • undirected (float) –

    P(source -- target) undirected edge.

  • none (float) –

    P(no edge between source and target).

Raises:

  • ValueError

    If probabilities do not sum to 1.0 (within tolerance).

Example

probs = EdgeProbabilities( ... forward=0.6, backward=0.2, undirected=0.1, none=0.1 ... ) probs.p_exist 0.9 probs.p_directed 0.8

Methods:

Attributes
p_exist property
p_exist: float

Probability that any edge exists between the nodes.

Returns:

  • float

    Sum of forward, backward, and undirected probabilities.

p_directed property
p_directed: float

Probability of a directed edge (either direction).

Returns:

  • float

    Sum of forward and backward probabilities.

Functions
__post_init__
__post_init__() -> None

Validate probabilities sum to 1.0.

most_likely_state
most_likely_state() -> str

Return the most likely edge state.

Returns:

  • str

    One of "forward", "backward", "undirected", or "none".

PDG

PDG(nodes: List[str], edges: Optional[Dict[Tuple[str, str], EdgeProbabilities]] = None)

Probabilistic Dependency Graph - distribution over SDG structures.

Represents uncertainty over causal graph structure by storing probability distributions for each possible edge between node pairs. Unlike SDG, PDAG, and DAG which represent single deterministic graphs, PDG captures structural uncertainty.

PDG is not a subclass of SDG because it represents a fundamentally different concept: a distribution over graphs rather than a single graph.

Parameters:

  • nodes
    (List[str]) –

    List of node names in the graph.

  • edges
    (Optional[Dict[Tuple[str, str], EdgeProbabilities]], default: None ) –

    Dictionary mapping (source, target) pairs to EdgeProbabilities. Node pairs should be in canonical order (source < target alphabetically).

Attributes:

  • nodes (List[str]) –

    Graph nodes in alphabetical order.

  • edges (Dict[Tuple[str, str], EdgeProbabilities]) –

    Edge probabilities {(source, target): EdgeProbabilities}.

Raises:

  • TypeError

    If nodes or edges have invalid types.

  • ValueError

    If edge keys are not in canonical order or reference unknown nodes.

Example

from causaliq_core.graph.pdg import PDG, EdgeProbabilities nodes = ["A", "B", "C"] edges = { ... ("A", "B"): EdgeProbabilities(forward=0.8, none=0.2), ... ("A", "C"): EdgeProbabilities(forward=0.6, backward=0.3, ... none=0.1), ... } pdg = PDG(nodes, edges) pdg.get_probabilities("A", "B").forward 0.8

Parameters:

  • nodes
    (List[str]) –

    List of node names.

  • edges
    (Optional[Dict[Tuple[str, str], EdgeProbabilities]], default: None ) –

    Optional dictionary of edge probabilities. Keys must be tuples (source, target) where source < target alphabetically.

Methods:

  • get_probabilities

    Get edge probabilities between two nodes.

  • set_probabilities

    Set edge probabilities between two nodes.

  • node_pairs

    Iterate over all possible node pairs in canonical order.

  • existing_edges

    Iterate over node pairs with non-zero edge probability.

  • __len__

    Return number of node pairs with explicit probabilities.

  • __eq__

    Check equality with another PDG.

  • __str__

    Return human-readable description of the PDG.

  • __repr__

    Return detailed representation of the PDG.

  • compress

    Compress PDG to compact binary representation.

  • decompress

    Decompress PDG from compact binary representation.

Functions
get_probabilities
get_probabilities(node_a: str, node_b: str) -> EdgeProbabilities

Get edge probabilities between two nodes.

Handles node ordering automatically - returns probabilities with forward/backward relative to alphabetical ordering.

Parameters:

  • node_a (str) –

    First node name.

  • node_b (str) –

    Second node name.

Returns:

Raises:

  • ValueError

    If either node is not in the graph.

set_probabilities
set_probabilities(node_a: str, node_b: str, probs: EdgeProbabilities) -> None

Set edge probabilities between two nodes.

Handles node ordering automatically.

Parameters:

  • node_a (str) –

    First node name.

  • node_b (str) –

    Second node name.

  • probs (EdgeProbabilities) –

    Edge probabilities to set.

Raises:

  • ValueError

    If either node is not in the graph.

  • TypeError

    If probs is not EdgeProbabilities.

node_pairs
node_pairs() -> Iterator[Tuple[str, str]]

Iterate over all possible node pairs in canonical order.

Yields:

  • Tuple[str, str]

    Tuples (source, target) where source < target alphabetically.

existing_edges
existing_edges() -> Iterator[Tuple[str, str, EdgeProbabilities]]

Iterate over node pairs with non-zero edge probability.

Yields:

  • Tuple[str, str, EdgeProbabilities]

    Tuples (source, target, probs) where p_exist > 0.

__len__
__len__() -> int

Return number of node pairs with explicit probabilities.

__eq__
__eq__(other: object) -> bool

Check equality with another PDG.

__str__
__str__() -> str

Return human-readable description of the PDG.

__repr__
__repr__() -> str

Return detailed representation of the PDG.

compress
compress() -> bytes

Compress PDG to compact binary representation.

Format: - 2 bytes: number of nodes (uint16, big-endian) - For each node: 2 bytes name length + UTF-8 encoded name - 2 bytes: number of edge pairs with probabilities (uint16) - For each edge pair: - 2 bytes: source node index (uint16) - 2 bytes: target node index (uint16) - 3 bytes: p_forward (4 s.f. mantissa + exponent) - 3 bytes: p_backward (4 s.f. mantissa + exponent) - 3 bytes: p_undirected (4 s.f. mantissa + exponent)

Probabilities are encoded with 4 significant figures using a mantissa (0-9999) and exponent format: value = mantissa × 10^exp. The p_none value is derived as 1.0 - (forward + backward + undirected).

Returns:

  • bytes

    Compact binary representation of the PDG.

Raises:

  • ValueError

    If graph has more than 65535 nodes or edge pairs.

decompress classmethod
decompress(data: bytes) -> PDG

Decompress PDG from compact binary representation.

Parameters:

  • data (bytes) –

    Binary data from PDG.compress().

Returns:

  • PDG

    Reconstructed PDG instance.

Raises:

  • TypeError

    If data is not bytes.

  • ValueError

    If data is invalid or corrupted.