Graph PDG Module¶
The PDG (Probabilistic Dependency Graph) class represents a probability
distribution over edge states between node pairs. Unlike SDG which stores
a single deterministic edge type, PDG stores probabilities for each possible
edge state.
Overview¶
PDG is designed for:
- Graph averaging: Combining multiple structure learning runs
- Uncertainty representation: Representing structural uncertainty in causal graphs
- LLM fusion: Integrating LLM-generated graphs with statistical methods
The PDG class is independent of the SDG class hierarchy (not a subclass) as it represents a fundamentally different concept: a distribution over graphs rather than a single graph.
Classes¶
EdgeProbabilities¶
Probability distribution over edge states between two nodes.
Stores probabilities for each possible edge state:
forward: P(source -> target) directed edge in stored directionbackward: P(target -> source) directed edge opposite to storedundirected: P(source -- target) undirected edgenone: P(no edge between source and target)
Properties:
p_exist: Probability that any edge exists (sum of forward, backward, undirected)p_directed: Probability of a directed edge (sum of forward, backward)most_likely_state(): Returns the most probable edge state
Usage:
from causaliq_core.graph import EdgeProbabilities
# Create edge probability distribution
probs = EdgeProbabilities(
forward=0.6,
backward=0.2,
undirected=0.1,
none=0.1
)
# Query properties
print(probs.p_exist) # 0.9 (probability edge exists)
print(probs.p_directed) # 0.8 (probability edge is directed)
print(probs.most_likely_state()) # "forward"
PDG¶
Probabilistic Dependency Graph - distribution over SDG structures.
Features:
- Stores probability distributions for each node pair
- Supports threshold-based graph extraction
- GraphML I/O for serialisation
Usage:
from causaliq_core.graph import PDG, EdgeProbabilities
# Create a PDG
nodes = ["A", "B", "C"]
edges = {
("A", "B"): EdgeProbabilities(forward=0.8, none=0.2),
("A", "C"): EdgeProbabilities(forward=0.6, backward=0.3, none=0.1),
}
pdg = PDG(nodes, edges)
# Query edge probabilities
probs = pdg.get_probabilities("A", "B")
print(probs.forward) # 0.8
# Extract graph at threshold
pdag = pdg.to_pdag(threshold=0.5)
Reference¶
Probabilistic Dependency Graph (PDG)
This module provides PDG (Probabilistic Dependency Graph) which represents a probability distribution over edge states between node pairs. Unlike SDG which stores a single deterministic edge type, PDG stores probabilities for each possible edge state.
PDG is designed for: - Graph averaging from multiple structure learning runs - Fusing LLM-generated graphs with statistical structure learning - Representing uncertainty in causal graph structure
The PDG class is independent of the SDG class hierarchy (not a subclass) as it represents uncertainty over graphs rather than a single graph.
Classes:
-
EdgeProbabilities–Probability distribution over edge states between two nodes.
-
PDG–Probabilistic Dependency Graph - distribution over SDG structures.
Classes¶
EdgeProbabilities
dataclass
¶
EdgeProbabilities(
forward: float = 0.0,
backward: float = 0.0,
undirected: float = 0.0,
none: float = 1.0,
)
Probability distribution over edge states between two nodes.
Stores probabilities for each possible edge state. The edge is stored with source node alphabetically before target node (canonical form).
Attributes:
-
forward(float) –P(source -> target) directed edge in stored direction.
-
backward(float) –P(target -> source) directed edge opposite to stored.
-
undirected(float) –P(source -- target) undirected edge.
-
none(float) –P(no edge between source and target).
Raises:
-
ValueError–If probabilities do not sum to 1.0 (within tolerance).
Example
probs = EdgeProbabilities( ... forward=0.6, backward=0.2, undirected=0.1, none=0.1 ... ) probs.p_exist 0.9 probs.p_directed 0.8
Methods:
-
__post_init__–Validate probabilities sum to 1.0.
-
most_likely_state–Return the most likely edge state.
Attributes¶
p_exist
property
¶
Probability that any edge exists between the nodes.
Returns:
-
float–Sum of forward, backward, and undirected probabilities.
p_directed
property
¶
Probability of a directed edge (either direction).
Returns:
-
float–Sum of forward and backward probabilities.
Functions¶
most_likely_state
¶
Return the most likely edge state.
Returns:
-
str–One of "forward", "backward", "undirected", or "none".
PDG
¶
PDG(nodes: List[str], edges: Optional[Dict[Tuple[str, str], EdgeProbabilities]] = None)
Probabilistic Dependency Graph - distribution over SDG structures.
Represents uncertainty over causal graph structure by storing probability distributions for each possible edge between node pairs. Unlike SDG, PDAG, and DAG which represent single deterministic graphs, PDG captures structural uncertainty.
PDG is not a subclass of SDG because it represents a fundamentally different concept: a distribution over graphs rather than a single graph.
Parameters:
-
(nodes¶List[str]) –List of node names in the graph.
-
(edges¶Optional[Dict[Tuple[str, str], EdgeProbabilities]], default:None) –Dictionary mapping (source, target) pairs to EdgeProbabilities. Node pairs should be in canonical order (source < target alphabetically).
Attributes:
-
nodes(List[str]) –Graph nodes in alphabetical order.
-
edges(Dict[Tuple[str, str], EdgeProbabilities]) –Edge probabilities {(source, target): EdgeProbabilities}.
Raises:
-
TypeError–If nodes or edges have invalid types.
-
ValueError–If edge keys are not in canonical order or reference unknown nodes.
Example
from causaliq_core.graph.pdg import PDG, EdgeProbabilities nodes = ["A", "B", "C"] edges = { ... ("A", "B"): EdgeProbabilities(forward=0.8, none=0.2), ... ("A", "C"): EdgeProbabilities(forward=0.6, backward=0.3, ... none=0.1), ... } pdg = PDG(nodes, edges) pdg.get_probabilities("A", "B").forward 0.8
Parameters:
-
(nodes¶List[str]) –List of node names.
-
(edges¶Optional[Dict[Tuple[str, str], EdgeProbabilities]], default:None) –Optional dictionary of edge probabilities. Keys must be tuples (source, target) where source < target alphabetically.
Methods:
-
get_probabilities–Get edge probabilities between two nodes.
-
set_probabilities–Set edge probabilities between two nodes.
-
node_pairs–Iterate over all possible node pairs in canonical order.
-
existing_edges–Iterate over node pairs with non-zero edge probability.
-
__len__–Return number of node pairs with explicit probabilities.
-
__eq__–Check equality with another PDG.
-
__str__–Return human-readable description of the PDG.
-
__repr__–Return detailed representation of the PDG.
-
compress–Compress PDG to compact binary representation.
-
decompress–Decompress PDG from compact binary representation.
Functions¶
get_probabilities
¶
get_probabilities(node_a: str, node_b: str) -> EdgeProbabilities
Get edge probabilities between two nodes.
Handles node ordering automatically - returns probabilities with forward/backward relative to alphabetical ordering.
Parameters:
Returns:
-
EdgeProbabilities–EdgeProbabilities for the node pair. If no explicit probabilities
-
EdgeProbabilities–stored, returns EdgeProbabilities(none=1.0).
Raises:
-
ValueError–If either node is not in the graph.
set_probabilities
¶
set_probabilities(node_a: str, node_b: str, probs: EdgeProbabilities) -> None
Set edge probabilities between two nodes.
Handles node ordering automatically.
Parameters:
-
(node_a¶str) –First node name.
-
(node_b¶str) –Second node name.
-
(probs¶EdgeProbabilities) –Edge probabilities to set.
Raises:
-
ValueError–If either node is not in the graph.
-
TypeError–If probs is not EdgeProbabilities.
node_pairs
¶
Iterate over all possible node pairs in canonical order.
Yields:
-
Tuple[str, str]–Tuples (source, target) where source < target alphabetically.
existing_edges
¶
existing_edges() -> Iterator[Tuple[str, str, EdgeProbabilities]]
Iterate over node pairs with non-zero edge probability.
Yields:
-
Tuple[str, str, EdgeProbabilities]–Tuples (source, target, probs) where p_exist > 0.
compress
¶
Compress PDG to compact binary representation.
Format: - 2 bytes: number of nodes (uint16, big-endian) - For each node: 2 bytes name length + UTF-8 encoded name - 2 bytes: number of edge pairs with probabilities (uint16) - For each edge pair: - 2 bytes: source node index (uint16) - 2 bytes: target node index (uint16) - 3 bytes: p_forward (4 s.f. mantissa + exponent) - 3 bytes: p_backward (4 s.f. mantissa + exponent) - 3 bytes: p_undirected (4 s.f. mantissa + exponent)
Probabilities are encoded with 4 significant figures using a mantissa (0-9999) and exponent format: value = mantissa × 10^exp. The p_none value is derived as 1.0 - (forward + backward + undirected).
Returns:
-
bytes–Compact binary representation of the PDG.
Raises:
-
ValueError–If graph has more than 65535 nodes or edge pairs.