Graph PDG Module¶
The PDG (Probabilistic Dependency Graph) class represents a probability
distribution over edge states between node pairs. Unlike SDG which stores
a single deterministic edge type, PDG stores probabilities for each possible
edge state.
Overview¶
PDG is designed for:
- Graph averaging: Combining multiple structure learning runs
- Uncertainty representation: Representing structural uncertainty in causal graphs
- LLM fusion: Integrating LLM-generated graphs with statistical methods
The PDG class is independent of the SDG class hierarchy (not a subclass) as it represents a fundamentally different concept: a distribution over graphs rather than a single graph.
Classes¶
EdgeProbabilities¶
Probability distribution over edge states between two nodes.
Stores probabilities for each possible edge state:
forward: P(source -> target) directed edge in stored directionbackward: P(target -> source) directed edge opposite to storedundirected: P(source -- target) undirected edgenone: P(no edge between source and target)
Properties:
p_exist: Probability that any edge exists (sum of forward, backward, undirected)p_directed: Probability of a directed edge (sum of forward, backward)most_likely_state(): Returns the most probable edge state
Usage:
from causaliq_core.graph import EdgeProbabilities
# Create edge probability distribution
probs = EdgeProbabilities(
forward=0.6,
backward=0.2,
undirected=0.1,
none=0.1
)
# Query properties
print(probs.p_exist) # 0.9 (probability edge exists)
print(probs.p_directed) # 0.8 (probability edge is directed)
print(probs.most_likely_state()) # "forward"
PDG¶
Probabilistic Dependency Graph - distribution over SDG structures.
Features:
- Stores probability distributions for each node pair
- Greedy DAG extraction with cycle avoidance
- Compact binary compression/decompression
- GraphML I/O for serialisation
Usage:
from causaliq_core.graph import PDG, EdgeProbabilities
# Create a PDG
nodes = ["A", "B", "C"]
edges = {
("A", "B"): EdgeProbabilities(forward=0.8, none=0.2),
("A", "C"): EdgeProbabilities(forward=0.6, backward=0.3, none=0.1),
}
pdg = PDG(nodes, edges)
# Query edge probabilities
probs = pdg.get_probabilities("A", "B")
print(probs.forward) # 0.8
# Extract a DAG greedily (edges ranked by probability)
result = pdg.to_dag_greedy(threshold=0.5)
print(result.dag) # The extracted DAG
print(result.edges_included) # Number of edges added
GreedyDAGResult¶
NamedTuple returned by PDG.to_dag_greedy().
Fields:
dag: The extractedDAG.edges_included: Number of edges added to the DAG.edges_skipped_cycle: Edges skipped to avoid cycles.edges_skipped_threshold: Edges below the probability threshold.tie_breaks_applied: Edges where alphabetical tie-breaking was used.
PDG.to_dag_greedy(threshold=0.0)¶
Extract a DAG from the PDG using a greedy algorithm.
Algorithm:
- For each edge pair, compute effective forward and backward probabilities by splitting the undirected probability equally between directions.
- Choose the direction with the higher effective probability.
- For ties (effective forward equals effective backward), use alphabetical ordering (source < target, i.e. forward direction).
- Sort candidate edges by descending probability.
- Greedily add edges, skipping any that would create a cycle (checked via ancestor tracking with propagation to descendants).
Parameters:
threshold(float): Minimump_existto consider an edge (default 0.0).
Returns:
GreedyDAGResult: The extracted DAG and extraction statistics.
Usage:
result = pdg.to_dag_greedy(threshold=0.5)
print(result.edges_included) # 2
print(result.edges_skipped_cycle) # 0
print(result.tie_breaks_applied) # 0
PDG.compress() / PDG.decompress(data)¶
Compact binary serialisation of a PDG.
compress() encodes the PDG to a compact binary format where each
probability is stored in 3 bytes (4 significant figures). The p_none
value is derived as 1.0 - (forward + backward + undirected).
decompress() reconstructs a PDG from the binary representation.
Usage:
Reference¶
Probabilistic Dependency Graph (PDG)
This module provides PDG (Probabilistic Dependency Graph) which represents a probability distribution over edge states between node pairs. Unlike SDG which stores a single deterministic edge type, PDG stores probabilities for each possible edge state.
PDG is designed for: - Graph averaging from multiple structure learning runs - Fusing LLM-generated graphs with statistical structure learning - Representing uncertainty in causal graph structure
The PDG class is independent of the SDG class hierarchy (not a subclass) as it represents uncertainty over graphs rather than a single graph.
Classes:
-
GreedyDAGResult–Result from PDG.to_dag_greedy().
-
EdgeProbabilities–Probability distribution over edge states between two nodes.
-
PDG–Probabilistic Dependency Graph - distribution over SDG structures.
Classes¶
GreedyDAGResult
¶
Result from PDG.to_dag_greedy().
Attributes:
-
dag(DAG) –The extracted DAG.
-
edges_included(int) –Number of edges added to the DAG.
-
edges_skipped_cycle(int) –Edges skipped to avoid cycles.
-
edges_skipped_threshold(int) –Edges below threshold.
-
tie_breaks_applied(int) –Edges where alphabetical tie-break was used.
EdgeProbabilities
dataclass
¶
EdgeProbabilities(
forward: float = 0.0,
backward: float = 0.0,
undirected: float = 0.0,
none: float = 1.0,
)
Probability distribution over edge states between two nodes.
Stores probabilities for each possible edge state. The edge is stored with source node alphabetically before target node (canonical form).
Attributes:
-
forward(float) –P(source -> target) directed edge in stored direction.
-
backward(float) –P(target -> source) directed edge opposite to stored.
-
undirected(float) –P(source -- target) undirected edge.
-
none(float) –P(no edge between source and target).
Raises:
-
ValueError–If probabilities do not sum to 1.0 (within tolerance).
Example
probs = EdgeProbabilities( ... forward=0.6, backward=0.2, undirected=0.1, none=0.1 ... ) probs.p_exist 0.9 probs.p_directed 0.8
Methods:
-
__post_init__–Validate probabilities sum to 1.0.
-
most_likely_state–Return the most likely edge state.
Attributes¶
p_exist
property
¶
Probability that any edge exists between the nodes.
Returns:
-
float–Sum of forward, backward, and undirected probabilities.
p_directed
property
¶
Probability of a directed edge (either direction).
Returns:
-
float–Sum of forward and backward probabilities.
Functions¶
most_likely_state
¶
Return the most likely edge state.
Returns:
-
str–One of "forward", "backward", "undirected", or "none".
PDG
¶
PDG(nodes: List[str], edges: Optional[Dict[Tuple[str, str], EdgeProbabilities]] = None)
Probabilistic Dependency Graph - distribution over SDG structures.
Represents uncertainty over causal graph structure by storing probability distributions for each possible edge between node pairs. Unlike SDG, PDAG, and DAG which represent single deterministic graphs, PDG captures structural uncertainty.
PDG is not a subclass of SDG because it represents a fundamentally different concept: a distribution over graphs rather than a single graph.
Parameters:
-
(nodes¶List[str]) –List of node names in the graph.
-
(edges¶Optional[Dict[Tuple[str, str], EdgeProbabilities]], default:None) –Dictionary mapping (source, target) pairs to EdgeProbabilities. Node pairs should be in canonical order (source < target alphabetically).
Attributes:
-
nodes(List[str]) –Graph nodes in alphabetical order.
-
edges(Dict[Tuple[str, str], EdgeProbabilities]) –Edge probabilities {(source, target): EdgeProbabilities}.
Raises:
-
TypeError–If nodes or edges have invalid types.
-
ValueError–If edge keys are not in canonical order or reference unknown nodes.
Example
from causaliq_core.graph.pdg import PDG, EdgeProbabilities nodes = ["A", "B", "C"] edges = { ... ("A", "B"): EdgeProbabilities(forward=0.8, none=0.2), ... ("A", "C"): EdgeProbabilities(forward=0.6, backward=0.3, ... none=0.1), ... } pdg = PDG(nodes, edges) pdg.get_probabilities("A", "B").forward 0.8
Parameters:
-
(nodes¶List[str]) –List of node names.
-
(edges¶Optional[Dict[Tuple[str, str], EdgeProbabilities]], default:None) –Optional dictionary of edge probabilities. Keys must be tuples (source, target) where source < target alphabetically.
Methods:
-
get_probabilities–Get edge probabilities between two nodes.
-
set_probabilities–Set edge probabilities between two nodes.
-
node_pairs–Iterate over all possible node pairs in canonical order.
-
existing_edges–Iterate over node pairs with non-zero edge probability.
-
__len__–Return number of node pairs with explicit probabilities.
-
__eq__–Check equality with another PDG.
-
__str__–Return human-readable description of the PDG.
-
__repr__–Return detailed representation of the PDG.
-
to_dag_greedy–Extract a DAG from PDG using greedy algorithm.
-
compress–Compress PDG to compact binary representation.
-
decompress–Decompress PDG from compact binary representation.
Functions¶
get_probabilities
¶
get_probabilities(node_a: str, node_b: str) -> EdgeProbabilities
Get edge probabilities between two nodes.
Handles node ordering automatically - returns probabilities with forward/backward relative to alphabetical ordering.
Parameters:
Returns:
-
EdgeProbabilities–EdgeProbabilities for the node pair. If no explicit probabilities
-
EdgeProbabilities–stored, returns EdgeProbabilities(none=1.0).
Raises:
-
ValueError–If either node is not in the graph.
set_probabilities
¶
set_probabilities(node_a: str, node_b: str, probs: EdgeProbabilities) -> None
Set edge probabilities between two nodes.
Handles node ordering automatically.
Parameters:
-
(node_a¶str) –First node name.
-
(node_b¶str) –Second node name.
-
(probs¶EdgeProbabilities) –Edge probabilities to set.
Raises:
-
ValueError–If either node is not in the graph.
-
TypeError–If probs is not EdgeProbabilities.
node_pairs
¶
Iterate over all possible node pairs in canonical order.
Yields:
-
Tuple[str, str]–Tuples (source, target) where source < target alphabetically.
existing_edges
¶
existing_edges() -> Iterator[Tuple[str, str, EdgeProbabilities]]
Iterate over node pairs with non-zero edge probability.
Yields:
-
Tuple[str, str, EdgeProbabilities]–Tuples (source, target, probs) where p_exist > 0.
to_dag_greedy
¶
to_dag_greedy(threshold: float = 0.0) -> GreedyDAGResult
Extract a DAG from PDG using greedy algorithm.
Greedily adds edges in order of probability, choosing directions based on effective probability (splitting undirected probability equally between forward and backward). Edges that would create cycles are skipped.
For ties where effective forward equals effective backward, alphabetical ordering is used (source < target).
Parameters:
-
(threshold¶float, default:0.0) –Minimum p_exist to consider an edge (default 0.0).
Returns:
-
GreedyDAGResult–GreedyDAGResult containing the DAG and extraction statistics.
Example
pdg = PDG(["A", "B", "C"], { ... ("A", "B"): EdgeProbabilities(forward=0.7, none=0.3), ... ("A", "C"): EdgeProbabilities(forward=0.5, none=0.5), ... }) result = pdg.to_dag_greedy() result.edges_included 2
compress
¶
Compress PDG to compact binary representation.
Format: - 2 bytes: number of nodes (uint16, big-endian) - For each node: 2 bytes name length + UTF-8 encoded name - 2 bytes: number of edge pairs with probabilities (uint16) - For each edge pair: - 2 bytes: source node index (uint16) - 2 bytes: target node index (uint16) - 3 bytes: p_forward (4 s.f. mantissa + exponent) - 3 bytes: p_backward (4 s.f. mantissa + exponent) - 3 bytes: p_undirected (4 s.f. mantissa + exponent)
Probabilities are encoded with 4 significant figures using a mantissa (0-9999) and exponent format: value = mantissa × 10^exp. The p_none value is derived as 1.0 - (forward + backward + undirected).
Returns:
-
bytes–Compact binary representation of the PDG.
Raises:
-
ValueError–If graph has more than 65535 nodes or edge pairs.