Graph PDG Module¶

The PDG (Probabilistic Dependency Graph) class represents a probability distribution over edge states between node pairs. Unlike SDG which stores a single deterministic edge type, PDG stores probabilities for each possible edge state.

Overview¶

PDG is designed for:

Graph averaging: Combining multiple structure learning runs
Uncertainty representation: Representing structural uncertainty in causal graphs
LLM fusion: Integrating LLM-generated graphs with statistical methods

The PDG class is independent of the SDG class hierarchy (not a subclass) as it represents a fundamentally different concept: a distribution over graphs rather than a single graph.

Classes¶

`EdgeProbabilities`¶

Probability distribution over edge states between two nodes.

Stores probabilities for each possible edge state:

forward: P(source -> target) directed edge in stored direction
backward: P(target -> source) directed edge opposite to stored
undirected: P(source -- target) undirected edge
none: P(no edge between source and target)

Properties:

p_exist: Probability that any edge exists (sum of forward, backward, undirected)
p_directed: Probability of a directed edge (sum of forward, backward)
most_likely_state(): Returns the most probable edge state

Usage:

from causaliq_core.graph import EdgeProbabilities

# Create edge probability distribution
probs = EdgeProbabilities(
    forward=0.6,
    backward=0.2,
    undirected=0.1,
    none=0.1
)

# Query properties
print(probs.p_exist)      # 0.9 (probability edge exists)
print(probs.p_directed)   # 0.8 (probability edge is directed)
print(probs.most_likely_state())  # "forward"

`PDG`¶

Probabilistic Dependency Graph - distribution over SDG structures.

Features:

Stores probability distributions for each node pair
Greedy DAG extraction with cycle avoidance
Compact binary compression/decompression
GraphML I/O for serialisation

Usage:

from causaliq_core.graph import PDG, EdgeProbabilities

# Create a PDG
nodes = ["A", "B", "C"]
edges = {
    ("A", "B"): EdgeProbabilities(forward=0.8, none=0.2),
    ("A", "C"): EdgeProbabilities(forward=0.6, backward=0.3, none=0.1),
}
pdg = PDG(nodes, edges)

# Query edge probabilities
probs = pdg.get_probabilities("A", "B")
print(probs.forward)  # 0.8

# Extract a DAG greedily (edges ranked by probability)
result = pdg.to_dag_greedy(threshold=0.5)
print(result.dag)             # The extracted DAG
print(result.edges_included)  # Number of edges added

`GreedyDAGResult`¶

NamedTuple returned by PDG.to_dag_greedy().

Fields:

dag: The extracted DAG.
edges_included: Number of edges added to the DAG.
edges_skipped_cycle: Edges skipped to avoid cycles.
edges_skipped_threshold: Edges below the probability threshold.
tie_breaks_applied: Edges where alphabetical tie-breaking was used.

`PDG.to_dag_greedy(threshold=0.0)`¶

Extract a DAG from the PDG using a greedy algorithm.

Algorithm:

For each edge pair, compute effective forward and backward probabilities by splitting the undirected probability equally between directions.
Choose the direction with the higher effective probability.
For ties (effective forward equals effective backward), use alphabetical ordering (source < target, i.e. forward direction).
Sort candidate edges by descending probability.
Greedily add edges, skipping any that would create a cycle (checked via ancestor tracking with propagation to descendants).

Parameters:

threshold (float): Minimum p_exist to consider an edge (default 0.0).

Returns:

GreedyDAGResult: The extracted DAG and extraction statistics.

Usage:

result = pdg.to_dag_greedy(threshold=0.5)
print(result.edges_included)        # 2
print(result.edges_skipped_cycle)    # 0
print(result.tie_breaks_applied)     # 0

`PDG.compress()` / `PDG.decompress(data)`¶

Compact binary serialisation of a PDG.

compress() encodes the PDG to a compact binary format where each probability is stored in 3 bytes (4 significant figures). The p_none value is derived as 1.0 - (forward + backward + undirected). decompress() reconstructs a PDG from the binary representation.

Usage:

blob = pdg.compress()
pdg2 = PDG.decompress(blob)

Reference¶

Probabilistic Dependency Graph (PDG)

This module provides PDG (Probabilistic Dependency Graph) which represents a probability distribution over edge states between node pairs. Unlike SDG which stores a single deterministic edge type, PDG stores probabilities for each possible edge state.

PDG is designed for: - Graph averaging from multiple structure learning runs - Fusing LLM-generated graphs with statistical structure learning - Representing uncertainty in causal graph structure

The PDG class is independent of the SDG class hierarchy (not a subclass) as it represents uncertainty over graphs rather than a single graph.

Classes:

GreedyDAGResult –

Result from PDG.to_dag_greedy().
EdgeProbabilities –

Probability distribution over edge states between two nodes.
PDG –

Probabilistic Dependency Graph - distribution over SDG structures.

Classes¶

GreedyDAGResult ¶

Result from PDG.to_dag_greedy().

Attributes:

dag (DAG) –

The extracted DAG.
edges_included (int) –

Number of edges added to the DAG.
edges_skipped_cycle (int) –

Edges skipped to avoid cycles.
edges_skipped_threshold (int) –

Edges below threshold.
tie_breaks_applied (int) –

Edges where alphabetical tie-break was used.

EdgeProbabilities `dataclass` ¶

EdgeProbabilities(
    forward: float = 0.0,
    backward: float = 0.0,
    undirected: float = 0.0,
    none: float = 1.0,
)

Probability distribution over edge states between two nodes.

Stores probabilities for each possible edge state. The edge is stored with source node alphabetically before target node (canonical form).

Attributes:

forward (float) –

P(source -> target) directed edge in stored direction.
backward (float) –

P(target -> source) directed edge opposite to stored.
undirected (float) –

P(source -- target) undirected edge.
none (float) –

P(no edge between source and target).

Raises:

ValueError –

If probabilities do not sum to 1.0 (within tolerance).

Example

probs = EdgeProbabilities( ... forward=0.6, backward=0.2, undirected=0.1, none=0.1 ... ) probs.p_exist 0.9 probs.p_directed 0.8

Methods:

__post_init__ –

Validate probabilities sum to 1.0.
most_likely_state –

Return the most likely edge state.

Attributes¶

p_exist `property` ¶

p_exist: float

Probability that any edge exists between the nodes.

Returns:

float –

Sum of forward, backward, and undirected probabilities.

p_directed `property` ¶

p_directed: float

Probability of a directed edge (either direction).

Returns:

float –

Sum of forward and backward probabilities.

Functions¶

__post_init__ ¶

__post_init__() -> None

Validate probabilities sum to 1.0.

most_likely_state ¶

most_likely_state() -> str

Return the most likely edge state.

Returns:

str –

One of "forward", "backward", "undirected", or "none".

PDG ¶

PDG(nodes: List[str], edges: Optional[Dict[Tuple[str, str], EdgeProbabilities]] = None)

Probabilistic Dependency Graph - distribution over SDG structures.

Represents uncertainty over causal graph structure by storing probability distributions for each possible edge between node pairs. Unlike SDG, PDAG, and DAG which represent single deterministic graphs, PDG captures structural uncertainty.

PDG is not a subclass of SDG because it represents a fundamentally different concept: a distribution over graphs rather than a single graph.

Parameters:

nodes ¶
(List[str]) –

List of node names in the graph.
edges ¶
(Optional[Dict[Tuple[str, str], EdgeProbabilities]], default: None ) –

Dictionary mapping (source, target) pairs to EdgeProbabilities. Node pairs should be in canonical order (source < target alphabetically).

Attributes:

nodes (List[str]) –

Graph nodes in alphabetical order.
edges (Dict[Tuple[str, str], EdgeProbabilities]) –

Edge probabilities {(source, target): EdgeProbabilities}.

Raises:

TypeError –

If nodes or edges have invalid types.
ValueError –

If edge keys are not in canonical order or reference unknown nodes.

Example

from causaliq_core.graph.pdg import PDG, EdgeProbabilities nodes = ["A", "B", "C"] edges = { ... ("A", "B"): EdgeProbabilities(forward=0.8, none=0.2), ... ("A", "C"): EdgeProbabilities(forward=0.6, backward=0.3, ... none=0.1), ... } pdg = PDG(nodes, edges) pdg.get_probabilities("A", "B").forward 0.8

Parameters:

nodes ¶
(List[str]) –

List of node names.
edges ¶
(Optional[Dict[Tuple[str, str], EdgeProbabilities]], default: None ) –

Optional dictionary of edge probabilities. Keys must be tuples (source, target) where source < target alphabetically.

Methods:

get_probabilities –

Get edge probabilities between two nodes.
set_probabilities –

Set edge probabilities between two nodes.
node_pairs –

Iterate over all possible node pairs in canonical order.
existing_edges –

Iterate over node pairs with non-zero edge probability.
__len__ –

Return number of node pairs with explicit probabilities.
__eq__ –

Check equality with another PDG.
__str__ –

Return human-readable description of the PDG.
__repr__ –

Return detailed representation of the PDG.
to_dag_greedy –

Extract a DAG from PDG using greedy algorithm.
compress –

Compress PDG to compact binary representation.
decompress –

Decompress PDG from compact binary representation.

Functions¶

get_probabilities ¶

get_probabilities(node_a: str, node_b: str) -> EdgeProbabilities

Get edge probabilities between two nodes.

Handles node ordering automatically - returns probabilities with forward/backward relative to alphabetical ordering.

Parameters:

node_a ¶ (str) –

First node name.
node_b ¶ (str) –

Second node name.

Returns:

EdgeProbabilities –

EdgeProbabilities for the node pair. If no explicit probabilities
EdgeProbabilities –

stored, returns EdgeProbabilities(none=1.0).

Raises:

ValueError –

If either node is not in the graph.

set_probabilities ¶

set_probabilities(node_a: str, node_b: str, probs: EdgeProbabilities) -> None

Set edge probabilities between two nodes.

Handles node ordering automatically.

Parameters:

node_a ¶ (str) –

First node name.
node_b ¶ (str) –

Second node name.
probs ¶ (EdgeProbabilities) –

Edge probabilities to set.

Raises:

ValueError –

If either node is not in the graph.
TypeError –

If probs is not EdgeProbabilities.

node_pairs ¶

node_pairs() -> Iterator[Tuple[str, str]]

Iterate over all possible node pairs in canonical order.

Yields:

Tuple[str, str] –

Tuples (source, target) where source < target alphabetically.

existing_edges ¶

existing_edges() -> Iterator[Tuple[str, str, EdgeProbabilities]]

Iterate over node pairs with non-zero edge probability.

Yields:

Tuple[str, str, EdgeProbabilities] –

Tuples (source, target, probs) where p_exist > 0.

len ¶

__len__() -> int

Return number of node pairs with explicit probabilities.

eq ¶

__eq__(other: object) -> bool

Check equality with another PDG.

str ¶

__str__() -> str

Return human-readable description of the PDG.

repr ¶

__repr__() -> str

Return detailed representation of the PDG.

to_dag_greedy ¶

to_dag_greedy(threshold: float = 0.0) -> GreedyDAGResult

Extract a DAG from PDG using greedy algorithm.

Greedily adds edges in order of probability, choosing directions based on effective probability (splitting undirected probability equally between forward and backward). Edges that would create cycles are skipped.

For ties where effective forward equals effective backward, alphabetical ordering is used (source < target).

Parameters:

threshold ¶ (float, default: 0.0 ) –

Minimum p_exist to consider an edge (default 0.0).

Returns:

GreedyDAGResult –

GreedyDAGResult containing the DAG and extraction statistics.

Example

pdg = PDG(["A", "B", "C"], { ... ("A", "B"): EdgeProbabilities(forward=0.7, none=0.3), ... ("A", "C"): EdgeProbabilities(forward=0.5, none=0.5), ... }) result = pdg.to_dag_greedy() result.edges_included 2

compress ¶

compress() -> bytes

Compress PDG to compact binary representation.

Format: - 2 bytes: number of nodes (uint16, big-endian) - For each node: 2 bytes name length + UTF-8 encoded name - 2 bytes: number of edge pairs with probabilities (uint16) - For each edge pair: - 2 bytes: source node index (uint16) - 2 bytes: target node index (uint16) - 3 bytes: p_forward (4 s.f. mantissa + exponent) - 3 bytes: p_backward (4 s.f. mantissa + exponent) - 3 bytes: p_undirected (4 s.f. mantissa + exponent)

Probabilities are encoded with 4 significant figures using a mantissa (0-9999) and exponent format: value = mantissa × 10^exp. The p_none value is derived as 1.0 - (forward + backward + undirected).

Returns:

bytes –

Compact binary representation of the PDG.

Raises:

ValueError –

If graph has more than 65535 nodes or edge pairs.

decompress `classmethod` ¶

decompress(data: bytes) -> PDG

Decompress PDG from compact binary representation.

Parameters:

data ¶ (bytes) –

Binary data from PDG.compress().

Returns:

PDG –

Reconstructed PDG instance.

Raises:

TypeError –

If data is not bytes.
ValueError –

If data is invalid or corrupted.

Graph PDG Module¶

Overview¶

Classes¶

`EdgeProbabilities`¶

`PDG`¶

`GreedyDAGResult`¶

`PDG.to_dag_greedy(threshold=0.0)`¶

`PDG.compress()` / `PDG.decompress(data)`¶

Reference¶

Classes¶

GreedyDAGResult ¶

EdgeProbabilities `dataclass` ¶

Attributes¶

p_exist `property` ¶

p_directed `property` ¶

Functions¶

__post_init__ ¶

most_likely_state ¶

PDG ¶

`nodes` ¶

`edges` ¶

`nodes` ¶

`edges` ¶

Functions¶

get_probabilities ¶

set_probabilities ¶

node_pairs ¶

existing_edges ¶

len ¶

eq ¶

str ¶

repr ¶

to_dag_greedy ¶

compress ¶

decompress `classmethod` ¶

Graph PDG Module¶

Overview¶

Classes¶

EdgeProbabilities¶

PDG¶

GreedyDAGResult¶

PDG.to_dag_greedy(threshold=0.0)¶

PDG.compress() / PDG.decompress(data)¶

Reference¶

Classes¶

GreedyDAGResult ¶

EdgeProbabilities dataclass ¶

Attributes¶

p_exist property ¶

p_directed property ¶

Functions¶

__post_init__ ¶

most_likely_state ¶

PDG ¶

nodes ¶

edges ¶

nodes ¶

edges ¶

Functions¶

get_probabilities ¶

set_probabilities ¶

node_pairs ¶

existing_edges ¶

__len__ ¶

__eq__ ¶

__str__ ¶

__repr__ ¶

to_dag_greedy ¶

compress ¶

decompress classmethod ¶

`EdgeProbabilities`¶

`PDG`¶

`GreedyDAGResult`¶

`PDG.to_dag_greedy(threshold=0.0)`¶

`PDG.compress()` / `PDG.decompress(data)`¶

EdgeProbabilities `dataclass` ¶

p_exist `property` ¶

p_directed `property` ¶

`nodes` ¶

`edges` ¶

`nodes` ¶

`edges` ¶

len ¶

eq ¶

str ¶

repr ¶

decompress `classmethod` ¶