Graph Module Overview¶
The causaliq_core.graph module provides graph-related classes and utilities for representing different types of graphs used in causal discovery algorithms, including directed acyclic graphs (DAGs), partially directed acyclic graphs (PDAGs), and summary dependence graphs (SDGs).
Core Components¶
SDG - Summary Dependence Graph¶
Base graph class supporting mixed edge types:
- Directed, undirected, and bidirected edges
- General graph operations and validation
- Foundation for specialized graph types
PDAG - Partially Directed Acyclic Graph¶
Specialized graph for causal discovery:
- Directed and undirected edges (no bidirected)
- Represents uncertainty in edge orientation
- Used in constraint-based causal discovery
DAG - Directed Acyclic Graph¶
Fully oriented causal structures:
- Only directed edges
- Represents definite causal relationships
- Topological ordering and string representation
PDG - Probabilistic Dependency Graph¶
Probability distributions over graph structures:
- Stores probabilities for each edge state (forward, backward, undirected, none)
- Used for graph averaging and uncertainty representation
- Independent of SDG hierarchy (represents distribution over graphs)
Graph Conversion Functions¶
Transform between graph representations:
dag_to_pdag()- DAG to equivalence class PDAGpdag_to_cpdag()- Complete a PDAG to CPDAG formextend_pdag()- Extend PDAG to consistent DAGis_cpdag()- Check if PDAG is completeddict_to_adjmat()- Convert dictionary to adjacency matrix DataFrame
I/O Functions¶
Common I/O Functions¶
Unified interface for reading and writing graphs:
read()- Automatically detects format and reads graphswrite()- Automatically detects format and writes graphs- Supports
.csv(Bayesys) and.tetrad(Tetrad) formats - Available directly from
causaliq_core.graphfor convenience
Bayesys Format I/O¶
CSV-based graph file format:
read()- Read graphs from Bayesys CSV fileswrite()- Write graphs to Bayesys CSV format
Tetrad Format I/O¶
Native Tetrad software format:
read()- Read graphs from Tetrad format fileswrite()- Write graphs to Tetrad format- Supports both DAGs and PDAGs
GraphML Format I/O¶
XML-based graph format:
read()- Read SDG/PDAG/DAG from GraphMLwrite()- Write graphs to GraphML formatread_pdg()/write_pdg()- PDG serialisation- Supports file paths and file-like objects
Constants¶
BAYESYS_VERSIONS¶
List of supported BayeSys versions for graph comparison semantics.
Value: ['v1.3', 'v1.5+']
Usage:
from causaliq_core.graph import BAYESYS_VERSIONS
# Check version compatibility
if version in BAYESYS_VERSIONS:
print(f"Version {version} is supported")
Functions¶
adjmat(columns)¶
Create an adjacency matrix with specified entries.
Parameters:
columns(dict): Data for matrix specified by column, where each key is a column name and each value is a list of integers representing edge types
Returns:
DataFrame: The adjacency matrix with proper indexing
Raises:
TypeError: If argument types are incorrectValueError: If values specified are invalid (wrong lengths or invalid edge codes)
Usage:
from causaliq_core.graph import adjmat, EdgeType
# Create a simple adjacency matrix
columns = {
'A': [0, 1, 0], # No edge, directed edge, no edge
'B': [0, 0, 1], # No edge, no edge, directed edge
'C': [0, 0, 0] # No edge, no edge, no edge
}
adj_matrix = adjmat(columns)
Classes¶
EdgeMark¶
Enumeration of supported 'ends' of an edge in a graph.
Values:
NONE = 0: No marking on edge endLINE = 1: Line marking (e.g., for undirected edges)ARROW = 2: Arrow marking (e.g., for directed edges)CIRCLE = 3: Circle marking (e.g., for partial direction)
Usage:
from causaliq_core.graph import EdgeMark
# Check edge marking
if edge_end == EdgeMark.ARROW:
print("This end is directed")
EdgeType¶
Enumeration of supported edge types and their symbols, combining start and end markings.
Structure:
Each edge type is defined as a tuple containing:
- (value, start_mark, end_mark, symbol)
Usage:
from causaliq_core.graph import EdgeType, EdgeMark
# Access edge type components
edge = EdgeType.DIRECTED
print(f"Symbol: {edge.symbol}")
print(f"Start: {edge.start_mark}")
print(f"End: {edge.end_mark}")
Reference¶
Graph-related enums and utilities for CausalIQ Core.
Modules:
-
convert– -
dag– -
enums–Graph-related enumerations for CausalIQ Core.
-
io–Graph I/O module for reading and writing various graph file formats.
-
pdag– -
pdg–Probabilistic Dependency Graph (PDG)
-
sdg–Simple Dependency Graph (SDG)
Classes:
-
NotDAGError–Indicate graph is not a DAG when one is expected.
-
EdgeMark–Supported 'ends' of an edge in a graph.
-
EdgeType–Supported edge types and their symbols.
-
NotPDAGError–Indicate graph is not a PDAG when one is expected.
-
PDG–Probabilistic Dependency Graph - distribution over SDG structures.
-
EdgeProbabilities–Probability distribution over edge states between two nodes.
Classes¶
NotDAGError
¶
Indicate graph is not a DAG when one is expected.
EdgeMark
¶
Supported 'ends' of an edge in a graph.
EdgeType
¶
Supported edge types and their symbols.
NotPDAGError
¶
Indicate graph is not a PDAG when one is expected.
PDG
¶
PDG(nodes: List[str], edges: Optional[Dict[Tuple[str, str], EdgeProbabilities]] = None)
Probabilistic Dependency Graph - distribution over SDG structures.
Represents uncertainty over causal graph structure by storing probability distributions for each possible edge between node pairs. Unlike SDG, PDAG, and DAG which represent single deterministic graphs, PDG captures structural uncertainty.
PDG is not a subclass of SDG because it represents a fundamentally different concept: a distribution over graphs rather than a single graph.
Parameters:
-
(nodes¶List[str]) –List of node names in the graph.
-
(edges¶Optional[Dict[Tuple[str, str], EdgeProbabilities]], default:None) –Dictionary mapping (source, target) pairs to EdgeProbabilities. Node pairs should be in canonical order (source < target alphabetically).
Attributes:
-
nodes(List[str]) –Graph nodes in alphabetical order.
-
edges(Dict[Tuple[str, str], EdgeProbabilities]) –Edge probabilities {(source, target): EdgeProbabilities}.
Raises:
-
TypeError–If nodes or edges have invalid types.
-
ValueError–If edge keys are not in canonical order or reference unknown nodes.
Example
from causaliq_core.graph.pdg import PDG, EdgeProbabilities nodes = ["A", "B", "C"] edges = { ... ("A", "B"): EdgeProbabilities(forward=0.8, none=0.2), ... ("A", "C"): EdgeProbabilities(forward=0.6, backward=0.3, ... none=0.1), ... } pdg = PDG(nodes, edges) pdg.get_probabilities("A", "B").forward 0.8
Parameters:
-
(nodes¶List[str]) –List of node names.
-
(edges¶Optional[Dict[Tuple[str, str], EdgeProbabilities]], default:None) –Optional dictionary of edge probabilities. Keys must be tuples (source, target) where source < target alphabetically.
Methods:
-
get_probabilities–Get edge probabilities between two nodes.
-
set_probabilities–Set edge probabilities between two nodes.
-
node_pairs–Iterate over all possible node pairs in canonical order.
-
existing_edges–Iterate over node pairs with non-zero edge probability.
-
__len__–Return number of node pairs with explicit probabilities.
-
__eq__–Check equality with another PDG.
-
__str__–Return human-readable description of the PDG.
-
__repr__–Return detailed representation of the PDG.
-
compress–Compress PDG to compact binary representation.
-
decompress–Decompress PDG from compact binary representation.
Functions¶
get_probabilities
¶
get_probabilities(node_a: str, node_b: str) -> EdgeProbabilities
Get edge probabilities between two nodes.
Handles node ordering automatically - returns probabilities with forward/backward relative to alphabetical ordering.
Parameters:
Returns:
-
EdgeProbabilities–EdgeProbabilities for the node pair. If no explicit probabilities
-
EdgeProbabilities–stored, returns EdgeProbabilities(none=1.0).
Raises:
-
ValueError–If either node is not in the graph.
set_probabilities
¶
set_probabilities(node_a: str, node_b: str, probs: EdgeProbabilities) -> None
Set edge probabilities between two nodes.
Handles node ordering automatically.
Parameters:
-
(node_a¶str) –First node name.
-
(node_b¶str) –Second node name.
-
(probs¶EdgeProbabilities) –Edge probabilities to set.
Raises:
-
ValueError–If either node is not in the graph.
-
TypeError–If probs is not EdgeProbabilities.
node_pairs
¶
Iterate over all possible node pairs in canonical order.
Yields:
-
Tuple[str, str]–Tuples (source, target) where source < target alphabetically.
existing_edges
¶
existing_edges() -> Iterator[Tuple[str, str, EdgeProbabilities]]
Iterate over node pairs with non-zero edge probability.
Yields:
-
Tuple[str, str, EdgeProbabilities]–Tuples (source, target, probs) where p_exist > 0.
compress
¶
Compress PDG to compact binary representation.
Format: - 2 bytes: number of nodes (uint16, big-endian) - For each node: 2 bytes name length + UTF-8 encoded name - 2 bytes: number of edge pairs with probabilities (uint16) - For each edge pair: - 2 bytes: source node index (uint16) - 2 bytes: target node index (uint16) - 3 bytes: p_forward (4 s.f. mantissa + exponent) - 3 bytes: p_backward (4 s.f. mantissa + exponent) - 3 bytes: p_undirected (4 s.f. mantissa + exponent)
Probabilities are encoded with 4 significant figures using a mantissa (0-9999) and exponent format: value = mantissa × 10^exp. The p_none value is derived as 1.0 - (forward + backward + undirected).
Returns:
-
bytes–Compact binary representation of the PDG.
Raises:
-
ValueError–If graph has more than 65535 nodes or edge pairs.
EdgeProbabilities
dataclass
¶
EdgeProbabilities(
forward: float = 0.0,
backward: float = 0.0,
undirected: float = 0.0,
none: float = 1.0,
)
Probability distribution over edge states between two nodes.
Stores probabilities for each possible edge state. The edge is stored with source node alphabetically before target node (canonical form).
Attributes:
-
forward(float) –P(source -> target) directed edge in stored direction.
-
backward(float) –P(target -> source) directed edge opposite to stored.
-
undirected(float) –P(source -- target) undirected edge.
-
none(float) –P(no edge between source and target).
Raises:
-
ValueError–If probabilities do not sum to 1.0 (within tolerance).
Example
probs = EdgeProbabilities( ... forward=0.6, backward=0.2, undirected=0.1, none=0.1 ... ) probs.p_exist 0.9 probs.p_directed 0.8
Methods:
-
__post_init__–Validate probabilities sum to 1.0.
-
most_likely_state–Return the most likely edge state.
Attributes¶
p_exist
property
¶
Probability that any edge exists between the nodes.
Returns:
-
float–Sum of forward, backward, and undirected probabilities.
p_directed
property
¶
Probability of a directed edge (either direction).
Returns:
-
float–Sum of forward and backward probabilities.
Functions¶
most_likely_state
¶
Return the most likely edge state.
Returns:
-
str–One of "forward", "backward", "undirected", or "none".
Implementation Notes¶
These classes and functions provide a standardised way to represent and manipulate different graph types commonly used in causal discovery algorithms. The hierarchy (SDG → PDAG → DAG) reflects increasing constraints on edge types and graph structure.