Skip to content

Graph Module Overview

The causaliq_core.graph module provides graph-related classes and utilities for representing different types of graphs used in causal discovery algorithms, including directed acyclic graphs (DAGs), partially directed acyclic graphs (PDAGs), and summary dependence graphs (SDGs).

Core Components

SDG - Summary Dependence Graph

Base graph class supporting mixed edge types:

  • Directed, undirected, and bidirected edges
  • General graph operations and validation
  • Foundation for specialized graph types

PDAG - Partially Directed Acyclic Graph

Specialized graph for causal discovery:

  • Directed and undirected edges (no bidirected)
  • Represents uncertainty in edge orientation
  • Used in constraint-based causal discovery

DAG - Directed Acyclic Graph

Fully oriented causal structures:

  • Only directed edges
  • Represents definite causal relationships
  • Topological ordering and string representation

PDG - Probabilistic Dependency Graph

Probability distributions over graph structures:

  • Stores probabilities for each edge state (forward, backward, undirected, none)
  • Used for graph averaging and uncertainty representation
  • Independent of SDG hierarchy (represents distribution over graphs)

Graph Conversion Functions

Transform between graph representations:

  • dag_to_pdag() - DAG to equivalence class PDAG
  • pdag_to_cpdag() - Complete a PDAG to CPDAG form
  • extend_pdag() - Extend PDAG to consistent DAG
  • is_cpdag() - Check if PDAG is completed
  • dict_to_adjmat() - Convert dictionary to adjacency matrix DataFrame

I/O Functions

Common I/O Functions

Unified interface for reading and writing graphs:

  • read() - Automatically detects format and reads graphs
  • write() - Automatically detects format and writes graphs
  • Supports .csv (Bayesys) and .tetrad (Tetrad) formats
  • Available directly from causaliq_core.graph for convenience

Bayesys Format I/O

CSV-based graph file format:

  • read() - Read graphs from Bayesys CSV files
  • write() - Write graphs to Bayesys CSV format

Tetrad Format I/O

Native Tetrad software format:

  • read() - Read graphs from Tetrad format files
  • write() - Write graphs to Tetrad format
  • Supports both DAGs and PDAGs

GraphML Format I/O

XML-based graph format:

  • read() - Read SDG/PDAG/DAG from GraphML
  • write() - Write graphs to GraphML format
  • read_pdg() / write_pdg() - PDG serialisation
  • Supports file paths and file-like objects

Constants

BAYESYS_VERSIONS

List of supported BayeSys versions for graph comparison semantics.

Value: ['v1.3', 'v1.5+']

Usage:

from causaliq_core.graph import BAYESYS_VERSIONS

# Check version compatibility
if version in BAYESYS_VERSIONS:
    print(f"Version {version} is supported")

Functions

adjmat(columns)

Create an adjacency matrix with specified entries.

Parameters:

  • columns (dict): Data for matrix specified by column, where each key is a column name and each value is a list of integers representing edge types

Returns:

  • DataFrame: The adjacency matrix with proper indexing

Raises:

  • TypeError: If argument types are incorrect
  • ValueError: If values specified are invalid (wrong lengths or invalid edge codes)

Usage:

from causaliq_core.graph import adjmat, EdgeType

# Create a simple adjacency matrix
columns = {
    'A': [0, 1, 0],  # No edge, directed edge, no edge
    'B': [0, 0, 1],  # No edge, no edge, directed edge  
    'C': [0, 0, 0]   # No edge, no edge, no edge
}
adj_matrix = adjmat(columns)

Classes

EdgeMark

Enumeration of supported 'ends' of an edge in a graph.

Values:

  • NONE = 0: No marking on edge end
  • LINE = 1: Line marking (e.g., for undirected edges)
  • ARROW = 2: Arrow marking (e.g., for directed edges)
  • CIRCLE = 3: Circle marking (e.g., for partial direction)

Usage:

from causaliq_core.graph import EdgeMark

# Check edge marking
if edge_end == EdgeMark.ARROW:
    print("This end is directed")

EdgeType

Enumeration of supported edge types and their symbols, combining start and end markings.

Structure:

Each edge type is defined as a tuple containing: - (value, start_mark, end_mark, symbol)

Usage:

from causaliq_core.graph import EdgeType, EdgeMark

# Access edge type components
edge = EdgeType.DIRECTED
print(f"Symbol: {edge.symbol}")
print(f"Start: {edge.start_mark}")
print(f"End: {edge.end_mark}")

Reference

Graph-related enums and utilities for CausalIQ Core.

Modules:

  • convert
  • dag
  • enums

    Graph-related enumerations for CausalIQ Core.

  • io

    Graph I/O module for reading and writing various graph file formats.

  • pdag
  • pdg

    Probabilistic Dependency Graph (PDG)

  • sdg

    Simple Dependency Graph (SDG)

Classes:

  • NotDAGError

    Indicate graph is not a DAG when one is expected.

  • EdgeMark

    Supported 'ends' of an edge in a graph.

  • EdgeType

    Supported edge types and their symbols.

  • NotPDAGError

    Indicate graph is not a PDAG when one is expected.

  • PDG

    Probabilistic Dependency Graph - distribution over SDG structures.

  • EdgeProbabilities

    Probability distribution over edge states between two nodes.

Classes

NotDAGError

Indicate graph is not a DAG when one is expected.

EdgeMark

Supported 'ends' of an edge in a graph.

EdgeType

Supported edge types and their symbols.

NotPDAGError

Indicate graph is not a PDAG when one is expected.

PDG

PDG(nodes: List[str], edges: Optional[Dict[Tuple[str, str], EdgeProbabilities]] = None)

Probabilistic Dependency Graph - distribution over SDG structures.

Represents uncertainty over causal graph structure by storing probability distributions for each possible edge between node pairs. Unlike SDG, PDAG, and DAG which represent single deterministic graphs, PDG captures structural uncertainty.

PDG is not a subclass of SDG because it represents a fundamentally different concept: a distribution over graphs rather than a single graph.

Parameters:

  • nodes
    (List[str]) –

    List of node names in the graph.

  • edges
    (Optional[Dict[Tuple[str, str], EdgeProbabilities]], default: None ) –

    Dictionary mapping (source, target) pairs to EdgeProbabilities. Node pairs should be in canonical order (source < target alphabetically).

Attributes:

  • nodes (List[str]) –

    Graph nodes in alphabetical order.

  • edges (Dict[Tuple[str, str], EdgeProbabilities]) –

    Edge probabilities {(source, target): EdgeProbabilities}.

Raises:

  • TypeError

    If nodes or edges have invalid types.

  • ValueError

    If edge keys are not in canonical order or reference unknown nodes.

Example

from causaliq_core.graph.pdg import PDG, EdgeProbabilities nodes = ["A", "B", "C"] edges = { ... ("A", "B"): EdgeProbabilities(forward=0.8, none=0.2), ... ("A", "C"): EdgeProbabilities(forward=0.6, backward=0.3, ... none=0.1), ... } pdg = PDG(nodes, edges) pdg.get_probabilities("A", "B").forward 0.8

Parameters:

  • nodes
    (List[str]) –

    List of node names.

  • edges
    (Optional[Dict[Tuple[str, str], EdgeProbabilities]], default: None ) –

    Optional dictionary of edge probabilities. Keys must be tuples (source, target) where source < target alphabetically.

Methods:

  • get_probabilities

    Get edge probabilities between two nodes.

  • set_probabilities

    Set edge probabilities between two nodes.

  • node_pairs

    Iterate over all possible node pairs in canonical order.

  • existing_edges

    Iterate over node pairs with non-zero edge probability.

  • __len__

    Return number of node pairs with explicit probabilities.

  • __eq__

    Check equality with another PDG.

  • __str__

    Return human-readable description of the PDG.

  • __repr__

    Return detailed representation of the PDG.

  • compress

    Compress PDG to compact binary representation.

  • decompress

    Decompress PDG from compact binary representation.

Functions
get_probabilities
get_probabilities(node_a: str, node_b: str) -> EdgeProbabilities

Get edge probabilities between two nodes.

Handles node ordering automatically - returns probabilities with forward/backward relative to alphabetical ordering.

Parameters:

  • node_a (str) –

    First node name.

  • node_b (str) –

    Second node name.

Returns:

Raises:

  • ValueError

    If either node is not in the graph.

set_probabilities
set_probabilities(node_a: str, node_b: str, probs: EdgeProbabilities) -> None

Set edge probabilities between two nodes.

Handles node ordering automatically.

Parameters:

  • node_a (str) –

    First node name.

  • node_b (str) –

    Second node name.

  • probs (EdgeProbabilities) –

    Edge probabilities to set.

Raises:

  • ValueError

    If either node is not in the graph.

  • TypeError

    If probs is not EdgeProbabilities.

node_pairs
node_pairs() -> Iterator[Tuple[str, str]]

Iterate over all possible node pairs in canonical order.

Yields:

  • Tuple[str, str]

    Tuples (source, target) where source < target alphabetically.

existing_edges
existing_edges() -> Iterator[Tuple[str, str, EdgeProbabilities]]

Iterate over node pairs with non-zero edge probability.

Yields:

  • Tuple[str, str, EdgeProbabilities]

    Tuples (source, target, probs) where p_exist > 0.

__len__
__len__() -> int

Return number of node pairs with explicit probabilities.

__eq__
__eq__(other: object) -> bool

Check equality with another PDG.

__str__
__str__() -> str

Return human-readable description of the PDG.

__repr__
__repr__() -> str

Return detailed representation of the PDG.

compress
compress() -> bytes

Compress PDG to compact binary representation.

Format: - 2 bytes: number of nodes (uint16, big-endian) - For each node: 2 bytes name length + UTF-8 encoded name - 2 bytes: number of edge pairs with probabilities (uint16) - For each edge pair: - 2 bytes: source node index (uint16) - 2 bytes: target node index (uint16) - 3 bytes: p_forward (4 s.f. mantissa + exponent) - 3 bytes: p_backward (4 s.f. mantissa + exponent) - 3 bytes: p_undirected (4 s.f. mantissa + exponent)

Probabilities are encoded with 4 significant figures using a mantissa (0-9999) and exponent format: value = mantissa × 10^exp. The p_none value is derived as 1.0 - (forward + backward + undirected).

Returns:

  • bytes

    Compact binary representation of the PDG.

Raises:

  • ValueError

    If graph has more than 65535 nodes or edge pairs.

decompress classmethod
decompress(data: bytes) -> PDG

Decompress PDG from compact binary representation.

Parameters:

  • data (bytes) –

    Binary data from PDG.compress().

Returns:

  • PDG

    Reconstructed PDG instance.

Raises:

  • TypeError

    If data is not bytes.

  • ValueError

    If data is invalid or corrupted.

EdgeProbabilities dataclass

EdgeProbabilities(
    forward: float = 0.0,
    backward: float = 0.0,
    undirected: float = 0.0,
    none: float = 1.0,
)

Probability distribution over edge states between two nodes.

Stores probabilities for each possible edge state. The edge is stored with source node alphabetically before target node (canonical form).

Attributes:

  • forward (float) –

    P(source -> target) directed edge in stored direction.

  • backward (float) –

    P(target -> source) directed edge opposite to stored.

  • undirected (float) –

    P(source -- target) undirected edge.

  • none (float) –

    P(no edge between source and target).

Raises:

  • ValueError

    If probabilities do not sum to 1.0 (within tolerance).

Example

probs = EdgeProbabilities( ... forward=0.6, backward=0.2, undirected=0.1, none=0.1 ... ) probs.p_exist 0.9 probs.p_directed 0.8

Methods:

Attributes
p_exist property
p_exist: float

Probability that any edge exists between the nodes.

Returns:

  • float

    Sum of forward, backward, and undirected probabilities.

p_directed property
p_directed: float

Probability of a directed edge (either direction).

Returns:

  • float

    Sum of forward and backward probabilities.

Functions
__post_init__
__post_init__() -> None

Validate probabilities sum to 1.0.

most_likely_state
most_likely_state() -> str

Return the most likely edge state.

Returns:

  • str

    One of "forward", "backward", "undirected", or "none".

Implementation Notes

These classes and functions provide a standardised way to represent and manipulate different graph types commonly used in causal discovery algorithms. The hierarchy (SDG → PDAG → DAG) reflects increasing constraints on edge types and graph structure.