Skip to content

Bayesian Networks Module

The Bayesian Networks module provides comprehensive functionality for working with probabilistic graphical models. It includes classes for representing Bayesian Networks, conditional node distributions, and various I/O formats.

Overview

The causaliq_core.bn module consists of several key components:

  • Core Classes: Main BN and BNFit classes for network representation
  • Distributions: Conditional node distribution implementations
  • I/O Operations: Reading and writing BNs in various formats

Main Classes

BN

The main Bayesian Network class that combines a DAG structure with conditional probability distributions.

BNFit

A fitted Bayesian Network with learned parameters from data.

Distribution Types

CPT (Conditional Probability Table)

Discrete conditional probability distributions for categorical variables.

LinGauss (Linear Gaussian)

Continuous conditional distributions for normally distributed variables.

I/O Formats

DSC Format

Reading and writing Bayesian Networks in DSC format.

XDSL Format

Reading and writing Bayesian Networks in GeNIe XDSL format.

Key Features

  • Probabilistic Inference: Compute marginal and conditional probabilities
  • Parameter Learning: Fit network parameters from data
  • Multiple Formats: Support for DSC and XDSL file formats
  • Flexible Distributions: Both discrete (CPT) and continuous (LinGauss) distributions
  • Graph Integration: Built on the causaliq_core.graph DAG structure

Example Usage

from causaliq_core.bn import BN, CPT
from causaliq_core.graph import DAG

# Create a simple DAG
dag = DAG(['A', 'B'], [('A', 'B')])

# Define conditional distributions
cnd_specs = {
    'A': CPT(values=['T', 'F'], table=[0.3, 0.7]),
    'B': CPT(values=['T', 'F'], table=[0.9, 0.1, 0.2, 0.8], parents=['A'])
}

# Create Bayesian Network
bn = BN(dag, cnd_specs)

# Compute marginals
marginals = bn.marginals(['A', 'B'])

Module Structure

bn

Bayesian Networks module for CausalIQ Core.

This module provides classes and utilities for working with Bayesian Networks, including conditional node distributions and their implementations.

Modules:

  • bn
  • bnfit
  • dist

    Distribution classes for Bayesian Network nodes.

  • io

    I/O module for Bayesian Network file formats.

Classes:

  • BN

    Base class for Bayesian Networks.

  • BNFit

    Interface for Bayesian Network parameter estimation and data access.

  • CPT

    Base class for conditional probability tables.

  • LinGauss

    Conditional Linear Gaussian Distribution.

  • NodeValueCombinations

    Iterable over all combinations of node values

BN

BN(dag: DAG, cnd_specs: Dict[str, Any], estimated_pmfs: Dict[str, Any] = {})

Base class for Bayesian Networks.

Bayesian Networks have a DAG and an associated probability distribution defined by CPTs.

Parameters:

  • dag
    (DAG) –

    DAG for the Bayesian Network.

  • cnd_specs
    (Dict[str, Any]) –

    Specification of each conditional node distribution.

  • estimated_pmfs
    (Dict[str, Any], default: {} ) –

    Number of PMFs that had to be estimated for each node.

Attributes:

  • dag

    BN's DAG.

  • cnds

    Conditional distributions for each node {node: CND}.

  • free_params

    Total number of free parameters in BN.

  • estimated_pmfs

    Number of estimated pmfs for each node.

Raises:

  • TypeError

    If arguments have invalid types.

  • ValueError

    If arguments have invalid values.

Methods:

  • __eq__

    Compare another BN with this one.

  • fit

    Alternative instantiation of BN using data to implicitly define the

  • generate_cases

    Generate specified number of random data cases for this BN.

  • global_distribution

    Generate the global probability distribution for the BN.

  • lnprob_case

    Return log of probability of set of node values (case) occuring.

  • marginal_distribution

    Generate a marginal probability distribution for a specified node

  • marginals

    Return marginal distribution for specified nodes.

  • rename

    Rename nodes in place according to name map.

__eq__
__eq__(other: object) -> bool

Compare another BN with this one.

Parameters:

  • other
    (object) –

    The other BN to compare with this one.

Returns:

  • bool

    True, if other BN is same as this one.

fit classmethod
fit(dag: DAG, data: BNFit) -> BN

Alternative instantiation of BN using data to implicitly define the conditional probability data.

Parameters:

  • dag
    (DAG) –

    DAG for the Bayesian Network.

  • data
    (BNFit) –

    Data to fit CPTs to.

Returns:

  • BN

    A new BN instance fitted to the data.

Raises:

  • TypeError

    If arguments have invalid types.

  • ValueError

    If arguments have invalid values.

generate_cases
generate_cases(
    n: int, outfile: Optional[str] = None, pseudo: bool = True
) -> DataFrame

Generate specified number of random data cases for this BN.

Parameters:

  • n
    (int) –

    Number of cases to generate.

  • outfile
    (Optional[str], default: None ) –

    Name of file to write instance to.

  • pseudo
    (bool, default: True ) –

    If pseudo-random (i.e. repeatable cases) to be produced, otherwise truly random.

Returns:

  • DataFrame

    Random data cases.

Raises:

  • TypeError

    If arguments not of correct type.

  • ValueError

    If invalid number of rows requested.

  • FileNotFoundError

    If outfile in nonexistent folder.

global_distribution
global_distribution() -> DataFrame

Generate the global probability distribution for the BN.

Returns:

  • DataFrame

    Global distribution in descending probability (and then by

  • DataFrame

    ascending values).

lnprob_case
lnprob_case(case_values: Dict[str, Any], base: Union[int, str] = 10) -> Optional[float]

Return log of probability of set of node values (case) occuring.

Parameters:

  • case_values
    (Dict[str, Any]) –

    Value for each node {node: value}.

  • base
    (Union[int, str], default: 10 ) –

    Logarithm base to use - 2, 10 or 'e'.

Returns:

  • Optional[float]

    Log of probability of case occuring, or None if case has zero

  • Optional[float]

    probability.

Raises:

  • TypeError

    If arguments wrong type.

  • ValueError

    If arguments have invalid values.

marginal_distribution
marginal_distribution(node: str, parents: Optional[List[str]] = None) -> DataFrame

Generate a marginal probability distribution for a specified node and its parents in same format returned by Panda crosstab function.

Parameters:

  • node
    (str) –

    Node for which distribution required.

  • parents
    (Optional[List[str]], default: None ) –

    Parents of node.

Returns:

  • DataFrame

    Marginal distribution with parental value combos as columns,

  • DataFrame

    and node values as rows.

marginals
marginals(nodes: List[str]) -> DataFrame

Return marginal distribution for specified nodes.

Parameters:

  • nodes
    (List[str]) –

    Nodes for which marginal distribution required.

Returns:

  • DataFrame

    Marginal distribution in same format returned by Pandas

  • DataFrame

    crosstab function.

Raises:

  • TypeError

    If arguments have bad type.

  • ValueError

    If arguments contain bad values.

rename
rename(name_map: Dict[str, str]) -> None

Rename nodes in place according to name map.

Parameters:

  • name_map
    (Dict[str, str]) –

    Name mapping {name: new name}.

Raises:

  • TypeError

    With bad arg type.

  • ValueError

    With bad arg values e.g. unknown node names.

BNFit

Interface for Bayesian Network parameter estimation and data access.

This interface provides the essential methods required for fitting conditional probability tables (CPT) and linear Gaussian models in Bayesian Networks, as well as data access methods for the BN class.

Implementing classes should provide: - A constructor that accepts df=DataFrame parameter for BN compatibility - All abstract methods defined below - Properties for data access (.nodes, .sample, .node_types)

Methods:

  • marginals

    Return marginal counts for a node and its parents.

  • values

    Return the (float) values for specified nodes.

  • write

    Write data to file.

Attributes:

  • N (int) –

    Total sample size.

  • node_types (Dict[str, str]) –

    Node type mapping for each variable.

  • node_values (Dict[str, Dict]) –

    Node value counts for categorical variables.

  • nodes (Tuple[str, ...]) –

    Column names in the dataset.

  • sample (Any) –

    Access to underlying data sample.

N abstractmethod property writable
N: int

Total sample size.

Returns:

  • int

    Current sample size being used.

node_types abstractmethod property
node_types: Dict[str, str]

Node type mapping for each variable.

Returns:

  • Dict[str, str]

    Dictionary mapping node names to their types.

  • Format ( Dict[str, str] ) –

    {node: 'category' | 'continuous'}

node_values abstractmethod property writable
node_values: Dict[str, Dict]

Node value counts for categorical variables.

Returns:

  • Dict[str, Dict]

    Values and their counts of categorical nodes in sample.

  • Format ( Dict[str, Dict] ) –

    {node1: {val1: count1, val2: count2, ...}, ...}

nodes abstractmethod property
nodes: Tuple[str, ...]

Column names in the dataset.

Returns:

  • Tuple[str, ...]

    Tuple of node names (column names) in the dataset.

sample abstractmethod property
sample: Any

Access to underlying data sample.

Returns:

  • Any

    The underlying DataFrame or data structure for direct access.

  • Any

    Used for operations like .unique() on columns.

marginals abstractmethod
marginals(node: str, parents: Dict, values_reqd: bool = False) -> Tuple

Return marginal counts for a node and its parents.

Parameters:

  • node
    (str) –

    Node for which marginals required.

  • parents
    (Dict) –

    Dictionary {node: parents} for non-orphan nodes.

  • values_reqd
    (bool, default: False ) –

    Whether parent and child values required.

Returns:

  • Tuple

    Tuple of counts, and optionally, values:

  • Tuple
    • ndarray counts: 2D array, rows=child, cols=parents
  • Tuple
    • int maxcol: Maximum number of parental values
  • Tuple
    • tuple rowval: Child values for each row
  • Tuple
    • tuple colval: Parent combo (dict) for each col

Raises:

  • TypeError

    For bad argument types.

values abstractmethod
values(nodes: Tuple[str, ...]) -> ndarray

Return the (float) values for specified nodes.

Suitable for passing into e.g. linear regression fitting.

Parameters:

  • nodes
    (Tuple[str, ...]) –

    Nodes for which data required.

Returns:

  • ndarray

    Numpy array of values, each column for a node.

Raises:

  • TypeError

    If bad argument type.

  • ValueError

    If bad argument value.

write abstractmethod
write(filename: str) -> None

Write data to file.

Parameters:

  • filename
    (str) –

    Path to output file.

Raises:

  • TypeError

    If filename is not a string.

  • FileNotFoundError

    If output directory doesn't exist.

CPT

CPT(
    pmfs: Union[Dict[str, float], List[Tuple[Dict[str, str], Dict[str, float]]]],
    estimated: int = 0,
)

Base class for conditional probability tables.

Parameters:

  • pmfs
    (Union[Dict[str, float], List[Tuple[Dict[str, str], Dict[str, float]]]]) –

    A pmf of {value: prob} for parentless nodes OR list of tuples ({parent: value}, {value: prob}).

  • estimated
    (int, default: 0 ) –

    How many PMFs were estimated.

Attributes:

  • cpt

    Internal representation of the CPT. {node_values: prob} for parentless node, otherwise {parental_values as frozenset: {node_values: prob}}.

  • estimated

    Number of PMFs that were estimated.

  • values

    Values which node can take.

Raises:

  • TypeError

    If arguments are of wrong type.

  • ValueError

    If arguments have invalid or conflicting values.

Methods:

  • __eq__

    Return whether two CPTs are the same allowing for probability

  • __str__

    Human-friendly description of the contents of the CPT.

  • cdist

    Return conditional probabilities of node values for specified

  • fit

    Constructs a CPT (Conditional Probability Table) from data.

  • node_values

    Return node values (states) of node CPT relates to.

  • param_ratios

    Returns distribution of parameter ratios across all parental

  • parents

    Return parents of node CPT relates to.

  • random_value

    Generate a random value for a node given the value of its parents.

  • to_spec

    Returns external specification format of CPT,

  • validate_cnds

    Checks that all CNDs in graph are consistent with one another

  • validate_parents

    Checks every CPT's parents and parental values are consistent

__eq__
__eq__(other: object) -> bool

Return whether two CPTs are the same allowing for probability rounding errors

:param other: CPT to compared to self :type other: CPT

:returns: whether CPTs are PRACTICALLY the same :rtype: bool

__str__
__str__() -> str

Human-friendly description of the contents of the CPT.

Returns:

  • str

    String representation of the CPT contents.

cdist
cdist(parental_values: Optional[Dict[str, str]] = None) -> Dict[str, float]

Return conditional probabilities of node values for specified parental values.

Parameters:

  • parental_values
    (Optional[Dict[str, str]], default: None ) –

    Parental values for which pmf required

Raises:

  • TypeError

    If args are of wrong type.

  • ValueError

    If args have invalid or conflicting values.

fit classmethod
fit(
    node: str,
    parents: Optional[Tuple[str, ...]],
    data: Union[BNFit, Any],
    autocomplete: bool = True,
) -> Tuple[Tuple[type, Dict[str, Any]], Optional[int]]

Constructs a CPT (Conditional Probability Table) from data.

Parameters:

  • node
    (str) –

    Node that CPT applies to.

  • parents
    (Optional[Tuple[str, ...]]) –

    Parents of node.

  • data
    (Union[BNFit, Any]) –

    Data to fit CPT to.

  • autocomplete
    (bool, default: True ) –

    Whether to ensure CPT data contains entries for

Returns:

  • Tuple[type, Dict[str, Any]]

    Tuple of (cnd_spec, estimated_pmfs) where

  • Optional[int]

    cnd_spec is (CPT class, cpt_spec for CPT())

  • Tuple[Tuple[type, Dict[str, Any]], Optional[int]]

    estimated_pmfs is int, # estimated pmfs.

node_values
node_values() -> List[str]

Return node values (states) of node CPT relates to.

Returns:

  • List[str]

    Node values in alphabetical order.

param_ratios
param_ratios() -> None

Returns distribution of parameter ratios across all parental values for each combination of possible node values.

:returns dict: {(node value pair): (param ratios across parents)

parents
parents() -> List[str]

Return parents of node CPT relates to.

Returns:

  • List[str]

    Parent node names in alphabetical order.

random_value
random_value(pvs: Optional[Dict[str, str]]) -> str

Generate a random value for a node given the value of its parents.

Parameters:

  • pvs
    (Optional[Dict[str, str]]) –

    Parental values, {parent1: value1, ...}.

Returns:

  • str

    Random value for node.

to_spec
to_spec(name_map: Dict[str, str]) -> Dict[str, Any]

Returns external specification format of CPT, renaming nodes according to a name map.

Parameters:

  • name_map
    (Dict[str, str]) –

    Map of node names {old: new}.

Returns:

  • Dict[str, Any]

    CPT specification with renamed nodes.

Raises:

  • TypeError

    If bad arg type.

  • ValueError

    If bad arg value, e.g. coeff keys not in map.

validate_cnds classmethod
validate_cnds(
    nodes: List[str], cnds: Dict[str, CND], parents: Dict[str, List[str]]
) -> None

Checks that all CNDs in graph are consistent with one another and with graph structure.

Parameters:

  • nodes
    (list) –

    BN nodes.

  • cnds
    (dict) –

    Set of CNDs for the BN, {node: cnd}.

  • parents
    (dict) –

    Parents of non-orphan nodes, {node: parents}.

Raises:

  • TypeError

    If invalid types used in arguments.

  • ValueError

    If any inconsistent values found.

validate_parents
validate_parents(
    node: str, parents: Dict[str, List[str]], node_values: Dict[str, List[str]]
) -> None

Checks every CPT's parents and parental values are consistent with other relevant CPTs and the DAG structure.

Parameters:

  • node
    (str) –

    Name of node.

  • parents
    (Dict[str, List[str]]) –

    Parents of all nodes {node: parents}.

  • node_values
    (Dict[str, List[str]]) –

    Values of each cat. node {node: values}.

Raises:

  • ValueError

    If parent mismatch or missing parental

LinGauss

LinGauss(lg: Dict[str, Any])

Conditional Linear Gaussian Distribution.

Parameters:

  • lg
    (Dict[str, Any]) –

    Specification of Linear Gaussian in following form: {'coeffs': {node: coeff}, 'mean': mean, 'sd': sd}.

Attributes:

  • coeffs

    Linear coefficient of parents {parent: coeff}.

  • mean

    Mean of Gaussian noise (aka intercept, mu).

  • sd

    S.D. of Gaussian noise (aka sigma).

Raises:

  • TypeError

    If called with bad arg types.

  • ValueError

    If called with bad arg values.

Methods:

  • __eq__

    Return whether two CNDs are the same allowing for probability

  • __str__

    Human-friendly formula description of the Linear Gaussian.

  • cdist

    Return conditional distribution for specified parental values.

  • fit

    Fit a Linear Gaussian to data.

  • parents

    Return parents of node CND relates to.

  • random_value

    Generate a random value for a node given the value of its parents.

  • to_spec

    Returns external specification format of LinGauss,

  • validate_parents

    Check LinGauss coeff keys consistent with parents in DAG.

__eq__
__eq__(other: object) -> bool

Return whether two CNDs are the same allowing for probability rounding errors

:param CND other: CND to compared to self

:returns bool: whether LinGauss objects are the same up to 10 sf

__str__
__str__() -> str

Human-friendly formula description of the Linear Gaussian.

Returns:

  • str

    String representation of the Linear Gaussian formula.

cdist
cdist(parental_values: Optional[Dict[str, float]] = None) -> Tuple[float, float]

Return conditional distribution for specified parental values.

Parameters:

  • parental_values
    (Optional[Dict[str, float]], default: None ) –

    Parental values for which dist. required

Returns:

  • Tuple[float, float]

    Tuple of (mean, sd) of child Gaussian distribution.

Raises:

  • TypeError

    If args are of wrong type.

  • ValueError

    If args have invalid or conflicting values.

fit classmethod
fit(
    node: str,
    parents: Optional[Tuple[str, ...]],
    data: Union[Pandas, BNFit],
    autocomplete: bool = True,
) -> Tuple[Tuple[type, Dict[str, Any]], Optional[int]]

Fit a Linear Gaussian to data.

Parameters:

  • node
    (str) –

    Node that Linear Gaussian applies to.

  • parents
    (Optional[Tuple[str, ...]]) –

    Parents of node.

  • data
    (Union[Pandas, BNFit]) –

    Data to fit Linear Gaussian to.

  • autocomplete
    (bool, default: True ) –

    Not used for Linear Gaussian.

Returns:

  • Tuple[Tuple[type, Dict[str, Any]], Optional[int]]

    Tuple of (lg_spec, None) where lg is (LinGauss class, lg_spec).

Raises:

  • TypeError

    With bad arg types.

  • ValueError

    With bad arg values.

parents
parents() -> List[str]

Return parents of node CND relates to.

Returns:

  • List[str]

    Parent node names in alphabetical order.

random_value
random_value(pvs: Optional[Dict[str, float]]) -> float

Generate a random value for a node given the value of its parents.

Parameters:

  • pvs
    (Optional[Dict[str, float]]) –

    Parental values, {parent1: value1, ...}.

Returns:

  • float

    Random value for node.

to_spec
to_spec(name_map: Dict[str, str]) -> Dict[str, Any]

Returns external specification format of LinGauss, renaming nodes according to a name map.

Parameters:

  • name_map
    (Dict[str, str]) –

    Map of node names {old: new}.

Returns:

  • Dict[str, Any]

    LinGauss specification with renamed nodes.

Raises:

  • TypeError

    If bad arg type.

  • ValueError

    If bad arg value, e.g. coeff keys not in map.

validate_parents
validate_parents(
    node: str, parents: Dict[str, List[str]], node_values: Dict[str, List[str]]
) -> None

Check LinGauss coeff keys consistent with parents in DAG.

:param str node: name of node :param dict parents: parents of all nodes defined in DAG :param dict node_values: values of each cat. node [UNUSED]

NodeValueCombinations

NodeValueCombinations(node_values: Dict[str, List[str]], sort: bool = True)

Iterable over all combinations of node values

:param dict node_values: allowed values for each node {node: [values]} :param bool sort: whether to sort node names and values into alphabetic order

Methods:

  • __iter__

    Returns the initialised iterator

  • __next__

    Generate the next node value combination

__iter__
__iter__() -> NodeValueCombinations

Returns the initialised iterator

:returns NodeValueCombinations: the iterator

__next__
__next__() -> Dict[str, str]

Generate the next node value combination

:raises StopIteration: when all combinations have been returned

:returns dict: next node value combination {node: value}