Bayesian Networks Module¶

The Bayesian Networks module provides comprehensive functionality for working with probabilistic graphical models. It includes classes for representing Bayesian Networks, conditional node distributions, and various I/O formats.

Overview¶

The causaliq_core.bn module consists of several key components:

Core Classes: Main BN and BNFit classes for network representation
Distributions: Conditional node distribution implementations
I/O Operations: Reading and writing BNs in various formats

Main Classes¶

BN ¶

The main Bayesian Network class that combines a DAG structure with conditional probability distributions.

BNFit ¶

A fitted Bayesian Network with learned parameters from data.

Distribution Types¶

CPT (Conditional Probability Table)¶

Discrete conditional probability distributions for categorical variables.

LinGauss (Linear Gaussian)¶

Continuous conditional distributions for normally distributed variables.

I/O Formats¶

DSC Format ¶

Reading and writing Bayesian Networks in DSC format.

XDSL Format ¶

Reading and writing Bayesian Networks in GeNIe XDSL format.

Key Features¶

Probabilistic Inference: Compute marginal and conditional probabilities
Parameter Learning: Fit network parameters from data
Multiple Formats: Support for DSC and XDSL file formats
Flexible Distributions: Both discrete (CPT) and continuous (LinGauss) distributions
Graph Integration: Built on the causaliq_core.graph DAG structure

Example Usage¶

from causaliq_core.bn import BN, CPT
from causaliq_core.graph import DAG

# Create a simple DAG
dag = DAG(['A', 'B'], [('A', 'B')])

# Define conditional distributions
cnd_specs = {
    'A': CPT(values=['T', 'F'], table=[0.3, 0.7]),
    'B': CPT(values=['T', 'F'], table=[0.9, 0.1, 0.2, 0.8], parents=['A'])
}

# Create Bayesian Network
bn = BN(dag, cnd_specs)

# Compute marginals
marginals = bn.marginals(['A', 'B'])

Module Structure¶

bn ¶

Bayesian Networks module for CausalIQ Core.

This module provides classes and utilities for working with Bayesian Networks, including conditional node distributions and their implementations.

Modules:

bn –
bnfit –
dist –

Distribution classes for Bayesian Network nodes.
io –

I/O module for Bayesian Network file formats.

Classes:

BN –

Base class for Bayesian Networks.
BNFit –

Interface for Bayesian Network parameter estimation and data access.
CPT –

Base class for conditional probability tables.
LinGauss –

Conditional Linear Gaussian Distribution.
NodeValueCombinations –

Iterable over all combinations of node values

BN ¶

BN(dag: DAG, cnd_specs: Dict[str, Any], estimated_pmfs: Dict[str, Any] = {})

Base class for Bayesian Networks.

Bayesian Networks have a DAG and an associated probability distribution defined by CPTs.

Parameters:

dag ¶
(DAG) –

DAG for the Bayesian Network.
cnd_specs ¶
(Dict[str, Any]) –

Specification of each conditional node distribution.
estimated_pmfs ¶
(Dict[str, Any], default: {} ) –

Number of PMFs that had to be estimated for each node.

Attributes:

dag –

BN's DAG.
cnds –

Conditional distributions for each node {node: CND}.
free_params –

Total number of free parameters in BN.
estimated_pmfs –

Number of estimated pmfs for each node.

Raises:

TypeError –

If arguments have invalid types.
ValueError –

If arguments have invalid values.

Methods:

__eq__ –

Compare another BN with this one.
fit –

Alternative instantiation of BN using data to implicitly define the
generate_cases –

Generate specified number of random data cases for this BN.
global_distribution –

Generate the global probability distribution for the BN.
lnprob_case –

Return log of probability of set of node values (case) occuring.
marginal_distribution –

Generate a marginal probability distribution for a specified node
marginals –

Return marginal distribution for specified nodes.
rename –

Rename nodes in place according to name map.

eq ¶

__eq__(other: object) -> bool

Compare another BN with this one.

Parameters:

other ¶
(object) –

The other BN to compare with this one.

Returns:

bool –

True, if other BN is same as this one.

fit `classmethod` ¶

fit(dag: DAG, data: BNFit) -> BN

Alternative instantiation of BN using data to implicitly define the conditional probability data.

Parameters:

dag ¶
(DAG) –

DAG for the Bayesian Network.
data ¶
(BNFit) –

Data to fit CPTs to.

Returns:

BN –

A new BN instance fitted to the data.

Raises:

TypeError –

If arguments have invalid types.
ValueError –

If arguments have invalid values.

generate_cases ¶

generate_cases(
    n: int, outfile: Optional[str] = None, pseudo: bool = True
) -> DataFrame

Generate specified number of random data cases for this BN.

Parameters:

n ¶
(int) –

Number of cases to generate.
outfile ¶
(Optional[str], default: None ) –

Name of file to write instance to.
pseudo ¶
(bool, default: True ) –

If pseudo-random (i.e. repeatable cases) to be produced, otherwise truly random.

Returns:

DataFrame –

Random data cases.

Raises:

TypeError –

If arguments not of correct type.
ValueError –

If invalid number of rows requested.
FileNotFoundError –

If outfile in nonexistent folder.

global_distribution ¶

global_distribution() -> DataFrame

Generate the global probability distribution for the BN.

Returns:

DataFrame –

Global distribution in descending probability (and then by
DataFrame –

ascending values).

lnprob_case ¶

lnprob_case(case_values: Dict[str, Any], base: Union[int, str] = 10) -> Optional[float]

Return log of probability of set of node values (case) occuring.

Parameters:

case_values ¶
(Dict[str, Any]) –

Value for each node {node: value}.
base ¶
(Union[int, str], default: 10 ) –

Logarithm base to use - 2, 10 or 'e'.

Returns:

Optional[float] –

Log of probability of case occuring, or None if case has zero
Optional[float] –

probability.

Raises:

TypeError –

If arguments wrong type.
ValueError –

If arguments have invalid values.

marginal_distribution ¶

marginal_distribution(node: str, parents: Optional[List[str]] = None) -> DataFrame

Generate a marginal probability distribution for a specified node and its parents in same format returned by Panda crosstab function.

Parameters:

node ¶
(str) –

Node for which distribution required.
parents ¶
(Optional[List[str]], default: None ) –

Parents of node.

Returns:

DataFrame –

Marginal distribution with parental value combos as columns,
DataFrame –

and node values as rows.

marginals ¶

marginals(nodes: List[str]) -> DataFrame

Return marginal distribution for specified nodes.

Parameters:

nodes ¶
(List[str]) –

Nodes for which marginal distribution required.

Returns:

DataFrame –

Marginal distribution in same format returned by Pandas
DataFrame –

crosstab function.

Raises:

TypeError –

If arguments have bad type.
ValueError –

If arguments contain bad values.

rename ¶

rename(name_map: Dict[str, str]) -> None

Rename nodes in place according to name map.

Parameters:

name_map ¶
(Dict[str, str]) –

Name mapping {name: new name}.

Raises:

TypeError –

With bad arg type.
ValueError –

With bad arg values e.g. unknown node names.

BNFit ¶

Interface for Bayesian Network parameter estimation and data access.

This interface provides the essential methods required for fitting conditional probability tables (CPT) and linear Gaussian models in Bayesian Networks, as well as data access methods for the BN class.

Implementing classes should provide: - A constructor that accepts df=DataFrame parameter for BN compatibility - All abstract methods defined below - Properties for data access (.nodes, .sample, .node_types)

Methods:

marginals –

Return marginal counts for a node and its parents.
values –

Return the (float) values for specified nodes.
write –

Write data to file.

Attributes:

N (int) –

Total sample size.
node_types (Dict[str, str]) –

Node type mapping for each variable.
node_values (Dict[str, Dict]) –

Node value counts for categorical variables.
nodes (Tuple[str, ...]) –

Column names in the dataset.
sample (Any) –

Access to underlying data sample.

N `abstractmethod` `property` `writable` ¶

N: int

Total sample size.

Returns:

int –

Current sample size being used.

node_types `abstractmethod` `property` ¶

node_types: Dict[str, str]

Node type mapping for each variable.

Returns:

Dict[str, str] –

Dictionary mapping node names to their types.
Format ( Dict[str, str] ) –

{node: 'category' | 'continuous'}

node_values `abstractmethod` `property` `writable` ¶

node_values: Dict[str, Dict]

Node value counts for categorical variables.

Returns:

Dict[str, Dict] –

Values and their counts of categorical nodes in sample.
Format ( Dict[str, Dict] ) –

{node1: {val1: count1, val2: count2, ...}, ...}

nodes `abstractmethod` `property` ¶

nodes: Tuple[str, ...]

Column names in the dataset.

Returns:

Tuple[str, ...] –

Tuple of node names (column names) in the dataset.

sample `abstractmethod` `property` ¶

sample: Any

Access to underlying data sample.

Returns:

Any –

The underlying DataFrame or data structure for direct access.
Any –

Used for operations like .unique() on columns.

marginals `abstractmethod` ¶

marginals(node: str, parents: Dict, values_reqd: bool = False) -> Tuple

Return marginal counts for a node and its parents.

Parameters:

node ¶
(str) –

Node for which marginals required.
parents ¶
(Dict) –

Dictionary {node: parents} for non-orphan nodes.
values_reqd ¶
(bool, default: False ) –

Whether parent and child values required.

Returns:

Tuple –

Tuple of counts, and optionally, values:
Tuple –
- ndarray counts: 2D array, rows=child, cols=parents
Tuple –
- int maxcol: Maximum number of parental values
Tuple –
- tuple rowval: Child values for each row
Tuple –
- tuple colval: Parent combo (dict) for each col

Raises:

TypeError –

For bad argument types.

values `abstractmethod` ¶

values(nodes: Tuple[str, ...]) -> ndarray

Return the (float) values for specified nodes.

Suitable for passing into e.g. linear regression fitting.

Parameters:

nodes ¶
(Tuple[str, ...]) –

Nodes for which data required.

Returns:

ndarray –

Numpy array of values, each column for a node.

Raises:

TypeError –

If bad argument type.
ValueError –

If bad argument value.

write `abstractmethod` ¶

write(filename: str) -> None

Write data to file.

Parameters:

filename ¶
(str) –

Path to output file.

Raises:

TypeError –

If filename is not a string.
FileNotFoundError –

If output directory doesn't exist.

CPT ¶

CPT(
    pmfs: Union[Dict[str, float], List[Tuple[Dict[str, str], Dict[str, float]]]],
    estimated: int = 0,
)

Base class for conditional probability tables.

Parameters:

pmfs ¶
(Union[Dict[str, float], List[Tuple[Dict[str, str], Dict[str, float]]]]) –

A pmf of {value: prob} for parentless nodes OR list of tuples ({parent: value}, {value: prob}).
estimated ¶
(int, default: 0 ) –

How many PMFs were estimated.

Attributes:

cpt –

Internal representation of the CPT. {node_values: prob} for parentless node, otherwise {parental_values as frozenset: {node_values: prob}}.
estimated –

Number of PMFs that were estimated.
values –

Values which node can take.

Raises:

TypeError –

If arguments are of wrong type.
ValueError –

If arguments have invalid or conflicting values.

Methods:

__eq__ –

Return whether two CPTs are the same allowing for probability
__str__ –

Human-friendly description of the contents of the CPT.
cdist –

Return conditional probabilities of node values for specified
fit –

Constructs a CPT (Conditional Probability Table) from data.
node_values –

Return node values (states) of node CPT relates to.
param_ratios –

Returns distribution of parameter ratios across all parental
parents –

Return parents of node CPT relates to.
random_value –

Generate a random value for a node given the value of its parents.
to_spec –

Returns external specification format of CPT,
validate_cnds –

Checks that all CNDs in graph are consistent with one another
validate_parents –

Checks every CPT's parents and parental values are consistent

eq ¶

__eq__(other: object) -> bool

Return whether two CPTs are the same allowing for probability rounding errors

:param other: CPT to compared to self :type other: CPT

:returns: whether CPTs are PRACTICALLY the same :rtype: bool

str ¶

__str__() -> str

Human-friendly description of the contents of the CPT.

Returns:

str –

String representation of the CPT contents.

cdist ¶

cdist(parental_values: Optional[Dict[str, str]] = None) -> Dict[str, float]

Return conditional probabilities of node values for specified parental values.

Parameters:

parental_values ¶
(Optional[Dict[str, str]], default: None ) –

Parental values for which pmf required

Raises:

TypeError –

If args are of wrong type.
ValueError –

If args have invalid or conflicting values.

fit `classmethod` ¶

fit(
    node: str,
    parents: Optional[Tuple[str, ...]],
    data: Union[BNFit, Any],
    autocomplete: bool = True,
) -> Tuple[Tuple[type, Dict[str, Any]], Optional[int]]

Constructs a CPT (Conditional Probability Table) from data.

Parameters:

node ¶
(str) –

Node that CPT applies to.
parents ¶
(Optional[Tuple[str, ...]]) –

Parents of node.
data ¶
(Union[BNFit, Any]) –

Data to fit CPT to.
autocomplete ¶
(bool, default: True ) –

Whether to ensure CPT data contains entries for

Returns:

Tuple[type, Dict[str, Any]] –

Tuple of (cnd_spec, estimated_pmfs) where
Optional[int] –

cnd_spec is (CPT class, cpt_spec for CPT())
Tuple[Tuple[type, Dict[str, Any]], Optional[int]] –

estimated_pmfs is int, # estimated pmfs.

node_values ¶

node_values() -> List[str]

Return node values (states) of node CPT relates to.

Returns:

List[str] –

Node values in alphabetical order.

param_ratios ¶

param_ratios() -> None

Returns distribution of parameter ratios across all parental values for each combination of possible node values.

:returns dict: {(node value pair): (param ratios across parents)

parents ¶

parents() -> List[str]

Return parents of node CPT relates to.

Returns:

List[str] –

Parent node names in alphabetical order.

random_value ¶

random_value(pvs: Optional[Dict[str, str]]) -> str

Generate a random value for a node given the value of its parents.

Parameters:

pvs ¶
(Optional[Dict[str, str]]) –

Parental values, {parent1: value1, ...}.

Returns:

str –

Random value for node.

to_spec ¶

to_spec(name_map: Dict[str, str]) -> Dict[str, Any]

Returns external specification format of CPT, renaming nodes according to a name map.

Parameters:

name_map ¶
(Dict[str, str]) –

Map of node names {old: new}.

Returns:

Dict[str, Any] –

CPT specification with renamed nodes.

Raises:

TypeError –

If bad arg type.
ValueError –

If bad arg value, e.g. coeff keys not in map.

validate_cnds `classmethod` ¶

validate_cnds(
    nodes: List[str], cnds: Dict[str, CND], parents: Dict[str, List[str]]
) -> None

Checks that all CNDs in graph are consistent with one another and with graph structure.

Parameters:

nodes ¶
(list) –

BN nodes.
cnds ¶
(dict) –

Set of CNDs for the BN, {node: cnd}.
parents ¶
(dict) –

Parents of non-orphan nodes, {node: parents}.

Raises:

TypeError –

If invalid types used in arguments.
ValueError –

If any inconsistent values found.

validate_parents ¶

validate_parents(
    node: str, parents: Dict[str, List[str]], node_values: Dict[str, List[str]]
) -> None

Checks every CPT's parents and parental values are consistent with other relevant CPTs and the DAG structure.

Parameters:

node ¶
(str) –

Name of node.
parents ¶
(Dict[str, List[str]]) –

Parents of all nodes {node: parents}.
node_values ¶
(Dict[str, List[str]]) –

Values of each cat. node {node: values}.

Raises:

ValueError –

If parent mismatch or missing parental

LinGauss ¶

LinGauss(lg: Dict[str, Any])

Conditional Linear Gaussian Distribution.

Parameters:

lg ¶
(Dict[str, Any]) –

Specification of Linear Gaussian in following form: {'coeffs': {node: coeff}, 'mean': mean, 'sd': sd}.

Attributes:

coeffs –

Linear coefficient of parents {parent: coeff}.
mean –

Mean of Gaussian noise (aka intercept, mu).
sd –

S.D. of Gaussian noise (aka sigma).

Raises:

TypeError –

If called with bad arg types.
ValueError –

If called with bad arg values.

Methods:

__eq__ –

Return whether two CNDs are the same allowing for probability
__str__ –

Human-friendly formula description of the Linear Gaussian.
cdist –

Return conditional distribution for specified parental values.
fit –

Fit a Linear Gaussian to data.
parents –

Return parents of node CND relates to.
random_value –

Generate a random value for a node given the value of its parents.
to_spec –

Returns external specification format of LinGauss,
validate_parents –

Check LinGauss coeff keys consistent with parents in DAG.

eq ¶

__eq__(other: object) -> bool

Return whether two CNDs are the same allowing for probability rounding errors

:param CND other: CND to compared to self

:returns bool: whether LinGauss objects are the same up to 10 sf

str ¶

__str__() -> str

Human-friendly formula description of the Linear Gaussian.

Returns:

str –

String representation of the Linear Gaussian formula.

cdist ¶

cdist(parental_values: Optional[Dict[str, float]] = None) -> Tuple[float, float]

Return conditional distribution for specified parental values.

Parameters:

parental_values ¶
(Optional[Dict[str, float]], default: None ) –

Parental values for which dist. required

Returns:

Tuple[float, float] –

Tuple of (mean, sd) of child Gaussian distribution.

Raises:

TypeError –

If args are of wrong type.
ValueError –

If args have invalid or conflicting values.

fit `classmethod` ¶

fit(
    node: str,
    parents: Optional[Tuple[str, ...]],
    data: Union[Pandas, BNFit],
    autocomplete: bool = True,
) -> Tuple[Tuple[type, Dict[str, Any]], Optional[int]]

Fit a Linear Gaussian to data.

Parameters:

node ¶
(str) –

Node that Linear Gaussian applies to.
parents ¶
(Optional[Tuple[str, ...]]) –

Parents of node.
data ¶
(Union[Pandas, BNFit]) –

Data to fit Linear Gaussian to.
autocomplete ¶
(bool, default: True ) –

Not used for Linear Gaussian.

Returns:

Tuple[Tuple[type, Dict[str, Any]], Optional[int]] –

Tuple of (lg_spec, None) where lg is (LinGauss class, lg_spec).

Raises:

TypeError –

With bad arg types.
ValueError –

With bad arg values.

parents ¶

parents() -> List[str]

Return parents of node CND relates to.

Returns:

List[str] –

Parent node names in alphabetical order.

random_value ¶

random_value(pvs: Optional[Dict[str, float]]) -> float

Generate a random value for a node given the value of its parents.

Parameters:

pvs ¶
(Optional[Dict[str, float]]) –

Parental values, {parent1: value1, ...}.

Returns:

float –

Random value for node.

to_spec ¶

to_spec(name_map: Dict[str, str]) -> Dict[str, Any]

Returns external specification format of LinGauss, renaming nodes according to a name map.

Parameters:

name_map ¶
(Dict[str, str]) –

Map of node names {old: new}.

Returns:

Dict[str, Any] –

LinGauss specification with renamed nodes.

Raises:

TypeError –

If bad arg type.
ValueError –

If bad arg value, e.g. coeff keys not in map.

validate_parents ¶

validate_parents(
    node: str, parents: Dict[str, List[str]], node_values: Dict[str, List[str]]
) -> None

Check LinGauss coeff keys consistent with parents in DAG.

:param str node: name of node :param dict parents: parents of all nodes defined in DAG :param dict node_values: values of each cat. node [UNUSED]

NodeValueCombinations ¶

NodeValueCombinations(node_values: Dict[str, List[str]], sort: bool = True)

Iterable over all combinations of node values

:param dict node_values: allowed values for each node {node: [values]} :param bool sort: whether to sort node names and values into alphabetic order

Methods:

__iter__ –

Returns the initialised iterator
__next__ –

Generate the next node value combination

iter ¶

__iter__() -> NodeValueCombinations

Returns the initialised iterator

:returns NodeValueCombinations: the iterator

next ¶

__next__() -> Dict[str, str]

Generate the next node value combination

:raises StopIteration: when all combinations have been returned

:returns dict: next node value combination {node: value}

Bayesian Networks Module¶

Overview¶

Main Classes¶

BN¶

BNFit¶

Distribution Types¶

CPT (Conditional Probability Table)¶

LinGauss (Linear Gaussian)¶

I/O Formats¶

DSC Format¶

XDSL Format¶

Key Features¶

Example Usage¶

Module Structure¶

bn ¶

BN ¶

dag ¶

cnd_specs ¶

estimated_pmfs ¶

__eq__ ¶

other ¶

fit classmethod ¶

dag ¶

data ¶

generate_cases ¶

n ¶

outfile ¶

pseudo ¶

global_distribution ¶

lnprob_case ¶

case_values ¶

base ¶

marginal_distribution ¶

node ¶

parents ¶

marginals ¶

nodes ¶

rename ¶

name_map ¶

BNFit ¶

N abstractmethod property writable ¶

node_types abstractmethod property ¶

node_values abstractmethod property writable ¶

nodes abstractmethod property ¶

sample abstractmethod property ¶

marginals abstractmethod ¶

node ¶

parents ¶

values_reqd ¶

values abstractmethod ¶

nodes ¶

write abstractmethod ¶

filename ¶

CPT ¶

pmfs ¶

estimated ¶

__eq__ ¶

__str__ ¶

cdist ¶

parental_values ¶

fit classmethod ¶

node ¶

parents ¶

data ¶

autocomplete ¶

node_values ¶

param_ratios ¶

parents ¶

random_value ¶

pvs ¶

to_spec ¶

name_map ¶

validate_cnds classmethod ¶

nodes ¶

cnds ¶

parents ¶

validate_parents ¶

node ¶

parents ¶

node_values ¶

BN ¶

BNFit ¶

DSC Format ¶

XDSL Format ¶

`dag` ¶

`cnd_specs` ¶

`estimated_pmfs` ¶

eq ¶

`other` ¶

fit `classmethod` ¶

`dag` ¶

`data` ¶

`n` ¶

`outfile` ¶

`pseudo` ¶

`case_values` ¶

`base` ¶

`node` ¶

`parents` ¶

`nodes` ¶

`name_map` ¶

N `abstractmethod` `property` `writable` ¶

node_types `abstractmethod` `property` ¶

node_values `abstractmethod` `property` `writable` ¶

nodes `abstractmethod` `property` ¶

sample `abstractmethod` `property` ¶

marginals `abstractmethod` ¶

`node` ¶

`parents` ¶

`values_reqd` ¶

values `abstractmethod` ¶

`nodes` ¶

write `abstractmethod` ¶

`filename` ¶

`pmfs` ¶

`estimated` ¶

eq ¶

str ¶

`parental_values` ¶

fit `classmethod` ¶

`node` ¶

`parents` ¶

`data` ¶

`autocomplete` ¶

`pvs` ¶

`name_map` ¶

validate_cnds `classmethod` ¶

`nodes` ¶

`cnds` ¶

`parents` ¶

`node` ¶

`parents` ¶

`node_values` ¶

`lg` ¶

eq ¶

str ¶

`parental_values` ¶

fit `classmethod` ¶

`node` ¶

`parents` ¶

`data` ¶

`autocomplete` ¶

`pvs` ¶

`name_map` ¶

iter ¶

next ¶