Bayesian Networks Module¶
The Bayesian Networks module provides comprehensive functionality for working with probabilistic graphical models. It includes classes for representing Bayesian Networks, conditional node distributions, and various I/O formats.
Overview¶
The causaliq_core.bn module consists of several key components:
- Core Classes: Main BN and BNFit classes for network representation
- Distributions: Conditional node distribution implementations
- I/O Operations: Reading and writing BNs in various formats
Main Classes¶
BN¶
The main Bayesian Network class that combines a DAG structure with conditional probability distributions.
BNFit¶
A fitted Bayesian Network with learned parameters from data.
Distribution Types¶
CPT (Conditional Probability Table)¶
Discrete conditional probability distributions for categorical variables.
LinGauss (Linear Gaussian)¶
Continuous conditional distributions for normally distributed variables.
I/O Formats¶
DSC Format¶
Reading and writing Bayesian Networks in DSC format.
XDSL Format¶
Reading and writing Bayesian Networks in GeNIe XDSL format.
Key Features¶
- Probabilistic Inference: Compute marginal and conditional probabilities
- Parameter Learning: Fit network parameters from data
- Multiple Formats: Support for DSC and XDSL file formats
- Flexible Distributions: Both discrete (CPT) and continuous (LinGauss) distributions
- Graph Integration: Built on the causaliq_core.graph DAG structure
Example Usage¶
from causaliq_core.bn import BN, CPT
from causaliq_core.graph import DAG
# Create a simple DAG
dag = DAG(['A', 'B'], [('A', 'B')])
# Define conditional distributions
cnd_specs = {
'A': CPT(values=['T', 'F'], table=[0.3, 0.7]),
'B': CPT(values=['T', 'F'], table=[0.9, 0.1, 0.2, 0.8], parents=['A'])
}
# Create Bayesian Network
bn = BN(dag, cnd_specs)
# Compute marginals
marginals = bn.marginals(['A', 'B'])
Module Structure¶
bn
¶
Bayesian Networks module for CausalIQ Core.
This module provides classes and utilities for working with Bayesian Networks, including conditional node distributions and their implementations.
Modules:
-
bn– -
bnfit– -
dist–Distribution classes for Bayesian Network nodes.
-
io–I/O module for Bayesian Network file formats.
Classes:
-
BN–Base class for Bayesian Networks.
-
BNFit–Interface for Bayesian Network parameter estimation and data access.
-
CPT–Base class for conditional probability tables.
-
LinGauss–Conditional Linear Gaussian Distribution.
-
NodeValueCombinations–Iterable over all combinations of node values
BN
¶
BN(dag: DAG, cnd_specs: Dict[str, Any], estimated_pmfs: Dict[str, Any] = {})
Base class for Bayesian Networks.
Bayesian Networks have a DAG and an associated probability distribution defined by CPTs.
Parameters:
-
(dag¶DAG) –DAG for the Bayesian Network.
-
(cnd_specs¶Dict[str, Any]) –Specification of each conditional node distribution.
-
(estimated_pmfs¶Dict[str, Any], default:{}) –Number of PMFs that had to be estimated for each node.
Attributes:
-
dag–BN's DAG.
-
cnds–Conditional distributions for each node {node: CND}.
-
free_params–Total number of free parameters in BN.
-
estimated_pmfs–Number of estimated pmfs for each node.
Raises:
-
TypeError–If arguments have invalid types.
-
ValueError–If arguments have invalid values.
Methods:
-
__eq__–Compare another BN with this one.
-
fit–Alternative instantiation of BN using data to implicitly define the
-
generate_cases–Generate specified number of random data cases for this BN.
-
global_distribution–Generate the global probability distribution for the BN.
-
lnprob_case–Return log of probability of set of node values (case) occuring.
-
marginal_distribution–Generate a marginal probability distribution for a specified node
-
marginals–Return marginal distribution for specified nodes.
-
rename–Rename nodes in place according to name map.
__eq__
¶
__eq__(other: object) -> bool
Compare another BN with this one.
Parameters:
-
(other¶object) –The other BN to compare with this one.
Returns:
-
bool–True, if other BN is same as this one.
fit
classmethod
¶
Alternative instantiation of BN using data to implicitly define the conditional probability data.
Parameters:
Returns:
-
BN–A new BN instance fitted to the data.
Raises:
-
TypeError–If arguments have invalid types.
-
ValueError–If arguments have invalid values.
generate_cases
¶
Generate specified number of random data cases for this BN.
Parameters:
-
(n¶int) –Number of cases to generate.
-
(outfile¶Optional[str], default:None) –Name of file to write instance to.
-
(pseudo¶bool, default:True) –If pseudo-random (i.e. repeatable cases) to be produced, otherwise truly random.
Returns:
-
DataFrame–Random data cases.
Raises:
-
TypeError–If arguments not of correct type.
-
ValueError–If invalid number of rows requested.
-
FileNotFoundError–If outfile in nonexistent folder.
global_distribution
¶
Generate the global probability distribution for the BN.
Returns:
-
DataFrame–Global distribution in descending probability (and then by
-
DataFrame–ascending values).
lnprob_case
¶
lnprob_case(case_values: Dict[str, Any], base: Union[int, str] = 10) -> Optional[float]
Return log of probability of set of node values (case) occuring.
Parameters:
-
(case_values¶Dict[str, Any]) –Value for each node {node: value}.
-
(base¶Union[int, str], default:10) –Logarithm base to use - 2, 10 or 'e'.
Returns:
-
Optional[float]–Log of probability of case occuring, or None if case has zero
-
Optional[float]–probability.
Raises:
-
TypeError–If arguments wrong type.
-
ValueError–If arguments have invalid values.
marginal_distribution
¶
Generate a marginal probability distribution for a specified node and its parents in same format returned by Panda crosstab function.
Parameters:
-
(node¶str) –Node for which distribution required.
-
(parents¶Optional[List[str]], default:None) –Parents of node.
Returns:
-
DataFrame–Marginal distribution with parental value combos as columns,
-
DataFrame–and node values as rows.
marginals
¶
marginals(nodes: List[str]) -> DataFrame
Return marginal distribution for specified nodes.
Parameters:
-
(nodes¶List[str]) –Nodes for which marginal distribution required.
Returns:
-
DataFrame–Marginal distribution in same format returned by Pandas
-
DataFrame–crosstab function.
Raises:
-
TypeError–If arguments have bad type.
-
ValueError–If arguments contain bad values.
BNFit
¶
Interface for Bayesian Network parameter estimation and data access.
This interface provides the essential methods required for fitting conditional probability tables (CPT) and linear Gaussian models in Bayesian Networks, as well as data access methods for the BN class.
Implementing classes should provide: - A constructor that accepts df=DataFrame parameter for BN compatibility - All abstract methods defined below - Properties for data access (.nodes, .sample, .node_types)
Methods:
-
marginals–Return marginal counts for a node and its parents.
-
values–Return the (float) values for specified nodes.
-
write–Write data to file.
Attributes:
-
N(int) –Total sample size.
-
node_types(Dict[str, str]) –Node type mapping for each variable.
-
node_values(Dict[str, Dict]) –Node value counts for categorical variables.
-
nodes(Tuple[str, ...]) –Column names in the dataset.
-
sample(Any) –Access to underlying data sample.
N
abstractmethod
property
writable
¶
Total sample size.
Returns:
-
int–Current sample size being used.
node_types
abstractmethod
property
¶
Node type mapping for each variable.
Returns:
-
Dict[str, str]–Dictionary mapping node names to their types.
-
Format(Dict[str, str]) –{node: 'category' | 'continuous'}
node_values
abstractmethod
property
writable
¶
Node value counts for categorical variables.
Returns:
-
Dict[str, Dict]–Values and their counts of categorical nodes in sample.
-
Format(Dict[str, Dict]) –{node1: {val1: count1, val2: count2, ...}, ...}
nodes
abstractmethod
property
¶
Column names in the dataset.
Returns:
-
Tuple[str, ...]–Tuple of node names (column names) in the dataset.
sample
abstractmethod
property
¶
Access to underlying data sample.
Returns:
-
Any–The underlying DataFrame or data structure for direct access.
-
Any–Used for operations like .unique() on columns.
marginals
abstractmethod
¶
marginals(node: str, parents: Dict, values_reqd: bool = False) -> Tuple
Return marginal counts for a node and its parents.
Parameters:
-
(node¶str) –Node for which marginals required.
-
(parents¶Dict) –Dictionary {node: parents} for non-orphan nodes.
-
(values_reqd¶bool, default:False) –Whether parent and child values required.
Returns:
-
Tuple–Tuple of counts, and optionally, values:
-
Tuple–- ndarray counts: 2D array, rows=child, cols=parents
-
Tuple–- int maxcol: Maximum number of parental values
-
Tuple–- tuple rowval: Child values for each row
-
Tuple–- tuple colval: Parent combo (dict) for each col
Raises:
-
TypeError–For bad argument types.
values
abstractmethod
¶
values(nodes: Tuple[str, ...]) -> ndarray
Return the (float) values for specified nodes.
Suitable for passing into e.g. linear regression fitting.
Parameters:
-
(nodes¶Tuple[str, ...]) –Nodes for which data required.
Returns:
-
ndarray–Numpy array of values, each column for a node.
Raises:
-
TypeError–If bad argument type.
-
ValueError–If bad argument value.
CPT
¶
CPT(
pmfs: Union[Dict[str, float], List[Tuple[Dict[str, str], Dict[str, float]]]],
estimated: int = 0,
)
Base class for conditional probability tables.
Parameters:
-
(pmfs¶Union[Dict[str, float], List[Tuple[Dict[str, str], Dict[str, float]]]]) –A pmf of {value: prob} for parentless nodes OR list of tuples ({parent: value}, {value: prob}).
-
(estimated¶int, default:0) –How many PMFs were estimated.
Attributes:
-
cpt–Internal representation of the CPT. {node_values: prob} for parentless node, otherwise {parental_values as frozenset: {node_values: prob}}.
-
estimated–Number of PMFs that were estimated.
-
values–Values which node can take.
Raises:
-
TypeError–If arguments are of wrong type.
-
ValueError–If arguments have invalid or conflicting values.
Methods:
-
__eq__–Return whether two CPTs are the same allowing for probability
-
__str__–Human-friendly description of the contents of the CPT.
-
cdist–Return conditional probabilities of node values for specified
-
fit–Constructs a CPT (Conditional Probability Table) from data.
-
node_values–Return node values (states) of node CPT relates to.
-
param_ratios–Returns distribution of parameter ratios across all parental
-
parents–Return parents of node CPT relates to.
-
random_value–Generate a random value for a node given the value of its parents.
-
to_spec–Returns external specification format of CPT,
-
validate_cnds–Checks that all CNDs in graph are consistent with one another
-
validate_parents–Checks every CPT's parents and parental values are consistent
__eq__
¶
Return whether two CPTs are the same allowing for probability rounding errors
:param other: CPT to compared to self :type other: CPT
:returns: whether CPTs are PRACTICALLY the same :rtype: bool
__str__
¶
Human-friendly description of the contents of the CPT.
Returns:
-
str–String representation of the CPT contents.
cdist
¶
cdist(parental_values: Optional[Dict[str, str]] = None) -> Dict[str, float]
Return conditional probabilities of node values for specified parental values.
Parameters:
-
(parental_values¶Optional[Dict[str, str]], default:None) –Parental values for which pmf required
Raises:
-
TypeError–If args are of wrong type.
-
ValueError–If args have invalid or conflicting values.
fit
classmethod
¶
fit(
node: str,
parents: Optional[Tuple[str, ...]],
data: Union[BNFit, Any],
autocomplete: bool = True,
) -> Tuple[Tuple[type, Dict[str, Any]], Optional[int]]
Constructs a CPT (Conditional Probability Table) from data.
Parameters:
-
(node¶str) –Node that CPT applies to.
-
(parents¶Optional[Tuple[str, ...]]) –Parents of node.
-
(data¶Union[BNFit, Any]) –Data to fit CPT to.
-
(autocomplete¶bool, default:True) –Whether to ensure CPT data contains entries for
Returns:
-
Tuple[type, Dict[str, Any]]–Tuple of (cnd_spec, estimated_pmfs) where
-
Optional[int]–cnd_spec is (CPT class, cpt_spec for CPT())
-
Tuple[Tuple[type, Dict[str, Any]], Optional[int]]–estimated_pmfs is int, # estimated pmfs.
node_values
¶
Return node values (states) of node CPT relates to.
Returns:
-
List[str]–Node values in alphabetical order.
param_ratios
¶
Returns distribution of parameter ratios across all parental values for each combination of possible node values.
:returns dict: {(node value pair): (param ratios across parents)
parents
¶
Return parents of node CPT relates to.
Returns:
-
List[str]–Parent node names in alphabetical order.
random_value
¶
random_value(pvs: Optional[Dict[str, str]]) -> str
Generate a random value for a node given the value of its parents.
Parameters:
-
(pvs¶Optional[Dict[str, str]]) –Parental values, {parent1: value1, ...}.
Returns:
-
str–Random value for node.
to_spec
¶
to_spec(name_map: Dict[str, str]) -> Dict[str, Any]
Returns external specification format of CPT, renaming nodes according to a name map.
Parameters:
-
(name_map¶Dict[str, str]) –Map of node names {old: new}.
Returns:
-
Dict[str, Any]–CPT specification with renamed nodes.
Raises:
-
TypeError–If bad arg type.
-
ValueError–If bad arg value, e.g. coeff keys not in map.
validate_cnds
classmethod
¶
Checks that all CNDs in graph are consistent with one another and with graph structure.
Parameters:
-
(nodes¶list) –BN nodes.
-
(cnds¶dict) –Set of CNDs for the BN, {node: cnd}.
-
(parents¶dict) –Parents of non-orphan nodes, {node: parents}.
Raises:
-
TypeError–If invalid types used in arguments.
-
ValueError–If any inconsistent values found.
validate_parents
¶
validate_parents(
node: str, parents: Dict[str, List[str]], node_values: Dict[str, List[str]]
) -> None
Checks every CPT's parents and parental values are consistent with other relevant CPTs and the DAG structure.
Parameters:
-
(node¶str) –Name of node.
-
(parents¶Dict[str, List[str]]) –Parents of all nodes {node: parents}.
-
(node_values¶Dict[str, List[str]]) –Values of each cat. node {node: values}.
Raises:
-
ValueError–If parent mismatch or missing parental
LinGauss
¶
LinGauss(lg: Dict[str, Any])
Conditional Linear Gaussian Distribution.
Parameters:
-
(lg¶Dict[str, Any]) –Specification of Linear Gaussian in following form: {'coeffs': {node: coeff}, 'mean': mean, 'sd': sd}.
Attributes:
-
coeffs–Linear coefficient of parents {parent: coeff}.
-
mean–Mean of Gaussian noise (aka intercept, mu).
-
sd–S.D. of Gaussian noise (aka sigma).
Raises:
-
TypeError–If called with bad arg types.
-
ValueError–If called with bad arg values.
Methods:
-
__eq__–Return whether two CNDs are the same allowing for probability
-
__str__–Human-friendly formula description of the Linear Gaussian.
-
cdist–Return conditional distribution for specified parental values.
-
fit–Fit a Linear Gaussian to data.
-
parents–Return parents of node CND relates to.
-
random_value–Generate a random value for a node given the value of its parents.
-
to_spec–Returns external specification format of LinGauss,
-
validate_parents–Check LinGauss coeff keys consistent with parents in DAG.
__eq__
¶
Return whether two CNDs are the same allowing for probability rounding errors
:param CND other: CND to compared to self
:returns bool: whether LinGauss objects are the same up to 10 sf
__str__
¶
Human-friendly formula description of the Linear Gaussian.
Returns:
-
str–String representation of the Linear Gaussian formula.
cdist
¶
cdist(parental_values: Optional[Dict[str, float]] = None) -> Tuple[float, float]
Return conditional distribution for specified parental values.
Parameters:
-
(parental_values¶Optional[Dict[str, float]], default:None) –Parental values for which dist. required
Returns:
-
Tuple[float, float]–Tuple of (mean, sd) of child Gaussian distribution.
Raises:
-
TypeError–If args are of wrong type.
-
ValueError–If args have invalid or conflicting values.
fit
classmethod
¶
fit(
node: str,
parents: Optional[Tuple[str, ...]],
data: Union[Pandas, BNFit],
autocomplete: bool = True,
) -> Tuple[Tuple[type, Dict[str, Any]], Optional[int]]
Fit a Linear Gaussian to data.
Parameters:
-
(node¶str) –Node that Linear Gaussian applies to.
-
(parents¶Optional[Tuple[str, ...]]) –Parents of node.
-
(data¶Union[Pandas, BNFit]) –Data to fit Linear Gaussian to.
-
(autocomplete¶bool, default:True) –Not used for Linear Gaussian.
Returns:
-
Tuple[Tuple[type, Dict[str, Any]], Optional[int]]–Tuple of (lg_spec, None) where lg is (LinGauss class, lg_spec).
Raises:
-
TypeError–With bad arg types.
-
ValueError–With bad arg values.
parents
¶
Return parents of node CND relates to.
Returns:
-
List[str]–Parent node names in alphabetical order.
random_value
¶
random_value(pvs: Optional[Dict[str, float]]) -> float
Generate a random value for a node given the value of its parents.
Parameters:
-
(pvs¶Optional[Dict[str, float]]) –Parental values, {parent1: value1, ...}.
Returns:
-
float–Random value for node.
to_spec
¶
to_spec(name_map: Dict[str, str]) -> Dict[str, Any]
Returns external specification format of LinGauss, renaming nodes according to a name map.
Parameters:
-
(name_map¶Dict[str, str]) –Map of node names {old: new}.
Returns:
-
Dict[str, Any]–LinGauss specification with renamed nodes.
Raises:
-
TypeError–If bad arg type.
-
ValueError–If bad arg value, e.g. coeff keys not in map.
validate_parents
¶
validate_parents(
node: str, parents: Dict[str, List[str]], node_values: Dict[str, List[str]]
) -> None
Check LinGauss coeff keys consistent with parents in DAG.
:param str node: name of node :param dict parents: parents of all nodes defined in DAG :param dict node_values: values of each cat. node [UNUSED]
NodeValueCombinations
¶
Iterable over all combinations of node values
:param dict node_values: allowed values for each node {node: [values]} :param bool sort: whether to sort node names and values into alphabetic order
Methods:
__iter__
¶
__iter__() -> NodeValueCombinations
Returns the initialised iterator
:returns NodeValueCombinations: the iterator
__next__
¶
Generate the next node value combination
:raises StopIteration: when all combinations have been returned
:returns dict: next node value combination {node: value}