Score - Scoring Functions for Causal Structure Learning¶
The score module provides comprehensive scoring functions for evaluating Bayesian networks and DAGs. It implements various entropy-based and Bayesian scoring methods for both categorical and continuous (Gaussian) data.
Score Types¶
Categorical Data Scores¶
Entropy-Based Scores¶
loglik: Log-likelihood scorebic: Bayesian Information Criterionaic: Akaike Information Criterion
Bayesian Scores¶
bde: Bayesian Dirichlet Equivalent (with hyperparameters)k2: K2 scorebdj: Bayesian Dirichlet with Jeffreys priorbds: Bayesian Dirichlet Sparse
Gaussian Data Scores¶
bic-g: Gaussian BICbge: Bayesian Gaussian Equivalentloglik-g: Gaussian log-likelihood
Constants¶
Available Scores¶
ENTROPY_SCORES = ["loglik", "bic", "aic"]
BAYESIAN_SCORES = ["bde", "k2", "bdj", "bds"]
GAUSSIAN_SCORES = ["bic-g", "bge", "loglik-g"]
Score Parameters¶
SCORES = {
"loglik": {"base"},
"loglik-g": {"base"},
"aic": {"base", "k"},
"bic": {"base", "k"},
"bic-g": {"base", "k"},
"bge": {},
"bde": {"iss", "prior"},
"bds": {"iss", "prior"},
"bdj": {},
"k2": {},
}
SCORE_PARAMS = {
"base": "e",
"k": 1,
"iss": 1,
"prior": "uniform",
"unistate_ok": True,
}
Core Functions¶
Node Scoring¶
node_score(node, parents, types, params, data) -> Dict[str, float]¶
Computes specified score types for a single node given its parents.
Arguments:
node: Node name to scoreparents: Dictionary mapping nodes to their parent liststypes: Score type(s) to compute (string or list)params: Score parameters dictionarydata: Data object implementing the Data interface
Returns:
- Dictionary mapping score types to computed values
categorical_node_score(node, parents, types, params, data, counts_reqd=False)¶
Returns decomposable scores for a categorical node with specified parents.
Arguments:
node: Node name to scoreparents: Parents dictionarytypes: List of score types requiredparams: Score parametersdata: Data objectcounts_reqd: Whether to return marginal count information
Returns:
- Dictionary of scores, or tuple of (scores, counts_info) if counts_reqd=True
gaussian_node_score(node, parents, types, params, data) -> Dict[str, float]¶
Computes Gaussian scores for continuous nodes.
Arguments:
node: Node name to scoreparents: Parents dictionarytypes: List of Gaussian score typesparams: Score parametersdata: Data object
Returns:
- Dictionary mapping score types to values
DAG and Network Scoring¶
dag_score(dag, data, types, params) -> DataFrame¶
Returns per-node scores for a complete DAG given data.
Arguments:
dag: DAG object to scoredata: Data object (not Oracle type)types: Score type(s) required (string or list)params: Score parameters dictionary
Returns:
- DataFrame with nodes as rows and score types as columns
Raises:
TypeError: For invalid argument typesValueError: For invalid argument values or data/DAG mismatch
bn_score(bn, N, types, params) -> DataFrame¶
Returns per-node scores for a Bayesian Network using oracle scoring.
Arguments:
bn: BN object to scoreN: Dataset size to assumetypes: Score type(s) requiredparams: Score parameters dictionary
Returns:
- DataFrame with nodes as rows and score types as columns
Raises:
TypeError: For invalid argument typesValueError: For invalid score types or non-positive N
Specialized Score Functions¶
bayesian_score(N_ijk, q_i, type, params) -> float¶
Computes Bayesian-based scores for marginal counts of a single node.
Arguments:
N_ijk: 2D array of instance counts for node i, parental combo j, and node value kq_i: Maximum possible number of parental value combinationstype: Bayesian score type (bde, k2, bdj, bds)params: Parameters dictionary including 'iss' (imaginary sample size)
Returns:
- Computed Bayesian score value
entropy_scores(Nijk, types, params, N, free_params) -> Dict[str, float]¶
Returns entropy-based scores for marginal counts.
Arguments:
Nijk: 2D array of instance countstypes: Entropy score types requiredparams: Parameters including logarithm 'base'N: Number of cases (instances)free_params: Number of free parameters
Returns:
- Dictionary of requested entropy scores
bayesian_gaussian_score(node, parents, params, data) -> float¶
Computes Bayesian Gaussian Equivalent (BGE) score for a node.
Arguments:
node: Node to scoreparents: Parents dictionaryparams: Score parametersdata: Data object
Returns:
- BGE score value
entropy_gaussian_score(node, parents, params, data) -> Dict[str, float]¶
Returns entropy-based scores for Gaussian nodes.
Arguments:
node: Node to scoreparents: Parents dictionaryparams: Score parametersdata: Data object
Returns:
- Dictionary with 'bic-g' and 'loglik-g' scores
Utility Functions¶
check_score_params(params, scores=None) -> Dict[str, Any]¶
Validates and completes score parameters with defaults.
Arguments:
params: Parameters dictionary to validatescores: Optional list of score types to validate against
Returns:
- Validated and completed parameters dictionary
Raises:
TypeError: For invalid parameter typesValueError: For invalid parameter values
free_params(graph, data, debug=False) -> int¶
Calculates the total number of free parameters in a graph structure.
Arguments:
graph: Graph object (DAG or PDAG)data: Data objectdebug: Whether to print debug information
Returns:
- Total number of free parameters
Usage Examples¶
Basic Node Scoring¶
from causaliq_data.score import node_score
# Score a node with categorical data
scores = node_score(
node="X1",
parents={"X1": ["X0", "X2"]},
types=["bic", "bde"],
params={"base": "e", "iss": 1},
data=data_obj
)
DAG Scoring¶
from causaliq_data.score import dag_score
# Score entire DAG
df_scores = dag_score(
dag=my_dag,
data=my_data,
types=["bic", "loglik"],
params={"base": "e"}
)
Gaussian Scoring¶
from causaliq_data.score import gaussian_node_score
# Score continuous node
gaussian_scores = gaussian_node_score(
node="Y1",
parents={"Y1": ["Y0"]},
types=["bic-g", "bge"],
params={"base": "e"},
data=continuous_data
)
Notes¶
- Score functions automatically determine appropriate score types based on data type
- Parameters are validated and defaults applied via
check_score_params() - BGE implementation follows bnlearn defaults with some simplifications
- Entropy scores require sufficient sample sizes for reliable estimates
- Single-valued variables will raise errors unless
unistate_ok=True