Data - Abstract Base Class¶
The Data class serves as the abstract base class for all data adapters in CausalIQ Data. It extends the BNFit interface from causaliq-core and provides the foundation for the plug-in data adapter architecture.
Class Definition¶
class Data(BNFit):
"""Top level data object that implements BNFit interface.
Extends BNFit interface with additional methods needed for
causal structure learning algorithms.
"""
Key Attributes¶
elapsed: Elapsed time for operationsorder: Order in which nodes should be processedext_to_orig: Mapping from external to original node namesorig_to_ext: Mapping from original to external node namesdstype: Overall dataset type (categorical/continuous/mixed)
Core Methods¶
Node Order Management¶
set_order(order: Tuple[str, ...]) -> None¶
Sets the processing order of nodes to the specified sequence.
Arguments:
- order: New processing order as a tuple of external node names
Raises:
- TypeError: For invalid argument types
- ValueError: If order contains invalid node names
get_order() -> Tuple[str, ...]¶
Returns the current processing order using external node names.
randomise_order(seed: int) -> None¶
Randomizes the processing order of nodes using the specified seed.
Arguments:
- seed: Randomization seed (must be non-negative)
Name Randomisation¶
randomise_names(seed: Optional[int]) -> None¶
Randomizes node names for sensitivity testing. When seed=None, reverts to original names.
Arguments:
- seed: Randomization seed or None to revert
Abstract Methods¶
The following methods must be implemented by concrete subclasses:
Sample Management¶
set_N(N, seed, random_selection): Set working sample size with optional randomization_update_sample(old_N, old_ext_to_orig): Update sample after parameter changes
Data Access¶
marginals(node, parents, values_reqd): Get marginal distributionsvalues(nodes): Return values for specified nodesas_df(): Convert to pandas DataFrame representation
Persistence¶
write(filename): Write data to file
Properties (from BNFit)¶
Node Information¶
nodes: Internal (original) node namesnode_types: Node type mapping (e.g.,{node1: type1, ...})node_values: Value counts for categorical nodes
Sample Information¶
N: Current sample size being usedsample: Access to underlying data sample
Type System¶
DatasetType Enumeration¶
class DatasetType(StrEnum):
CATEGORICAL = "categorical" # All categorical variables
CONTINUOUS = "continuous" # All float variables
MIXED = "mixed" # Mixed categorical and numeric
VariableType Enumeration¶
class VariableType(StrEnum):
INT16 = "int16"
INT32 = "int32"
INT64 = "int64"
FLOAT32 = "float32"
FLOAT64 = "float64"
CATEGORY = "category"
Internal Methods¶
_set_dstype() -> None¶
Determines the overall dataset type from individual node types. Sets dstype to:
CATEGORICAL: If all variables are categoricalCONTINUOUS: If all variables are numericMIXED: If variables are mixed types
_generate_random_names(seed: Optional[int]) -> None¶
Generates randomized external names for nodes using the format X###NNNNNN where:
###is a zero-padded random integerNNNNNNis the first 6 characters of the original name
When seed=None, reverts mappings back to original names.
Usage Patterns¶
Basic Inheritance¶
class CustomData(Data):
def __init__(self, source):
super().__init__()
# Initialize from source
def set_N(self, N, seed=None, random_selection=False):
# Implement sample size setting
def marginals(self, node, parents, values_reqd=False):
# Implement marginal computation
# ... implement other abstract methods
Name Randomization Workflow¶
# Store original order for later restoration
original_order = data.get_order()
# Randomize names for sensitivity testing
data.randomise_names(seed=42)
# Run algorithm with randomized names
results_randomized = run_algorithm(data)
# Restore original names
data.randomise_names(seed=None)
# Compare results
compare_sensitivity(results_original, results_randomized)
Integration with BNFit¶
The Data class fully implements the BNFit interface, enabling seamless integration with causaliq-core components:
- Parameter Estimation: Compatible with BN parameter fitting algorithms
- Score Calculation: Provides necessary marginals for score-based learning
- Constraint Testing: Supports conditional independence testing workflows