I/O Operations¶
The I/O utilities module provides enhanced file and path handling functionality, including robust path validation and DataFrame writing with advanced formatting options.
Overview¶
The causaliq_core.utils.io module provides:
- Path Validation: Robust checking of file and directory paths
- Enhanced DataFrame Writing: CSV output with compression and numerical formatting
- Cross-Platform Support: Consistent behavior across operating systems
- Error Handling: Clear error messages for common I/O issues
Functions¶
is_valid_path()¶
Validates that a path exists and matches the expected type (file or directory).
from causaliq_core.utils.io import is_valid_path
# Check if file exists
if is_valid_path('data/network.dsc', is_file=True):
print("File exists and is accessible")
# Check if directory exists
if is_valid_path('output/', is_file=False):
print("Directory exists and is accessible")
# Default behavior checks for file
try:
is_valid_path('important_file.txt')
print("File is valid")
except FileNotFoundError:
print("File not found")
Parameters:
path(str): Full path to validateis_file(bool): Whether path should be a file (True) or directory (False)
Returns:
bool: True if path exists and matches expected type
Raises:
TypeError: If arguments have invalid typesFileNotFoundError: If path doesn't exist or doesn't match expected type
write_dataframe()¶
Enhanced DataFrame writing with numerical formatting, compression, and validation.
from causaliq_core.utils.io import write_dataframe
import pandas as pd
# Create sample data
df = pd.DataFrame({
'measurement': [1.234567, 2.789012, 3.456789],
'category': ['A', 'B', 'C'],
'value': [10.123456789, 20.987654321, 30.555555555]
})
# Basic usage
write_dataframe(df, 'output.csv')
# With numerical formatting (3 significant figures)
write_dataframe(df, 'formatted.csv', sf=3)
# With compression
write_dataframe(df, 'compressed.csv.gz', compress=True)
# Preserve original DataFrame (default)
write_dataframe(df, 'output.csv', preserve=True)
# Modify DataFrame in-place (faster for large data)
write_dataframe(df, 'output.csv', preserve=False)
# Custom zero threshold
write_dataframe(df, 'output.csv', sf=4, zero=1e-6)
Parameters:
df(DataFrame): Pandas DataFrame to writefilename(str): Output file pathcompress(bool): Whether to gzip compress the output (default: False)sf(int): Number of significant figures for numerical formatting (default: 10)zero(float, optional): Values below this threshold are treated as zero (default: 10^(-sf))preserve(bool): Whether to preserve original DataFrame unchanged (default: True)
Returns:
- None
Raises:
TypeError: If arguments have invalid typesValueError: If sf or zero parameters are out of valid rangesFileNotFoundError: If destination directory doesn't exist
Features¶
Numerical Formatting¶
The write_dataframe() function provides sophisticated numerical formatting:
import pandas as pd
from causaliq_core.utils.io import write_dataframe
# Data with varying precision
df = pd.DataFrame({
'high_precision': [1.23456789012345, 2.98765432109876],
'low_precision': [1.2, 3.4],
'scientific': [1.23e-8, 4.56e12]
})
# Format to 3 significant figures
write_dataframe(df, 'formatted.csv', sf=3)
# Results: 1.23, 2.99, 1.20, 3.40, 1.23e-08, 4.56e+12
Compression Support¶
Automatic compression for large datasets:
# Large dataset
large_df = pd.DataFrame({
'data': range(100000),
'values': [random.random() for _ in range(100000)]
})
# Compressed output (much smaller file size)
write_dataframe(large_df, 'large_data.csv.gz', compress=True)
Memory Efficiency¶
Control memory usage with the preserve parameter:
# For large DataFrames, avoid copying
write_dataframe(huge_df, 'output.csv', preserve=False)
# Original DataFrame may be modified for efficiency
# For small DataFrames, preserve original
write_dataframe(small_df, 'output.csv', preserve=True) # Default
# Original DataFrame remains unchanged
Error Handling¶
The I/O utilities provide comprehensive error handling:
from causaliq_core.utils.io import write_dataframe, is_valid_path
# Handle path validation errors
try:
is_valid_path('nonexistent/path.txt')
except FileNotFoundError as e:
print(f"Path error: {e}")
# Handle DataFrame writing errors
try:
write_dataframe(df, '/invalid/path/output.csv')
except FileNotFoundError:
print("Destination directory doesn't exist")
try:
write_dataframe(df, 'output.csv', sf=50) # Invalid sf
except ValueError as e:
print(f"Parameter error: {e}")
Usage Patterns¶
Data Pipeline Integration¶
from causaliq_core.utils.io import write_dataframe, is_valid_path
import pandas as pd
from pathlib import Path
def save_analysis_results(df, output_dir, filename, compress_large=True):
"""Save analysis results with appropriate formatting."""
# Ensure output directory exists
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
# Full output path
full_path = output_path / filename
# Determine if compression is needed
should_compress = compress_large and len(df) > 10000
if should_compress:
full_path = full_path.with_suffix('.csv.gz')
# Write with appropriate settings
write_dataframe(
df,
str(full_path),
compress=should_compress,
sf=4, # 4 significant figures for analysis data
preserve=True # Keep original data unchanged
)
print(f"Results saved to {full_path}")
return full_path
Validation Workflow¶
from causaliq_core.utils.io import is_valid_path
def validate_inputs(file_paths):
"""Validate all required input files exist."""
missing_files = []
for path in file_paths:
try:
is_valid_path(path, is_file=True)
except FileNotFoundError:
missing_files.append(path)
if missing_files:
raise FileNotFoundError(f"Missing required files: {missing_files}")
print("All input files validated successfully")
API Reference¶
io
¶
IO-related utilities for file and path handling.
Classes:
-
FileFormatError–Exception raised when a file format is invalid or unsupported.
Functions:
-
is_valid_path–Check if path is a string and it exists.
-
write_dataframe–Write DataFrame to CSV with numeric rounding and compression options.
FileFormatError
¶
Exception raised when a file format is invalid or unsupported.
is_valid_path
¶
Check if path is a string and it exists.
Parameters:
-
(path¶str) –Full path name of file or directory.
-
(is_file¶bool, default:True) –Should path be a file (otherwise a directory).
Returns:
-
bool–True if path is valid and exists.
Raises:
-
TypeError–If arguments have bad types.
-
FileNotFoundError–If path is not found.
write_dataframe
¶
write_dataframe(
df: DataFrame,
filename: str,
compress: bool = False,
sf: int = 10,
zero: Optional[float] = None,
preserve: bool = True,
) -> None
Write DataFrame to CSV with numeric rounding and compression options.
Parameters:
-
(df¶DataFrame) –DataFrame to write.
-
(filename¶str) –Full path of output file.
-
(compress¶bool, default:False) –Whether to gzip compress the file.
-
(sf¶int, default:10) –Number of significant figures to retain for numeric values.
-
(zero¶Optional[float], default:None) –Absolute values below this counted as zero.
-
(preserve¶bool, default:True) –Whether df is left unchanged (True conserves original).
Raises:
-
TypeError–If argument types incorrect.
-
ValueError–If sf or zero parameters are invalid.
-
FileNotFoundError–If destination folder does not exist.