Skip to content

I/O Operations

The I/O utilities module provides enhanced file and path handling functionality, including robust path validation and DataFrame writing with advanced formatting options.

Overview

The causaliq_core.utils.io module provides:

  • Path Validation: Robust checking of file and directory paths
  • Enhanced DataFrame Writing: CSV output with compression and numerical formatting
  • Cross-Platform Support: Consistent behavior across operating systems
  • Error Handling: Clear error messages for common I/O issues

Functions

is_valid_path()

Validates that a path exists and matches the expected type (file or directory).

from causaliq_core.utils.io import is_valid_path

# Check if file exists
if is_valid_path('data/network.dsc', is_file=True):
    print("File exists and is accessible")

# Check if directory exists  
if is_valid_path('output/', is_file=False):
    print("Directory exists and is accessible")

# Default behavior checks for file
try:
    is_valid_path('important_file.txt')
    print("File is valid")
except FileNotFoundError:
    print("File not found")

Parameters:

  • path (str): Full path to validate
  • is_file (bool): Whether path should be a file (True) or directory (False)

Returns:

  • bool: True if path exists and matches expected type

Raises:

  • TypeError: If arguments have invalid types
  • FileNotFoundError: If path doesn't exist or doesn't match expected type

write_dataframe()

Enhanced DataFrame writing with numerical formatting, compression, and validation.

from causaliq_core.utils.io import write_dataframe
import pandas as pd

# Create sample data
df = pd.DataFrame({
    'measurement': [1.234567, 2.789012, 3.456789],
    'category': ['A', 'B', 'C'],
    'value': [10.123456789, 20.987654321, 30.555555555]
})

# Basic usage
write_dataframe(df, 'output.csv')

# With numerical formatting (3 significant figures)
write_dataframe(df, 'formatted.csv', sf=3)

# With compression
write_dataframe(df, 'compressed.csv.gz', compress=True)

# Preserve original DataFrame (default)
write_dataframe(df, 'output.csv', preserve=True)

# Modify DataFrame in-place (faster for large data)
write_dataframe(df, 'output.csv', preserve=False)

# Custom zero threshold
write_dataframe(df, 'output.csv', sf=4, zero=1e-6)

Parameters:

  • df (DataFrame): Pandas DataFrame to write
  • filename (str): Output file path
  • compress (bool): Whether to gzip compress the output (default: False)
  • sf (int): Number of significant figures for numerical formatting (default: 10)
  • zero (float, optional): Values below this threshold are treated as zero (default: 10^(-sf))
  • preserve (bool): Whether to preserve original DataFrame unchanged (default: True)

Returns:

  • None

Raises:

  • TypeError: If arguments have invalid types
  • ValueError: If sf or zero parameters are out of valid ranges
  • FileNotFoundError: If destination directory doesn't exist

Features

Numerical Formatting

The write_dataframe() function provides sophisticated numerical formatting:

import pandas as pd
from causaliq_core.utils.io import write_dataframe

# Data with varying precision
df = pd.DataFrame({
    'high_precision': [1.23456789012345, 2.98765432109876],
    'low_precision': [1.2, 3.4],
    'scientific': [1.23e-8, 4.56e12]
})

# Format to 3 significant figures
write_dataframe(df, 'formatted.csv', sf=3)
# Results: 1.23, 2.99, 1.20, 3.40, 1.23e-08, 4.56e+12

Compression Support

Automatic compression for large datasets:

# Large dataset
large_df = pd.DataFrame({
    'data': range(100000),
    'values': [random.random() for _ in range(100000)]
})

# Compressed output (much smaller file size)
write_dataframe(large_df, 'large_data.csv.gz', compress=True)

Memory Efficiency

Control memory usage with the preserve parameter:

# For large DataFrames, avoid copying
write_dataframe(huge_df, 'output.csv', preserve=False)
# Original DataFrame may be modified for efficiency

# For small DataFrames, preserve original
write_dataframe(small_df, 'output.csv', preserve=True)  # Default
# Original DataFrame remains unchanged

Error Handling

The I/O utilities provide comprehensive error handling:

from causaliq_core.utils.io import write_dataframe, is_valid_path

# Handle path validation errors
try:
    is_valid_path('nonexistent/path.txt')
except FileNotFoundError as e:
    print(f"Path error: {e}")

# Handle DataFrame writing errors
try:
    write_dataframe(df, '/invalid/path/output.csv')
except FileNotFoundError:
    print("Destination directory doesn't exist")

try:
    write_dataframe(df, 'output.csv', sf=50)  # Invalid sf
except ValueError as e:
    print(f"Parameter error: {e}")

Usage Patterns

Data Pipeline Integration

from causaliq_core.utils.io import write_dataframe, is_valid_path
import pandas as pd
from pathlib import Path

def save_analysis_results(df, output_dir, filename, compress_large=True):
    """Save analysis results with appropriate formatting."""

    # Ensure output directory exists
    output_path = Path(output_dir)
    output_path.mkdir(parents=True, exist_ok=True)

    # Full output path
    full_path = output_path / filename

    # Determine if compression is needed
    should_compress = compress_large and len(df) > 10000
    if should_compress:
        full_path = full_path.with_suffix('.csv.gz')

    # Write with appropriate settings
    write_dataframe(
        df, 
        str(full_path),
        compress=should_compress,
        sf=4,  # 4 significant figures for analysis data
        preserve=True  # Keep original data unchanged
    )

    print(f"Results saved to {full_path}")
    return full_path

Validation Workflow

from causaliq_core.utils.io import is_valid_path

def validate_inputs(file_paths):
    """Validate all required input files exist."""

    missing_files = []
    for path in file_paths:
        try:
            is_valid_path(path, is_file=True)
        except FileNotFoundError:
            missing_files.append(path)

    if missing_files:
        raise FileNotFoundError(f"Missing required files: {missing_files}")

    print("All input files validated successfully")

API Reference

io

IO-related utilities for file and path handling.

Classes:

  • FileFormatError

    Exception raised when a file format is invalid or unsupported.

Functions:

  • is_valid_path

    Check if path is a string and it exists.

  • write_dataframe

    Write DataFrame to CSV with numeric rounding and compression options.

FileFormatError

Exception raised when a file format is invalid or unsupported.

is_valid_path

is_valid_path(path: str, is_file: bool = True) -> bool

Check if path is a string and it exists.

Parameters:

  • path
    (str) –

    Full path name of file or directory.

  • is_file
    (bool, default: True ) –

    Should path be a file (otherwise a directory).

Returns:

  • bool

    True if path is valid and exists.

Raises:

  • TypeError

    If arguments have bad types.

  • FileNotFoundError

    If path is not found.

write_dataframe

write_dataframe(
    df: DataFrame,
    filename: str,
    compress: bool = False,
    sf: int = 10,
    zero: Optional[float] = None,
    preserve: bool = True,
) -> None

Write DataFrame to CSV with numeric rounding and compression options.

Parameters:

  • df
    (DataFrame) –

    DataFrame to write.

  • filename
    (str) –

    Full path of output file.

  • compress
    (bool, default: False ) –

    Whether to gzip compress the file.

  • sf
    (int, default: 10 ) –

    Number of significant figures to retain for numeric values.

  • zero
    (Optional[float], default: None ) –

    Absolute values below this counted as zero.

  • preserve
    (bool, default: True ) –

    Whether df is left unchanged (True conserves original).

Raises:

  • TypeError

    If argument types incorrect.

  • ValueError

    If sf or zero parameters are invalid.

  • FileNotFoundError

    If destination folder does not exist.