Skip to content

Cache Module (Core)

SQLite-backed caching infrastructure with shared token dictionary for efficient storage.

Future Migration

This module provides core caching capability that will migrate to causaliq-core. LLM-specific code (LLMEntryEncoder) stays in causaliq-knowledge.

Overview

The cache module provides:

  • TokenCache - SQLite-backed cache with connection management
  • EntryEncoder - Abstract base class for pluggable type-specific encoders
  • JsonEncoder - Tokenised encoder for JSON-serialisable data (50-70% compression)
Component Migration Target
TokenCache causaliq-core
EntryEncoder causaliq-core
JsonEncoder causaliq-core

For LLM-specific caching, see LLM Cache which provides LLMEntryEncoder with structured data types for requests and responses.

Design Philosophy

The cache uses SQLite for storage, providing:

  • Fast indexed key lookup
  • Built-in concurrency via SQLite locking
  • In-memory mode via :memory: for testing
  • Incremental updates without rewriting

See Caching Architecture for full design details.

Usage

Basic In-Memory Cache

from causaliq_knowledge.cache import TokenCache

# In-memory cache (fast, non-persistent)
with TokenCache(":memory:") as cache:
    assert cache.table_exists("tokens")
    assert cache.table_exists("cache_entries")

File-Based Persistent Cache

from causaliq_knowledge.cache import TokenCache

# File-based cache (persistent)
with TokenCache("my_cache.db") as cache:
    # Data persists across sessions
    print(f"Entries: {cache.entry_count()}")
    print(f"Tokens: {cache.token_count()}")

Transaction Support

from causaliq_knowledge.cache import TokenCache

with TokenCache(":memory:") as cache:
    # Transactions auto-commit on success, rollback on exception
    with cache.transaction() as cursor:
        cursor.execute("INSERT INTO tokens (token) VALUES (?)", ("example",))

Token Dictionary

The cache maintains a shared token dictionary for cross-entry compression. Encoders use this to convert strings to compact integer IDs:

from causaliq_knowledge.cache import TokenCache

with TokenCache(":memory:") as cache:
    # Get or create token IDs (used by encoders)
    id1 = cache.get_or_create_token("hello")  # Returns 1
    id2 = cache.get_or_create_token("world")  # Returns 2
    id1_again = cache.get_or_create_token("hello")  # Returns 1 (cached)

    # Look up token by ID (used by decoders)
    token = cache.get_token(1)  # Returns "hello"

Storing and Retrieving Entries

Cache entries are stored as binary blobs with a hash key and entry type:

from causaliq_knowledge.cache import TokenCache

with TokenCache(":memory:") as cache:
    # Store an entry
    cache.put("abc123", "llm", b"response data")

    # Check if entry exists
    if cache.exists("abc123", "llm"):
        # Retrieve entry
        data = cache.get("abc123", "llm")  # Returns b"response data"

    # Store with metadata
    cache.put("def456", "llm", b"data", metadata=b"extra info")
    result = cache.get_with_metadata("def456", "llm")
    # result = (b"data", b"extra info")

    # Delete entry
    cache.delete("abc123", "llm")

Auto-Encoding with Registered Encoders

Register an encoder to automatically encode/decode entries:

from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.cache.encoders import JsonEncoder

with TokenCache(":memory:") as cache:
    # Register encoder for "json" entry type
    cache.register_encoder("json", JsonEncoder())

    # Store data (auto-encoded)
    cache.put_data("hash1", "json", {"role": "user", "content": "Hello"})

    # Retrieve data (auto-decoded)
    data = cache.get_data("hash1", "json")
    # data = {"role": "user", "content": "Hello"}

    # Store with metadata
    cache.put_data("hash2", "json", 
                   {"response": "Hi!"}, 
                   metadata={"latency_ms": 150})
    result = cache.get_data_with_metadata("hash2", "json")
    # result = ({"response": "Hi!"}, {"latency_ms": 150})

Exporting and Importing Entries

Export cache entries to files for backup, migration, or sharing. Import entries from files into a cache:

from pathlib import Path
from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.cache.encoders import JsonEncoder

# Export entries to directory
with TokenCache("my_cache.db") as cache:
    cache.register_encoder("json", JsonEncoder())

    # Export all entries of type "json" to directory
    # Creates one file per entry: {hash}.json
    count = cache.export_entries(Path("./export"), "json")
    print(f"Exported {count} entries")

# Import entries from directory
with TokenCache("new_cache.db") as cache:
    cache.register_encoder("json", JsonEncoder())

    # Import all .json files from directory
    # Uses filename (without extension) as hash key
    count = cache.import_entries(Path("./export"), "json")
    print(f"Imported {count} entries")

Export behaviour:

  • Creates output directory if it doesn't exist
  • Writes each entry to {hash}.{ext} (e.g., abc123.json)
  • Uses encoder's export() method for human-readable format
  • Returns count of exported entries

Import behaviour:

  • Reads all files in directory (skips subdirectories)
  • Uses filename stem as hash key (e.g., abc123.json → key abc123)
  • Uses encoder's import_() method to parse content
  • Returns count of imported entries

API Reference

TokenCache

TokenCache(db_path: str | Path)

SQLite-backed cache with shared token dictionary.

Attributes:

  • db_path

    Path to SQLite database file, or ":memory:" for in-memory.

  • conn (Connection) –

    SQLite connection (None until open() called or context entered).

Example

with TokenCache(":memory:") as cache: ... cache.put("abc123", "test", b"hello") ... data = cache.get("abc123", "test")

Parameters:

  • db_path

    (str | Path) –

    Path to SQLite database file. Use ":memory:" for in-memory database (fast, non-persistent).

Methods:

  • open

    Open the database connection and initialise schema.

  • close

    Close the database connection.

  • transaction

    Context manager for a database transaction.

  • table_exists

    Check if a table exists in the database.

  • entry_count

    Count cache entries, optionally filtered by type.

  • token_count

    Count tokens in the dictionary.

  • get_or_create_token

    Get token ID, creating a new entry if needed.

  • get_token

    Get token string by ID.

  • register_encoder

    Register an encoder for a specific entry type.

  • get_encoder

    Get the registered encoder for an entry type.

  • has_encoder

    Check if an encoder is registered for an entry type.

  • put

    Store a cache entry.

  • get

    Retrieve a cache entry.

  • get_with_metadata

    Retrieve a cache entry with its metadata.

  • exists

    Check if a cache entry exists.

  • delete

    Delete a cache entry.

  • put_data

    Store data using the registered encoder for the entry type.

  • get_data

    Retrieve and decode data using the registered encoder.

  • get_data_with_metadata

    Retrieve and decode data with metadata using registered encoder.

  • export_entries

    Export cache entries to human-readable files.

  • import_entries

    Import human-readable files into the cache.

is_open property

is_open: bool

Check if the cache connection is open.

is_memory property

is_memory: bool

Check if this is an in-memory database.

conn property

conn: Connection

Get the database connection, raising if not connected.

open

open() -> TokenCache

Open the database connection and initialise schema.

Returns:

Raises:

  • RuntimeError

    If already connected.

close

close() -> None

Close the database connection.

transaction

transaction() -> Iterator[Cursor]

Context manager for a database transaction.

Commits on success, rolls back on exception.

Yields:

  • Cursor

    SQLite cursor for executing statements.

table_exists

table_exists(table_name: str) -> bool

Check if a table exists in the database.

Parameters:

  • table_name

    (str) –

    Name of the table to check.

Returns:

  • bool

    True if table exists, False otherwise.

entry_count

entry_count(entry_type: str | None = None) -> int

Count cache entries, optionally filtered by type.

Parameters:

  • entry_type

    (str | None, default: None ) –

    If provided, count only entries of this type.

Returns:

  • int

    Number of matching entries.

token_count

token_count() -> int

Count tokens in the dictionary.

Returns:

  • int

    Number of tokens.

get_or_create_token

get_or_create_token(token: str) -> int

Get token ID, creating a new entry if needed.

This method is used by encoders to compress strings to integer IDs. The token dictionary grows dynamically as new tokens are encountered.

Parameters:

  • token

    (str) –

    The string token to look up or create.

Returns:

  • int

    Integer ID for the token (1-65535 range).

Raises:

  • ValueError

    If token dictionary exceeds uint16 capacity.

get_token

get_token(token_id: int) -> str | None

Get token string by ID.

This method is used by decoders to expand integer IDs back to strings.

Parameters:

  • token_id

    (int) –

    The integer ID to look up.

Returns:

  • str | None

    The token string, or None if not found.

register_encoder

register_encoder(entry_type: str, encoder: EntryEncoder) -> None

Register an encoder for a specific entry type.

Once registered, put_data() and get_data() will automatically encode/decode entries of this type using the registered encoder.

Parameters:

  • entry_type

    (str) –

    Type identifier (e.g. 'llm', 'json', 'score').

  • encoder

    (EntryEncoder) –

    EntryEncoder instance for this type.

Example

from causaliq_knowledge.cache.encoders import JsonEncoder with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("key1", "json", {"msg": "hello"})

get_encoder

get_encoder(entry_type: str) -> EntryEncoder | None

Get the registered encoder for an entry type.

Parameters:

  • entry_type

    (str) –

    Type identifier to look up.

Returns:

  • EntryEncoder | None

    The registered encoder, or None if not registered.

has_encoder

has_encoder(entry_type: str) -> bool

Check if an encoder is registered for an entry type.

Parameters:

  • entry_type

    (str) –

    Type identifier to check.

Returns:

  • bool

    True if encoder is registered, False otherwise.

put

put(hash: str, entry_type: str, data: bytes, metadata: bytes | None = None) -> None

Store a cache entry.

Parameters:

  • hash

    (str) –

    Unique identifier for the entry (e.g. SHA-256 truncated).

  • entry_type

    (str) –

    Type of entry (e.g. 'llm', 'graph', 'score').

  • data

    (bytes) –

    Binary data to store.

  • metadata

    (bytes | None, default: None ) –

    Optional binary metadata.

get

get(hash: str, entry_type: str) -> bytes | None

Retrieve a cache entry.

Parameters:

  • hash

    (str) –

    Unique identifier for the entry.

  • entry_type

    (str) –

    Type of entry to retrieve.

Returns:

  • bytes | None

    Binary data if found, None otherwise.

get_with_metadata

get_with_metadata(hash: str, entry_type: str) -> tuple[bytes, bytes | None] | None

Retrieve a cache entry with its metadata.

Parameters:

  • hash

    (str) –

    Unique identifier for the entry.

  • entry_type

    (str) –

    Type of entry to retrieve.

Returns:

  • tuple[bytes, bytes | None] | None

    Tuple of (data, metadata) if found, None otherwise.

exists

exists(hash: str, entry_type: str) -> bool

Check if a cache entry exists.

Parameters:

  • hash

    (str) –

    Unique identifier for the entry.

  • entry_type

    (str) –

    Type of entry to check.

Returns:

  • bool

    True if entry exists, False otherwise.

delete

delete(hash: str, entry_type: str) -> bool

Delete a cache entry.

Parameters:

  • hash

    (str) –

    Unique identifier for the entry.

  • entry_type

    (str) –

    Type of entry to delete.

Returns:

  • bool

    True if entry was deleted, False if it didn't exist.

put_data

put_data(hash: str, entry_type: str, data: Any, metadata: Any | None = None) -> None

Store data using the registered encoder for the entry type.

This method automatically encodes the data using the encoder registered for the given entry_type. Use put() for raw bytes.

Parameters:

  • hash

    (str) –

    Unique identifier for the entry.

  • entry_type

    (str) –

    Type of entry (must have registered encoder).

  • data

    (Any) –

    Data to encode and store.

  • metadata

    (Any | None, default: None ) –

    Optional metadata to encode and store.

Raises:

  • KeyError

    If no encoder is registered for entry_type.

Example

with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("abc", "json", {"key": "value"})

get_data

get_data(hash: str, entry_type: str) -> Any | None

Retrieve and decode data using the registered encoder.

This method automatically decodes the data using the encoder registered for the given entry_type. Use get() for raw bytes.

Parameters:

  • hash

    (str) –

    Unique identifier for the entry.

  • entry_type

    (str) –

    Type of entry (must have registered encoder).

Returns:

  • Any | None

    Decoded data if found, None otherwise.

Raises:

  • KeyError

    If no encoder is registered for entry_type.

Example

with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("abc", "json", {"key": "value"}) ... data = cache.get_data("abc", "json")

get_data_with_metadata

get_data_with_metadata(hash: str, entry_type: str) -> tuple[Any, Any | None] | None

Retrieve and decode data with metadata using registered encoder.

Parameters:

  • hash

    (str) –

    Unique identifier for the entry.

  • entry_type

    (str) –

    Type of entry (must have registered encoder).

Returns:

  • tuple[Any, Any | None] | None

    Tuple of (decoded_data, decoded_metadata) if found, None otherwise.

  • tuple[Any, Any | None] | None

    metadata may be None if not stored.

Raises:

  • KeyError

    If no encoder is registered for entry_type.

export_entries

export_entries(output_dir: Path, entry_type: str, fmt: str | None = None) -> int

Export cache entries to human-readable files.

Each entry is exported to a separate file named {hash}.{ext} where ext is determined by the format or encoder's default_export_format.

Parameters:

  • output_dir

    (Path) –

    Directory to write exported files to. Created if it doesn't exist.

  • entry_type

    (str) –

    Type of entries to export (must have registered encoder).

  • fmt

    (str | None, default: None ) –

    Export format (e.g. 'json', 'yaml'). If None, uses the encoder's default_export_format.

Returns:

  • int

    Number of entries exported.

Raises:

  • KeyError

    If no encoder is registered for entry_type.

Example

from pathlib import Path from causaliq_knowledge.cache import TokenCache from causaliq_knowledge.cache.encoders import JsonEncoder with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("abc123", "json", {"key": "value"}) ... count = cache.export_entries(Path("./export"), "json") ... # Creates ./export/abc123.json

import_entries

import_entries(input_dir: Path, entry_type: str) -> int

Import human-readable files into the cache.

Each file is imported with its stem (filename without extension) used as the cache hash. The encoder's import_() method reads the file and the data is encoded before storage.

Parameters:

  • input_dir

    (Path) –

    Directory containing files to import.

  • entry_type

    (str) –

    Type to assign to imported entries (must have registered encoder).

Returns:

  • int

    Number of entries imported.

Raises:

  • KeyError

    If no encoder is registered for entry_type.

  • FileNotFoundError

    If input_dir doesn't exist.

Example

from pathlib import Path from causaliq_knowledge.cache import TokenCache from causaliq_knowledge.cache.encoders import JsonEncoder with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... count = cache.import_entries(Path("./import"), "json") ... # Imports all files from ./import as "json" entries

EntryEncoder

The EntryEncoder abstract base class defines the interface for pluggable cache encoders. Each encoder handles a specific entry type (e.g., LLM requests, embeddings, documents).

Creating a Custom Encoder

from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.cache.encoders import EntryEncoder


class MyEncoder(EntryEncoder):
    """Example encoder for custom data types."""

    def encode(self, data: dict, cache: TokenCache) -> bytes:
        """Convert data to bytes for storage."""
        # Use cache.get_or_create_token() for string compression
        return b"encoded"

    def decode(self, data: bytes, cache: TokenCache) -> dict:
        """Convert bytes back to original data."""
        # Use cache.get_token() to restore strings
        return {"decoded": True}

    def export(self, data: bytes, cache: TokenCache, fmt: str) -> str:
        """Export to human-readable format (json, yaml)."""
        return '{"decoded": true}'

    def import_(self, data: str, cache: TokenCache, fmt: str) -> bytes:
        """Import from human-readable format."""
        return b"encoded"

Encoder Interface

EntryEncoder

Abstract base class for type-specific cache entry encoders.

Encoders handle: - Encoding data to compact binary format for storage - Decoding binary data back to original structure - Exporting to human-readable formats (JSON, GraphML, etc.) - Importing from human-readable formats

Encoders may use the shared token dictionary in TokenCache for cross-entry compression of repeated strings.

Example

class MyEncoder(EntryEncoder): ... def encode(self, data, token_cache): ... return json.dumps(data).encode() ... def decode(self, blob, token_cache): ... return json.loads(blob.decode()) ... # ... export/import methods

Methods:

  • encode

    Encode data to binary format.

  • decode

    Decode binary data back to original structure.

  • export

    Export data to human-readable file format.

  • import_

    Import data from human-readable file format.

Attributes:

default_export_format property

default_export_format: str

Default file extension for exports (e.g. 'json', 'graphml').

encode abstractmethod

encode(data: Any, token_cache: TokenCache) -> bytes

Encode data to binary format.

Parameters:

  • data

    (Any) –

    The data to encode (type depends on encoder).

  • token_cache

    (TokenCache) –

    Cache instance for shared token dictionary.

Returns:

  • bytes

    Compact binary representation.

decode abstractmethod

decode(blob: bytes, token_cache: TokenCache) -> Any

Decode binary data back to original structure.

Parameters:

  • blob

    (bytes) –

    Binary data from cache.

  • token_cache

    (TokenCache) –

    Cache instance for shared token dictionary.

Returns:

  • Any

    Decoded data in original format.

export abstractmethod

export(data: Any, path: Path) -> None

Export data to human-readable file format.

Parameters:

  • data

    (Any) –

    The data to export (decoded format).

  • path

    (Path) –

    Destination file path.

import_ abstractmethod

import_(path: Path) -> Any

Import data from human-readable file format.

Parameters:

  • path

    (Path) –

    Source file path.

Returns:

  • Any

    Imported data ready for encoding.