Cache Module (Core)¶

SQLite-backed caching infrastructure with shared token dictionary for efficient storage.

Future Migration

This module provides core caching capability that will migrate to causaliq-core. LLM-specific code (LLMEntryEncoder) stays in causaliq-knowledge.

Overview¶

The cache module provides:

TokenCache - SQLite-backed cache with connection management
EntryEncoder - Abstract base class for pluggable type-specific encoders
JsonEncoder - Tokenised encoder for JSON-serialisable data (50-70% compression)

Component	Migration Target
`TokenCache`	`causaliq-core`
`EntryEncoder`	`causaliq-core`
`JsonEncoder`	`causaliq-core`

For LLM-specific caching, see LLM Cache which provides LLMEntryEncoder with structured data types for requests and responses.

Design Philosophy¶

The cache uses SQLite for storage, providing:

Fast indexed key lookup
Built-in concurrency via SQLite locking
In-memory mode via :memory: for testing
Incremental updates without rewriting

See Caching Architecture for full design details.

Usage¶

Basic In-Memory Cache¶

from causaliq_knowledge.cache import TokenCache

# In-memory cache (fast, non-persistent)
with TokenCache(":memory:") as cache:
    assert cache.table_exists("tokens")
    assert cache.table_exists("cache_entries")

File-Based Persistent Cache¶

from causaliq_knowledge.cache import TokenCache

# File-based cache (persistent)
with TokenCache("my_cache.db") as cache:
    # Data persists across sessions
    print(f"Entries: {cache.entry_count()}")
    print(f"Tokens: {cache.token_count()}")

Transaction Support¶

from causaliq_knowledge.cache import TokenCache

with TokenCache(":memory:") as cache:
    # Transactions auto-commit on success, rollback on exception
    with cache.transaction() as cursor:
        cursor.execute("INSERT INTO tokens (token) VALUES (?)", ("example",))

Token Dictionary¶

The cache maintains a shared token dictionary for cross-entry compression. Encoders use this to convert strings to compact integer IDs:

from causaliq_knowledge.cache import TokenCache

with TokenCache(":memory:") as cache:
    # Get or create token IDs (used by encoders)
    id1 = cache.get_or_create_token("hello")  # Returns 1
    id2 = cache.get_or_create_token("world")  # Returns 2
    id1_again = cache.get_or_create_token("hello")  # Returns 1 (cached)

    # Look up token by ID (used by decoders)
    token = cache.get_token(1)  # Returns "hello"

Storing and Retrieving Entries¶

Cache entries are stored as binary blobs with a hash key and entry type:

from causaliq_knowledge.cache import TokenCache

with TokenCache(":memory:") as cache:
    # Store an entry
    cache.put("abc123", "llm", b"response data")

    # Check if entry exists
    if cache.exists("abc123", "llm"):
        # Retrieve entry
        data = cache.get("abc123", "llm")  # Returns b"response data"

    # Store with metadata
    cache.put("def456", "llm", b"data", metadata=b"extra info")
    result = cache.get_with_metadata("def456", "llm")
    # result = (b"data", b"extra info")

    # Delete entry
    cache.delete("abc123", "llm")

Auto-Encoding with Registered Encoders¶

Register an encoder to automatically encode/decode entries:

from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.cache.encoders import JsonEncoder

with TokenCache(":memory:") as cache:
    # Register encoder for "json" entry type
    cache.register_encoder("json", JsonEncoder())

    # Store data (auto-encoded)
    cache.put_data("hash1", "json", {"role": "user", "content": "Hello"})

    # Retrieve data (auto-decoded)
    data = cache.get_data("hash1", "json")
    # data = {"role": "user", "content": "Hello"}

    # Store with metadata
    cache.put_data("hash2", "json", 
                   {"response": "Hi!"}, 
                   metadata={"latency_ms": 150})
    result = cache.get_data_with_metadata("hash2", "json")
    # result = ({"response": "Hi!"}, {"latency_ms": 150})

Exporting and Importing Entries¶

Export cache entries to files for backup, migration, or sharing. Import entries from files into a cache:

from pathlib import Path
from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.cache.encoders import JsonEncoder

# Export entries to directory
with TokenCache("my_cache.db") as cache:
    cache.register_encoder("json", JsonEncoder())

    # Export all entries of type "json" to directory
    # Creates one file per entry: {hash}.json
    count = cache.export_entries(Path("./export"), "json")
    print(f"Exported {count} entries")

# Import entries from directory
with TokenCache("new_cache.db") as cache:
    cache.register_encoder("json", JsonEncoder())

    # Import all .json files from directory
    # Uses filename (without extension) as hash key
    count = cache.import_entries(Path("./export"), "json")
    print(f"Imported {count} entries")

Export behaviour:

Creates output directory if it doesn't exist
Writes each entry to {hash}.{ext} (e.g., abc123.json)
Uses encoder's export() method for human-readable format
Returns count of exported entries

Import behaviour:

Reads all files in directory (skips subdirectories)
Uses filename stem as hash key (e.g., abc123.json → key abc123)
Uses encoder's import_() method to parse content
Returns count of imported entries

API Reference¶

TokenCache ¶

TokenCache(db_path: str | Path)

SQLite-backed cache with shared token dictionary.

Attributes:

db_path –

Path to SQLite database file, or ":memory:" for in-memory.
conn (Connection) –

SQLite connection (None until open() called or context entered).

Example

with TokenCache(":memory:") as cache: ... cache.put("abc123", "test", b"hello") ... data = cache.get("abc123", "test")

Parameters:

db_path ¶
(str | Path) –

Path to SQLite database file. Use ":memory:" for in-memory database (fast, non-persistent).

Methods:

open –

Open the database connection and initialise schema.
close –

Close the database connection.
transaction –

Context manager for a database transaction.
table_exists –

Check if a table exists in the database.
entry_count –

Count cache entries, optionally filtered by type.
token_count –

Count tokens in the dictionary.
get_or_create_token –

Get token ID, creating a new entry if needed.
get_token –

Get token string by ID.
register_encoder –

Register an encoder for a specific entry type.
get_encoder –

Get the registered encoder for an entry type.
has_encoder –

Check if an encoder is registered for an entry type.
put –

Store a cache entry.
get –

Retrieve a cache entry.
get_with_metadata –

Retrieve a cache entry with its metadata.
exists –

Check if a cache entry exists.
delete –

Delete a cache entry.
put_data –

Store data using the registered encoder for the entry type.
get_data –

Retrieve and decode data using the registered encoder.
get_data_with_metadata –

Retrieve and decode data with metadata using registered encoder.
export_entries –

Export cache entries to human-readable files.
import_entries –

Import human-readable files into the cache.

is_open `property` ¶

is_open: bool

Check if the cache connection is open.

is_memory `property` ¶

is_memory: bool

Check if this is an in-memory database.

conn `property` ¶

conn: Connection

Get the database connection, raising if not connected.

open ¶

open() -> TokenCache

Open the database connection and initialise schema.

Returns:

TokenCache –

self for method chaining.

Raises:

RuntimeError –

If already connected.

close ¶

close() -> None

Close the database connection.

transaction ¶

transaction() -> Iterator[Cursor]

Context manager for a database transaction.

Commits on success, rolls back on exception.

Yields:

Cursor –

SQLite cursor for executing statements.

table_exists ¶

table_exists(table_name: str) -> bool

Check if a table exists in the database.

Parameters:

table_name ¶
(str) –

Name of the table to check.

Returns:

bool –

True if table exists, False otherwise.

entry_count ¶

entry_count(entry_type: str | None = None) -> int

Count cache entries, optionally filtered by type.

Parameters:

entry_type ¶
(str | None, default: None ) –

If provided, count only entries of this type.

Returns:

int –

Number of matching entries.

token_count ¶

token_count() -> int

Count tokens in the dictionary.

Returns:

int –

Number of tokens.

get_or_create_token ¶

get_or_create_token(token: str) -> int

Get token ID, creating a new entry if needed.

This method is used by encoders to compress strings to integer IDs. The token dictionary grows dynamically as new tokens are encountered.

Parameters:

token ¶
(str) –

The string token to look up or create.

Returns:

int –

Integer ID for the token (1-65535 range).

Raises:

ValueError –

If token dictionary exceeds uint16 capacity.

get_token ¶

get_token(token_id: int) -> str | None

Get token string by ID.

This method is used by decoders to expand integer IDs back to strings.

Parameters:

token_id ¶
(int) –

The integer ID to look up.

Returns:

str | None –

The token string, or None if not found.

register_encoder ¶

register_encoder(entry_type: str, encoder: EntryEncoder) -> None

Register an encoder for a specific entry type.

Once registered, put_data() and get_data() will automatically encode/decode entries of this type using the registered encoder.

Parameters:

entry_type ¶
(str) –

Type identifier (e.g. 'llm', 'json', 'score').
encoder ¶
(EntryEncoder) –

EntryEncoder instance for this type.

Example

from causaliq_knowledge.cache.encoders import JsonEncoder with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("key1", "json", {"msg": "hello"})

get_encoder ¶

get_encoder(entry_type: str) -> EntryEncoder | None

Get the registered encoder for an entry type.

Parameters:

entry_type ¶
(str) –

Type identifier to look up.

Returns:

EntryEncoder | None –

The registered encoder, or None if not registered.

has_encoder ¶

has_encoder(entry_type: str) -> bool

Check if an encoder is registered for an entry type.

Parameters:

entry_type ¶
(str) –

Type identifier to check.

Returns:

bool –

True if encoder is registered, False otherwise.

put ¶

put(hash: str, entry_type: str, data: bytes, metadata: bytes | None = None) -> None

Store a cache entry.

Parameters:

hash ¶
(str) –

Unique identifier for the entry (e.g. SHA-256 truncated).
entry_type ¶
(str) –

Type of entry (e.g. 'llm', 'graph', 'score').
data ¶
(bytes) –

Binary data to store.
metadata ¶
(bytes | None, default: None ) –

Optional binary metadata.

get ¶

get(hash: str, entry_type: str) -> bytes | None

Retrieve a cache entry.

Parameters:

hash ¶
(str) –

Unique identifier for the entry.
entry_type ¶
(str) –

Type of entry to retrieve.

Returns:

bytes | None –

Binary data if found, None otherwise.

get_with_metadata ¶

get_with_metadata(hash: str, entry_type: str) -> tuple[bytes, bytes | None] | None

Retrieve a cache entry with its metadata.

Parameters:

hash ¶
(str) –

Unique identifier for the entry.
entry_type ¶
(str) –

Type of entry to retrieve.

Returns:

tuple[bytes, bytes | None] | None –

Tuple of (data, metadata) if found, None otherwise.

exists ¶

exists(hash: str, entry_type: str) -> bool

Check if a cache entry exists.

Parameters:

hash ¶
(str) –

Unique identifier for the entry.
entry_type ¶
(str) –

Type of entry to check.

Returns:

bool –

True if entry exists, False otherwise.

delete ¶

delete(hash: str, entry_type: str) -> bool

Delete a cache entry.

Parameters:

hash ¶
(str) –

Unique identifier for the entry.
entry_type ¶
(str) –

Type of entry to delete.

Returns:

bool –

True if entry was deleted, False if it didn't exist.

put_data ¶

put_data(hash: str, entry_type: str, data: Any, metadata: Any | None = None) -> None

Store data using the registered encoder for the entry type.

This method automatically encodes the data using the encoder registered for the given entry_type. Use put() for raw bytes.

Parameters:

hash ¶
(str) –

Unique identifier for the entry.
entry_type ¶
(str) –

Type of entry (must have registered encoder).
data ¶
(Any) –

Data to encode and store.
metadata ¶
(Any | None, default: None ) –

Optional metadata to encode and store.

Raises:

KeyError –

If no encoder is registered for entry_type.

Example

with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("abc", "json", {"key": "value"})

get_data ¶

get_data(hash: str, entry_type: str) -> Any | None

Retrieve and decode data using the registered encoder.

This method automatically decodes the data using the encoder registered for the given entry_type. Use get() for raw bytes.

Parameters:

hash ¶
(str) –

Unique identifier for the entry.
entry_type ¶
(str) –

Type of entry (must have registered encoder).

Returns:

Any | None –

Decoded data if found, None otherwise.

Raises:

KeyError –

If no encoder is registered for entry_type.

Example

with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("abc", "json", {"key": "value"}) ... data = cache.get_data("abc", "json")

get_data_with_metadata ¶

get_data_with_metadata(hash: str, entry_type: str) -> tuple[Any, Any | None] | None

Retrieve and decode data with metadata using registered encoder.

Parameters:

hash ¶
(str) –

Unique identifier for the entry.
entry_type ¶
(str) –

Type of entry (must have registered encoder).

Returns:

tuple[Any, Any | None] | None –

Tuple of (decoded_data, decoded_metadata) if found, None otherwise.
tuple[Any, Any | None] | None –

metadata may be None if not stored.

Raises:

KeyError –

If no encoder is registered for entry_type.

export_entries ¶

export_entries(output_dir: Path, entry_type: str, fmt: str | None = None) -> int

Export cache entries to human-readable files.

Each entry is exported to a separate file named {hash}.{ext} where ext is determined by the format or encoder's default_export_format.

Parameters:

output_dir ¶
(Path) –

Directory to write exported files to. Created if it doesn't exist.
entry_type ¶
(str) –

Type of entries to export (must have registered encoder).
fmt ¶
(str | None, default: None ) –

Export format (e.g. 'json', 'yaml'). If None, uses the encoder's default_export_format.

Returns:

int –

Number of entries exported.

Raises:

KeyError –

If no encoder is registered for entry_type.

Example

from pathlib import Path from causaliq_knowledge.cache import TokenCache from causaliq_knowledge.cache.encoders import JsonEncoder with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("abc123", "json", {"key": "value"}) ... count = cache.export_entries(Path("./export"), "json") ... # Creates ./export/abc123.json

import_entries ¶

import_entries(input_dir: Path, entry_type: str) -> int

Import human-readable files into the cache.

Each file is imported with its stem (filename without extension) used as the cache hash. The encoder's import_() method reads the file and the data is encoded before storage.

Parameters:

input_dir ¶
(Path) –

Directory containing files to import.
entry_type ¶
(str) –

Type to assign to imported entries (must have registered encoder).

Returns:

int –

Number of entries imported.

Raises:

KeyError –

If no encoder is registered for entry_type.
FileNotFoundError –

If input_dir doesn't exist.

Example

from pathlib import Path from causaliq_knowledge.cache import TokenCache from causaliq_knowledge.cache.encoders import JsonEncoder with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... count = cache.import_entries(Path("./import"), "json") ... # Imports all files from ./import as "json" entries

EntryEncoder¶

The EntryEncoder abstract base class defines the interface for pluggable cache encoders. Each encoder handles a specific entry type (e.g., LLM requests, embeddings, documents).

Creating a Custom Encoder¶

from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.cache.encoders import EntryEncoder


class MyEncoder(EntryEncoder):
    """Example encoder for custom data types."""

    def encode(self, data: dict, cache: TokenCache) -> bytes:
        """Convert data to bytes for storage."""
        # Use cache.get_or_create_token() for string compression
        return b"encoded"

    def decode(self, data: bytes, cache: TokenCache) -> dict:
        """Convert bytes back to original data."""
        # Use cache.get_token() to restore strings
        return {"decoded": True}

    def export(self, data: bytes, cache: TokenCache, fmt: str) -> str:
        """Export to human-readable format (json, yaml)."""
        return '{"decoded": true}'

    def import_(self, data: str, cache: TokenCache, fmt: str) -> bytes:
        """Import from human-readable format."""
        return b"encoded"

Encoder Interface¶

EntryEncoder ¶

Abstract base class for type-specific cache entry encoders.

Encoders handle: - Encoding data to compact binary format for storage - Decoding binary data back to original structure - Exporting to human-readable formats (JSON, GraphML, etc.) - Importing from human-readable formats

Encoders may use the shared token dictionary in TokenCache for cross-entry compression of repeated strings.

Example

class MyEncoder(EntryEncoder): ... def encode(self, data, token_cache): ... return json.dumps(data).encode() ... def decode(self, blob, token_cache): ... return json.loads(blob.decode()) ... # ... export/import methods

Methods:

encode –

Encode data to binary format.
decode –

Decode binary data back to original structure.
export –

Export data to human-readable file format.
import_ –

Import data from human-readable file format.

Attributes:

default_export_format (str) –

Default file extension for exports (e.g. 'json', 'graphml').

default_export_format `property` ¶

default_export_format: str

Default file extension for exports (e.g. 'json', 'graphml').

encode `abstractmethod` ¶

encode(data: Any, token_cache: TokenCache) -> bytes

Encode data to binary format.

Parameters:

data ¶
(Any) –

The data to encode (type depends on encoder).
token_cache ¶
(TokenCache) –

Cache instance for shared token dictionary.

Returns:

bytes –

Compact binary representation.

decode `abstractmethod` ¶

decode(blob: bytes, token_cache: TokenCache) -> Any

Decode binary data back to original structure.

Parameters:

blob ¶
(bytes) –

Binary data from cache.
token_cache ¶
(TokenCache) –

Cache instance for shared token dictionary.

Returns:

Any –

Decoded data in original format.

export `abstractmethod` ¶

export(data: Any, path: Path) -> None

Export data to human-readable file format.

Parameters:

data ¶
(Any) –

The data to export (decoded format).
path ¶
(Path) –

Destination file path.

import_ `abstractmethod` ¶

import_(path: Path) -> Any

Import data from human-readable file format.

Parameters:

path ¶
(Path) –

Source file path.

Returns:

Any –

Imported data ready for encoding.

Cache Module (Core)¶

Overview¶

Design Philosophy¶

Usage¶

Basic In-Memory Cache¶

File-Based Persistent Cache¶

Transaction Support¶

Token Dictionary¶

Storing and Retrieving Entries¶

Auto-Encoding with Registered Encoders¶

Exporting and Importing Entries¶

API Reference¶

TokenCache ¶

db_path ¶

is_open property ¶

is_memory property ¶

conn property ¶

open ¶

close ¶

transaction ¶

table_exists ¶

table_name ¶

entry_count ¶

entry_type ¶

token_count ¶

get_or_create_token ¶

token ¶

get_token ¶

token_id ¶

register_encoder ¶

entry_type ¶

encoder ¶

get_encoder ¶

entry_type ¶

has_encoder ¶

entry_type ¶

put ¶

hash ¶

entry_type ¶

data ¶

metadata ¶

get ¶

hash ¶

entry_type ¶

get_with_metadata ¶

hash ¶

entry_type ¶

exists ¶

hash ¶

entry_type ¶

delete ¶

hash ¶

entry_type ¶

put_data ¶

hash ¶

entry_type ¶

data ¶

metadata ¶

get_data ¶

hash ¶

entry_type ¶

get_data_with_metadata ¶

hash ¶

entry_type ¶

export_entries ¶

output_dir ¶

entry_type ¶

fmt ¶

import_entries ¶

input_dir ¶

entry_type ¶

EntryEncoder¶

Creating a Custom Encoder¶

Encoder Interface¶

EntryEncoder ¶

default_export_format property ¶

encode abstractmethod ¶

data ¶

token_cache ¶

decode abstractmethod ¶

`db_path` ¶

is_open `property` ¶

is_memory `property` ¶

conn `property` ¶

`table_name` ¶

`entry_type` ¶

`token` ¶

`token_id` ¶

`entry_type` ¶

`encoder` ¶

`entry_type` ¶

`entry_type` ¶

`hash` ¶

`entry_type` ¶

`data` ¶

`metadata` ¶

`hash` ¶

`entry_type` ¶

`hash` ¶

`entry_type` ¶

`hash` ¶

`entry_type` ¶

`hash` ¶

`entry_type` ¶

`hash` ¶

`entry_type` ¶

`data` ¶

`metadata` ¶

`hash` ¶

`entry_type` ¶

`hash` ¶

`entry_type` ¶

`output_dir` ¶

`entry_type` ¶

`fmt` ¶

`input_dir` ¶

`entry_type` ¶

default_export_format `property` ¶

encode `abstractmethod` ¶

`data` ¶

`token_cache` ¶

decode `abstractmethod` ¶

`blob` ¶

`token_cache` ¶

export `abstractmethod` ¶

`data` ¶

`path` ¶

import_ `abstractmethod` ¶

`path` ¶