Cache Module (Core)¶
SQLite-backed caching infrastructure with shared token dictionary for efficient storage.
Future Migration
This module provides core caching capability that will migrate to
causaliq-core. LLM-specific code (LLMEntryEncoder)
stays in causaliq-knowledge.
Overview¶
The cache module provides:
- TokenCache - SQLite-backed cache with connection management
- EntryEncoder - Abstract base class for pluggable type-specific encoders
- JsonEncoder - Tokenised encoder for JSON-serialisable data (50-70% compression)
| Component | Migration Target |
|---|---|
TokenCache |
causaliq-core |
EntryEncoder |
causaliq-core |
JsonEncoder |
causaliq-core |
For LLM-specific caching, see LLM Cache which provides
LLMEntryEncoder with structured data types for requests and responses.
Design Philosophy¶
The cache uses SQLite for storage, providing:
- Fast indexed key lookup
- Built-in concurrency via SQLite locking
- In-memory mode via
:memory:for testing - Incremental updates without rewriting
See Caching Architecture for full design details.
Usage¶
Basic In-Memory Cache¶
from causaliq_knowledge.cache import TokenCache
# In-memory cache (fast, non-persistent)
with TokenCache(":memory:") as cache:
assert cache.table_exists("tokens")
assert cache.table_exists("cache_entries")
File-Based Persistent Cache¶
from causaliq_knowledge.cache import TokenCache
# File-based cache (persistent)
with TokenCache("my_cache.db") as cache:
# Data persists across sessions
print(f"Entries: {cache.entry_count()}")
print(f"Tokens: {cache.token_count()}")
Transaction Support¶
from causaliq_knowledge.cache import TokenCache
with TokenCache(":memory:") as cache:
# Transactions auto-commit on success, rollback on exception
with cache.transaction() as cursor:
cursor.execute("INSERT INTO tokens (token) VALUES (?)", ("example",))
Token Dictionary¶
The cache maintains a shared token dictionary for cross-entry compression. Encoders use this to convert strings to compact integer IDs:
from causaliq_knowledge.cache import TokenCache
with TokenCache(":memory:") as cache:
# Get or create token IDs (used by encoders)
id1 = cache.get_or_create_token("hello") # Returns 1
id2 = cache.get_or_create_token("world") # Returns 2
id1_again = cache.get_or_create_token("hello") # Returns 1 (cached)
# Look up token by ID (used by decoders)
token = cache.get_token(1) # Returns "hello"
Storing and Retrieving Entries¶
Cache entries are stored as binary blobs with a hash key and entry type:
from causaliq_knowledge.cache import TokenCache
with TokenCache(":memory:") as cache:
# Store an entry
cache.put("abc123", "llm", b"response data")
# Check if entry exists
if cache.exists("abc123", "llm"):
# Retrieve entry
data = cache.get("abc123", "llm") # Returns b"response data"
# Store with metadata
cache.put("def456", "llm", b"data", metadata=b"extra info")
result = cache.get_with_metadata("def456", "llm")
# result = (b"data", b"extra info")
# Delete entry
cache.delete("abc123", "llm")
Auto-Encoding with Registered Encoders¶
Register an encoder to automatically encode/decode entries:
from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.cache.encoders import JsonEncoder
with TokenCache(":memory:") as cache:
# Register encoder for "json" entry type
cache.register_encoder("json", JsonEncoder())
# Store data (auto-encoded)
cache.put_data("hash1", "json", {"role": "user", "content": "Hello"})
# Retrieve data (auto-decoded)
data = cache.get_data("hash1", "json")
# data = {"role": "user", "content": "Hello"}
# Store with metadata
cache.put_data("hash2", "json",
{"response": "Hi!"},
metadata={"latency_ms": 150})
result = cache.get_data_with_metadata("hash2", "json")
# result = ({"response": "Hi!"}, {"latency_ms": 150})
Exporting and Importing Entries¶
Export cache entries to files for backup, migration, or sharing. Import entries from files into a cache:
from pathlib import Path
from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.cache.encoders import JsonEncoder
# Export entries to directory
with TokenCache("my_cache.db") as cache:
cache.register_encoder("json", JsonEncoder())
# Export all entries of type "json" to directory
# Creates one file per entry: {hash}.json
count = cache.export_entries(Path("./export"), "json")
print(f"Exported {count} entries")
# Import entries from directory
with TokenCache("new_cache.db") as cache:
cache.register_encoder("json", JsonEncoder())
# Import all .json files from directory
# Uses filename (without extension) as hash key
count = cache.import_entries(Path("./export"), "json")
print(f"Imported {count} entries")
Export behaviour:
- Creates output directory if it doesn't exist
- Writes each entry to
{hash}.{ext}(e.g.,abc123.json) - Uses encoder's
export()method for human-readable format - Returns count of exported entries
Import behaviour:
- Reads all files in directory (skips subdirectories)
- Uses filename stem as hash key (e.g.,
abc123.json→ keyabc123) - Uses encoder's
import_()method to parse content - Returns count of imported entries
API Reference¶
TokenCache
¶
TokenCache(db_path: str | Path)
SQLite-backed cache with shared token dictionary.
Attributes:
-
db_path–Path to SQLite database file, or ":memory:" for in-memory.
-
conn(Connection) –SQLite connection (None until open() called or context entered).
Example
with TokenCache(":memory:") as cache: ... cache.put("abc123", "test", b"hello") ... data = cache.get("abc123", "test")
Parameters:
-
(db_path¶str | Path) –Path to SQLite database file. Use ":memory:" for in-memory database (fast, non-persistent).
Methods:
-
open–Open the database connection and initialise schema.
-
close–Close the database connection.
-
transaction–Context manager for a database transaction.
-
table_exists–Check if a table exists in the database.
-
entry_count–Count cache entries, optionally filtered by type.
-
token_count–Count tokens in the dictionary.
-
get_or_create_token–Get token ID, creating a new entry if needed.
-
get_token–Get token string by ID.
-
register_encoder–Register an encoder for a specific entry type.
-
get_encoder–Get the registered encoder for an entry type.
-
has_encoder–Check if an encoder is registered for an entry type.
-
put–Store a cache entry.
-
get–Retrieve a cache entry.
-
get_with_metadata–Retrieve a cache entry with its metadata.
-
exists–Check if a cache entry exists.
-
delete–Delete a cache entry.
-
put_data–Store data using the registered encoder for the entry type.
-
get_data–Retrieve and decode data using the registered encoder.
-
get_data_with_metadata–Retrieve and decode data with metadata using registered encoder.
-
export_entries–Export cache entries to human-readable files.
-
import_entries–Import human-readable files into the cache.
open
¶
open() -> TokenCache
Open the database connection and initialise schema.
Returns:
-
TokenCache–self for method chaining.
Raises:
-
RuntimeError–If already connected.
transaction
¶
Context manager for a database transaction.
Commits on success, rolls back on exception.
Yields:
-
Cursor–SQLite cursor for executing statements.
table_exists
¶
table_exists(table_name: str) -> bool
Check if a table exists in the database.
Parameters:
-
(table_name¶str) –Name of the table to check.
Returns:
-
bool–True if table exists, False otherwise.
entry_count
¶
entry_count(entry_type: str | None = None) -> int
Count cache entries, optionally filtered by type.
Parameters:
-
(entry_type¶str | None, default:None) –If provided, count only entries of this type.
Returns:
-
int–Number of matching entries.
get_or_create_token
¶
get_or_create_token(token: str) -> int
Get token ID, creating a new entry if needed.
This method is used by encoders to compress strings to integer IDs. The token dictionary grows dynamically as new tokens are encountered.
Parameters:
-
(token¶str) –The string token to look up or create.
Returns:
-
int–Integer ID for the token (1-65535 range).
Raises:
-
ValueError–If token dictionary exceeds uint16 capacity.
get_token
¶
get_token(token_id: int) -> str | None
Get token string by ID.
This method is used by decoders to expand integer IDs back to strings.
Parameters:
-
(token_id¶int) –The integer ID to look up.
Returns:
-
str | None–The token string, or None if not found.
register_encoder
¶
register_encoder(entry_type: str, encoder: EntryEncoder) -> None
Register an encoder for a specific entry type.
Once registered, put_data() and get_data() will automatically
encode/decode entries of this type using the registered encoder.
Parameters:
-
(entry_type¶str) –Type identifier (e.g. 'llm', 'json', 'score').
-
(encoder¶EntryEncoder) –EntryEncoder instance for this type.
Example
from causaliq_knowledge.cache.encoders import JsonEncoder with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("key1", "json", {"msg": "hello"})
get_encoder
¶
get_encoder(entry_type: str) -> EntryEncoder | None
Get the registered encoder for an entry type.
Parameters:
-
(entry_type¶str) –Type identifier to look up.
Returns:
-
EntryEncoder | None–The registered encoder, or None if not registered.
has_encoder
¶
has_encoder(entry_type: str) -> bool
Check if an encoder is registered for an entry type.
Parameters:
-
(entry_type¶str) –Type identifier to check.
Returns:
-
bool–True if encoder is registered, False otherwise.
put
¶
put(hash: str, entry_type: str, data: bytes, metadata: bytes | None = None) -> None
get
¶
get(hash: str, entry_type: str) -> bytes | None
get_with_metadata
¶
get_with_metadata(hash: str, entry_type: str) -> tuple[bytes, bytes | None] | None
exists
¶
exists(hash: str, entry_type: str) -> bool
delete
¶
delete(hash: str, entry_type: str) -> bool
put_data
¶
put_data(hash: str, entry_type: str, data: Any, metadata: Any | None = None) -> None
Store data using the registered encoder for the entry type.
This method automatically encodes the data using the encoder
registered for the given entry_type. Use put() for raw bytes.
Parameters:
-
(hash¶str) –Unique identifier for the entry.
-
(entry_type¶str) –Type of entry (must have registered encoder).
-
(data¶Any) –Data to encode and store.
-
(metadata¶Any | None, default:None) –Optional metadata to encode and store.
Raises:
-
KeyError–If no encoder is registered for entry_type.
Example
with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("abc", "json", {"key": "value"})
get_data
¶
get_data(hash: str, entry_type: str) -> Any | None
Retrieve and decode data using the registered encoder.
This method automatically decodes the data using the encoder
registered for the given entry_type. Use get() for raw bytes.
Parameters:
-
(hash¶str) –Unique identifier for the entry.
-
(entry_type¶str) –Type of entry (must have registered encoder).
Returns:
-
Any | None–Decoded data if found, None otherwise.
Raises:
-
KeyError–If no encoder is registered for entry_type.
Example
with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("abc", "json", {"key": "value"}) ... data = cache.get_data("abc", "json")
get_data_with_metadata
¶
get_data_with_metadata(hash: str, entry_type: str) -> tuple[Any, Any | None] | None
Retrieve and decode data with metadata using registered encoder.
Parameters:
-
(hash¶str) –Unique identifier for the entry.
-
(entry_type¶str) –Type of entry (must have registered encoder).
Returns:
-
tuple[Any, Any | None] | None–Tuple of (decoded_data, decoded_metadata) if found, None otherwise.
-
tuple[Any, Any | None] | None–metadata may be None if not stored.
Raises:
-
KeyError–If no encoder is registered for entry_type.
export_entries
¶
export_entries(output_dir: Path, entry_type: str, fmt: str | None = None) -> int
Export cache entries to human-readable files.
Each entry is exported to a separate file named {hash}.{ext} where
ext is determined by the format or encoder's default_export_format.
Parameters:
-
(output_dir¶Path) –Directory to write exported files to. Created if it doesn't exist.
-
(entry_type¶str) –Type of entries to export (must have registered encoder).
-
(fmt¶str | None, default:None) –Export format (e.g. 'json', 'yaml'). If None, uses the encoder's default_export_format.
Returns:
-
int–Number of entries exported.
Raises:
-
KeyError–If no encoder is registered for entry_type.
Example
from pathlib import Path from causaliq_knowledge.cache import TokenCache from causaliq_knowledge.cache.encoders import JsonEncoder with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... cache.put_data("abc123", "json", {"key": "value"}) ... count = cache.export_entries(Path("./export"), "json") ... # Creates ./export/abc123.json
import_entries
¶
import_entries(input_dir: Path, entry_type: str) -> int
Import human-readable files into the cache.
Each file is imported with its stem (filename without extension) used as the cache hash. The encoder's import_() method reads the file and the data is encoded before storage.
Parameters:
-
(input_dir¶Path) –Directory containing files to import.
-
(entry_type¶str) –Type to assign to imported entries (must have registered encoder).
Returns:
-
int–Number of entries imported.
Raises:
-
KeyError–If no encoder is registered for entry_type.
-
FileNotFoundError–If input_dir doesn't exist.
Example
from pathlib import Path from causaliq_knowledge.cache import TokenCache from causaliq_knowledge.cache.encoders import JsonEncoder with TokenCache(":memory:") as cache: ... cache.register_encoder("json", JsonEncoder()) ... count = cache.import_entries(Path("./import"), "json") ... # Imports all files from ./import as "json" entries
EntryEncoder¶
The EntryEncoder abstract base class defines the interface for pluggable
cache encoders. Each encoder handles a specific entry type (e.g., LLM
requests, embeddings, documents).
Creating a Custom Encoder¶
from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.cache.encoders import EntryEncoder
class MyEncoder(EntryEncoder):
"""Example encoder for custom data types."""
def encode(self, data: dict, cache: TokenCache) -> bytes:
"""Convert data to bytes for storage."""
# Use cache.get_or_create_token() for string compression
return b"encoded"
def decode(self, data: bytes, cache: TokenCache) -> dict:
"""Convert bytes back to original data."""
# Use cache.get_token() to restore strings
return {"decoded": True}
def export(self, data: bytes, cache: TokenCache, fmt: str) -> str:
"""Export to human-readable format (json, yaml)."""
return '{"decoded": true}'
def import_(self, data: str, cache: TokenCache, fmt: str) -> bytes:
"""Import from human-readable format."""
return b"encoded"
Encoder Interface¶
EntryEncoder
¶
Abstract base class for type-specific cache entry encoders.
Encoders handle: - Encoding data to compact binary format for storage - Decoding binary data back to original structure - Exporting to human-readable formats (JSON, GraphML, etc.) - Importing from human-readable formats
Encoders may use the shared token dictionary in TokenCache for cross-entry compression of repeated strings.
Example
class MyEncoder(EntryEncoder): ... def encode(self, data, token_cache): ... return json.dumps(data).encode() ... def decode(self, blob, token_cache): ... return json.loads(blob.decode()) ... # ... export/import methods
Methods:
-
encode–Encode data to binary format.
-
decode–Decode binary data back to original structure.
-
export–Export data to human-readable file format.
-
import_–Import data from human-readable file format.
Attributes:
-
default_export_format(str) –Default file extension for exports (e.g. 'json', 'graphml').
default_export_format
property
¶
Default file extension for exports (e.g. 'json', 'graphml').
encode
abstractmethod
¶
encode(data: Any, token_cache: TokenCache) -> bytes
Encode data to binary format.
Parameters:
-
(data¶Any) –The data to encode (type depends on encoder).
-
(token_cache¶TokenCache) –Cache instance for shared token dictionary.
Returns:
-
bytes–Compact binary representation.
decode
abstractmethod
¶
decode(blob: bytes, token_cache: TokenCache) -> Any
Decode binary data back to original structure.
Parameters:
-
(blob¶bytes) –Binary data from cache.
-
(token_cache¶TokenCache) –Cache instance for shared token dictionary.
Returns:
-
Any–Decoded data in original format.