Cache Object Model
Overview
This document defines the model for objects stored within workflow cache entries. Objects are the primary data artefacts produced by actions, such as graphs, traces, and statistical summaries.
Design Goals
| Goal | Description |
|---|---|
| Type-based lookup | Actions find objects by semantic type, not arbitrary names |
| Multiple objects | Entries can hold multiple objects (e.g., graph + trace) |
| Clear provenance | Action metadata references the object(s) it created |
| Format flexibility | Same object type can be serialised in different formats |
| Simple constraints | One object per type per entry avoids ambiguity |
Object Model
CacheObject Structure
Each object in a cache entry has three components:
| Field | Description | Example |
|---|---|---|
type |
Semantic object type | "pdg", "dag", "cpdag", "trace" |
format |
Serialisation format | "graphml", "json", "csv" |
content |
Serialised data | GraphML string, JSON string, etc. |
Example object:
CacheObject(
type="pdg",
format="graphml",
content="<?xml version='1.0'?>..."
)
Supported Object Types
| Type | Description | Typical Format |
|---|---|---|
dag |
Directed Acyclic Graph | graphml |
pdag |
Partially Directed Acyclic Graph (CPDAG) | graphml |
pdg |
Probabilistic Dependency Graph | graphml |
trace |
Algorithm execution trace | json, csv |
summary |
Statistical summary | json |
confidences |
Edge confidence scores | json |
New types can be added without schema changes.
One Object Per Type Constraint
Rule: Each cache entry may contain at most one object of each type.
This constraint:
- Enables unambiguous lookup by type
- Simplifies action interfaces (request type, get exactly one result)
- Avoids naming conflicts and arbitrary identifier proliferation
Valid entry:
{
"objects": {
"pdg": {"type": "pdg", "format": "graphml", "content": "..."},
"trace": {"type": "trace", "format": "json", "content": "..."}
}
}
Invalid entry (two objects of same type):
{
"objects": {
"reference_dag": {"type": "dag", ...},
"learnt_dag": {"type": "dag", ...}
}
}
If an action needs to produce multiple objects of the same conceptual type,
use qualified types (e.g., reference_dag, learnt_dag) or separate entries.
Storage Structure
Entry Layout
Objects are stored in a dictionary keyed by type:
{
"matrix_values": {"network": "asia", "sample_size": "1k"},
"metadata": {
"created_at": "2026-03-15T09:00:00Z",
"causaliq-analysis": {
"merge_graphs": {
"action": "merge_graphs",
"timestamp": "2026-03-15T09:00:00Z",
"num_graphs": 25,
"output": {"type": "pdg", "format": "graphml"}
}
}
},
"objects": {
"pdg": {
"type": "pdg",
"format": "graphml",
"content": "<?xml version='1.0'?>..."
}
}
}
Action Metadata Output Reference
Each action that creates or modifies objects records its output in metadata:
"merge_graphs": {
"action": "merge_graphs",
"timestamp": "...",
"output": {"type": "pdg", "format": "graphml"}
}
For CHANGE actions that modify entries:
"best_graph": {
"action": "best_graph",
"timestamp": "...",
"input": {"type": "pdg"},
"output": {"type": "dag", "format": "graphml"}
}
This provides clear provenance: which action created which object.
Action Patterns
CREATE Actions
CREATE actions produce new cache entries with objects:
# merge_graphs creates PDG from input DAGs
def run_merge_graphs(...) -> Tuple[str, Dict, List[Dict]]:
pdg = merge(input_dags)
return (
"success",
{"num_graphs": len(input_dags)},
[{"type": "pdg", "format": "graphml", "content": pdg_xml}]
)
CHANGE Actions
CHANGE actions transform objects within existing entries:
# best_graph reads PDG, creates DAG
def run_best_graph(entry: CacheEntry, ...) -> CacheEntry:
pdg = entry.get_object_by_type("pdg")
dag = extract_best_dag(pdg)
entry.add_object("dag", "graphml", dag_xml)
return entry
Object Lookup
Actions request objects by type:
# Get PDG from entry
pdg_obj = entry.get_object_by_type("pdg")
if pdg_obj is None:
raise ActionExecutionError("Entry missing required 'pdg' object")
pdg = graphml.read_pdg(StringIO(pdg_obj.content))
Migration from Current Model
Current Model
# Objects keyed by arbitrary name
entry.objects = {
"merged_pdg": CacheObject(type="graphml", content="...")
}
New Model
# Objects keyed by semantic type
entry.objects = {
"pdg": CacheObject(type="pdg", format="graphml", content="...")
}
Migration Steps
- Phase 1: Update
CacheObjectmodel - Rename
type→format - Add
typefield for semantic type -
Update serialisation/deserialisation
-
Phase 2: Update action metadata
- Add
outputreference to action metadata -
Update metadata writing in all actions
-
Phase 3: Update object access
- Change from name-based to type-based lookup
-
Update all actions that read objects
-
Phase 4: Update tests and regenerate caches
Backward Compatibility
During migration, support reading old-format caches:
def get_object_by_type(self, obj_type: str) -> CacheObject | None:
# New format: keyed by type
if obj_type in self.objects:
return self.objects[obj_type]
# Legacy format: scan for matching name pattern
for name, obj in self.objects.items():
if obj_type in name: # e.g., "merged_pdg" contains "pdg"
return obj
return None
Implementation Checklist
- [ ] Update
CacheObjectdataclass incache/entry.py - [ ] Add
get_object_by_type()method toCacheEntry - [ ] Update
to_storage()andfrom_storage()methods - [ ] Update action return format in all providers
- [ ] Add output reference to action metadata
- [ ] Update
best_graphto CHANGE pattern - [ ] Update tests
- [ ] Regenerate test caches
See Also
- Workflow Cache Design - Overall cache architecture
- Action Architecture - Action patterns