Filter Expression Utilities¶
Safe filter expression evaluation for metadata filtering in workflows and aggregation operations.
This module provides functions for evaluating Python-like filter expressions
against metadata dictionaries using the simpleeval library for safe
evaluation without security risks of eval().
Core Functions¶
evaluate_filter
¶
evaluate_filter(expression: str, metadata: Dict[str, Any]) -> bool
Evaluate filter expression against metadata dictionary.
The expression uses Python syntax with metadata field names as variables. Supports comparison operators (==, !=, >, <, >=, <=), boolean operators (and, or, not), membership testing (in), and parentheses for grouping.
Parameters:
-
(expression¶str) –Filter expression string.
-
(metadata¶Dict[str, Any]) –Metadata dictionary with field values.
Returns:
-
bool–True if metadata matches the filter expression, False otherwise.
Raises:
-
FilterSyntaxError–If expression has invalid syntax.
-
FilterExpressionError–If evaluation fails (e.g., undefined variable).
-
TypeError–If expression is not a string or metadata is not a dict.
Examples:
>>> metadata = {"network": "asia", "sample_size": 1000, "status": "ok"}
>>> evaluate_filter("network == 'asia'", metadata)
True
>>> evaluate_filter("sample_size > 500 and status == 'ok'", metadata)
True
>>> evaluate_filter("network in ['asia', 'alarm']", metadata)
True
>>> evaluate_filter("not network == 'sports'", metadata)
True
validate_filter
¶
validate_filter(expression: str) -> None
Validate filter expression syntax without evaluating.
Checks that the expression can be parsed. Does not verify that variable names exist - that is checked during evaluation.
Parameters:
-
(expression¶str) –Filter expression string.
Raises:
-
FilterSyntaxError–If expression has invalid syntax.
-
TypeError–If expression is not a string.
Example
validate_filter("network == 'asia'") # OK validate_filter("network ==") # Raises FilterSyntaxError
get_filter_variables
¶
get_filter_variables(expression: str) -> Set[str]
Extract variable names used in a filter expression.
Parses the expression and returns the set of variable names referenced. Useful for validating that required metadata fields are present.
Parameters:
-
(expression¶str) –Filter expression string.
Returns:
-
Set[str]–Set of variable names used in the expression.
Raises:
-
FilterSyntaxError–If expression has invalid syntax.
-
TypeError–If expression is not a string.
Example
get_filter_variables("network == 'asia' and sample_size > 500")
filter_entries
¶
filter_entries(
entries: List[Dict[str, Any]], expression: str, metadata_key: str = "metadata"
) -> List[Dict[str, Any]]
Filter a list of entries by metadata expression.
Convenience function to filter a list of cache entry dictionaries by a filter expression applied to each entry's metadata.
Parameters:
-
(entries¶List[Dict[str, Any]]) –List of entry dictionaries.
-
(expression¶str) –Filter expression string.
-
(metadata_key¶str, default:'metadata') –Key in entry dict containing metadata.
Returns:
-
List[Dict[str, Any]]–List of entries where metadata matches the filter.
Raises:
-
FilterSyntaxError–If expression has invalid syntax.
-
FilterExpressionError–If evaluation fails.
-
TypeError–If arguments have invalid types.
Example
entries = [ ... {"metadata": {"network": "asia", "size": 100}}, ... {"metadata": {"network": "alarm", "size": 200}}, ... ] filter_entries(entries, "network == 'asia'") [{'metadata': {'network': 'asia', 'size': 100}}]
resolve_random_calls
¶
resolve_random_calls(
expression: str, all_metadata: List[Dict[str, Any]]
) -> Tuple[str, Dict[str, Any]]
Pre-resolve random() calls in a filter expression.
Finds VAR in random(count, seed) patterns, collects the
distinct values of VAR across all_metadata, selects
count of them using the hardware-stable RandomIntegers
sequence, and returns a rewritten expression plus a dictionary
of pre-computed sets to inject as extra names during
evaluation.
Parameters:
-
(expression¶str) –Filter expression, possibly containing
random(count, seed)calls. -
(all_metadata¶List[Dict[str, Any]]) –List of flat metadata dictionaries from all entries in the population.
Returns:
-
str–Tuple of (resolved_expression, extra_names).
-
Dict[str, Any]–extra_names should be merged into each entry's metadata
-
Tuple[str, Dict[str, Any]]–when calling :func:
evaluate_filter.
Raises:
-
FilterExpressionError–If fewer distinct values exist than the requested count.
Example
metas = [{"seed": i} for i in range(25)] expr, names = resolve_random_calls( ... "seed in random(10, 0)", metas ... ) len(names) 1
Exceptions¶
FilterExpressionError
¶
Raised when filter expression evaluation fails.
FilterSyntaxError
¶
Raised when filter expression has invalid syntax.
Expression Syntax¶
Filter expressions use Python syntax with the following supported operators:
| Category | Operators |
|---|---|
| Comparison | ==, !=, >, <, >=, <= |
| Boolean | and, or, not |
| Membership | in |
| Grouping | () |
Allowed functions: len, str, int, float, bool, abs,
min, max, random
Random Sampling¶
The random(count, seed) function enables reproducible random
selection within filter expressions. When used as
VAR in random(count, seed), it selects count values from the
distinct population of VAR across all entries, using a
hardware-stable random sequence.
from causaliq_core.utils import filter_entries
entries = [
{"seed": 1, "network": "asia"},
{"seed": 5, "network": "asia"},
{"seed": 10, "network": "asia"},
{"seed": 15, "network": "asia"},
]
# Select 2 random seeds (deterministic with seed=42)
result = filter_entries(
entries, "seed in random(2, 42)"
)
# Returns 2 entries with reproducibly chosen seed values
random() calls are pre-resolved by resolve_random_calls() before
expression evaluation, replacing them with concrete value sets.
Use filter_entries() for automatic resolution, or call
resolve_random_calls() directly for manual control.
Usage Examples¶
Basic Filtering¶
from causaliq_core.utils import evaluate_filter
metadata = {"network": "asia", "sample_size": 1000, "status": "completed"}
# Simple equality
evaluate_filter("network == 'asia'", metadata) # True
# Numeric comparison
evaluate_filter("sample_size >= 500", metadata) # True
# Boolean combination
evaluate_filter("network == 'asia' and sample_size > 500", metadata) # True
Validating Expressions¶
from causaliq_core.utils import validate_filter, FilterSyntaxError
# Valid expression
validate_filter("x > 5 and y == 'value'") # No exception
# Invalid syntax
try:
validate_filter("x ==") # Missing right operand
except FilterSyntaxError as e:
print(f"Invalid: {e}")
Extracting Variables¶
from causaliq_core.utils import get_filter_variables
# Get variables referenced in expression
vars = get_filter_variables("network == 'asia' and sample_size > 500")
print(vars) # {'network', 'sample_size'}
Filtering Collections¶
from causaliq_core.utils import filter_entries
entries = [
{"network": "asia", "sample_size": 100},
{"network": "asia", "sample_size": 1000},
{"network": "alarm", "sample_size": 500},
]
# Filter to asia entries with sample_size > 500
result = filter_entries(entries, "network == 'asia' and sample_size > 500")
# Returns: [{"network": "asia", "sample_size": 1000}]
Workflow Integration¶
Filter expressions are commonly used in workflow configurations:
actions:
merge_graphs:
input: discovery_results.db
filter: network == 'asia' and status == 'completed'
output: merged_graphs.db
The filter is applied to cache entry metadata before aggregation.
Random Sampling in Workflows¶
steps:
- name: "Evaluate Subset"
uses: "causaliq-analysis"
with:
action: "evaluate_graph"
input: "results/graphs.db"
filter: "seed in random(5, 42)"
output: "results/evaluation.db"
This selects 5 random seed values (deterministically with seed 42) from the input cache entries.