Filter Expression Utilities¶
Safe filter expression evaluation for metadata filtering in workflows and aggregation operations.
This module provides functions for evaluating Python-like filter expressions
against metadata dictionaries using the simpleeval library for safe
evaluation without security risks of eval().
Core Functions¶
evaluate_filter
¶
evaluate_filter(expression: str, metadata: Dict[str, Any]) -> bool
Evaluate filter expression against metadata dictionary.
The expression uses Python syntax with metadata field names as variables. Supports comparison operators (==, !=, >, <, >=, <=), boolean operators (and, or, not), membership testing (in), and parentheses for grouping.
Parameters:
-
(expression¶str) –Filter expression string.
-
(metadata¶Dict[str, Any]) –Metadata dictionary with field values.
Returns:
-
bool–True if metadata matches the filter expression, False otherwise.
Raises:
-
FilterSyntaxError–If expression has invalid syntax.
-
FilterExpressionError–If evaluation fails (e.g., undefined variable).
-
TypeError–If expression is not a string or metadata is not a dict.
Examples:
>>> metadata = {"network": "asia", "sample_size": 1000, "status": "ok"}
>>> evaluate_filter("network == 'asia'", metadata)
True
>>> evaluate_filter("sample_size > 500 and status == 'ok'", metadata)
True
>>> evaluate_filter("network in ['asia', 'alarm']", metadata)
True
>>> evaluate_filter("not network == 'sports'", metadata)
True
validate_filter
¶
validate_filter(expression: str) -> None
Validate filter expression syntax without evaluating.
Checks that the expression can be parsed. Does not verify that variable names exist - that is checked during evaluation.
Parameters:
-
(expression¶str) –Filter expression string.
Raises:
-
FilterSyntaxError–If expression has invalid syntax.
-
TypeError–If expression is not a string.
Example
validate_filter("network == 'asia'") # OK validate_filter("network ==") # Raises FilterSyntaxError
get_filter_variables
¶
get_filter_variables(expression: str) -> Set[str]
Extract variable names used in a filter expression.
Parses the expression and returns the set of variable names referenced. Useful for validating that required metadata fields are present.
Parameters:
-
(expression¶str) –Filter expression string.
Returns:
-
Set[str]–Set of variable names used in the expression.
Raises:
-
FilterSyntaxError–If expression has invalid syntax.
-
TypeError–If expression is not a string.
Example
get_filter_variables("network == 'asia' and sample_size > 500")
filter_entries
¶
filter_entries(
entries: List[Dict[str, Any]], expression: str, metadata_key: str = "metadata"
) -> List[Dict[str, Any]]
Filter a list of entries by metadata expression.
Convenience function to filter a list of cache entry dictionaries by a filter expression applied to each entry's metadata.
Parameters:
-
(entries¶List[Dict[str, Any]]) –List of entry dictionaries.
-
(expression¶str) –Filter expression string.
-
(metadata_key¶str, default:'metadata') –Key in entry dict containing metadata.
Returns:
-
List[Dict[str, Any]]–List of entries where metadata matches the filter.
Raises:
-
FilterSyntaxError–If expression has invalid syntax.
-
FilterExpressionError–If evaluation fails.
-
TypeError–If arguments have invalid types.
Example
entries = [ ... {"metadata": {"network": "asia", "size": 100}}, ... {"metadata": {"network": "alarm", "size": 200}}, ... ] filter_entries(entries, "network == 'asia'") [{'metadata': {'network': 'asia', 'size': 100}}]
Exceptions¶
FilterExpressionError
¶
Raised when filter expression evaluation fails.
FilterSyntaxError
¶
Raised when filter expression has invalid syntax.
Expression Syntax¶
Filter expressions use Python syntax with the following supported operators:
| Category | Operators |
|---|---|
| Comparison | ==, !=, >, <, >=, <= |
| Boolean | and, or, not |
| Membership | in |
| Grouping | () |
Allowed functions: len, str, int, float, bool, abs, min, max
Usage Examples¶
Basic Filtering¶
from causaliq_core.utils import evaluate_filter
metadata = {"network": "asia", "sample_size": 1000, "status": "completed"}
# Simple equality
evaluate_filter("network == 'asia'", metadata) # True
# Numeric comparison
evaluate_filter("sample_size >= 500", metadata) # True
# Boolean combination
evaluate_filter("network == 'asia' and sample_size > 500", metadata) # True
Validating Expressions¶
from causaliq_core.utils import validate_filter, FilterSyntaxError
# Valid expression
validate_filter("x > 5 and y == 'value'") # No exception
# Invalid syntax
try:
validate_filter("x ==") # Missing right operand
except FilterSyntaxError as e:
print(f"Invalid: {e}")
Extracting Variables¶
from causaliq_core.utils import get_filter_variables
# Get variables referenced in expression
vars = get_filter_variables("network == 'asia' and sample_size > 500")
print(vars) # {'network', 'sample_size'}
Filtering Collections¶
from causaliq_core.utils import filter_entries
entries = [
{"network": "asia", "sample_size": 100},
{"network": "asia", "sample_size": 1000},
{"network": "alarm", "sample_size": 500},
]
# Filter to asia entries with sample_size > 500
result = filter_entries(entries, "network == 'asia' and sample_size > 500")
# Returns: [{"network": "asia", "sample_size": 1000}]
Workflow Integration¶
Filter expressions are commonly used in workflow configurations:
actions:
merge_graphs:
input: discovery_results.db
filter: network == 'asia' and status == 'completed'
output: merged_graphs.db
The filter is applied to cache entry metadata before aggregation.