Skip to content

Filter Expression Utilities

Safe filter expression evaluation for metadata filtering in workflows and aggregation operations.

This module provides functions for evaluating Python-like filter expressions against metadata dictionaries using the simpleeval library for safe evaluation without security risks of eval().

Core Functions

evaluate_filter

evaluate_filter(expression: str, metadata: Dict[str, Any]) -> bool

Evaluate filter expression against metadata dictionary.

The expression uses Python syntax with metadata field names as variables. Supports comparison operators (==, !=, >, <, >=, <=), boolean operators (and, or, not), membership testing (in), and parentheses for grouping.

Parameters:

  • expression

    (str) –

    Filter expression string.

  • metadata

    (Dict[str, Any]) –

    Metadata dictionary with field values.

Returns:

  • bool

    True if metadata matches the filter expression, False otherwise.

Raises:

  • FilterSyntaxError

    If expression has invalid syntax.

  • FilterExpressionError

    If evaluation fails (e.g., undefined variable).

  • TypeError

    If expression is not a string or metadata is not a dict.

Examples:

>>> metadata = {"network": "asia", "sample_size": 1000, "status": "ok"}
>>> evaluate_filter("network == 'asia'", metadata)
True
>>> evaluate_filter("sample_size > 500 and status == 'ok'", metadata)
True
>>> evaluate_filter("network in ['asia', 'alarm']", metadata)
True
>>> evaluate_filter("not network == 'sports'", metadata)
True

validate_filter

validate_filter(expression: str) -> None

Validate filter expression syntax without evaluating.

Checks that the expression can be parsed. Does not verify that variable names exist - that is checked during evaluation.

Parameters:

  • expression

    (str) –

    Filter expression string.

Raises:

  • FilterSyntaxError

    If expression has invalid syntax.

  • TypeError

    If expression is not a string.

Example

validate_filter("network == 'asia'") # OK validate_filter("network ==") # Raises FilterSyntaxError

get_filter_variables

get_filter_variables(expression: str) -> Set[str]

Extract variable names used in a filter expression.

Parses the expression and returns the set of variable names referenced. Useful for validating that required metadata fields are present.

Parameters:

  • expression

    (str) –

    Filter expression string.

Returns:

  • Set[str]

    Set of variable names used in the expression.

Raises:

  • FilterSyntaxError

    If expression has invalid syntax.

  • TypeError

    If expression is not a string.

Example

get_filter_variables("network == 'asia' and sample_size > 500")

filter_entries

filter_entries(
    entries: List[Dict[str, Any]], expression: str, metadata_key: str = "metadata"
) -> List[Dict[str, Any]]

Filter a list of entries by metadata expression.

Convenience function to filter a list of cache entry dictionaries by a filter expression applied to each entry's metadata.

Parameters:

  • entries

    (List[Dict[str, Any]]) –

    List of entry dictionaries.

  • expression

    (str) –

    Filter expression string.

  • metadata_key

    (str, default: 'metadata' ) –

    Key in entry dict containing metadata.

Returns:

  • List[Dict[str, Any]]

    List of entries where metadata matches the filter.

Raises:

Example

entries = [ ... {"metadata": {"network": "asia", "size": 100}}, ... {"metadata": {"network": "alarm", "size": 200}}, ... ] filter_entries(entries, "network == 'asia'") [{'metadata': {'network': 'asia', 'size': 100}}]

Exceptions

FilterExpressionError

Raised when filter expression evaluation fails.

FilterSyntaxError

Raised when filter expression has invalid syntax.

Expression Syntax

Filter expressions use Python syntax with the following supported operators:

Category Operators
Comparison ==, !=, >, <, >=, <=
Boolean and, or, not
Membership in
Grouping ()

Allowed functions: len, str, int, float, bool, abs, min, max

Usage Examples

Basic Filtering

from causaliq_core.utils import evaluate_filter

metadata = {"network": "asia", "sample_size": 1000, "status": "completed"}

# Simple equality
evaluate_filter("network == 'asia'", metadata)  # True

# Numeric comparison
evaluate_filter("sample_size >= 500", metadata)  # True

# Boolean combination
evaluate_filter("network == 'asia' and sample_size > 500", metadata)  # True

Validating Expressions

from causaliq_core.utils import validate_filter, FilterSyntaxError

# Valid expression
validate_filter("x > 5 and y == 'value'")  # No exception

# Invalid syntax
try:
    validate_filter("x ==")  # Missing right operand
except FilterSyntaxError as e:
    print(f"Invalid: {e}")

Extracting Variables

from causaliq_core.utils import get_filter_variables

# Get variables referenced in expression
vars = get_filter_variables("network == 'asia' and sample_size > 500")
print(vars)  # {'network', 'sample_size'}

Filtering Collections

from causaliq_core.utils import filter_entries

entries = [
    {"network": "asia", "sample_size": 100},
    {"network": "asia", "sample_size": 1000},
    {"network": "alarm", "sample_size": 500},
]

# Filter to asia entries with sample_size > 500
result = filter_entries(entries, "network == 'asia' and sample_size > 500")
# Returns: [{"network": "asia", "sample_size": 1000}]

Workflow Integration

Filter expressions are commonly used in workflow configurations:

actions:
  merge_graphs:
    input: discovery_results.db
    filter: network == 'asia' and status == 'completed'
    output: merged_graphs.db

The filter is applied to cache entry metadata before aggregation.