Skip to content

Filter Expression Utilities

Safe filter expression evaluation for metadata filtering in workflows and aggregation operations.

This module provides functions for evaluating Python-like filter expressions against metadata dictionaries using the simpleeval library for safe evaluation without security risks of eval().

Core Functions

evaluate_filter

evaluate_filter(expression: str, metadata: Dict[str, Any]) -> bool

Evaluate filter expression against metadata dictionary.

The expression uses Python syntax with metadata field names as variables. Supports comparison operators (==, !=, >, <, >=, <=), boolean operators (and, or, not), membership testing (in), and parentheses for grouping.

Parameters:

  • expression

    (str) –

    Filter expression string.

  • metadata

    (Dict[str, Any]) –

    Metadata dictionary with field values.

Returns:

  • bool

    True if metadata matches the filter expression, False otherwise.

Raises:

  • FilterSyntaxError

    If expression has invalid syntax.

  • FilterExpressionError

    If evaluation fails (e.g., undefined variable).

  • TypeError

    If expression is not a string or metadata is not a dict.

Examples:

>>> metadata = {"network": "asia", "sample_size": 1000, "status": "ok"}
>>> evaluate_filter("network == 'asia'", metadata)
True
>>> evaluate_filter("sample_size > 500 and status == 'ok'", metadata)
True
>>> evaluate_filter("network in ['asia', 'alarm']", metadata)
True
>>> evaluate_filter("not network == 'sports'", metadata)
True

validate_filter

validate_filter(expression: str) -> None

Validate filter expression syntax without evaluating.

Checks that the expression can be parsed. Does not verify that variable names exist - that is checked during evaluation.

Parameters:

  • expression

    (str) –

    Filter expression string.

Raises:

  • FilterSyntaxError

    If expression has invalid syntax.

  • TypeError

    If expression is not a string.

Example

validate_filter("network == 'asia'") # OK validate_filter("network ==") # Raises FilterSyntaxError

get_filter_variables

get_filter_variables(expression: str) -> Set[str]

Extract variable names used in a filter expression.

Parses the expression and returns the set of variable names referenced. Useful for validating that required metadata fields are present.

Parameters:

  • expression

    (str) –

    Filter expression string.

Returns:

  • Set[str]

    Set of variable names used in the expression.

Raises:

  • FilterSyntaxError

    If expression has invalid syntax.

  • TypeError

    If expression is not a string.

Example

get_filter_variables("network == 'asia' and sample_size > 500")

filter_entries

filter_entries(
    entries: List[Dict[str, Any]], expression: str, metadata_key: str = "metadata"
) -> List[Dict[str, Any]]

Filter a list of entries by metadata expression.

Convenience function to filter a list of cache entry dictionaries by a filter expression applied to each entry's metadata.

Parameters:

  • entries

    (List[Dict[str, Any]]) –

    List of entry dictionaries.

  • expression

    (str) –

    Filter expression string.

  • metadata_key

    (str, default: 'metadata' ) –

    Key in entry dict containing metadata.

Returns:

  • List[Dict[str, Any]]

    List of entries where metadata matches the filter.

Raises:

Example

entries = [ ... {"metadata": {"network": "asia", "size": 100}}, ... {"metadata": {"network": "alarm", "size": 200}}, ... ] filter_entries(entries, "network == 'asia'") [{'metadata': {'network': 'asia', 'size': 100}}]

resolve_random_calls

resolve_random_calls(
    expression: str, all_metadata: List[Dict[str, Any]]
) -> Tuple[str, Dict[str, Any]]

Pre-resolve random() calls in a filter expression.

Finds VAR in random(count, seed) patterns, collects the distinct values of VAR across all_metadata, selects count of them using the hardware-stable RandomIntegers sequence, and returns a rewritten expression plus a dictionary of pre-computed sets to inject as extra names during evaluation.

Parameters:

  • expression

    (str) –

    Filter expression, possibly containing random(count, seed) calls.

  • all_metadata

    (List[Dict[str, Any]]) –

    List of flat metadata dictionaries from all entries in the population.

Returns:

  • str

    Tuple of (resolved_expression, extra_names).

  • Dict[str, Any]

    extra_names should be merged into each entry's metadata

  • Tuple[str, Dict[str, Any]]

    when calling :func:evaluate_filter.

Raises:

Example

metas = [{"seed": i} for i in range(25)] expr, names = resolve_random_calls( ... "seed in random(10, 0)", metas ... ) len(names) 1

Exceptions

FilterExpressionError

Raised when filter expression evaluation fails.

FilterSyntaxError

Raised when filter expression has invalid syntax.

Expression Syntax

Filter expressions use Python syntax with the following supported operators:

Category Operators
Comparison ==, !=, >, <, >=, <=
Boolean and, or, not
Membership in
Grouping ()

Allowed functions: len, str, int, float, bool, abs, min, max, random

Random Sampling

The random(count, seed) function enables reproducible random selection within filter expressions. When used as VAR in random(count, seed), it selects count values from the distinct population of VAR across all entries, using a hardware-stable random sequence.

from causaliq_core.utils import filter_entries

entries = [
    {"seed": 1, "network": "asia"},
    {"seed": 5, "network": "asia"},
    {"seed": 10, "network": "asia"},
    {"seed": 15, "network": "asia"},
]

# Select 2 random seeds (deterministic with seed=42)
result = filter_entries(
    entries, "seed in random(2, 42)"
)
# Returns 2 entries with reproducibly chosen seed values

random() calls are pre-resolved by resolve_random_calls() before expression evaluation, replacing them with concrete value sets. Use filter_entries() for automatic resolution, or call resolve_random_calls() directly for manual control.

Usage Examples

Basic Filtering

from causaliq_core.utils import evaluate_filter

metadata = {"network": "asia", "sample_size": 1000, "status": "completed"}

# Simple equality
evaluate_filter("network == 'asia'", metadata)  # True

# Numeric comparison
evaluate_filter("sample_size >= 500", metadata)  # True

# Boolean combination
evaluate_filter("network == 'asia' and sample_size > 500", metadata)  # True

Validating Expressions

from causaliq_core.utils import validate_filter, FilterSyntaxError

# Valid expression
validate_filter("x > 5 and y == 'value'")  # No exception

# Invalid syntax
try:
    validate_filter("x ==")  # Missing right operand
except FilterSyntaxError as e:
    print(f"Invalid: {e}")

Extracting Variables

from causaliq_core.utils import get_filter_variables

# Get variables referenced in expression
vars = get_filter_variables("network == 'asia' and sample_size > 500")
print(vars)  # {'network', 'sample_size'}

Filtering Collections

from causaliq_core.utils import filter_entries

entries = [
    {"network": "asia", "sample_size": 100},
    {"network": "asia", "sample_size": 1000},
    {"network": "alarm", "sample_size": 500},
]

# Filter to asia entries with sample_size > 500
result = filter_entries(entries, "network == 'asia' and sample_size > 500")
# Returns: [{"network": "asia", "sample_size": 1000}]

Workflow Integration

Filter expressions are commonly used in workflow configurations:

actions:
  merge_graphs:
    input: discovery_results.db
    filter: network == 'asia' and status == 'completed'
    output: merged_graphs.db

The filter is applied to cache entry metadata before aggregation.

Random Sampling in Workflows

steps:
  - name: "Evaluate Subset"
    uses: "causaliq-analysis"
    with:
      action: "evaluate_graph"
      input: "results/graphs.db"
      filter: "seed in random(5, 42)"
      output: "results/evaluation.db"

This selects 5 random seed values (deterministically with seed 42) from the input cache entries.