Filter Expression Utilities¶

Safe filter expression evaluation for metadata filtering in workflows and aggregation operations.

This module provides functions for evaluating Python-like filter expressions against metadata dictionaries using the simpleeval library for safe evaluation without security risks of eval().

Core Functions¶

evaluate_filter ¶

evaluate_filter(expression: str, metadata: Dict[str, Any]) -> bool

Evaluate filter expression against metadata dictionary.

The expression uses Python syntax with metadata field names as variables. Supports comparison operators (==, !=, >, <, >=, <=), boolean operators (and, or, not), membership testing (in), and parentheses for grouping.

Parameters:

expression ¶
(str) –

Filter expression string.
metadata ¶
(Dict[str, Any]) –

Metadata dictionary with field values.

Returns:

bool –

True if metadata matches the filter expression, False otherwise.

Raises:

FilterSyntaxError –

If expression has invalid syntax.
FilterExpressionError –

If evaluation fails (e.g., undefined variable).
TypeError –

If expression is not a string or metadata is not a dict.

Examples:

>>> metadata = {"network": "asia", "sample_size": 1000, "status": "ok"}
>>> evaluate_filter("network == 'asia'", metadata)
True
>>> evaluate_filter("sample_size > 500 and status == 'ok'", metadata)
True
>>> evaluate_filter("network in ['asia', 'alarm']", metadata)
True
>>> evaluate_filter("not network == 'sports'", metadata)
True

validate_filter ¶

validate_filter(expression: str) -> None

Validate filter expression syntax without evaluating.

Checks that the expression can be parsed. Does not verify that variable names exist - that is checked during evaluation.

Parameters:

expression ¶
(str) –

Filter expression string.

Raises:

FilterSyntaxError –

If expression has invalid syntax.
TypeError –

If expression is not a string.

Example

validate_filter("network == 'asia'") # OK validate_filter("network ==") # Raises FilterSyntaxError

get_filter_variables ¶

get_filter_variables(expression: str) -> Set[str]

Extract variable names used in a filter expression.

Parses the expression and returns the set of variable names referenced. Useful for validating that required metadata fields are present.

Parameters:

expression ¶
(str) –

Filter expression string.

Returns:

Set[str] –

Set of variable names used in the expression.

Raises:

FilterSyntaxError –

If expression has invalid syntax.
TypeError –

If expression is not a string.

Example

get_filter_variables("network == 'asia' and sample_size > 500")

filter_entries ¶

filter_entries(
    entries: List[Dict[str, Any]], expression: str, metadata_key: str = "metadata"
) -> List[Dict[str, Any]]

Filter a list of entries by metadata expression.

Convenience function to filter a list of cache entry dictionaries by a filter expression applied to each entry's metadata.

Parameters:

entries ¶
(List[Dict[str, Any]]) –

List of entry dictionaries.
expression ¶
(str) –

Filter expression string.
metadata_key ¶
(str, default: 'metadata' ) –

Key in entry dict containing metadata.

Returns:

List[Dict[str, Any]] –

List of entries where metadata matches the filter.

Raises:

FilterSyntaxError –

If expression has invalid syntax.
FilterExpressionError –

If evaluation fails.
TypeError –

If arguments have invalid types.

Example

entries = [ ... {"metadata": {"network": "asia", "size": 100}}, ... {"metadata": {"network": "alarm", "size": 200}}, ... ] filter_entries(entries, "network == 'asia'") [{'metadata': {'network': 'asia', 'size': 100}}]

resolve_random_calls ¶

resolve_random_calls(
    expression: str, all_metadata: List[Dict[str, Any]]
) -> Tuple[str, Dict[str, Any]]

Pre-resolve random() calls in a filter expression.

Finds VAR in random(count, seed) patterns, collects the distinct values of VAR across all_metadata, selects count of them using the hardware-stable RandomIntegers sequence, and returns a rewritten expression plus a dictionary of pre-computed sets to inject as extra names during evaluation.

Parameters:

expression ¶
(str) –

Filter expression, possibly containing random(count, seed) calls.
all_metadata ¶
(List[Dict[str, Any]]) –

List of flat metadata dictionaries from all entries in the population.

Returns:

str –

Tuple of (resolved_expression, extra_names).
Dict[str, Any] –

extra_names should be merged into each entry's metadata
Tuple[str, Dict[str, Any]] –

when calling :func:evaluate_filter.

Raises:

FilterExpressionError –

If fewer distinct values exist than the requested count.

Example

metas = [{"seed": i} for i in range(25)] expr, names = resolve_random_calls( ... "seed in random(10, 0)", metas ... ) len(names) 1

Exceptions¶

FilterExpressionError ¶

Raised when filter expression evaluation fails.

FilterSyntaxError ¶

Raised when filter expression has invalid syntax.

Expression Syntax¶

Filter expressions use Python syntax with the following supported operators:

Category	Operators
Comparison	`==`, `!=`, `>`, `<`, `>=`, `<=`
Boolean	`and`, `or`, `not`
Membership	`in`
Grouping	`()`

Allowed functions: len, str, int, float, bool, abs, min, max, random

Random Sampling¶

The random(count, seed) function enables reproducible random selection within filter expressions. When used as VAR in random(count, seed), it selects count values from the distinct population of VAR across all entries, using a hardware-stable random sequence.

from causaliq_core.utils import filter_entries

entries = [
    {"seed": 1, "network": "asia"},
    {"seed": 5, "network": "asia"},
    {"seed": 10, "network": "asia"},
    {"seed": 15, "network": "asia"},
]

# Select 2 random seeds (deterministic with seed=42)
result = filter_entries(
    entries, "seed in random(2, 42)"
)
# Returns 2 entries with reproducibly chosen seed values

random() calls are pre-resolved by resolve_random_calls() before expression evaluation, replacing them with concrete value sets. Use filter_entries() for automatic resolution, or call resolve_random_calls() directly for manual control.

Usage Examples¶

Basic Filtering¶

from causaliq_core.utils import evaluate_filter

metadata = {"network": "asia", "sample_size": 1000, "status": "completed"}

# Simple equality
evaluate_filter("network == 'asia'", metadata)  # True

# Numeric comparison
evaluate_filter("sample_size >= 500", metadata)  # True

# Boolean combination
evaluate_filter("network == 'asia' and sample_size > 500", metadata)  # True

Validating Expressions¶

from causaliq_core.utils import validate_filter, FilterSyntaxError

# Valid expression
validate_filter("x > 5 and y == 'value'")  # No exception

# Invalid syntax
try:
    validate_filter("x ==")  # Missing right operand
except FilterSyntaxError as e:
    print(f"Invalid: {e}")

Extracting Variables¶

from causaliq_core.utils import get_filter_variables

# Get variables referenced in expression
vars = get_filter_variables("network == 'asia' and sample_size > 500")
print(vars)  # {'network', 'sample_size'}

Filtering Collections¶

from causaliq_core.utils import filter_entries

entries = [
    {"network": "asia", "sample_size": 100},
    {"network": "asia", "sample_size": 1000},
    {"network": "alarm", "sample_size": 500},
]

# Filter to asia entries with sample_size > 500
result = filter_entries(entries, "network == 'asia' and sample_size > 500")
# Returns: [{"network": "asia", "sample_size": 1000}]

Workflow Integration¶

Filter expressions are commonly used in workflow configurations:

actions:
  merge_graphs:
    input: discovery_results.db
    filter: network == 'asia' and status == 'completed'
    output: merged_graphs.db

The filter is applied to cache entry metadata before aggregation.

Random Sampling in Workflows¶

steps:
  - name: "Evaluate Subset"
    uses: "causaliq-analysis"
    with:
      action: "evaluate_graph"
      input: "results/graphs.db"
      filter: "seed in random(5, 42)"
      output: "results/evaluation.db"

This selects 5 random seed values (deterministically with seed 42) from the input cache entries.

Filter Expression Utilities¶

Core Functions¶

evaluate_filter ¶

`expression` ¶

`metadata` ¶

validate_filter ¶

`expression` ¶

get_filter_variables ¶

`expression` ¶

filter_entries ¶

`entries` ¶

`expression` ¶

`metadata_key` ¶

resolve_random_calls ¶

`expression` ¶

`all_metadata` ¶

Exceptions¶

FilterExpressionError ¶

FilterSyntaxError ¶

Expression Syntax¶

Random Sampling¶

Usage Examples¶

Basic Filtering¶

Validating Expressions¶

Extracting Variables¶

Filtering Collections¶

Workflow Integration¶

Random Sampling in Workflows¶

Filter Expression Utilities¶

Core Functions¶

evaluate_filter ¶

expression ¶

metadata ¶

validate_filter ¶

expression ¶

get_filter_variables ¶

expression ¶

filter_entries ¶

entries ¶

expression ¶

metadata_key ¶

resolve_random_calls ¶

expression ¶

all_metadata ¶

Exceptions¶

FilterExpressionError ¶

FilterSyntaxError ¶

Expression Syntax¶

Random Sampling¶

Usage Examples¶

Basic Filtering¶

Validating Expressions¶

Extracting Variables¶

Filtering Collections¶

Workflow Integration¶

Random Sampling in Workflows¶

`expression` ¶

`metadata` ¶

`expression` ¶

`expression` ¶

`entries` ¶

`expression` ¶

`metadata_key` ¶

`expression` ¶

`all_metadata` ¶