LLM Client Base Interface¶

Abstract base class and common types for all LLM vendor clients. This module defines the interface that all vendor-specific clients must implement, ensuring consistent behavior across different LLM providers.

Overview¶

The base client module provides:

BaseLLMClient - Abstract base class defining the client interface
LLMConfig - Base configuration dataclass for all clients
LLMResponse - Unified response format from any LLM provider

Caching Support¶

BaseLLMClient includes built-in caching integration:

set_cache() - Configure a TokenCache for response caching
cached_completion() - Make completion requests with automatic caching
_build_cache_key() - Generate deterministic cache keys (SHA-256)

Design Philosophy¶

We use vendor-specific API clients rather than wrapper libraries like LiteLLM or LangChain. This provides:

Minimal dependencies (httpx only for HTTP)
Reliable and predictable behavior
Easy debugging without abstraction layers
Full control over API interactions

The abstract interface ensures that all vendor clients behave consistently, making it easy to swap providers or add new ones.

Usage¶

Vendor-specific clients inherit from BaseLLMClient:

from causaliq_knowledge.llm import (
    BaseLLMClient,
    LLMConfig,
    LLMResponse,
    GroqClient,
    GeminiClient,
)

# All clients share the same interface
def query_llm(client: BaseLLMClient, prompt: str) -> str:
    messages = [{"role": "user", "content": prompt}]
    response = client.completion(messages)
    return response.content

# Works with any client
groq = GroqClient()
gemini = GeminiClient()

result1 = query_llm(groq, "What is 2+2?")
result2 = query_llm(gemini, "What is 2+2?")

Caching LLM Responses¶

Enable caching to avoid redundant API calls:

from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.llm import GroqClient, LLMConfig

# Create a persistent cache
with TokenCache("llm_cache.db") as cache:
    client = GroqClient(LLMConfig(model="llama-3.1-8b-instant"))
    client.set_cache(cache)

    messages = [{"role": "user", "content": "What is Python?"}]

    # First call - hits API, stores in cache
    response1 = client.cached_completion(messages)

    # Second call - returns from cache, no API call
    response2 = client.cached_completion(messages)

    assert response1.content == response2.content
    assert client.call_count == 1  # Only one API call made

The cache uses the LLMEntryEncoder automatically, storing:

Request details (model, messages, temperature, max_tokens)
Response content
Metadata (provider, token counts, cost, latency)

Each cached entry captures latency timing automatically using time.perf_counter(), enabling performance analysis across providers and models.

See LLM Cache for details on the cache entry structure.

## Creating a Custom Client

To add support for a new LLM provider, implement the `BaseLLMClient` interface:

```python
from causaliq_knowledge.llm import BaseLLMClient, LLMConfig, LLMResponse

class MyCustomClient(BaseLLMClient):
    def __init__(self, config: LLMConfig) -> None:
        self.config = config
        self._total_calls = 0

    @property
    def provider_name(self) -> str:
        return "my_provider"

    def completion(self, messages, **kwargs) -> LLMResponse:
        # Implement API call here
        ...
        return LLMResponse(
            content="response text",
            model=self.config.model,
            input_tokens=10,
            output_tokens=20,
        )

    @property
    def call_count(self) -> int:
        return self._total_calls

LLMConfig¶

LLMConfig `dataclass` ¶

LLMConfig(
    model: str,
    temperature: float = 0.1,
    max_tokens: int = 500,
    timeout: float = 30.0,
    api_key: Optional[str] = None,
)

Base configuration for all LLM clients.

This dataclass defines common configuration options shared by all LLM provider clients. Vendor-specific clients may extend this with additional options.

Attributes:

model (str) –

Model identifier (provider-specific format).
temperature (float) –

Sampling temperature (0.0=deterministic, 1.0=creative).
max_tokens (int) –

Maximum tokens in the response.
timeout (float) –

Request timeout in seconds.
api_key (Optional[str]) –

API key for authentication (optional, can use env var).

LLMResponse¶

LLMResponse `dataclass` ¶

LLMResponse(
    content: str,
    model: str,
    input_tokens: int = 0,
    output_tokens: int = 0,
    cost: float = 0.0,
    raw_response: Optional[Dict[str, Any]] = None,
)

Standard response from any LLM client.

This dataclass provides a unified response format across all LLM providers, abstracting away provider-specific response structures.

Attributes:

content (str) –

The text content of the response.
model (str) –

The model that generated the response.
input_tokens (int) –

Number of input/prompt tokens used.
output_tokens (int) –

Number of output/completion tokens generated.
cost (float) –

Estimated cost of the request (if available).
raw_response (Optional[Dict[str, Any]]) –

The original provider-specific response (for debugging).

Methods:

parse_json –

Parse content as JSON, handling common formatting issues.

parse_json ¶

parse_json() -> Optional[Dict[str, Any]]

Parse content as JSON, handling common formatting issues.

LLMs sometimes wrap JSON in markdown code blocks. This method handles those cases and attempts to extract valid JSON.

Returns:

Optional[Dict[str, Any]] –

Parsed JSON as dict, or None if parsing fails.

BaseLLMClient¶

BaseLLMClient ¶

BaseLLMClient(config: LLMConfig)

Abstract base class for LLM clients.

All LLM vendor clients (OpenAI, Anthropic, Groq, Gemini, Llama, etc.) must implement this interface to ensure consistent behavior across the codebase.

This abstraction allows: - Easy addition of new LLM providers - Consistent API for all providers - Provider-agnostic code in higher-level modules - Simplified testing with mock implementations

Example

class MyClient(BaseLLMClient): ... def completion(self, messages, **kwargs): ... # Implementation here ... pass ... client = MyClient(config) msgs = [{"role": "user", "content": "Hello"}] response = client.completion(msgs) print(response.content)

Parameters:

config ¶
(LLMConfig) –

Configuration for the LLM client.

Methods:

cached_completion –

Make a completion request with caching.
complete_json –

Make a completion request and parse response as JSON.
completion –

Make a chat completion request.
is_available –

Check if the LLM provider is available and configured.
list_models –

List available models from the provider.
set_cache –

Configure caching for this client.

Attributes:

cache (Optional['TokenCache']) –

Return the configured cache, if any.
call_count (int) –

Return the number of API calls made by this client.
model_name (str) –

Return the model name being used.
provider_name (str) –

Return the name of the LLM provider.
use_cache (bool) –

Return whether caching is enabled.

cache `property` ¶

cache: Optional['TokenCache']

Return the configured cache, if any.

call_count `abstractmethod` `property` ¶

call_count: int

Return the number of API calls made by this client.

Returns:

int –

Total number of completion calls made.

model_name `property` ¶

model_name: str

Return the model name being used.

Returns:

str –

Model identifier string.

provider_name `abstractmethod` `property` ¶

provider_name: str

Return the name of the LLM provider.

Returns:

str –

Provider name (e.g., "openai", "anthropic", "groq").

use_cache `property` ¶

use_cache: bool

Return whether caching is enabled.

cached_completion ¶

cached_completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a completion request with caching.

If caching is enabled and a cached response exists, returns the cached response without making an API call. Otherwise, makes the API call and caches the result.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
**kwargs ¶
(Any, default: {} ) –

Provider-specific options (temperature, max_tokens, etc.)

Returns:

LLMResponse –

LLMResponse with the generated content and metadata.

complete_json ¶

complete_json(
    messages: List[Dict[str, str]], **kwargs: Any
) -> tuple[Optional[Dict[str, Any]], LLMResponse]

Make a completion request and parse response as JSON.

Convenience method that calls completion() and attempts to parse the response content as JSON.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
**kwargs ¶
(Any, default: {} ) –

Provider-specific options passed to completion().

Returns:

tuple[Optional[Dict[str, Any]], LLMResponse] –

Tuple of (parsed JSON dict or None, raw LLMResponse).

completion `abstractmethod` ¶

completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a chat completion request.

This is the core method that sends a request to the LLM provider and returns a standardized response.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys. Roles can be: "system", "user", "assistant".
**kwargs ¶
(Any, default: {} ) –

Provider-specific options (temperature, max_tokens, etc.) that override the config defaults.

Returns:

LLMResponse –

LLMResponse with the generated content and metadata.

Raises:

ValueError –

If the API request fails or returns an error.

is_available `abstractmethod` ¶

is_available() -> bool

Check if the LLM provider is available and configured.

This method checks whether the client can make API calls: - For cloud providers: checks if API key is set - For local providers: checks if server is running

Returns:

bool –

True if the provider is available and ready for requests.

list_models `abstractmethod` ¶

list_models() -> List[str]

List available models from the provider.

Queries the provider's API to get the list of models accessible with the current API key or configuration. Results are filtered by the user's subscription/access level.

Returns:

List[str] –

List of model identifiers available for use.

Raises:

ValueError –

If the API request fails.

set_cache ¶

set_cache(cache: Optional['TokenCache'], use_cache: bool = True) -> None

Configure caching for this client.

Parameters:

cache ¶
(Optional['TokenCache']) –

TokenCache instance for caching, or None to disable.
use_cache ¶
(bool, default: True ) –

Whether to use the cache (default True).

LLM Client Base Interface¶

Overview¶

Caching Support¶

Design Philosophy¶

Usage¶

Caching LLM Responses¶

LLMConfig¶

LLMConfig `dataclass` ¶

LLMResponse¶

LLMResponse `dataclass` ¶

parse_json ¶

BaseLLMClient¶

BaseLLMClient ¶

`config` ¶

cache `property` ¶

call_count `abstractmethod` `property` ¶

model_name `property` ¶

provider_name `abstractmethod` `property` ¶

use_cache `property` ¶

cached_completion ¶

`messages` ¶

`kwargs`** ¶

complete_json ¶

`messages` ¶

`kwargs`** ¶

completion `abstractmethod` ¶

`messages` ¶

`kwargs`** ¶

is_available `abstractmethod` ¶

list_models `abstractmethod` ¶

set_cache ¶

`cache` ¶

`use_cache` ¶

LLM Client Base Interface¶

Overview¶

Caching Support¶

Design Philosophy¶

Usage¶

Caching LLM Responses¶

LLMConfig¶

LLMConfig dataclass ¶

LLMResponse¶

LLMResponse dataclass ¶

parse_json ¶

BaseLLMClient¶

BaseLLMClient ¶

config ¶

cache property ¶

call_count abstractmethod property ¶

model_name property ¶

provider_name abstractmethod property ¶

use_cache property ¶

cached_completion ¶

messages ¶

**kwargs ¶

complete_json ¶

messages ¶

**kwargs ¶

completion abstractmethod ¶

messages ¶

**kwargs ¶

is_available abstractmethod ¶

list_models abstractmethod ¶

set_cache ¶

cache ¶

use_cache ¶

LLMConfig `dataclass` ¶

LLMResponse `dataclass` ¶

`config` ¶

cache `property` ¶

call_count `abstractmethod` `property` ¶

model_name `property` ¶

provider_name `abstractmethod` `property` ¶

use_cache `property` ¶

`messages` ¶

`kwargs`** ¶

`messages` ¶

`kwargs`** ¶

completion `abstractmethod` ¶

`messages` ¶

`kwargs`** ¶

is_available `abstractmethod` ¶

list_models `abstractmethod` ¶

`cache` ¶

`use_cache` ¶