Skip to content

LLM Client Base Interface

Abstract base class and common types for all LLM vendor clients. This module defines the interface that all vendor-specific clients must implement, ensuring consistent behavior across different LLM providers.

Overview

The base client module provides:

  • BaseLLMClient - Abstract base class defining the client interface
  • LLMConfig - Base configuration dataclass for all clients
  • LLMResponse - Unified response format from any LLM provider

Caching Support

BaseLLMClient includes built-in caching integration:

  • set_cache() - Configure a TokenCache for response caching
  • cached_completion() - Make completion requests with automatic caching
  • _build_cache_key() - Generate deterministic cache keys (SHA-256)

Design Philosophy

We use vendor-specific API clients rather than wrapper libraries like LiteLLM or LangChain. This provides:

  • Minimal dependencies (httpx only for HTTP)
  • Reliable and predictable behavior
  • Easy debugging without abstraction layers
  • Full control over API interactions

The abstract interface ensures that all vendor clients behave consistently, making it easy to swap providers or add new ones.

Usage

Vendor-specific clients inherit from BaseLLMClient:

from causaliq_knowledge.llm import (
    BaseLLMClient,
    LLMConfig,
    LLMResponse,
    GroqClient,
    GeminiClient,
)

# All clients share the same interface
def query_llm(client: BaseLLMClient, prompt: str) -> str:
    messages = [{"role": "user", "content": prompt}]
    response = client.completion(messages)
    return response.content

# Works with any client
groq = GroqClient()
gemini = GeminiClient()

result1 = query_llm(groq, "What is 2+2?")
result2 = query_llm(gemini, "What is 2+2?")

Caching LLM Responses

Enable caching to avoid redundant API calls:

from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.llm import GroqClient, LLMConfig

# Create a persistent cache
with TokenCache("llm_cache.db") as cache:
    client = GroqClient(LLMConfig(model="llama-3.1-8b-instant"))
    client.set_cache(cache)

    messages = [{"role": "user", "content": "What is Python?"}]

    # First call - hits API, stores in cache
    response1 = client.cached_completion(messages)

    # Second call - returns from cache, no API call
    response2 = client.cached_completion(messages)

    assert response1.content == response2.content
    assert client.call_count == 1  # Only one API call made

The cache uses the LLMEntryEncoder automatically, storing:

  • Request details (model, messages, temperature, max_tokens)
  • Response content
  • Metadata (provider, token counts, cost, latency)

Each cached entry captures latency timing automatically using time.perf_counter(), enabling performance analysis across providers and models.

See LLM Cache for details on the cache entry structure.

## Creating a Custom Client

To add support for a new LLM provider, implement the `BaseLLMClient` interface:

```python
from causaliq_knowledge.llm import BaseLLMClient, LLMConfig, LLMResponse

class MyCustomClient(BaseLLMClient):
    def __init__(self, config: LLMConfig) -> None:
        self.config = config
        self._total_calls = 0

    @property
    def provider_name(self) -> str:
        return "my_provider"

    def completion(self, messages, **kwargs) -> LLMResponse:
        # Implement API call here
        ...
        return LLMResponse(
            content="response text",
            model=self.config.model,
            input_tokens=10,
            output_tokens=20,
        )

    @property
    def call_count(self) -> int:
        return self._total_calls

LLMConfig

LLMConfig dataclass

LLMConfig(
    model: str,
    temperature: float = 0.1,
    max_tokens: int = 500,
    timeout: float = 30.0,
    api_key: Optional[str] = None,
)

Base configuration for all LLM clients.

This dataclass defines common configuration options shared by all LLM provider clients. Vendor-specific clients may extend this with additional options.

Attributes:

  • model (str) –

    Model identifier (provider-specific format).

  • temperature (float) –

    Sampling temperature (0.0=deterministic, 1.0=creative).

  • max_tokens (int) –

    Maximum tokens in the response.

  • timeout (float) –

    Request timeout in seconds.

  • api_key (Optional[str]) –

    API key for authentication (optional, can use env var).

LLMResponse

LLMResponse dataclass

LLMResponse(
    content: str,
    model: str,
    input_tokens: int = 0,
    output_tokens: int = 0,
    cost: float = 0.0,
    raw_response: Optional[Dict[str, Any]] = None,
)

Standard response from any LLM client.

This dataclass provides a unified response format across all LLM providers, abstracting away provider-specific response structures.

Attributes:

  • content (str) –

    The text content of the response.

  • model (str) –

    The model that generated the response.

  • input_tokens (int) –

    Number of input/prompt tokens used.

  • output_tokens (int) –

    Number of output/completion tokens generated.

  • cost (float) –

    Estimated cost of the request (if available).

  • raw_response (Optional[Dict[str, Any]]) –

    The original provider-specific response (for debugging).

Methods:

  • parse_json

    Parse content as JSON, handling common formatting issues.

parse_json

parse_json() -> Optional[Dict[str, Any]]

Parse content as JSON, handling common formatting issues.

LLMs sometimes wrap JSON in markdown code blocks. This method handles those cases and attempts to extract valid JSON.

Returns:

  • Optional[Dict[str, Any]]

    Parsed JSON as dict, or None if parsing fails.

BaseLLMClient

BaseLLMClient

BaseLLMClient(config: LLMConfig)

Abstract base class for LLM clients.

All LLM vendor clients (OpenAI, Anthropic, Groq, Gemini, Llama, etc.) must implement this interface to ensure consistent behavior across the codebase.

This abstraction allows: - Easy addition of new LLM providers - Consistent API for all providers - Provider-agnostic code in higher-level modules - Simplified testing with mock implementations

Example

class MyClient(BaseLLMClient): ... def completion(self, messages, **kwargs): ... # Implementation here ... pass ... client = MyClient(config) msgs = [{"role": "user", "content": "Hello"}] response = client.completion(msgs) print(response.content)

Parameters:

  • config

    (LLMConfig) –

    Configuration for the LLM client.

Methods:

  • cached_completion

    Make a completion request with caching.

  • complete_json

    Make a completion request and parse response as JSON.

  • completion

    Make a chat completion request.

  • is_available

    Check if the LLM provider is available and configured.

  • list_models

    List available models from the provider.

  • set_cache

    Configure caching for this client.

Attributes:

  • cache (Optional['TokenCache']) –

    Return the configured cache, if any.

  • call_count (int) –

    Return the number of API calls made by this client.

  • model_name (str) –

    Return the model name being used.

  • provider_name (str) –

    Return the name of the LLM provider.

  • use_cache (bool) –

    Return whether caching is enabled.

cache property

cache: Optional['TokenCache']

Return the configured cache, if any.

call_count abstractmethod property

call_count: int

Return the number of API calls made by this client.

Returns:

  • int

    Total number of completion calls made.

model_name property

model_name: str

Return the model name being used.

Returns:

  • str

    Model identifier string.

provider_name abstractmethod property

provider_name: str

Return the name of the LLM provider.

Returns:

  • str

    Provider name (e.g., "openai", "anthropic", "groq").

use_cache property

use_cache: bool

Return whether caching is enabled.

cached_completion

cached_completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a completion request with caching.

If caching is enabled and a cached response exists, returns the cached response without making an API call. Otherwise, makes the API call and caches the result.

Parameters:

  • messages

    (List[Dict[str, str]]) –

    List of message dicts with "role" and "content" keys.

  • **kwargs

    (Any, default: {} ) –

    Provider-specific options (temperature, max_tokens, etc.)

Returns:

  • LLMResponse

    LLMResponse with the generated content and metadata.

complete_json

complete_json(
    messages: List[Dict[str, str]], **kwargs: Any
) -> tuple[Optional[Dict[str, Any]], LLMResponse]

Make a completion request and parse response as JSON.

Convenience method that calls completion() and attempts to parse the response content as JSON.

Parameters:

  • messages

    (List[Dict[str, str]]) –

    List of message dicts with "role" and "content" keys.

  • **kwargs

    (Any, default: {} ) –

    Provider-specific options passed to completion().

Returns:

  • tuple[Optional[Dict[str, Any]], LLMResponse]

    Tuple of (parsed JSON dict or None, raw LLMResponse).

completion abstractmethod

completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a chat completion request.

This is the core method that sends a request to the LLM provider and returns a standardized response.

Parameters:

  • messages

    (List[Dict[str, str]]) –

    List of message dicts with "role" and "content" keys. Roles can be: "system", "user", "assistant".

  • **kwargs

    (Any, default: {} ) –

    Provider-specific options (temperature, max_tokens, etc.) that override the config defaults.

Returns:

  • LLMResponse

    LLMResponse with the generated content and metadata.

Raises:

  • ValueError

    If the API request fails or returns an error.

is_available abstractmethod

is_available() -> bool

Check if the LLM provider is available and configured.

This method checks whether the client can make API calls: - For cloud providers: checks if API key is set - For local providers: checks if server is running

Returns:

  • bool

    True if the provider is available and ready for requests.

list_models abstractmethod

list_models() -> List[str]

List available models from the provider.

Queries the provider's API to get the list of models accessible with the current API key or configuration. Results are filtered by the user's subscription/access level.

Returns:

  • List[str]

    List of model identifiers available for use.

Raises:

  • ValueError

    If the API request fails.

set_cache

set_cache(cache: Optional['TokenCache'], use_cache: bool = True) -> None

Configure caching for this client.

Parameters:

  • cache

    (Optional['TokenCache']) –

    TokenCache instance for caching, or None to disable.

  • use_cache

    (bool, default: True ) –

    Whether to use the cache (default True).