LLM Client Base Interface¶
Abstract base class and common types for all LLM vendor clients. This module defines the interface that all vendor-specific clients must implement, ensuring consistent behavior across different LLM providers.
Overview¶
The base client module provides:
- BaseLLMClient - Abstract base class defining the client interface
- LLMConfig - Base configuration dataclass for all clients
- LLMResponse - Unified response format from any LLM provider
Caching Support¶
BaseLLMClient includes built-in caching integration:
- set_cache() - Configure a TokenCache for response caching
- cached_completion() - Make completion requests with automatic caching
- _build_cache_key() - Generate deterministic cache keys (SHA-256)
Design Philosophy¶
We use vendor-specific API clients rather than wrapper libraries like LiteLLM or LangChain. This provides:
- Minimal dependencies (httpx only for HTTP)
- Reliable and predictable behavior
- Easy debugging without abstraction layers
- Full control over API interactions
The abstract interface ensures that all vendor clients behave consistently, making it easy to swap providers or add new ones.
Usage¶
Vendor-specific clients inherit from BaseLLMClient:
from causaliq_knowledge.llm import (
BaseLLMClient,
LLMConfig,
LLMResponse,
GroqClient,
GeminiClient,
)
# All clients share the same interface
def query_llm(client: BaseLLMClient, prompt: str) -> str:
messages = [{"role": "user", "content": prompt}]
response = client.completion(messages)
return response.content
# Works with any client
groq = GroqClient()
gemini = GeminiClient()
result1 = query_llm(groq, "What is 2+2?")
result2 = query_llm(gemini, "What is 2+2?")
Caching LLM Responses¶
Enable caching to avoid redundant API calls:
from causaliq_knowledge.cache import TokenCache
from causaliq_knowledge.llm import GroqClient, LLMConfig
# Create a persistent cache
with TokenCache("llm_cache.db") as cache:
client = GroqClient(LLMConfig(model="llama-3.1-8b-instant"))
client.set_cache(cache)
messages = [{"role": "user", "content": "What is Python?"}]
# First call - hits API, stores in cache
response1 = client.cached_completion(messages)
# Second call - returns from cache, no API call
response2 = client.cached_completion(messages)
assert response1.content == response2.content
assert client.call_count == 1 # Only one API call made
The cache uses the LLMEntryEncoder automatically, storing:
- Request details (model, messages, temperature, max_tokens)
- Response content
- Metadata (provider, token counts, cost, latency)
Each cached entry captures latency timing automatically using time.perf_counter(),
enabling performance analysis across providers and models.
See LLM Cache for details on the cache entry structure.
## Creating a Custom Client
To add support for a new LLM provider, implement the `BaseLLMClient` interface:
```python
from causaliq_knowledge.llm import BaseLLMClient, LLMConfig, LLMResponse
class MyCustomClient(BaseLLMClient):
def __init__(self, config: LLMConfig) -> None:
self.config = config
self._total_calls = 0
@property
def provider_name(self) -> str:
return "my_provider"
def completion(self, messages, **kwargs) -> LLMResponse:
# Implement API call here
...
return LLMResponse(
content="response text",
model=self.config.model,
input_tokens=10,
output_tokens=20,
)
@property
def call_count(self) -> int:
return self._total_calls
LLMConfig¶
LLMConfig
dataclass
¶
LLMConfig(
model: str,
temperature: float = 0.1,
max_tokens: int = 500,
timeout: float = 30.0,
api_key: Optional[str] = None,
)
Base configuration for all LLM clients.
This dataclass defines common configuration options shared by all LLM provider clients. Vendor-specific clients may extend this with additional options.
Attributes:
-
model(str) –Model identifier (provider-specific format).
-
temperature(float) –Sampling temperature (0.0=deterministic, 1.0=creative).
-
max_tokens(int) –Maximum tokens in the response.
-
timeout(float) –Request timeout in seconds.
-
api_key(Optional[str]) –API key for authentication (optional, can use env var).
LLMResponse¶
LLMResponse
dataclass
¶
LLMResponse(
content: str,
model: str,
input_tokens: int = 0,
output_tokens: int = 0,
cost: float = 0.0,
raw_response: Optional[Dict[str, Any]] = None,
)
Standard response from any LLM client.
This dataclass provides a unified response format across all LLM providers, abstracting away provider-specific response structures.
Attributes:
-
content(str) –The text content of the response.
-
model(str) –The model that generated the response.
-
input_tokens(int) –Number of input/prompt tokens used.
-
output_tokens(int) –Number of output/completion tokens generated.
-
cost(float) –Estimated cost of the request (if available).
-
raw_response(Optional[Dict[str, Any]]) –The original provider-specific response (for debugging).
Methods:
-
parse_json–Parse content as JSON, handling common formatting issues.
parse_json
¶
Parse content as JSON, handling common formatting issues.
LLMs sometimes wrap JSON in markdown code blocks. This method handles those cases and attempts to extract valid JSON.
Returns:
-
Optional[Dict[str, Any]]–Parsed JSON as dict, or None if parsing fails.
BaseLLMClient¶
BaseLLMClient
¶
Abstract base class for LLM clients.
All LLM vendor clients (OpenAI, Anthropic, Groq, Gemini, Llama, etc.) must implement this interface to ensure consistent behavior across the codebase.
This abstraction allows: - Easy addition of new LLM providers - Consistent API for all providers - Provider-agnostic code in higher-level modules - Simplified testing with mock implementations
Example
class MyClient(BaseLLMClient): ... def completion(self, messages, **kwargs): ... # Implementation here ... pass ... client = MyClient(config) msgs = [{"role": "user", "content": "Hello"}] response = client.completion(msgs) print(response.content)
Parameters:
Methods:
-
cached_completion–Make a completion request with caching.
-
complete_json–Make a completion request and parse response as JSON.
-
completion–Make a chat completion request.
-
is_available–Check if the LLM provider is available and configured.
-
list_models–List available models from the provider.
-
set_cache–Configure caching for this client.
Attributes:
-
cache(Optional['TokenCache']) –Return the configured cache, if any.
-
call_count(int) –Return the number of API calls made by this client.
-
model_name(str) –Return the model name being used.
-
provider_name(str) –Return the name of the LLM provider.
-
use_cache(bool) –Return whether caching is enabled.
call_count
abstractmethod
property
¶
Return the number of API calls made by this client.
Returns:
-
int–Total number of completion calls made.
model_name
property
¶
Return the model name being used.
Returns:
-
str–Model identifier string.
provider_name
abstractmethod
property
¶
Return the name of the LLM provider.
Returns:
-
str–Provider name (e.g., "openai", "anthropic", "groq").
cached_completion
¶
cached_completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse
Make a completion request with caching.
If caching is enabled and a cached response exists, returns the cached response without making an API call. Otherwise, makes the API call and caches the result.
Parameters:
-
(messages¶List[Dict[str, str]]) –List of message dicts with "role" and "content" keys.
-
(**kwargs¶Any, default:{}) –Provider-specific options (temperature, max_tokens, etc.)
Returns:
-
LLMResponse–LLMResponse with the generated content and metadata.
complete_json
¶
complete_json(
messages: List[Dict[str, str]], **kwargs: Any
) -> tuple[Optional[Dict[str, Any]], LLMResponse]
Make a completion request and parse response as JSON.
Convenience method that calls completion() and attempts to parse the response content as JSON.
Parameters:
-
(messages¶List[Dict[str, str]]) –List of message dicts with "role" and "content" keys.
-
(**kwargs¶Any, default:{}) –Provider-specific options passed to completion().
Returns:
-
tuple[Optional[Dict[str, Any]], LLMResponse]–Tuple of (parsed JSON dict or None, raw LLMResponse).
completion
abstractmethod
¶
completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse
Make a chat completion request.
This is the core method that sends a request to the LLM provider and returns a standardized response.
Parameters:
-
(messages¶List[Dict[str, str]]) –List of message dicts with "role" and "content" keys. Roles can be: "system", "user", "assistant".
-
(**kwargs¶Any, default:{}) –Provider-specific options (temperature, max_tokens, etc.) that override the config defaults.
Returns:
-
LLMResponse–LLMResponse with the generated content and metadata.
Raises:
-
ValueError–If the API request fails or returns an error.
is_available
abstractmethod
¶
Check if the LLM provider is available and configured.
This method checks whether the client can make API calls: - For cloud providers: checks if API key is set - For local providers: checks if server is running
Returns:
-
bool–True if the provider is available and ready for requests.
list_models
abstractmethod
¶
List available models from the provider.
Queries the provider's API to get the list of models accessible with the current API key or configuration. Results are filtered by the user's subscription/access level.
Returns:
-
List[str]–List of model identifiers available for use.
Raises:
-
ValueError–If the API request fails.