Gemini Client API Reference¶

Direct Google Gemini API client. This client implements the BaseLLMClient interface using httpx to communicate directly with Google's Generative Language API.

Overview¶

The Gemini client provides:

Direct HTTP communication with Google's Generative Language API
Implements the BaseLLMClient abstract interface
Automatic conversion from OpenAI-style messages to Gemini format
JSON response parsing with error handling
Call counting for usage tracking
Configurable timeout settings

Usage¶

from causaliq_knowledge.llm import GeminiClient, GeminiConfig

# Create client with custom config
config = GeminiConfig(
    model="gemini-2.5-flash",
    temperature=0.1,
    max_tokens=500,
)
client = GeminiClient(config=config)

# Make a completion request (OpenAI-style messages)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"},
]
response = client.completion(messages)
print(response.content)

# Parse JSON response
json_data = response.parse_json()

Environment Variables¶

The Gemini client requires the GEMINI_API_KEY environment variable to be set:

export GEMINI_API_KEY=your_api_key_here

GeminiConfig¶

GeminiConfig `dataclass` ¶

GeminiConfig(
    model: str = "gemini-2.5-flash",
    temperature: float = 0.1,
    max_tokens: int = 500,
    timeout: float = 30.0,
    api_key: Optional[str] = None,
)

Configuration for Gemini API client.

Extends LLMConfig with Gemini-specific defaults.

Attributes:

model (str) –

Gemini model identifier (default: gemini-2.5-flash).
temperature (float) –

Sampling temperature (default: 0.1).
max_tokens (int) –

Maximum response tokens (default: 500).
timeout (float) –

Request timeout in seconds (default: 30.0).
api_key (Optional[str]) –

Gemini API key (falls back to GEMINI_API_KEY env var).

Methods:

__post_init__ –

Set API key from environment if not provided.

api_key `class-attribute` `instance-attribute` ¶

api_key: Optional[str] = None

max_tokens `class-attribute` `instance-attribute` ¶

max_tokens: int = 500

model `class-attribute` `instance-attribute` ¶

model: str = 'gemini-2.5-flash'

temperature `class-attribute` `instance-attribute` ¶

temperature: float = 0.1

timeout `class-attribute` `instance-attribute` ¶

timeout: float = 30.0

__post_init__ ¶

__post_init__() -> None

Set API key from environment if not provided.

GeminiClient¶

GeminiClient ¶

GeminiClient(config: Optional[GeminiConfig] = None)

Direct Gemini API client.

Implements the BaseLLMClient interface for Google's Gemini API. Uses httpx for HTTP requests.

Example

config = GeminiConfig(model="gemini-2.5-flash") client = GeminiClient(config) msgs = [{"role": "user", "content": "Hello"}] response = client.completion(msgs) print(response.content)

Parameters:

config ¶
(Optional[GeminiConfig], default: None ) –

Gemini configuration. If None, uses defaults with API key from GEMINI_API_KEY environment variable.

Methods:

_build_cache_key –

Build a deterministic cache key for the request.
cached_completion –

Make a completion request with caching.
complete_json –

Make a completion request and parse response as JSON.
completion –

Make a chat completion request to Gemini.
is_available –

Check if Gemini API is available.
list_models –

List available models from Gemini API.
set_cache –

Configure caching for this client.

Attributes:

BASE_URL –
_total_calls –
cache (Optional['TokenCache']) –

Return the configured cache, if any.
call_count (int) –

Return the number of API calls made.
config –
model_name (str) –

Return the model name being used.
provider_name (str) –

Return the provider name.
use_cache (bool) –

Return whether caching is enabled.

BASE_URL `class-attribute` `instance-attribute` ¶

BASE_URL = 'https://generativelanguage.googleapis.com/v1beta/models'

_total_calls `instance-attribute` ¶

_total_calls = 0

cache `property` ¶

cache: Optional['TokenCache']

Return the configured cache, if any.

call_count `property` ¶

call_count: int

Return the number of API calls made.

config `instance-attribute` ¶

config = config or GeminiConfig()

model_name `property` ¶

model_name: str

Return the model name being used.

Returns:

str –

Model identifier string.

provider_name `property` ¶

provider_name: str

Return the provider name.

use_cache `property` ¶

use_cache: bool

Return whether caching is enabled.

_build_cache_key ¶

_build_cache_key(
    messages: List[Dict[str, str]],
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
) -> str

Build a deterministic cache key for the request.

Creates a SHA-256 hash from the model, messages, temperature, and max_tokens. The hash is truncated to 16 hex characters (64 bits).

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
temperature ¶
(Optional[float], default: None ) –

Sampling temperature (defaults to config value).
max_tokens ¶
(Optional[int], default: None ) –

Maximum tokens (defaults to config value).

Returns:

str –

16-character hex string cache key.

cached_completion ¶

cached_completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a completion request with caching.

If caching is enabled and a cached response exists, returns the cached response without making an API call. Otherwise, makes the API call and caches the result.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
**kwargs ¶
(Any, default: {} ) –

Provider-specific options (temperature, max_tokens, etc.)

Returns:

LLMResponse –

LLMResponse with the generated content and metadata.

complete_json ¶

complete_json(
    messages: List[Dict[str, str]], **kwargs: Any
) -> tuple[Optional[Dict[str, Any]], LLMResponse]

Make a completion request and parse response as JSON.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
**kwargs ¶
(Any, default: {} ) –

Override config options passed to completion().

Returns:

tuple[Optional[Dict[str, Any]], LLMResponse] –

Tuple of (parsed JSON dict or None, raw LLMResponse).

completion ¶

completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a chat completion request to Gemini.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
**kwargs ¶
(Any, default: {} ) –

Override config options (temperature, max_tokens).

Returns:

LLMResponse –

LLMResponse with the generated content and metadata.

Raises:

ValueError –

If the API request fails.

is_available ¶

is_available() -> bool

Check if Gemini API is available.

Returns:

bool –

True if GEMINI_API_KEY is configured.

list_models ¶

list_models() -> List[str]

List available models from Gemini API.

Queries the Gemini API to get models accessible with the current API key. Filters to only include models that support generateContent.

Returns:

List[str] –

List of model identifiers (e.g., ['gemini-2.5-flash', ...]).

Raises:

ValueError –

If the API request fails.

set_cache ¶

set_cache(cache: Optional['TokenCache'], use_cache: bool = True) -> None

Configure caching for this client.

Parameters:

cache ¶
(Optional['TokenCache']) –

TokenCache instance for caching, or None to disable.
use_cache ¶
(bool, default: True ) –

Whether to use the cache (default True).

Message Format Conversion¶

The client automatically converts OpenAI-style messages to Gemini's format:

OpenAI Role	Gemini Role
`system`	System instruction (separate field)
`user`	`user`
`assistant`	`model`

Supported Models¶

Google Gemini provides a generous free tier:

Model	Description	Free Tier
`gemini-2.5-flash`	Fast and efficient	✅ Yes
`gemini-2.5-pro`	Most capable	✅ Limited
`gemini-1.5-flash`	Previous generation	✅ Yes

See Google AI documentation for the full list of available models.

Gemini Client API Reference¶

Overview¶

Usage¶

Environment Variables¶

GeminiConfig¶

GeminiConfig dataclass ¶

api_key class-attribute instance-attribute ¶

max_tokens class-attribute instance-attribute ¶

model class-attribute instance-attribute ¶

temperature class-attribute instance-attribute ¶

timeout class-attribute instance-attribute ¶

__post_init__ ¶

GeminiClient¶

GeminiClient ¶

config ¶

BASE_URL class-attribute instance-attribute ¶

_total_calls instance-attribute ¶

cache property ¶

call_count property ¶

config instance-attribute ¶

model_name property ¶

provider_name property ¶

use_cache property ¶

_build_cache_key ¶

messages ¶

temperature ¶

max_tokens ¶

cached_completion ¶

messages ¶

**kwargs ¶

complete_json ¶

messages ¶

**kwargs ¶

completion ¶

messages ¶

**kwargs ¶

is_available ¶

list_models ¶

set_cache ¶

cache ¶

use_cache ¶

Message Format Conversion¶

Supported Models¶

GeminiConfig `dataclass` ¶

api_key `class-attribute` `instance-attribute` ¶

max_tokens `class-attribute` `instance-attribute` ¶

model `class-attribute` `instance-attribute` ¶

temperature `class-attribute` `instance-attribute` ¶

timeout `class-attribute` `instance-attribute` ¶

`config` ¶

BASE_URL `class-attribute` `instance-attribute` ¶

_total_calls `instance-attribute` ¶

cache `property` ¶

call_count `property` ¶

config `instance-attribute` ¶

model_name `property` ¶

provider_name `property` ¶

use_cache `property` ¶

`messages` ¶

`temperature` ¶

`max_tokens` ¶

`messages` ¶

`kwargs`** ¶

`messages` ¶

`kwargs`** ¶

`messages` ¶

`kwargs`** ¶

`cache` ¶

`use_cache` ¶