Gemini Client API Reference¶
Direct Google Gemini API client. This client implements the BaseLLMClient interface using httpx to communicate directly with Google's Generative Language API.
Overview¶
The Gemini client provides:
- Direct HTTP communication with Google's Generative Language API
- Implements the
BaseLLMClientabstract interface - Automatic conversion from OpenAI-style messages to Gemini format
- JSON response parsing with error handling
- Call counting for usage tracking
- Configurable timeout settings
Usage¶
from causaliq_knowledge.llm import GeminiClient, GeminiConfig
# Create client with custom config
config = GeminiConfig(
model="gemini-2.5-flash",
temperature=0.1,
max_tokens=500,
)
client = GeminiClient(config=config)
# Make a completion request (OpenAI-style messages)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"},
]
response = client.completion(messages)
print(response.content)
# Parse JSON response
json_data = response.parse_json()
Environment Variables¶
The Gemini client requires the GEMINI_API_KEY environment variable to be set:
GeminiConfig¶
GeminiConfig
dataclass
¶
GeminiConfig(
model: str = "gemini-2.5-flash",
temperature: float = 0.1,
max_tokens: int = 500,
timeout: float = 30.0,
api_key: Optional[str] = None,
)
Configuration for Gemini API client.
Extends LLMConfig with Gemini-specific defaults.
Attributes:
-
model(str) –Gemini model identifier (default: gemini-2.5-flash).
-
temperature(float) –Sampling temperature (default: 0.1).
-
max_tokens(int) –Maximum response tokens (default: 500).
-
timeout(float) –Request timeout in seconds (default: 30.0).
-
api_key(Optional[str]) –Gemini API key (falls back to GEMINI_API_KEY env var).
Methods:
-
__post_init__–Set API key from environment if not provided.
GeminiClient¶
GeminiClient
¶
GeminiClient(config: Optional[GeminiConfig] = None)
Direct Gemini API client.
Implements the BaseLLMClient interface for Google's Gemini API. Uses httpx for HTTP requests.
Example
config = GeminiConfig(model="gemini-2.5-flash") client = GeminiClient(config) msgs = [{"role": "user", "content": "Hello"}] response = client.completion(msgs) print(response.content)
Parameters:
-
(config¶Optional[GeminiConfig], default:None) –Gemini configuration. If None, uses defaults with API key from GEMINI_API_KEY environment variable.
Methods:
-
_build_cache_key–Build a deterministic cache key for the request.
-
cached_completion–Make a completion request with caching.
-
complete_json–Make a completion request and parse response as JSON.
-
completion–Make a chat completion request to Gemini.
-
is_available–Check if Gemini API is available.
-
list_models–List available models from Gemini API.
-
set_cache–Configure caching for this client.
Attributes:
-
BASE_URL– -
_total_calls– -
cache(Optional['TokenCache']) –Return the configured cache, if any.
-
call_count(int) –Return the number of API calls made.
-
config– -
model_name(str) –Return the model name being used.
-
provider_name(str) –Return the provider name.
-
use_cache(bool) –Return whether caching is enabled.
BASE_URL
class-attribute
instance-attribute
¶
model_name
property
¶
Return the model name being used.
Returns:
-
str–Model identifier string.
_build_cache_key
¶
_build_cache_key(
messages: List[Dict[str, str]],
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
) -> str
Build a deterministic cache key for the request.
Creates a SHA-256 hash from the model, messages, temperature, and max_tokens. The hash is truncated to 16 hex characters (64 bits).
Parameters:
-
(messages¶List[Dict[str, str]]) –List of message dicts with "role" and "content" keys.
-
(temperature¶Optional[float], default:None) –Sampling temperature (defaults to config value).
-
(max_tokens¶Optional[int], default:None) –Maximum tokens (defaults to config value).
Returns:
-
str–16-character hex string cache key.
cached_completion
¶
cached_completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse
Make a completion request with caching.
If caching is enabled and a cached response exists, returns the cached response without making an API call. Otherwise, makes the API call and caches the result.
Parameters:
-
(messages¶List[Dict[str, str]]) –List of message dicts with "role" and "content" keys.
-
(**kwargs¶Any, default:{}) –Provider-specific options (temperature, max_tokens, etc.)
Returns:
-
LLMResponse–LLMResponse with the generated content and metadata.
complete_json
¶
complete_json(
messages: List[Dict[str, str]], **kwargs: Any
) -> tuple[Optional[Dict[str, Any]], LLMResponse]
Make a completion request and parse response as JSON.
Parameters:
-
(messages¶List[Dict[str, str]]) –List of message dicts with "role" and "content" keys.
-
(**kwargs¶Any, default:{}) –Override config options passed to completion().
Returns:
-
tuple[Optional[Dict[str, Any]], LLMResponse]–Tuple of (parsed JSON dict or None, raw LLMResponse).
completion
¶
completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse
Make a chat completion request to Gemini.
Parameters:
-
(messages¶List[Dict[str, str]]) –List of message dicts with "role" and "content" keys.
-
(**kwargs¶Any, default:{}) –Override config options (temperature, max_tokens).
Returns:
-
LLMResponse–LLMResponse with the generated content and metadata.
Raises:
-
ValueError–If the API request fails.
is_available
¶
Check if Gemini API is available.
Returns:
-
bool–True if GEMINI_API_KEY is configured.
list_models
¶
List available models from Gemini API.
Queries the Gemini API to get models accessible with the current API key. Filters to only include models that support generateContent.
Returns:
-
List[str]–List of model identifiers (e.g., ['gemini-2.5-flash', ...]).
Raises:
-
ValueError–If the API request fails.
Message Format Conversion¶
The client automatically converts OpenAI-style messages to Gemini's format:
| OpenAI Role | Gemini Role |
|---|---|
system |
System instruction (separate field) |
user |
user |
assistant |
model |
Supported Models¶
Google Gemini provides a generous free tier:
| Model | Description | Free Tier |
|---|---|---|
gemini-2.5-flash |
Fast and efficient | ✅ Yes |
gemini-2.5-pro |
Most capable | ✅ Limited |
gemini-1.5-flash |
Previous generation | ✅ Yes |
See Google AI documentation for the full list of available models.