Skip to content

Gemini Client API Reference

Direct Google Gemini API client. This client implements the BaseLLMClient interface using httpx to communicate directly with Google's Generative Language API.

Overview

The Gemini client provides:

  • Direct HTTP communication with Google's Generative Language API
  • Implements the BaseLLMClient abstract interface
  • Automatic conversion from OpenAI-style messages to Gemini format
  • JSON response parsing with error handling
  • Call counting for usage tracking
  • Configurable timeout settings

Usage

from causaliq_knowledge.llm import GeminiClient, GeminiConfig

# Create client with custom config
config = GeminiConfig(
    model="gemini-2.5-flash",
    temperature=0.1,
    max_tokens=500,
)
client = GeminiClient(config=config)

# Make a completion request (OpenAI-style messages)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"},
]
response = client.completion(messages)
print(response.content)

# Parse JSON response
json_data = response.parse_json()

Environment Variables

The Gemini client requires the GEMINI_API_KEY environment variable to be set:

export GEMINI_API_KEY=your_api_key_here

GeminiConfig

GeminiConfig dataclass

GeminiConfig(
    model: str = "gemini-2.5-flash",
    temperature: float = 0.1,
    max_tokens: int = 500,
    timeout: float = 30.0,
    api_key: Optional[str] = None,
)

Configuration for Gemini API client.

Extends LLMConfig with Gemini-specific defaults.

Attributes:

  • model (str) –

    Gemini model identifier (default: gemini-2.5-flash).

  • temperature (float) –

    Sampling temperature (default: 0.1).

  • max_tokens (int) –

    Maximum response tokens (default: 500).

  • timeout (float) –

    Request timeout in seconds (default: 30.0).

  • api_key (Optional[str]) –

    Gemini API key (falls back to GEMINI_API_KEY env var).

Methods:

  • __post_init__

    Set API key from environment if not provided.

api_key class-attribute instance-attribute

api_key: Optional[str] = None

max_tokens class-attribute instance-attribute

max_tokens: int = 500

model class-attribute instance-attribute

model: str = 'gemini-2.5-flash'

temperature class-attribute instance-attribute

temperature: float = 0.1

timeout class-attribute instance-attribute

timeout: float = 30.0

__post_init__

__post_init__() -> None

Set API key from environment if not provided.

GeminiClient

GeminiClient

GeminiClient(config: Optional[GeminiConfig] = None)

Direct Gemini API client.

Implements the BaseLLMClient interface for Google's Gemini API. Uses httpx for HTTP requests.

Example

config = GeminiConfig(model="gemini-2.5-flash") client = GeminiClient(config) msgs = [{"role": "user", "content": "Hello"}] response = client.completion(msgs) print(response.content)

Parameters:

  • config

    (Optional[GeminiConfig], default: None ) –

    Gemini configuration. If None, uses defaults with API key from GEMINI_API_KEY environment variable.

Methods:

Attributes:

BASE_URL class-attribute instance-attribute

BASE_URL = 'https://generativelanguage.googleapis.com/v1beta/models'

_total_calls instance-attribute

_total_calls = 0

cache property

cache: Optional['TokenCache']

Return the configured cache, if any.

call_count property

call_count: int

Return the number of API calls made.

config instance-attribute

config = config or GeminiConfig()

model_name property

model_name: str

Return the model name being used.

Returns:

  • str

    Model identifier string.

provider_name property

provider_name: str

Return the provider name.

use_cache property

use_cache: bool

Return whether caching is enabled.

_build_cache_key

_build_cache_key(
    messages: List[Dict[str, str]],
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
) -> str

Build a deterministic cache key for the request.

Creates a SHA-256 hash from the model, messages, temperature, and max_tokens. The hash is truncated to 16 hex characters (64 bits).

Parameters:

  • messages
    (List[Dict[str, str]]) –

    List of message dicts with "role" and "content" keys.

  • temperature
    (Optional[float], default: None ) –

    Sampling temperature (defaults to config value).

  • max_tokens
    (Optional[int], default: None ) –

    Maximum tokens (defaults to config value).

Returns:

  • str

    16-character hex string cache key.

cached_completion

cached_completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a completion request with caching.

If caching is enabled and a cached response exists, returns the cached response without making an API call. Otherwise, makes the API call and caches the result.

Parameters:

  • messages
    (List[Dict[str, str]]) –

    List of message dicts with "role" and "content" keys.

  • **kwargs
    (Any, default: {} ) –

    Provider-specific options (temperature, max_tokens, etc.)

Returns:

  • LLMResponse

    LLMResponse with the generated content and metadata.

complete_json

complete_json(
    messages: List[Dict[str, str]], **kwargs: Any
) -> tuple[Optional[Dict[str, Any]], LLMResponse]

Make a completion request and parse response as JSON.

Parameters:

  • messages
    (List[Dict[str, str]]) –

    List of message dicts with "role" and "content" keys.

  • **kwargs
    (Any, default: {} ) –

    Override config options passed to completion().

Returns:

  • tuple[Optional[Dict[str, Any]], LLMResponse]

    Tuple of (parsed JSON dict or None, raw LLMResponse).

completion

completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a chat completion request to Gemini.

Parameters:

  • messages
    (List[Dict[str, str]]) –

    List of message dicts with "role" and "content" keys.

  • **kwargs
    (Any, default: {} ) –

    Override config options (temperature, max_tokens).

Returns:

  • LLMResponse

    LLMResponse with the generated content and metadata.

Raises:

  • ValueError

    If the API request fails.

is_available

is_available() -> bool

Check if Gemini API is available.

Returns:

  • bool

    True if GEMINI_API_KEY is configured.

list_models

list_models() -> List[str]

List available models from Gemini API.

Queries the Gemini API to get models accessible with the current API key. Filters to only include models that support generateContent.

Returns:

  • List[str]

    List of model identifiers (e.g., ['gemini-2.5-flash', ...]).

Raises:

  • ValueError

    If the API request fails.

set_cache

set_cache(cache: Optional['TokenCache'], use_cache: bool = True) -> None

Configure caching for this client.

Parameters:

  • cache
    (Optional['TokenCache']) –

    TokenCache instance for caching, or None to disable.

  • use_cache
    (bool, default: True ) –

    Whether to use the cache (default True).

Message Format Conversion

The client automatically converts OpenAI-style messages to Gemini's format:

OpenAI Role Gemini Role
system System instruction (separate field)
user user
assistant model

Supported Models

Google Gemini provides a generous free tier:

Model Description Free Tier
gemini-2.5-flash Fast and efficient ✅ Yes
gemini-2.5-pro Most capable ✅ Limited
gemini-1.5-flash Previous generation ✅ Yes

See Google AI documentation for the full list of available models.