Skip to content

Groq Client API Reference

Direct Groq API client for fast LLM inference. This client implements the BaseLLMClient interface using httpx to communicate directly with the Groq API.

Overview

The Groq client provides:

  • Direct HTTP communication with Groq's API
  • Implements the BaseLLMClient abstract interface
  • JSON response parsing with error handling
  • Call counting for usage tracking
  • Configurable timeout and retry settings

Usage

from causaliq_knowledge.llm import GroqClient, GroqConfig

# Create client with custom config
config = GroqConfig(
    model="llama-3.1-8b-instant",
    temperature=0.1,
    max_tokens=500,
)
client = GroqClient(config=config)

# Make a completion request
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"},
]
response = client.completion(messages)
print(response.content)

# Parse JSON response
json_data = response.parse_json()

Environment Variables

The Groq client requires the GROQ_API_KEY environment variable to be set:

export GROQ_API_KEY=your_api_key_here

GroqConfig

GroqConfig dataclass

GroqConfig(
    model: str = "llama-3.1-8b-instant",
    temperature: float = 0.1,
    max_tokens: int = 500,
    timeout: float = 30.0,
    api_key: Optional[str] = None,
)

Configuration for Groq API client.

Extends LLMConfig with Groq-specific defaults.

Attributes:

  • model (str) –

    Groq model identifier (default: llama-3.1-8b-instant).

  • temperature (float) –

    Sampling temperature (default: 0.1).

  • max_tokens (int) –

    Maximum response tokens (default: 500).

  • timeout (float) –

    Request timeout in seconds (default: 30.0).

  • api_key (Optional[str]) –

    Groq API key (falls back to GROQ_API_KEY env var).

Methods:

  • __post_init__

    Set API key from environment if not provided.

api_key class-attribute instance-attribute

api_key: Optional[str] = None

max_tokens class-attribute instance-attribute

max_tokens: int = 500

model class-attribute instance-attribute

model: str = 'llama-3.1-8b-instant'

temperature class-attribute instance-attribute

temperature: float = 0.1

timeout class-attribute instance-attribute

timeout: float = 30.0

__post_init__

__post_init__() -> None

Set API key from environment if not provided.

GroqClient

GroqClient

GroqClient(config: Optional[GroqConfig] = None)

Direct Groq API client.

Implements the BaseLLMClient interface for Groq's API. Uses httpx for HTTP requests.

Example

config = GroqConfig(model="llama-3.1-8b-instant") client = GroqClient(config) msgs = [{"role": "user", "content": "Hello"}] response = client.completion(msgs) print(response.content)

Parameters:

  • config

    (Optional[GroqConfig], default: None ) –

    Groq configuration. If None, uses defaults with API key from GROQ_API_KEY environment variable.

Methods:

Attributes:

BASE_URL class-attribute instance-attribute

BASE_URL = 'https://api.groq.com/openai/v1'

_total_calls instance-attribute

_total_calls = 0

cache property

cache: Optional['TokenCache']

Return the configured cache, if any.

call_count property

call_count: int

Return the number of API calls made.

config instance-attribute

config = config or GroqConfig()

model_name property

model_name: str

Return the model name being used.

Returns:

  • str

    Model identifier string.

provider_name property

provider_name: str

Return the provider name.

use_cache property

use_cache: bool

Return whether caching is enabled.

_build_cache_key

_build_cache_key(
    messages: List[Dict[str, str]],
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
) -> str

Build a deterministic cache key for the request.

Creates a SHA-256 hash from the model, messages, temperature, and max_tokens. The hash is truncated to 16 hex characters (64 bits).

Parameters:

  • messages
    (List[Dict[str, str]]) –

    List of message dicts with "role" and "content" keys.

  • temperature
    (Optional[float], default: None ) –

    Sampling temperature (defaults to config value).

  • max_tokens
    (Optional[int], default: None ) –

    Maximum tokens (defaults to config value).

Returns:

  • str

    16-character hex string cache key.

cached_completion

cached_completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a completion request with caching.

If caching is enabled and a cached response exists, returns the cached response without making an API call. Otherwise, makes the API call and caches the result.

Parameters:

  • messages
    (List[Dict[str, str]]) –

    List of message dicts with "role" and "content" keys.

  • **kwargs
    (Any, default: {} ) –

    Provider-specific options (temperature, max_tokens, etc.)

Returns:

  • LLMResponse

    LLMResponse with the generated content and metadata.

complete_json

complete_json(
    messages: List[Dict[str, str]], **kwargs: Any
) -> tuple[Optional[Dict[str, Any]], LLMResponse]

Make a completion request and parse response as JSON.

Parameters:

  • messages
    (List[Dict[str, str]]) –

    List of message dicts with "role" and "content" keys.

  • **kwargs
    (Any, default: {} ) –

    Override config options passed to completion().

Returns:

  • tuple[Optional[Dict[str, Any]], LLMResponse]

    Tuple of (parsed JSON dict or None, raw LLMResponse).

completion

completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a chat completion request to Groq.

Parameters:

  • messages
    (List[Dict[str, str]]) –

    List of message dicts with "role" and "content" keys.

  • **kwargs
    (Any, default: {} ) –

    Override config options (temperature, max_tokens).

Returns:

  • LLMResponse

    LLMResponse with the generated content and metadata.

Raises:

  • ValueError

    If the API request fails.

is_available

is_available() -> bool

Check if Groq API is available.

Returns:

  • bool

    True if GROQ_API_KEY is configured.

list_models

list_models() -> List[str]

List available models from Groq API.

Queries the Groq API to get models accessible with the current API key. Filters to only include text generation models.

Returns:

  • List[str]

    List of model identifiers (e.g., ['llama-3.1-8b-instant', ...]).

Raises:

  • ValueError

    If the API request fails.

set_cache

set_cache(cache: Optional['TokenCache'], use_cache: bool = True) -> None

Configure caching for this client.

Parameters:

  • cache
    (Optional['TokenCache']) –

    TokenCache instance for caching, or None to disable.

  • use_cache
    (bool, default: True ) –

    Whether to use the cache (default True).

Supported Models

Groq provides fast inference for open-source models:

Model Description Free Tier
llama-3.1-8b-instant Fast Llama 3.1 8B model ✅ Yes
llama-3.1-70b-versatile Larger Llama 3.1 model ✅ Yes
mixtral-8x7b-32768 Mixtral MoE model ✅ Yes

See Groq documentation for the full list of available models.