Groq Client API Reference¶

Direct Groq API client for fast LLM inference. This client implements the BaseLLMClient interface using httpx to communicate directly with the Groq API.

Overview¶

The Groq client provides:

Direct HTTP communication with Groq's API
Implements the BaseLLMClient abstract interface
JSON response parsing with error handling
Call counting for usage tracking
Configurable timeout and retry settings

Usage¶

from causaliq_knowledge.llm import GroqClient, GroqConfig

# Create client with custom config
config = GroqConfig(
    model="llama-3.1-8b-instant",
    temperature=0.1,
    max_tokens=500,
)
client = GroqClient(config=config)

# Make a completion request
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"},
]
response = client.completion(messages)
print(response.content)

# Parse JSON response
json_data = response.parse_json()

Environment Variables¶

The Groq client requires the GROQ_API_KEY environment variable to be set:

export GROQ_API_KEY=your_api_key_here

GroqConfig¶

GroqConfig `dataclass` ¶

GroqConfig(
    model: str = "llama-3.1-8b-instant",
    temperature: float = 0.1,
    max_tokens: int = 500,
    timeout: float = 30.0,
    api_key: Optional[str] = None,
)

Configuration for Groq API client.

Extends LLMConfig with Groq-specific defaults.

Attributes:

model (str) –

Groq model identifier (default: llama-3.1-8b-instant).
temperature (float) –

Sampling temperature (default: 0.1).
max_tokens (int) –

Maximum response tokens (default: 500).
timeout (float) –

Request timeout in seconds (default: 30.0).
api_key (Optional[str]) –

Groq API key (falls back to GROQ_API_KEY env var).

Methods:

__post_init__ –

Set API key from environment if not provided.

api_key `class-attribute` `instance-attribute` ¶

api_key: Optional[str] = None

max_tokens `class-attribute` `instance-attribute` ¶

max_tokens: int = 500

model `class-attribute` `instance-attribute` ¶

model: str = 'llama-3.1-8b-instant'

temperature `class-attribute` `instance-attribute` ¶

temperature: float = 0.1

timeout `class-attribute` `instance-attribute` ¶

timeout: float = 30.0

__post_init__ ¶

__post_init__() -> None

Set API key from environment if not provided.

GroqClient¶

GroqClient ¶

GroqClient(config: Optional[GroqConfig] = None)

Direct Groq API client.

Implements the BaseLLMClient interface for Groq's API. Uses httpx for HTTP requests.

Example

config = GroqConfig(model="llama-3.1-8b-instant") client = GroqClient(config) msgs = [{"role": "user", "content": "Hello"}] response = client.completion(msgs) print(response.content)

Parameters:

config ¶
(Optional[GroqConfig], default: None ) –

Groq configuration. If None, uses defaults with API key from GROQ_API_KEY environment variable.

Methods:

_build_cache_key –

Build a deterministic cache key for the request.
cached_completion –

Make a completion request with caching.
complete_json –

Make a completion request and parse response as JSON.
completion –

Make a chat completion request to Groq.
is_available –

Check if Groq API is available.
list_models –

List available models from Groq API.
set_cache –

Configure caching for this client.

Attributes:

BASE_URL –
_total_calls –
cache (Optional['TokenCache']) –

Return the configured cache, if any.
call_count (int) –

Return the number of API calls made.
config –
model_name (str) –

Return the model name being used.
provider_name (str) –

Return the provider name.
use_cache (bool) –

Return whether caching is enabled.

BASE_URL `class-attribute` `instance-attribute` ¶

BASE_URL = 'https://api.groq.com/openai/v1'

_total_calls `instance-attribute` ¶

_total_calls = 0

cache `property` ¶

cache: Optional['TokenCache']

Return the configured cache, if any.

call_count `property` ¶

call_count: int

Return the number of API calls made.

config `instance-attribute` ¶

config = config or GroqConfig()

model_name `property` ¶

model_name: str

Return the model name being used.

Returns:

str –

Model identifier string.

provider_name `property` ¶

provider_name: str

Return the provider name.

use_cache `property` ¶

use_cache: bool

Return whether caching is enabled.

_build_cache_key ¶

_build_cache_key(
    messages: List[Dict[str, str]],
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
) -> str

Build a deterministic cache key for the request.

Creates a SHA-256 hash from the model, messages, temperature, and max_tokens. The hash is truncated to 16 hex characters (64 bits).

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
temperature ¶
(Optional[float], default: None ) –

Sampling temperature (defaults to config value).
max_tokens ¶
(Optional[int], default: None ) –

Maximum tokens (defaults to config value).

Returns:

str –

16-character hex string cache key.

cached_completion ¶

cached_completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a completion request with caching.

If caching is enabled and a cached response exists, returns the cached response without making an API call. Otherwise, makes the API call and caches the result.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
**kwargs ¶
(Any, default: {} ) –

Provider-specific options (temperature, max_tokens, etc.)

Returns:

LLMResponse –

LLMResponse with the generated content and metadata.

complete_json ¶

complete_json(
    messages: List[Dict[str, str]], **kwargs: Any
) -> tuple[Optional[Dict[str, Any]], LLMResponse]

Make a completion request and parse response as JSON.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
**kwargs ¶
(Any, default: {} ) –

Override config options passed to completion().

Returns:

tuple[Optional[Dict[str, Any]], LLMResponse] –

Tuple of (parsed JSON dict or None, raw LLMResponse).

completion ¶

completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a chat completion request to Groq.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
**kwargs ¶
(Any, default: {} ) –

Override config options (temperature, max_tokens).

Returns:

LLMResponse –

LLMResponse with the generated content and metadata.

Raises:

ValueError –

If the API request fails.

is_available ¶

is_available() -> bool

Check if Groq API is available.

Returns:

bool –

True if GROQ_API_KEY is configured.

list_models ¶

list_models() -> List[str]

List available models from Groq API.

Queries the Groq API to get models accessible with the current API key. Filters to only include text generation models.

Returns:

List[str] –

List of model identifiers (e.g., ['llama-3.1-8b-instant', ...]).

Raises:

ValueError –

If the API request fails.

set_cache ¶

set_cache(cache: Optional['TokenCache'], use_cache: bool = True) -> None

Configure caching for this client.

Parameters:

cache ¶
(Optional['TokenCache']) –

TokenCache instance for caching, or None to disable.
use_cache ¶
(bool, default: True ) –

Whether to use the cache (default True).

Supported Models¶

Groq provides fast inference for open-source models:

Model	Description	Free Tier
`llama-3.1-8b-instant`	Fast Llama 3.1 8B model	✅ Yes
`llama-3.1-70b-versatile`	Larger Llama 3.1 model	✅ Yes
`mixtral-8x7b-32768`	Mixtral MoE model	✅ Yes

See Groq documentation for the full list of available models.

Groq Client API Reference¶

Overview¶

Usage¶

Environment Variables¶

GroqConfig¶

GroqConfig dataclass ¶

api_key class-attribute instance-attribute ¶

max_tokens class-attribute instance-attribute ¶

model class-attribute instance-attribute ¶

temperature class-attribute instance-attribute ¶

timeout class-attribute instance-attribute ¶

__post_init__ ¶

GroqClient¶

GroqClient ¶

config ¶

BASE_URL class-attribute instance-attribute ¶

_total_calls instance-attribute ¶

cache property ¶

call_count property ¶

config instance-attribute ¶

model_name property ¶

provider_name property ¶

use_cache property ¶

_build_cache_key ¶

messages ¶

temperature ¶

max_tokens ¶

cached_completion ¶

messages ¶

**kwargs ¶

complete_json ¶

messages ¶

**kwargs ¶

completion ¶

messages ¶

**kwargs ¶

is_available ¶

list_models ¶

set_cache ¶

cache ¶

use_cache ¶

Supported Models¶

GroqConfig `dataclass` ¶

api_key `class-attribute` `instance-attribute` ¶

max_tokens `class-attribute` `instance-attribute` ¶

model `class-attribute` `instance-attribute` ¶

temperature `class-attribute` `instance-attribute` ¶

timeout `class-attribute` `instance-attribute` ¶

`config` ¶

BASE_URL `class-attribute` `instance-attribute` ¶

_total_calls `instance-attribute` ¶

cache `property` ¶

call_count `property` ¶

config `instance-attribute` ¶

model_name `property` ¶

provider_name `property` ¶

use_cache `property` ¶

`messages` ¶

`temperature` ¶

`max_tokens` ¶

`messages` ¶

`kwargs`** ¶

`messages` ¶

`kwargs`** ¶

`messages` ¶

`kwargs`** ¶

`cache` ¶

`use_cache` ¶