Groq Client API Reference¶
Direct Groq API client for fast LLM inference. This client implements the BaseLLMClient interface using httpx to communicate directly with the Groq API.
Overview¶
The Groq client provides:
- Direct HTTP communication with Groq's API
- Implements the
BaseLLMClientabstract interface - JSON response parsing with error handling
- Call counting for usage tracking
- Configurable timeout and retry settings
Usage¶
from causaliq_knowledge.llm import GroqClient, GroqConfig
# Create client with custom config
config = GroqConfig(
model="llama-3.1-8b-instant",
temperature=0.1,
max_tokens=500,
)
client = GroqClient(config=config)
# Make a completion request
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"},
]
response = client.completion(messages)
print(response.content)
# Parse JSON response
json_data = response.parse_json()
Environment Variables¶
The Groq client requires the GROQ_API_KEY environment variable to be set:
GroqConfig¶
GroqConfig
dataclass
¶
GroqConfig(
model: str = "llama-3.1-8b-instant",
temperature: float = 0.1,
max_tokens: int = 500,
timeout: float = 30.0,
api_key: Optional[str] = None,
)
Configuration for Groq API client.
Extends LLMConfig with Groq-specific defaults.
Attributes:
-
model(str) –Groq model identifier (default: llama-3.1-8b-instant).
-
temperature(float) –Sampling temperature (default: 0.1).
-
max_tokens(int) –Maximum response tokens (default: 500).
-
timeout(float) –Request timeout in seconds (default: 30.0).
-
api_key(Optional[str]) –Groq API key (falls back to GROQ_API_KEY env var).
Methods:
-
__post_init__–Set API key from environment if not provided.
GroqClient¶
GroqClient
¶
GroqClient(config: Optional[GroqConfig] = None)
Direct Groq API client.
Implements the BaseLLMClient interface for Groq's API. Uses httpx for HTTP requests.
Example
config = GroqConfig(model="llama-3.1-8b-instant") client = GroqClient(config) msgs = [{"role": "user", "content": "Hello"}] response = client.completion(msgs) print(response.content)
Parameters:
-
(config¶Optional[GroqConfig], default:None) –Groq configuration. If None, uses defaults with API key from GROQ_API_KEY environment variable.
Methods:
-
_build_cache_key–Build a deterministic cache key for the request.
-
cached_completion–Make a completion request with caching.
-
complete_json–Make a completion request and parse response as JSON.
-
completion–Make a chat completion request to Groq.
-
is_available–Check if Groq API is available.
-
list_models–List available models from Groq API.
-
set_cache–Configure caching for this client.
Attributes:
-
BASE_URL– -
_total_calls– -
cache(Optional['TokenCache']) –Return the configured cache, if any.
-
call_count(int) –Return the number of API calls made.
-
config– -
model_name(str) –Return the model name being used.
-
provider_name(str) –Return the provider name.
-
use_cache(bool) –Return whether caching is enabled.
model_name
property
¶
Return the model name being used.
Returns:
-
str–Model identifier string.
_build_cache_key
¶
_build_cache_key(
messages: List[Dict[str, str]],
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
) -> str
Build a deterministic cache key for the request.
Creates a SHA-256 hash from the model, messages, temperature, and max_tokens. The hash is truncated to 16 hex characters (64 bits).
Parameters:
-
(messages¶List[Dict[str, str]]) –List of message dicts with "role" and "content" keys.
-
(temperature¶Optional[float], default:None) –Sampling temperature (defaults to config value).
-
(max_tokens¶Optional[int], default:None) –Maximum tokens (defaults to config value).
Returns:
-
str–16-character hex string cache key.
cached_completion
¶
cached_completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse
Make a completion request with caching.
If caching is enabled and a cached response exists, returns the cached response without making an API call. Otherwise, makes the API call and caches the result.
Parameters:
-
(messages¶List[Dict[str, str]]) –List of message dicts with "role" and "content" keys.
-
(**kwargs¶Any, default:{}) –Provider-specific options (temperature, max_tokens, etc.)
Returns:
-
LLMResponse–LLMResponse with the generated content and metadata.
complete_json
¶
complete_json(
messages: List[Dict[str, str]], **kwargs: Any
) -> tuple[Optional[Dict[str, Any]], LLMResponse]
Make a completion request and parse response as JSON.
Parameters:
-
(messages¶List[Dict[str, str]]) –List of message dicts with "role" and "content" keys.
-
(**kwargs¶Any, default:{}) –Override config options passed to completion().
Returns:
-
tuple[Optional[Dict[str, Any]], LLMResponse]–Tuple of (parsed JSON dict or None, raw LLMResponse).
completion
¶
completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse
Make a chat completion request to Groq.
Parameters:
-
(messages¶List[Dict[str, str]]) –List of message dicts with "role" and "content" keys.
-
(**kwargs¶Any, default:{}) –Override config options (temperature, max_tokens).
Returns:
-
LLMResponse–LLMResponse with the generated content and metadata.
Raises:
-
ValueError–If the API request fails.
is_available
¶
Check if Groq API is available.
Returns:
-
bool–True if GROQ_API_KEY is configured.
list_models
¶
List available models from Groq API.
Queries the Groq API to get models accessible with the current API key. Filters to only include text generation models.
Returns:
-
List[str]–List of model identifiers (e.g., ['llama-3.1-8b-instant', ...]).
Raises:
-
ValueError–If the API request fails.
Supported Models¶
Groq provides fast inference for open-source models:
| Model | Description | Free Tier |
|---|---|---|
llama-3.1-8b-instant |
Fast Llama 3.1 8B model | ✅ Yes |
llama-3.1-70b-versatile |
Larger Llama 3.1 model | ✅ Yes |
mixtral-8x7b-32768 |
Mixtral MoE model | ✅ Yes |
See Groq documentation for the full list of available models.