Ollama Client API Reference¶

Local Ollama API client for running Llama and other open-source models locally. This client implements the BaseLLMClient interface using httpx to communicate with a locally running Ollama server.

Overview¶

The Ollama client provides:

Local LLM inference without API keys or internet access
Implements the BaseLLMClient abstract interface
Support for Llama 3.2, Llama 3.1, Mistral, and other models
JSON response parsing with error handling
Call counting for usage tracking
Availability checking via is_available() method

Prerequisites¶

Install Ollama from ollama.com/download

Pull a model:

ollama pull llama3.2:1b    # Small, fast (~1.3GB)
ollama pull llama3.2       # Medium (~2GB)
ollama pull llama3.1:8b    # Larger, better quality (~4.7GB)

Ensure Ollama is running (it usually auto-starts after installation)

Usage¶

from causaliq_knowledge.llm import OllamaClient, OllamaConfig

# Create client with default config (llama3.2:1b on localhost:11434)
client = OllamaClient()

# Or with custom config
config = OllamaConfig(
    model="llama3.1:8b",
    temperature=0.1,
    max_tokens=500,
    timeout=120.0,  # Local inference can be slow
)
client = OllamaClient(config=config)

# Check if Ollama is available
if client.is_available():
    # Make a completion request
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2+2?"},
    ]
    response = client.completion(messages)
    print(response.content)
else:
    print("Ollama not running or model not installed")

Using with LLMKnowledge Provider¶

from causaliq_knowledge.llm import LLMKnowledge

# Use local Ollama for causal queries
provider = LLMKnowledge(models=["ollama/llama3.2:1b"])
result = provider.query_edge("smoking", "lung_cancer")
print(f"Exists: {result.exists}, Confidence: {result.confidence}")

# Mix local and cloud models for consensus
provider = LLMKnowledge(
    models=[
        "ollama/llama3.2:1b",
        "groq/llama-3.1-8b-instant",
    ],
    consensus_strategy="weighted_vote"
)

OllamaConfig¶

OllamaConfig `dataclass` ¶

OllamaConfig(
    model: str = "llama3.2:1b",
    temperature: float = 0.1,
    max_tokens: int = 500,
    timeout: float = 120.0,
    api_key: Optional[str] = None,
    base_url: str = "http://localhost:11434",
)

Configuration for Ollama API client.

Extends LLMConfig with Ollama-specific defaults.

Attributes:

model (str) –

Ollama model identifier (default: llama3.2:1b).
temperature (float) –

Sampling temperature (default: 0.1).
max_tokens (int) –

Maximum response tokens (default: 500).
timeout (float) –

Request timeout in seconds (default: 120.0, local).
api_key (Optional[str]) –

Not used for Ollama (local server).
base_url (str) –

Ollama server URL (default: http://localhost:11434).

api_key `class-attribute` `instance-attribute` ¶

api_key: Optional[str] = None

base_url `class-attribute` `instance-attribute` ¶

base_url: str = 'http://localhost:11434'

max_tokens `class-attribute` `instance-attribute` ¶

max_tokens: int = 500

model `class-attribute` `instance-attribute` ¶

model: str = 'llama3.2:1b'

temperature `class-attribute` `instance-attribute` ¶

temperature: float = 0.1

timeout `class-attribute` `instance-attribute` ¶

timeout: float = 120.0

OllamaClient¶

OllamaClient ¶

OllamaClient(config: Optional[OllamaConfig] = None)

Local Ollama API client.

Implements the BaseLLMClient interface for locally running Ollama server. Uses httpx for HTTP requests to the local Ollama API.

Ollama provides an OpenAI-compatible API for running open-source models like Llama locally without requiring API keys or internet access.

Example

config = OllamaConfig(model="llama3.2:1b") client = OllamaClient(config) msgs = [{"role": "user", "content": "Hello"}] response = client.completion(msgs) print(response.content)

Parameters:

config ¶
(Optional[OllamaConfig], default: None ) –

Ollama configuration. If None, uses defaults connecting to localhost:11434 with llama3.2:1b model.

Methods:

_build_cache_key –

Build a deterministic cache key for the request.
cached_completion –

Make a completion request with caching.
complete_json –

Make a completion request and parse response as JSON.
completion –

Make a chat completion request to Ollama.
is_available –

Check if Ollama server is running and model is available.
list_models –

List installed models from Ollama.
set_cache –

Configure caching for this client.

Attributes:

_total_calls –
cache (Optional['TokenCache']) –

Return the configured cache, if any.
call_count (int) –

Return the number of API calls made.
config –
model_name (str) –

Return the model name being used.
provider_name (str) –

Return the provider name.
use_cache (bool) –

Return whether caching is enabled.

_total_calls `instance-attribute` ¶

_total_calls = 0

cache `property` ¶

cache: Optional['TokenCache']

Return the configured cache, if any.

call_count `property` ¶

call_count: int

Return the number of API calls made.

config `instance-attribute` ¶

config = config or OllamaConfig()

model_name `property` ¶

model_name: str

Return the model name being used.

Returns:

str –

Model identifier string.

provider_name `property` ¶

provider_name: str

Return the provider name.

use_cache `property` ¶

use_cache: bool

Return whether caching is enabled.

_build_cache_key ¶

_build_cache_key(
    messages: List[Dict[str, str]],
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
) -> str

Build a deterministic cache key for the request.

Creates a SHA-256 hash from the model, messages, temperature, and max_tokens. The hash is truncated to 16 hex characters (64 bits).

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
temperature ¶
(Optional[float], default: None ) –

Sampling temperature (defaults to config value).
max_tokens ¶
(Optional[int], default: None ) –

Maximum tokens (defaults to config value).

Returns:

str –

16-character hex string cache key.

cached_completion ¶

cached_completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a completion request with caching.

If caching is enabled and a cached response exists, returns the cached response without making an API call. Otherwise, makes the API call and caches the result.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
**kwargs ¶
(Any, default: {} ) –

Provider-specific options (temperature, max_tokens, etc.)

Returns:

LLMResponse –

LLMResponse with the generated content and metadata.

complete_json ¶

complete_json(
    messages: List[Dict[str, str]], **kwargs: Any
) -> tuple[Optional[Dict[str, Any]], LLMResponse]

Make a completion request and parse response as JSON.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
**kwargs ¶
(Any, default: {} ) –

Override config options passed to completion().

Returns:

tuple[Optional[Dict[str, Any]], LLMResponse] –

Tuple of (parsed JSON dict or None, raw LLMResponse).

completion ¶

completion(messages: List[Dict[str, str]], **kwargs: Any) -> LLMResponse

Make a chat completion request to Ollama.

Parameters:

messages ¶
(List[Dict[str, str]]) –

List of message dicts with "role" and "content" keys.
**kwargs ¶
(Any, default: {} ) –

Override config options (temperature, max_tokens).

Returns:

LLMResponse –

LLMResponse with the generated content and metadata.

Raises:

ValueError –

If the API request fails or Ollama is not running.

is_available ¶

is_available() -> bool

Check if Ollama server is running and model is available.

Returns:

bool –

True if Ollama is running and the configured model exists.

list_models ¶

list_models() -> List[str]

List installed models from Ollama.

Queries the local Ollama server to get installed models. Unlike cloud providers, this returns only models the user has explicitly pulled/installed.

Returns:

List[str] –

List of model identifiers (e.g., ['llama3.2:1b', ...]).

Raises:

ValueError –

If Ollama server is not running.

set_cache ¶

set_cache(cache: Optional['TokenCache'], use_cache: bool = True) -> None

Configure caching for this client.

Parameters:

cache ¶
(Optional['TokenCache']) –

TokenCache instance for caching, or None to disable.
use_cache ¶
(bool, default: True ) –

Whether to use the cache (default True).

Supported Models¶

Ollama supports many open-source models. Recommended for causal queries:

Model	Size	RAM Needed	Quality
`llama3.2:1b`	~1.3GB	4GB+	Good for simple queries
`llama3.2`	~2GB	6GB+	Better reasoning
`llama3.1:8b`	~4.7GB	10GB+	Best quality
`mistral`	~4GB	8GB+	Good alternative

See Ollama Library for all available models.

Troubleshooting¶

"Could not connect to Ollama"

Ensure Ollama is installed and running
Run ollama serve in a terminal, or start the Ollama app
Check that nothing else is using port 11434

"Model not found"

Run ollama pull <model-name> to download the model
Run ollama list to see installed models

Slow responses

Local inference is CPU/GPU bound
Use smaller models like llama3.2:1b
Increase the timeout in OllamaConfig
Consider using GPU acceleration if available

Ollama Client API Reference¶

Overview¶

Prerequisites¶

Usage¶

Using with LLMKnowledge Provider¶

OllamaConfig¶

OllamaConfig dataclass ¶

api_key class-attribute instance-attribute ¶

base_url class-attribute instance-attribute ¶

max_tokens class-attribute instance-attribute ¶

model class-attribute instance-attribute ¶

temperature class-attribute instance-attribute ¶

timeout class-attribute instance-attribute ¶

OllamaClient¶

OllamaClient ¶

config ¶

_total_calls instance-attribute ¶

cache property ¶

call_count property ¶

config instance-attribute ¶

model_name property ¶

provider_name property ¶

use_cache property ¶

_build_cache_key ¶

messages ¶

temperature ¶

max_tokens ¶

cached_completion ¶

messages ¶

**kwargs ¶

complete_json ¶

messages ¶

**kwargs ¶

completion ¶

messages ¶

**kwargs ¶

is_available ¶

list_models ¶

set_cache ¶

cache ¶

use_cache ¶

Supported Models¶

Troubleshooting¶

OllamaConfig `dataclass` ¶

api_key `class-attribute` `instance-attribute` ¶

base_url `class-attribute` `instance-attribute` ¶

max_tokens `class-attribute` `instance-attribute` ¶

model `class-attribute` `instance-attribute` ¶

temperature `class-attribute` `instance-attribute` ¶

timeout `class-attribute` `instance-attribute` ¶

`config` ¶

_total_calls `instance-attribute` ¶

cache `property` ¶

call_count `property` ¶

config `instance-attribute` ¶

model_name `property` ¶

provider_name `property` ¶

use_cache `property` ¶

`messages` ¶

`temperature` ¶

`max_tokens` ¶

`messages` ¶

`kwargs`** ¶

`messages` ¶

`kwargs`** ¶

`messages` ¶

`kwargs`** ¶

`cache` ¶

`use_cache` ¶