CausalIQ Knowledge User Guide¶
What is CausalIQ Knowledge?¶
CausalIQ Knowledge is a Python package that provides knowledge services for causal discovery workflows. It enables you to query Large Language Models (LLMs) about potential causal relationships between variables, helping to resolve uncertainty in learned causal graphs.
Primary Use Case¶
When averaging multiple causal graphs learned from data subsamples, some edges may be uncertain - appearing in some graphs but not others, or with inconsistent directions. CausalIQ Knowledge helps resolve this uncertainty by querying LLMs about whether:
- A causal relationship exists between two variables
- What the direction of causation is (A→B or B→A)
Quick Start¶
Installation¶
Basic Usage¶
from causaliq_knowledge.llm import LLMKnowledge
# Initialize with Groq (default, free tier)
knowledge = LLMKnowledge(models=["groq/llama-3.1-8b-instant"])
# Query about a potential edge
result = knowledge.query_edge(
node_a="smoking",
node_b="lung_cancer",
context={"domain": "epidemiology"}
)
print(f"Exists: {result.exists}")
print(f"Direction: {result.direction}")
print(f"Confidence: {result.confidence}")
print(f"Reasoning: {result.reasoning}")
Using Gemini (Free Tier)¶
Multi-Model Consensus¶
# Query multiple models for more robust answers
knowledge = LLMKnowledge(
models=["groq/llama-3.1-8b-instant", "gemini/gemini-2.5-flash"],
consensus_strategy="weighted_vote"
)
LLM Provider Setup¶
CausalIQ Knowledge uses direct vendor-specific API clients (not wrapper libraries) to communicate with LLM providers. This approach provides reliability and minimal dependencies. Currently supported:
- Groq: Free tier with fast inference
- Google Gemini: Generous free tier
- OpenAI: GPT-4o and other models
- Anthropic: Claude models
- DeepSeek: DeepSeek-V3 and R1 models
- Mistral: Mistral AI models
- Ollama: Local LLMs (free, runs locally)
Free Options¶
Groq (Free Tier - Very Fast)¶
Groq offers a generous free tier with extremely fast inference:
- Sign up at console.groq.com
- Create an API key
- Set the environment variable (see Storing API Keys)
- Use in code:
from causaliq_knowledge.llm import LLMKnowledge
knowledge = LLMKnowledge(models=["groq/llama-3.1-8b-instant"])
result = knowledge.query_edge("smoking", "lung_cancer")
Available Groq models: groq/llama-3.1-8b-instant, groq/llama-3.1-70b-versatile, groq/mixtral-8x7b-32768
Google Gemini (Free Tier)¶
Google offers free access to Gemini models:
- Sign up at aistudio.google.com/apikey
- Create an API key
- Set
GEMINI_API_KEYenvironment variable - Use in code:
OpenAI¶
OpenAI provides GPT-4o and other models:
- Sign up at platform.openai.com
- Create an API key
- Set
OPENAI_API_KEYenvironment variable
Anthropic¶
Anthropic provides Claude models:
- Sign up at console.anthropic.com
- Create an API key
- Set
ANTHROPIC_API_KEYenvironment variable
DeepSeek¶
DeepSeek offers high-quality models at competitive prices:
- Sign up at platform.deepseek.com
- Create an API key
- Set
DEEPSEEK_API_KEYenvironment variable
Mistral¶
Mistral AI provides models with EU data sovereignty:
- Sign up at console.mistral.ai
- Create an API key
- Set
MISTRAL_API_KEYenvironment variable
Ollama (Local)¶
Run models locally with Ollama (no API key needed):
- Install Ollama from ollama.ai
- Pull a model:
ollama pull llama3 - Use in code:
Storing API Keys¶
Option 1: User Environment Variables (Recommended)¶
Set permanently for your user account on Windows:
[Environment]::SetEnvironmentVariable("GROQ_API_KEY", "your-key", "User")
[Environment]::SetEnvironmentVariable("GEMINI_API_KEY", "your-key", "User")
On Linux/macOS, add to your ~/.bashrc or ~/.zshrc:
Restart your terminal for changes to take effect.
Option 2: Project .env File¶
Create a .env file in your project root:
Important: Add .env to your .gitignore so keys aren't committed to version control!
Option 3: Password Manager¶
Store API keys in a password manager (LastPass, 1Password, Bitwarden, etc.) as a Secure Note. This provides:
- Encrypted backup of your keys
- Access from any machine
- Secure sharing with team members if needed
Copy keys from your password manager when setting environment variables.
Verifying Your Setup¶
Test your configuration with the CLI:
# Using the installed CLI
cqknow query smoking lung_cancer --model groq/llama-3.1-8b-instant
# Or with Python module
python -m causaliq_knowledge.cli query smoking lung_cancer --model ollama/llama3
Cache Management¶
CausalIQ Knowledge includes a caching system that stores LLM responses to avoid redundant API calls. You can inspect, export, and import your cache using the CLI:
# View cache statistics
cqknow cache stats ./llm_cache.db
# Export cache entries to human-readable JSON files
cqknow cache export ./llm_cache.db ./export_dir
# Export to a zip archive for sharing
cqknow cache export ./llm_cache.db ./export.zip
# Import cache entries (auto-detects entry types)
cqknow cache import ./new_cache.db ./export.zip
Exported files use human-readable names like gpt4_smoking_lung_edge_a1b2.json.
The cache stores:
- Entry count: Number of cached LLM responses
- Token count: Total tokens across all cached entries
For programmatic cache usage, see Caching Architecture.
What's Next?¶
- Architecture Overview - Understand how the package works
- LLM Integration Design - Detailed design documentation
- API Reference - Full API documentation