Quick Start
Get Cachecore running in under five minutes. By the end, your OpenAI calls will route through the caching gateway with zero application code changes.
1. Create an account
Go to app.cachecore.it and sign in with Google. Your account and a default project are created automatically.
2. Upload your OpenAI API key
In your project dashboard, paste your OpenAI key (sk-...). Cachecore validates it with OpenAI, then stores it in AWS Secrets Manager. The key is never logged or displayed again.
3. Copy your Cachecore token
Your project page shows a Cachecore API token (cc_live_...). This JWT authenticates your requests to the gateway and encodes your tenant namespace.
4. Swap one line
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.cachecore.it/v1",
api_key="cc_live_xxxxx.eyJ..." # your Cachecore token
)
response = client.chat.completions.create(
model="gpt-5.4-mini",
messages=[{"role": "user", "content": "What is semantic caching?"}]
)
print(response.choices[0].message.content)
Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.cachecore.it/v1",
apiKey: "cc_live_xxxxx.eyJ...",
});
const response = await client.chat.completions.create({
model: "gpt-5.4-mini",
messages: [{ role: "user", content: "What is semantic caching?" }],
});
cURL
curl -X POST https://api.cachecore.it/v1/chat/completions \
-H "Authorization: Bearer cc_live_xxxxx.eyJ..." \
-H "Content-Type: application/json" \
-d '{"model":"gpt-5.4-mini","messages":[{"role":"user","content":"What is semantic caching?"}]}'
5. Verify caching
Send the same request twice. Check the response headers:
| Header | Values | Description |
|--------|--------|-------------|
| X-Cache | HIT_L1, HIT_L2, MISS | Cache result |
| X-Cache-Similarity | 0.00–1.00 | Cosine similarity (L2 hits) |
| X-Cache-Age | integer (seconds) | Age of the cached entry |
The first request returns MISS and routes to OpenAI. The second returns HIT_L1 in ~5ms.
6. Check your dashboard
Open app.cachecore.it to see hit rate, tokens saved, and latency improvements.
Next steps
- How It Works: understand the request pipeline
- Python Client: add tenant isolation and dependency invalidation
- Response Headers: full header reference