Quick Start

Get Cachecore running in under five minutes. By the end, your OpenAI calls will route through the caching gateway with zero application code changes.

1. Create an account

Go to app.cachecore.it and sign in with Google. Your account and a default project are created automatically.

2. Upload your OpenAI API key

In your project dashboard, paste your OpenAI key (sk-...). Cachecore validates it with OpenAI, then stores it in AWS Secrets Manager. The key is never logged or displayed again.

3. Copy your Cachecore token

Your project page shows a Cachecore API token (cc_live_...). This JWT authenticates your requests to the gateway and encodes your tenant namespace.

4. Swap one line

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.cachecore.it/v1",
    api_key="cc_live_xxxxx.eyJ..."  # your Cachecore token
)

response = client.chat.completions.create(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "What is semantic caching?"}]
)
print(response.choices[0].message.content)

Node.js (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.cachecore.it/v1",
  apiKey: "cc_live_xxxxx.eyJ...",
});

const response = await client.chat.completions.create({
  model: "gpt-5.4-mini",
  messages: [{ role: "user", content: "What is semantic caching?" }],
});

cURL

curl -X POST https://api.cachecore.it/v1/chat/completions \
  -H "Authorization: Bearer cc_live_xxxxx.eyJ..." \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-5.4-mini","messages":[{"role":"user","content":"What is semantic caching?"}]}'

5. Verify caching

Send the same request twice. Check the response headers:

Header	Values	Description
`X-Cache`	`HIT_L1`, `HIT_L2`, `MISS`	Cache result
`X-Cache-Similarity`	`0.00`–`1.00`	Cosine similarity (L2 hits)
`X-Cache-Age`	integer (seconds)	Age of the cached entry

The first request returns MISS and routes to OpenAI. The second returns HIT_L1 in ~5ms.

6. Check your dashboard

Open app.cachecore.it to see hit rate, tokens saved, and latency improvements.

Next steps

How It Works: understand the request pipeline
Python Client: add tenant isolation and dependency invalidation
Response Headers: full header reference