Weights & Biases Inference
https://weave-docs.wandb.ai/quickstart-inference
Litellm provides support to all models from W&B Inference service. To use a model, set model=wandb/<any-model-on-wandb-inference-dashboard> as a prefix for litellm requests. The full list of supported models is provided at https://docs.wandb.ai/guides/inference/models/
API Keyโ
You can get an API key for W&B Inference at - https://wandb.ai/authorize
import os
# env variable
os.environ['WANDB_API_KEY']
Sample Usage: Text Generationโ
from litellm import completion
import os
os.environ['WANDB_API_KEY'] = "insert-your-wandb-api-key"
response = completion(
    model="wandb/Qwen/Qwen3-235B-A22B-Instruct-2507",
    messages=[
        {
            "role": "user",
            "content": "What character was Wall-e in love with?",
        }
    ],
    max_tokens=10,
    response_format={ "type": "json_object" },
    seed=123,
    temperature=0.6,  # either set temperature or `top_p`
    top_p=0.01,  # to get as deterministic results as possible
)
print(response)
Sample Usage - Streamingโ
from litellm import completion
import os
os.environ['WANDB_API_KEY'] = ""
response = completion(
    model="wandb/Qwen/Qwen3-235B-A22B-Instruct-2507",
    messages=[
        {
            "role": "user",
            "content": "What character was Wall-e in love with?",
        }
    ],
    stream=True,
    max_tokens=10,
    response_format={ "type": "json_object" },
    seed=123,
    temperature=0.6,  # either set temperature or `top_p`
    top_p=0.01,  # to get as deterministic results as possible
)
for chunk in response:
    print(chunk)
The above examples may not work if the model has been taken offline. Check the full list of available models at https://docs.wandb.ai/guides/inference/models/.
Usage with LiteLLM Proxy Serverโ
Here's how to call a W&B Inference model with the LiteLLM Proxy Server
- Modify the config.yaml
model_list:
  - model_name: my-model
    litellm_params:
      model: wandb/<your-model-name>  # add wandb/ prefix to use W&B Inference as provider
      api_key: api-key                 # api key to send your model
- Start the proxy
$ litellm --config /path/to/config.yaml
- Send Request to LiteLLM Proxy Server
- OpenAI Python v1.0.0+
- curl
import openai
client = openai.OpenAI(
    api_key="litellm-proxy-key",             # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)
response = client.chat.completions.create(
    model="my-model",
    messages = [
        {
            "role": "user",
            "content": "What character was Wall-e in love with?"
        }
    ],
)
print(response)
curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Authorization: litellm-proxy-key' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "my-model",
    "messages": [
        {
        "role": "user",
        "content": "What character was Wall-e in love with?"
        }
    ],
}'
Supported Parametersโ
The W&B Inference provider supports the following parameters:
Chat Completion Parametersโ
| Parameter | Type | Description | 
|---|---|---|
| frequency_penalty | number | Penalizes new tokens based on their frequency in the text | 
| function_call | string/object | Controls how the model calls functions | 
| functions | array | List of functions for which the model may generate JSON inputs | 
| logit_bias | map | Modifies the likelihood of specified tokens | 
| max_tokens | integer | Maximum number of tokens to generate | 
| n | integer | Number of completions to generate | 
| presence_penalty | number | Penalizes tokens based on if they appear in the text so far | 
| response_format | object | Format of the response, e.g., {"type": "json"} | 
| seed | integer | Sampling seed for deterministic results | 
| stop | string/array | Sequences where the API will stop generating tokens | 
| stream | boolean | Whether to stream the response | 
| temperature | number | Controls randomness (0-2) | 
| top_p | number | Controls nucleus sampling | 
Error Handlingโ
The integration uses the standard LiteLLM error handling. Further, here's a list of commonly encountered errors with the W&B Inference API -
| Error Code | Message | Cause | Solution | 
|---|---|---|---|
| 401 | Authentication failed | Your authentication credentials are incorrect or your W&B project entity and/or name are incorrect. | Ensure you're using the correct API key and that your W&B project name and entity are correct. | 
| 403 | Country, region, or territory not supported | Accessing the API from an unsupported location. | Please see Geographic restrictions | 
| 429 | Concurrency limit reached for requests | Too many concurrent requests. | Reduce the number of concurrent requests or increase your limits. For more information, see Usage information and limits. | 
| 429 | You exceeded your current quota, please check your plan and billing details | Out of credits or reached monthly spending cap. | Get more credits or increase your limits. For more information, see Usage information and limits. | 
| 429 | W&B Inference isn't available for personal accounts. | Switch to a non-personal account. | Follow the instructions below for a work around. | 
| 500 | The server had an error while processing your request | Internal server error. | Retry after a brief wait and contact support if it persists. | 
| 503 | The engine is currently overloaded, please try again later | Server is experiencing high traffic. | Retry your request after a short delay. | 
Error 429: Personal entities unsupportedโ
The user is on a personal account, which doesn't have access to W&B Inference. If one isn't available, create a Team to create a non-personal account.
Once done, add the openai-project header to your request as shown below:
response = completion(
    model="...",
    extra_headers={"openai-project": "team_name/project_name"},
    ...
For more information, see Personal entities unsupported.
You can find more ways of using custom headers with LiteLLM here - https://docs.litellm.ai/docs/proxy/request_headers.