Baseten Chat Completions
baseten_chat_completionsCreate a chat completion using OpenAI-compatible API. **Supported Models:** - `deepseek-ai/DeepSeek-V3-0324` - DeepSeek V3 0324 (164k context) 🧠 - `deepseek-ai/DeepSeek-V3.1` - DeepSeek V3.1 (164k context) 🧠 - `zai-org/GLM-4.6` - GLM 4.6 (200k context) 🧠 - `zai-org/GLM-4.7` - GLM 4.7 (200k context) 🧠 - `moonshotai/Kimi-K2-Instruct-0905` - Kimi K2 0905 (128k context) - `moonshotai/Kimi-K2-Thinking` - Kimi K2 Thinking (262k context) 🧠 always-on - `moonshotai/Kimi-K2.5` - Kimi K2.5 (262k context) - `openai/gpt-oss-120b` - OpenAI GPT OSS 120B (128k context) 🧠 = Reasoning model. Use `reasoning_effort` param (low/medium/high) to control thinking depth. Response includes `reasoning_content` field with chain-of-thought. Supports streaming, tool calling, structured outputs.
When to Use
Use this tool when you need to create a chat completion using openai-compatible api.
supported models:
deepseek-ai/deepseek-v3-0324- deepseek v3 0324 (164k context) 🧠deepseek-ai/deepseek-v3.1- deepseek v3.1 (164k context) 🧠zai-org/glm-4.6- glm 4.6 (200k context) 🧠zai-org/glm-4.7- glm 4.7 (200k context) 🧠moonshotai/kimi-k2-instruct-0905- kimi k2 0905 (128k context)moonshotai/kimi-k2-thinking- kimi k2 thinking (262k context) 🧠 always-onmoonshotai/kimi-k2.5- kimi k2.5 (262k context)openai/gpt-oss-120b- openai gpt oss 120b (128k context)
🧠 = reasoning model. use reasoning_effort param (low/medium/high) to control thinking depth. response includes reasoning_content field with chain-of-thought.
supports streaming, tool calling, structured outputs.. This is part of the Baseten Model APIs provider on xpay✦.
MCP Connection
Connect to xpay✦ to access this tool (and 0+ others):
{
"mcpServers": {
"xpay": {
"url": "https://mcp.xpay.sh/mcp?key=YOUR_API_KEY"
}
}
}
For Claude Code:
claude mcp add --transport http xpay "https://mcp.xpay.sh/mcp?key=YOUR_API_KEY"
How to Execute
Use the xpay✦ meta-tools to run this tool:
xpay_details— Get full input schema:xpay_details("baseten/baseten_chat_completions")xpay_run— Execute:xpay_run("baseten/baseten_chat_completions", { ...inputs })
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
top_logprobs | number | No | Top logprobs to return (0-20) |
reasoning_effort | string | No | Reasoning depth for supported models (low/medium/high). Default: medium. Supported on: DeepSeek V3.1, DeepSeek V3 0324, GLM 4.7, GLM 4.6, Kimi K2 Thinking |
logit_bias | object | No | Token ID to bias map (-100 to 100) |
seed | number | No | Random seed |
bad | string | No | Words to avoid |
skip_special_tokens | boolean | No | Remove special tokens |
documents | string | No | Documents for RAG |
presence_penalty | number | No | Penalize by presence |
echo | boolean | No | Prepend last message to output |
top_p_min | number | No | Min dynamic top_p |
early_stopping | boolean | No | Stop when n candidates found |
tools | string | No | Functions model can call |
logprobs | boolean | No | Return log probabilities |
top_p | number | No | Nucleus sampling 0-1 |
frequency_penalty | number | No | Penalize tokens by frequency (default: 0) |
response_format | object | No | Response format type |
truncate_prompt_tokens | number | No | Truncate prompt to N tokens |
best_of | number | No | Candidates to generate (only 1) |
stream | boolean | No | Stream responses |
top_k | number | No | Top-K sampling |
disaggregated_params | object | No | Advanced distributed inference params |
temperature | number | No | Sampling temperature 0-4 |
tool_choice | string | No | Tool calling mode |
model | string | Yes | Model slug (e.g., deepseek-ai/DeepSeek-V3.1) |
ignore_eos | boolean | No | Continue past EOS token |
chat_template | string | No | Custom Jinja template |
max_tokens | number | No | Max tokens (default: 4096) |
add_generation_prompt | boolean | No | Add generation prompt from template |
n | number | No | Number of completions (only 1) |
min_tokens | number | No | Minimum tokens before stopping |
min_p | number | No | Min probability threshold |
spaces_between_special_tokens | boolean | No | Add spaces between special tokens |
chat_template_args | object | No | Chat template arguments |
stop | string | No | Stop sequences |
parallel_tool_calls | boolean | No | Parallel tool calls |
include_stop_str_in_output | boolean | No | Include stop string in output |
messages | string | Yes | Conversation messages with role and content |
bad_token_ids | string | No | Token IDs to avoid |
stream_options | object | No | Stream options |
user | string | No | End-user identifier |
repetition_penalty | number | No | Repetition penalty |
length_penalty | number | No | Length penalty for beam search |
stop_token_ids | string | No | Token IDs that stop generation |
add_special_tokens | boolean | No | Add special tokens like BOS |
Pricing
- Cost: $0.01/call
- Balance check: Use
xpay_balanceto check remaining credits - Get your API key at xpay.tools — $5 free credits included
Related Skills
- Baseten Model APIs (all tools) — 1 tools
Links
- Tool page: https://xpay.tools/baseten/baseten-chat-completions/
- Provider: https://xpay.tools/baseten/
- All tools: https://xpay.tools/explore
How to Execute
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
top_logprobs | number | No | Top logprobs to return (0-20) |
reasoning_effort | string | No | Reasoning depth for supported models (low/medium/high). Default: medium. Supported on: DeepSeek V3.1, DeepSeek V3 0324, GLM 4.7, GLM 4.6, Kimi K2 Thinking |
logit_bias | object | No | Token ID to bias map (-100 to 100) |
seed | number | No | Random seed |
bad | string | No | Words to avoid |
skip_special_tokens | boolean | No | Remove special tokens |
documents | string | No | Documents for RAG |
presence_penalty | number | No | Penalize by presence |
echo | boolean | No | Prepend last message to output |
top_p_min | number | No | Min dynamic top_p |
early_stopping | boolean | No | Stop when n candidates found |
tools | string | No | Functions model can call |
logprobs | boolean | No | Return log probabilities |
top_p | number | No | Nucleus sampling 0-1 |
frequency_penalty | number | No | Penalize tokens by frequency (default: 0) |
response_format | object | No | Response format type |
truncate_prompt_tokens | number | No | Truncate prompt to N tokens |
best_of | number | No | Candidates to generate (only 1) |
stream | boolean | No | Stream responses |
top_k | number | No | Top-K sampling |
disaggregated_params | object | No | Advanced distributed inference params |
temperature | number | No | Sampling temperature 0-4 |
tool_choice | string | No | Tool calling mode |
model | string | Yes | Model slug (e.g., deepseek-ai/DeepSeek-V3.1) |
ignore_eos | boolean | No | Continue past EOS token |
chat_template | string | No | Custom Jinja template |
max_tokens | number | No | Max tokens (default: 4096) |
add_generation_prompt | boolean | No | Add generation prompt from template |
n | number | No | Number of completions (only 1) |
min_tokens | number | No | Minimum tokens before stopping |
min_p | number | No | Min probability threshold |
spaces_between_special_tokens | boolean | No | Add spaces between special tokens |
chat_template_args | object | No | Chat template arguments |
stop | string | No | Stop sequences |
parallel_tool_calls | boolean | No | Parallel tool calls |
include_stop_str_in_output | boolean | No | Include stop string in output |
messages | string | Yes | Conversation messages with role and content |
bad_token_ids | string | No | Token IDs to avoid |
stream_options | object | No | Stream options |
user | string | No | End-user identifier |
repetition_penalty | number | No | Repetition penalty |
length_penalty | number | No | Length penalty for beam search |
stop_token_ids | string | No | Token IDs that stop generation |
add_special_tokens | boolean | No | Add special tokens like BOS |
Install Skill
Pricing
Cost
$0.01/call
Model
Flat rate
Provider
Baseten Model APIs

