Baseten Chat Completions Skill — Baseten Model APIs Tool for AI Agents

Tool Skill

Baseten Chat Completions

baseten_chat_completions

Create a chat completion using OpenAI-compatible API. **Supported Models:** - `deepseek-ai/DeepSeek-V3-0324` - DeepSeek V3 0324 (164k context) 🧠 - `deepseek-ai/DeepSeek-V3.1` - DeepSeek V3.1 (164k context) 🧠 - `zai-org/GLM-4.6` - GLM 4.6 (200k context) 🧠 - `zai-org/GLM-4.7` - GLM 4.7 (200k context) 🧠 - `moonshotai/Kimi-K2-Instruct-0905` - Kimi K2 0905 (128k context) - `moonshotai/Kimi-K2-Thinking` - Kimi K2 Thinking (262k context) 🧠 always-on - `moonshotai/Kimi-K2.5` - Kimi K2.5 (262k context) - `openai/gpt-oss-120b` - OpenAI GPT OSS 120B (128k context) 🧠 = Reasoning model. Use `reasoning_effort` param (low/medium/high) to control thinking depth. Response includes `reasoning_content` field with chain-of-thought. Supports streaming, tool calling, structured outputs.

$0.01/call

Flat rate

Baseten Model APIs

Raw SKILL.md Tool Page

When to Use

Use this tool when you need to create a chat completion using openai-compatible api.

supported models:

deepseek-ai/deepseek-v3-0324 - deepseek v3 0324 (164k context) 🧠
deepseek-ai/deepseek-v3.1 - deepseek v3.1 (164k context) 🧠
zai-org/glm-4.6 - glm 4.6 (200k context) 🧠
zai-org/glm-4.7 - glm 4.7 (200k context) 🧠
moonshotai/kimi-k2-instruct-0905 - kimi k2 0905 (128k context)
moonshotai/kimi-k2-thinking - kimi k2 thinking (262k context) 🧠 always-on
moonshotai/kimi-k2.5 - kimi k2.5 (262k context)
openai/gpt-oss-120b - openai gpt oss 120b (128k context)

🧠 = reasoning model. use reasoning_effort param (low/medium/high) to control thinking depth. response includes reasoning_content field with chain-of-thought.

supports streaming, tool calling, structured outputs.. This is part of the Baseten Model APIs provider on xpay✦.

MCP Connection

Connect to xpay✦ to access this tool (and 0+ others):

{
  "mcpServers": {
    "xpay": {
      "url": "https://mcp.xpay.sh/mcp?key=YOUR_API_KEY"
    }
  }
}

For Claude Code:

claude mcp add --transport http xpay "https://mcp.xpay.sh/mcp?key=YOUR_API_KEY"

How to Execute

Use the xpay✦ meta-tools to run this tool:

xpay_details — Get full input schema: xpay_details("baseten/baseten_chat_completions")
xpay_run — Execute: xpay_run("baseten/baseten_chat_completions", { ...inputs })

Input Parameters

Parameter	Type	Required	Description
`top_logprobs`	number	No	Top logprobs to return (0-20)
`reasoning_effort`	string	No	Reasoning depth for supported models (low/medium/high). Default: medium. Supported on: DeepSeek V3.1, DeepSeek V3 0324, GLM 4.7, GLM 4.6, Kimi K2 Thinking
`logit_bias`	object	No	Token ID to bias map (-100 to 100)
`seed`	number	No	Random seed
`bad`	string	No	Words to avoid
`skip_special_tokens`	boolean	No	Remove special tokens
`documents`	string	No	Documents for RAG
`presence_penalty`	number	No	Penalize by presence
`echo`	boolean	No	Prepend last message to output
`top_p_min`	number	No	Min dynamic top_p
`early_stopping`	boolean	No	Stop when n candidates found
`tools`	string	No	Functions model can call
`logprobs`	boolean	No	Return log probabilities
`top_p`	number	No	Nucleus sampling 0-1
`frequency_penalty`	number	No	Penalize tokens by frequency (default: 0)
`response_format`	object	No	Response format type
`truncate_prompt_tokens`	number	No	Truncate prompt to N tokens
`best_of`	number	No	Candidates to generate (only 1)
`stream`	boolean	No	Stream responses
`top_k`	number	No	Top-K sampling
`disaggregated_params`	object	No	Advanced distributed inference params
`temperature`	number	No	Sampling temperature 0-4
`tool_choice`	string	No	Tool calling mode
`model`	string	Yes	Model slug (e.g., deepseek-ai/DeepSeek-V3.1)
`ignore_eos`	boolean	No	Continue past EOS token
`chat_template`	string	No	Custom Jinja template
`max_tokens`	number	No	Max tokens (default: 4096)
`add_generation_prompt`	boolean	No	Add generation prompt from template
`n`	number	No	Number of completions (only 1)
`min_tokens`	number	No	Minimum tokens before stopping
`min_p`	number	No	Min probability threshold
`spaces_between_special_tokens`	boolean	No	Add spaces between special tokens
`chat_template_args`	object	No	Chat template arguments
`stop`	string	No	Stop sequences
`parallel_tool_calls`	boolean	No	Parallel tool calls
`include_stop_str_in_output`	boolean	No	Include stop string in output
`messages`	string	Yes	Conversation messages with role and content
`bad_token_ids`	string	No	Token IDs to avoid
`stream_options`	object	No	Stream options
`user`	string	No	End-user identifier
`repetition_penalty`	number	No	Repetition penalty
`length_penalty`	number	No	Length penalty for beam search
`stop_token_ids`	string	No	Token IDs that stop generation
`add_special_tokens`	boolean	No	Add special tokens like BOS

Pricing

Cost: $0.01/call
Balance check: Use xpay_balance to check remaining credits
Get your API key at xpay.tools — $5 free credits included

Related Skills

Baseten Model APIs (all tools) — 1 tools

Links

Tool page: https://xpay.tools/baseten/baseten-chat-completions/
Provider: https://xpay.tools/baseten/
All tools: https://xpay.tools/explore

How to Execute

// 1. Get full schema

xpay_details("baseten/baseten_chat_completions")

// 2. Execute

xpay_run("baseten/baseten_chat_completions", { ...inputs })

Input Parameters

Parameter	Type	Required	Description
top_logprobs	number	No	Top logprobs to return (0-20)
reasoning_effort	string	No	Reasoning depth for supported models (low/medium/high). Default: medium. Supported on: DeepSeek V3.1, DeepSeek V3 0324, GLM 4.7, GLM 4.6, Kimi K2 Thinking
logit_bias	object	No	Token ID to bias map (-100 to 100)
seed	number	No	Random seed
bad	string	No	Words to avoid
skip_special_tokens	boolean	No	Remove special tokens
documents	string	No	Documents for RAG
presence_penalty	number	No	Penalize by presence
echo	boolean	No	Prepend last message to output
top_p_min	number	No	Min dynamic top_p
early_stopping	boolean	No	Stop when n candidates found
tools	string	No	Functions model can call
logprobs	boolean	No	Return log probabilities
top_p	number	No	Nucleus sampling 0-1
frequency_penalty	number	No	Penalize tokens by frequency (default: 0)
response_format	object	No	Response format type
truncate_prompt_tokens	number	No	Truncate prompt to N tokens
best_of	number	No	Candidates to generate (only 1)
stream	boolean	No	Stream responses
top_k	number	No	Top-K sampling
disaggregated_params	object	No	Advanced distributed inference params
temperature	number	No	Sampling temperature 0-4
tool_choice	string	No	Tool calling mode
model	string	Yes	Model slug (e.g., deepseek-ai/DeepSeek-V3.1)
ignore_eos	boolean	No	Continue past EOS token
chat_template	string	No	Custom Jinja template
max_tokens	number	No	Max tokens (default: 4096)
add_generation_prompt	boolean	No	Add generation prompt from template
n	number	No	Number of completions (only 1)
min_tokens	number	No	Minimum tokens before stopping
min_p	number	No	Min probability threshold
spaces_between_special_tokens	boolean	No	Add spaces between special tokens
chat_template_args	object	No	Chat template arguments
stop	string	No	Stop sequences
parallel_tool_calls	boolean	No	Parallel tool calls
include_stop_str_in_output	boolean	No	Include stop string in output
messages	string	Yes	Conversation messages with role and content
bad_token_ids	string	No	Token IDs to avoid
stream_options	object	No	Stream options
user	string	No	End-user identifier
repetition_penalty	number	No	Repetition penalty
length_penalty	number	No	Length penalty for beam search
stop_token_ids	string	No	Token IDs that stop generation
add_special_tokens	boolean	No	Add special tokens like BOS

Install Skill

Claude Code

claude /install-skill https://xpay.tools/skills/baseten/baseten-chat-completions/SKILL.md

CLI

npx @xpaysh/cli install baseten/baseten-chat-completions

Manual

curl -o SKILL.md https://xpay.tools/skills/baseten/baseten-chat-completions/SKILL.md

Pricing

Cost

$0.01/call

Model

Flat rate

Provider

Baseten Model APIs

When to Use

Use this tool when you need to create a chat completion using openai-compatible api.

supported models:

deepseek-ai/deepseek-v3-0324 - deepseek v3 0324 (164k context) 🧠

deepseek-ai/deepseek-v3.1 - deepseek v3.1 (164k context) 🧠

zai-org/glm-4.6 - glm 4.6 (200k context) 🧠

zai-org/glm-4.7 - glm 4.7 (200k context) 🧠

moonshotai/kimi-k2-instruct-0905 - kimi k2 0905 (128k context)

moonshotai/kimi-k2-thinking - kimi k2 thinking (262k context) 🧠 always-on

moonshotai/kimi-k2.5 - kimi k2.5 (262k context)

openai/gpt-oss-120b - openai gpt oss 120b (128k context)

🧠 = reasoning model. use reasoning_effort param (low/medium/high) to control thinking depth. response includes reasoning_content field with chain-of-thought.

supports streaming, tool calling, structured outputs.. This is part of the Baseten Model APIs provider on xpay✦.

Input Parameters

Parameter

Type

Required

Description

top_logprobs

number

Top logprobs to return (0-20)

reasoning_effort

string

Reasoning depth for supported models (low/medium/high). Default: medium. Supported on: DeepSeek V3.1, DeepSeek V3 0324, GLM 4.7, GLM 4.6, Kimi K2 Thinking

logit_bias

object

Token ID to bias map (-100 to 100)

seed

number

Random seed

bad

string

Words to avoid

skip_special_tokens

boolean

Remove special tokens

documents

string

Documents for RAG

presence_penalty

number

Penalize by presence

echo

boolean

Prepend last message to output

top_p_min

number

Min dynamic top_p

early_stopping

boolean

Stop when n candidates found

tools

string

Functions model can call

logprobs

boolean

Return log probabilities

top_p

number

Nucleus sampling 0-1

frequency_penalty

number

Penalize tokens by frequency (default: 0)

response_format

object

Response format type

truncate_prompt_tokens

number

Truncate prompt to N tokens

best_of

number

Candidates to generate (only 1)

stream

boolean

Stream responses

top_k

number

Top-K sampling

disaggregated_params

object

Advanced distributed inference params

temperature

number

Sampling temperature 0-4

tool_choice

string

Tool calling mode

model

string

Yes

Model slug (e.g., deepseek-ai/DeepSeek-V3.1)

ignore_eos

boolean

Continue past EOS token

chat_template

string

Custom Jinja template

max_tokens

number

Max tokens (default: 4096)

add_generation_prompt

boolean

Add generation prompt from template

n

number

Number of completions (only 1)

min_tokens

number

Minimum tokens before stopping

min_p

number

Min probability threshold

spaces_between_special_tokens

boolean

Add spaces between special tokens

chat_template_args

object

Chat template arguments

stop

string

Stop sequences

parallel_tool_calls

boolean

Parallel tool calls

include_stop_str_in_output

boolean

Include stop string in output

messages

string

Yes

Conversation messages with role and content

bad_token_ids

string

Token IDs to avoid

stream_options

object

Stream options

user

string

End-user identifier

repetition_penalty

number

Repetition penalty

length_penalty

number

Length penalty for beam search

stop_token_ids

string

Token IDs that stop generation

add_special_tokens

boolean

Add special tokens like BOS

Parameter

Type

Required

Description

top_logprobs

number

Top logprobs to return (0-20)

reasoning_effort

string

Reasoning depth for supported models (low/medium/high). Default: medium. Supported on: DeepSeek V3.1, DeepSeek V3 0324, GLM 4.7, GLM 4.6, Kimi K2 Thinking

logit_bias

object

Token ID to bias map (-100 to 100)

seed

number

Random seed

bad

string

Words to avoid

skip_special_tokens

boolean

Remove special tokens

documents

string

Documents for RAG

presence_penalty

number

Penalize by presence

echo

boolean

Prepend last message to output

top_p_min

number

Min dynamic top_p

early_stopping

boolean

Stop when n candidates found

tools

string

Functions model can call

logprobs

boolean

Return log probabilities

top_p

number

Nucleus sampling 0-1

frequency_penalty

number

Penalize tokens by frequency (default: 0)

response_format

object

Response format type

truncate_prompt_tokens

number

Truncate prompt to N tokens

best_of

number

Candidates to generate (only 1)

stream

boolean

Stream responses

top_k

number

Top-K sampling

disaggregated_params

object

Advanced distributed inference params

temperature

number

Sampling temperature 0-4

tool_choice

string

Tool calling mode

model

string

Yes

Model slug (e.g., deepseek-ai/DeepSeek-V3.1)

ignore_eos

boolean

Continue past EOS token

chat_template

string

Custom Jinja template

max_tokens

number

Max tokens (default: 4096)

add_generation_prompt

boolean

Add generation prompt from template

number

Number of completions (only 1)

min_tokens

number

Minimum tokens before stopping

min_p

number

Min probability threshold

spaces_between_special_tokens

boolean

Add spaces between special tokens

chat_template_args

object

Chat template arguments

stop

string

Stop sequences

parallel_tool_calls

boolean

Parallel tool calls

include_stop_str_in_output

boolean

Include stop string in output

messages

string

Yes

Conversation messages with role and content

bad_token_ids

string

Token IDs to avoid

stream_options

object

Stream options

user

string

End-user identifier

repetition_penalty

number

Repetition penalty

length_penalty

number

Length penalty for beam search

stop_token_ids

string

Token IDs that stop generation

add_special_tokens

boolean

Add special tokens like BOS