Zai Chat Completion
zai_chat_completionCreate a chat completion model that generates AI replies for given conversation messages. It supports multimodal inputs (text, images, audio, video, file), offers configurable parameters (like temperature, max tokens, tool use), and supports both streaming and non-streaming output modes.
When to Use
Use this tool when you need to create a chat completion model that generates ai replies for given conversation messages. it supports multimodal inputs (text, images, audio, video, file), offers configurable parameters (like temperature, max tokens, tool use), and supports both streaming and non-streaming output modes.. This is part of the Z.ai API provider on xpay✦.
MCP Connection
Connect to xpay✦ to access this tool (and 9+ others):
{
"mcpServers": {
"xpay": {
"url": "https://mcp.xpay.sh/mcp?key=YOUR_API_KEY"
}
}
}
For Claude Code:
claude mcp add --transport http xpay "https://mcp.xpay.sh/mcp?key=YOUR_API_KEY"
How to Execute
Use the xpay✦ meta-tools to run this tool:
xpay_details— Get full input schema:xpay_details("zai/zai_chat_completion")xpay_run— Execute:xpay_run("zai/zai_chat_completion", { ...inputs })
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
max_tokens | integer | No | The maximum number of tokens for model output, the GLM-4.6 series supports 128K maximum output, the GLM-4.5 series supports 96K maximum output, the GLM-4.5v series supports 16K maximum output, GLM-4-32B-0414-128K supports 16K maximum output. |
do_sample | boolean | No | When do_sample is true, sampling strategy is enabled; when do_sample is false, sampling strategy parameters such as temperature and top_p will not take effect. |
thinking | object | No | Only supported by GLM-4.5 series and higher models. This parameter is used to control whether the model enable the chain of thought. |
tools | array | No | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported. |
tool_stream | boolean | No | Whether to enable streaming response for Function Calls. Default value is false. Only supported by GLM-4.6. Refer the Stream Tool Call |
top_p | number | No | Another method of temperature sampling, value range is: [0.01, 1.0]. The GLM-4.6, GLM-4.5 series default value is 0.95, GLM-4-32B-0414-128K default value is 0.9. |
response_format | object | No | Specifies the response format of the model. Defaults to text. Supports two formats:{ "type": "text" } plain text mode, returns natural language text, { "type": "json_object" } JSON mode, returns valid JSON data. When using JSON mode, it’s recommended to clearly request JSON output in the prompt. |
stop | array | No | Stop word list. Generation stops when the model encounters any specified string. Currently, only one stop word is supported, in the format ["stop_word1"]. |
stream | boolean | No | This parameter should be set to false or omitted when using synchronous call. It indicates that the model returns all content at once after generating all content. Default value is false. If set to true, the model will return the generated content in chunks via standard Event Stream. When the Event Stream ends, a data: [DONE] message will be returned. |
user_id | string | No | Unique ID for the end user, 6–128 characters. Avoid using sensitive information. |
temperature | number | No | Sampling temperature, controls the randomness of the output, must be a positive number within the range: [0.0, 1.0]. The GLM-4.6 series default value is 1.0, GLM-4.5 series default value is 0.6, GLM-4-32B-0414-128K default value is 0.75. |
messages | array | Yes | The current conversation message list as the model’s prompt input, provided in JSON array format, e.g.,{“role”: “user”, “content”: “Hello”}. Possible message types include system messages, user messages, assistant messages, and tool messages. Note: The input must not consist of system messages or assistant messages only. |
tool_choice | string | No | Controls how the model selects a tool. Used to control how the model selects which function to call. This is only applicable when the tool type is function. The default value is auto, and only auto is supported. |
model | string | Yes | The model code to be called. GLM-4.6 are the latest flagship model series, foundational models specifically designed for agent applications. |
request_id | string | No | Passed by the user side, needs to be unique; used to distinguish each request. If not provided by the user side, the platform will generate one by default. |
Pricing
- Cost: $0.02/call
- Balance check: Use
xpay_balanceto check remaining credits - Get your API key at xpay.tools — $5 free credits included
Related Skills
- Z.ai API (all tools) — 10 tools
- Zai Generate Image — $0.02/call
- Zai Web Reader — $0.02/call
- Zai File Upload — $0.02/call
- Zai Retrieve Result — $0.01/call
- Zai Conversation History — $0.02/call
Links
- Tool page: https://xpay.tools/zai/zai-chat-completion/
- Provider: https://xpay.tools/zai/
- All tools: https://xpay.tools/explore
How to Execute
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
max_tokens | integer | No | The maximum number of tokens for model output, the GLM-4.6 series supports 128K maximum output, the GLM-4.5 series supports 96K maximum output, the GLM-4.5v series supports 16K maximum output, GLM-4-32B-0414-128K supports 16K maximum output. |
do_sample | boolean | No | When do_sample is true, sampling strategy is enabled; when do_sample is false, sampling strategy parameters such as temperature and top_p will not take effect. |
thinking | object | No | Only supported by GLM-4.5 series and higher models. This parameter is used to control whether the model enable the chain of thought. |
tools | array | No | A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported. |
tool_stream | boolean | No | Whether to enable streaming response for Function Calls. Default value is false. Only supported by GLM-4.6. Refer the [Stream Tool Call](/guides/tools/stream-tool) |
top_p | number | No | Another method of temperature sampling, value range is: `[0.01, 1.0]`. The GLM-4.6, GLM-4.5 series default value is `0.95`, GLM-4-32B-0414-128K default value is `0.9`. |
response_format | object | No | Specifies the response format of the model. Defaults to text. Supports two formats:{ "type": "text" } plain text mode, returns natural language text, { "type": "json_object" } JSON mode, returns valid JSON data. When using JSON mode, it’s recommended to clearly request JSON output in the prompt. |
stop | array | No | Stop word list. Generation stops when the model encounters any specified string. Currently, only one stop word is supported, in the format ["stop_word1"]. |
stream | boolean | No | This parameter should be set to false or omitted when using synchronous call. It indicates that the model returns all content at once after generating all content. Default value is false. If set to true, the model will return the generated content in chunks via standard Event Stream. When the Event Stream ends, a `data: [DONE]` message will be returned. |
user_id | string | No | Unique ID for the end user, 6–128 characters. Avoid using sensitive information. |
temperature | number | No | Sampling temperature, controls the randomness of the output, must be a positive number within the range: `[0.0, 1.0]`. The GLM-4.6 series default value is `1.0`, GLM-4.5 series default value is `0.6`, GLM-4-32B-0414-128K default value is `0.75`. |
messages | array | Yes | The current conversation message list as the model’s prompt input, provided in JSON array format, e.g.,`{“role”: “user”, “content”: “Hello”}`. Possible message types include system messages, user messages, assistant messages, and tool messages. Note: The input must not consist of system messages or assistant messages only. |
tool_choice | string | No | Controls how the model selects a tool. Used to control how the model selects which function to call. This is only applicable when the tool type is function. The default value is auto, and only auto is supported. |
model | string | Yes | The model code to be called. GLM-4.6 are the latest flagship model series, foundational models specifically designed for agent applications. |
request_id | string | No | Passed by the user side, needs to be unique; used to distinguish each request. If not provided by the user side, the platform will generate one by default. |
Install Skill
Pricing
Cost
$0.02/call
Model
Flat rate
Provider
Z.ai API

