Zai Chat Completion SKILL.md — Install for Claude Code, Cursor & Cline | Z.ai API

Skill

Zai Chat Completion

zai_chat_completion

Create a chat completion model that generates AI replies for given conversation messages. It supports multimodal inputs (text, images, audio, video, file), offers configurable parameters (like temperature, max tokens, tool use), and supports both streaming and non-streaming output modes.

$0.02/call

Flat rate

Z.ai API

Raw SKILL.md Run This Tool

When to Use

Use this tool when you need to create a chat completion model that generates ai replies for given conversation messages. it supports multimodal inputs (text, images, audio, video, file), offers configurable parameters (like temperature, max tokens, tool use), and supports both streaming and non-streaming output modes.. This is part of the Z.ai API provider on xpay✦.

MCP Connection

Connect to xpay✦ to access this tool (and 9+ others):

{
  "mcpServers": {
    "xpay": {
      "url": "https://mcp.xpay.sh/mcp?key=YOUR_API_KEY"
    }
  }
}

For Claude Code:

claude mcp add --transport http xpay "https://mcp.xpay.sh/mcp?key=YOUR_API_KEY"

How to Execute

Use the xpay✦ meta-tools to run this tool:

xpay_details — Get full input schema: xpay_details("zai/zai_chat_completion")
xpay_run — Execute: xpay_run("zai/zai_chat_completion", { ...inputs })

Input Parameters

Parameter	Type	Required	Description
`max_tokens`	integer	No	The maximum number of tokens for model output, the GLM-4.6 series supports 128K maximum output, the GLM-4.5 series supports 96K maximum output, the GLM-4.5v series supports 16K maximum output, GLM-4-32B-0414-128K supports 16K maximum output.
`do_sample`	boolean	No	When do_sample is true, sampling strategy is enabled; when do_sample is false, sampling strategy parameters such as temperature and top_p will not take effect.
`thinking`	object	No	Only supported by GLM-4.5 series and higher models. This parameter is used to control whether the model enable the chain of thought.
`tools`	array	No	A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.
`tool_stream`	boolean	No	Whether to enable streaming response for Function Calls. Default value is false. Only supported by GLM-4.6. Refer the Stream Tool Call
`top_p`	number	No	Another method of temperature sampling, value range is: `[0.01, 1.0]`. The GLM-4.6, GLM-4.5 series default value is `0.95`, GLM-4-32B-0414-128K default value is `0.9`.
`response_format`	object	No	Specifies the response format of the model. Defaults to text. Supports two formats:{ "type": "text" } plain text mode, returns natural language text, { "type": "json_object" } JSON mode, returns valid JSON data. When using JSON mode, it’s recommended to clearly request JSON output in the prompt.
`stop`	array	No	Stop word list. Generation stops when the model encounters any specified string. Currently, only one stop word is supported, in the format ["stop_word1"].
`stream`	boolean	No	This parameter should be set to false or omitted when using synchronous call. It indicates that the model returns all content at once after generating all content. Default value is false. If set to true, the model will return the generated content in chunks via standard Event Stream. When the Event Stream ends, a `data: [DONE]` message will be returned.
`user_id`	string	No	Unique ID for the end user, 6–128 characters. Avoid using sensitive information.
`temperature`	number	No	Sampling temperature, controls the randomness of the output, must be a positive number within the range: `[0.0, 1.0]`. The GLM-4.6 series default value is `1.0`, GLM-4.5 series default value is `0.6`, GLM-4-32B-0414-128K default value is `0.75`.
`messages`	array	Yes	The current conversation message list as the model’s prompt input, provided in JSON array format, e.g.,`{“role”: “user”, “content”: “Hello”}`. Possible message types include system messages, user messages, assistant messages, and tool messages. Note: The input must not consist of system messages or assistant messages only.
`tool_choice`	string	No	Controls how the model selects a tool. Used to control how the model selects which function to call. This is only applicable when the tool type is function. The default value is auto, and only auto is supported.
`model`	string	Yes	The model code to be called. GLM-4.6 are the latest flagship model series, foundational models specifically designed for agent applications.
`request_id`	string	No	Passed by the user side, needs to be unique; used to distinguish each request. If not provided by the user side, the platform will generate one by default.

Pricing

Cost: $0.02/call
Balance check: Use xpay_balance to check remaining credits
Get your API key at xpay.tools — $5 free credits included

Related Skills

Z.ai API (all tools) — 10 tools
Zai Generate Image — $0.02/call
Zai Web Reader — $0.02/call
Zai File Upload — $0.02/call
Zai Retrieve Result — $0.01/call
Zai Conversation History — $0.02/call

Links

Tool page: https://xpay.tools/zai/zai-chat-completion/
Provider: https://xpay.tools/zai/
All tools: https://xpay.tools/explore

How to Execute

// 1. Get full schema

xpay_details("zai/zai_chat_completion")

// 2. Execute

xpay_run("zai/zai_chat_completion", { ...inputs })

Input Parameters

Parameter	Type	Required	Description
max_tokens	integer	No	The maximum number of tokens for model output, the GLM-4.6 series supports 128K maximum output, the GLM-4.5 series supports 96K maximum output, the GLM-4.5v series supports 16K maximum output, GLM-4-32B-0414-128K supports 16K maximum output.
do_sample	boolean	No	When do_sample is true, sampling strategy is enabled; when do_sample is false, sampling strategy parameters such as temperature and top_p will not take effect.
thinking	object	No	Only supported by GLM-4.5 series and higher models. This parameter is used to control whether the model enable the chain of thought.
tools	array	No	A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.
tool_stream	boolean	No	Whether to enable streaming response for Function Calls. Default value is false. Only supported by GLM-4.6. Refer the [Stream Tool Call](/guides/tools/stream-tool)
top_p	number	No	Another method of temperature sampling, value range is: `[0.01, 1.0]`. The GLM-4.6, GLM-4.5 series default value is `0.95`, GLM-4-32B-0414-128K default value is `0.9`.
response_format	object	No	Specifies the response format of the model. Defaults to text. Supports two formats:{ "type": "text" } plain text mode, returns natural language text, { "type": "json_object" } JSON mode, returns valid JSON data. When using JSON mode, it’s recommended to clearly request JSON output in the prompt.
stop	array	No	Stop word list. Generation stops when the model encounters any specified string. Currently, only one stop word is supported, in the format ["stop_word1"].
stream	boolean	No	This parameter should be set to false or omitted when using synchronous call. It indicates that the model returns all content at once after generating all content. Default value is false. If set to true, the model will return the generated content in chunks via standard Event Stream. When the Event Stream ends, a `data: [DONE]` message will be returned.
user_id	string	No	Unique ID for the end user, 6–128 characters. Avoid using sensitive information.
temperature	number	No	Sampling temperature, controls the randomness of the output, must be a positive number within the range: `[0.0, 1.0]`. The GLM-4.6 series default value is `1.0`, GLM-4.5 series default value is `0.6`, GLM-4-32B-0414-128K default value is `0.75`.
messages	array	Yes	The current conversation message list as the model’s prompt input, provided in JSON array format, e.g.,`{“role”: “user”, “content”: “Hello”}`. Possible message types include system messages, user messages, assistant messages, and tool messages. Note: The input must not consist of system messages or assistant messages only.
tool_choice	string	No	Controls how the model selects a tool. Used to control how the model selects which function to call. This is only applicable when the tool type is function. The default value is auto, and only auto is supported.
model	string	Yes	The model code to be called. GLM-4.6 are the latest flagship model series, foundational models specifically designed for agent applications.
request_id	string	No	Passed by the user side, needs to be unique; used to distinguish each request. If not provided by the user side, the platform will generate one by default.

Other Z.ai API Skills

Zai Generate Image — $0.02 Zai Web Reader — $0.02 Zai File Upload — $0.02 Zai Retrieve Result — $0.01 Zai Conversation History — $0.02 Zai Retrieve Result Post — $0.02 Zai Agent Chat — $0.02 Zai Generate Videoasync — $0.02

Install Skill

Claude Code

claude /install-skill https://xpay.tools/skills/zai/zai-chat-completion/SKILL.md

CLI

npx @xpaysh/cli install zai/zai-chat-completion

Manual

curl -o SKILL.md https://xpay.tools/skills/zai/zai-chat-completion/SKILL.md

Pricing

Cost

$0.02/call

Model

Flat rate

Provider

Z.ai API

When to Use

Input Parameters

Parameter

Type

Required

Description

max_tokens

integer

The maximum number of tokens for model output, the GLM-4.6 series supports 128K maximum output, the GLM-4.5 series supports 96K maximum output, the GLM-4.5v series supports 16K maximum output, GLM-4-32B-0414-128K supports 16K maximum output.

do_sample

boolean

When do_sample is true, sampling strategy is enabled; when do_sample is false, sampling strategy parameters such as temperature and top_p will not take effect.

thinking

object

Only supported by GLM-4.5 series and higher models. This parameter is used to control whether the model enable the chain of thought.

tools

array

A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.

tool_stream

boolean

Whether to enable streaming response for Function Calls. Default value is false. Only supported by GLM-4.6. Refer the Stream Tool Call

top_p

number

Another method of temperature sampling, value range is: [0.01, 1.0]. The GLM-4.6, GLM-4.5 series default value is 0.95, GLM-4-32B-0414-128K default value is 0.9.

response_format

object

Specifies the response format of the model. Defaults to text. Supports two formats:{ "type": "text" } plain text mode, returns natural language text, { "type": "json_object" } JSON mode, returns valid JSON data. When using JSON mode, it’s recommended to clearly request JSON output in the prompt.

stop

array

Stop word list. Generation stops when the model encounters any specified string. Currently, only one stop word is supported, in the format ["stop_word1"].

stream

boolean

This parameter should be set to false or omitted when using synchronous call. It indicates that the model returns all content at once after generating all content. Default value is false. If set to true, the model will return the generated content in chunks via standard Event Stream. When the Event Stream ends, a data: [DONE] message will be returned.

user_id

string

Unique ID for the end user, 6–128 characters. Avoid using sensitive information.

temperature

number

Sampling temperature, controls the randomness of the output, must be a positive number within the range: [0.0, 1.0]. The GLM-4.6 series default value is 1.0, GLM-4.5 series default value is 0.6, GLM-4-32B-0414-128K default value is 0.75.

messages

array

Yes

The current conversation message list as the model’s prompt input, provided in JSON array format, e.g.,{“role”: “user”, “content”: “Hello”}. Possible message types include system messages, user messages, assistant messages, and tool messages. Note: The input must not consist of system messages or assistant messages only.

tool_choice

string

Controls how the model selects a tool. Used to control how the model selects which function to call. This is only applicable when the tool type is function. The default value is auto, and only auto is supported.

model

string

Yes

The model code to be called. GLM-4.6 are the latest flagship model series, foundational models specifically designed for agent applications.

request_id

string

Passed by the user side, needs to be unique; used to distinguish each request. If not provided by the user side, the platform will generate one by default.

Parameter

Type

Required

Description

max_tokens

integer

do_sample

boolean

When do_sample is true, sampling strategy is enabled; when do_sample is false, sampling strategy parameters such as temperature and top_p will not take effect.

thinking

object

Only supported by GLM-4.5 series and higher models. This parameter is used to control whether the model enable the chain of thought.

tools

array

tool_stream

boolean

Whether to enable streaming response for Function Calls. Default value is false. Only supported by GLM-4.6. Refer the [Stream Tool Call](/guides/tools/stream-tool)

top_p

number

Another method of temperature sampling, value range is: `[0.01, 1.0]`. The GLM-4.6, GLM-4.5 series default value is `0.95`, GLM-4-32B-0414-128K default value is `0.9`.

response_format

object

stop

array

Stop word list. Generation stops when the model encounters any specified string. Currently, only one stop word is supported, in the format ["stop_word1"].

stream

boolean

user_id

string

Unique ID for the end user, 6–128 characters. Avoid using sensitive information.

temperature

number

Sampling temperature, controls the randomness of the output, must be a positive number within the range: `[0.0, 1.0]`. The GLM-4.6 series default value is `1.0`, GLM-4.5 series default value is `0.6`, GLM-4-32B-0414-128K default value is `0.75`.

messages

array

Yes

The current conversation message list as the model’s prompt input, provided in JSON array format, e.g.,`{“role”: “user”, “content”: “Hello”}`. Possible message types include system messages, user messages, assistant messages, and tool messages. Note: The input must not consist of system messages or assistant messages only.

tool_choice

string

model

string

Yes

The model code to be called. GLM-4.6 are the latest flagship model series, foundational models specifically designed for agent applications.

request_id

string

Passed by the user side, needs to be unique; used to distinguish each request. If not provided by the user side, the platform will generate one by default.