Tool

Perplexity Chat Completions

Name: Perplexity Chat Completions
Brand: Perplexity API
SKU: perplexity/perplexity-chat-completions
Price: 0.02 USD
Availability: InStock

perplexity_chat_completions

Generates a model’s response for the given chat conversation.

How it works ↓

Pricing

Per call

$0.02

Model

flat

Pay only for what you use. No subscriptions.

Inputs

reasoning_effort

string

return_images

boolean

search_mode

string

presence_penalty

number

search_after_date_filter

string

disable_search

boolean

web_search_options

object

top_p

number

last_updated_before_filter

string

frequency_penalty

number

response_format

object

stream

boolean

top_k

number

temperature

number

model *

string

search_domain_filter

string

last_updated_after_filter

string

return_related_questions

boolean

search_before_date_filter

string

max_tokens

integer

enable_search_classifier

boolean

language_preference

string

messages *

array

search_recency_filter

string

media_response

object

Try It

API

MCP Config

Input Parameters

reasoning_effort

Perplexity-Specific: Controls how much computational effort the AI dedicates to each query for deep research models. 'low' provides faster, simpler answers with reduced token usage, 'medium' offers a balanced approach, and 'high' delivers deeper, more thorough responses with increased token usage. This parameter directly impacts the amount of reasoning tokens consumed. WARNING: This parameter is ONLY applicable for sonar-deep-research. Defaults to 'medium' when used with sonar-deep-research.

return_images

search_mode

Controls search mode: 'academic' prioritizes scholarly sources, 'sec' prioritizes SEC filings, 'web' uses general web search. See academic guide and SEC guide.

presence_penalty

OpenAI Compatible: Positive values increase the likelihood of discussing new topics. Applies a penalty to tokens that have already appeared in the text, encouraging the model to talk about new concepts. Values typically range from 0 (no penalty) to 2.0 (strong penalty). Higher values reduce repetition but may lead to more off-topic text.

search_after_date_filter

Perplexity-Specific: Filters search results to only include content published after this date. Format should be %m/%d/%Y (e.g. 3/1/2025)

disable_search

web_search_options

Perplexity-Specific: Configuration for using web search in model responses.

top_p

OpenAI Compatible: The nucleus sampling threshold, valued between 0 and 1. Controls the diversity of generated text by considering only the tokens whose cumulative probability exceeds the top_p value. Lower values (e.g., 0.5) make the output more focused and deterministic, while higher values (e.g., 0.95) allow for more diverse outputs. Often used as an alternative to temperature.

last_updated_before_filter

Perplexity-Specific: Filters search results to only include content last updated before this date. Format should be %m/%d/%Y (e.g. 3/1/2025)

frequency_penalty

OpenAI Compatible: Decreases likelihood of repetition based on prior frequency. Applies a penalty to tokens based on how frequently they've appeared in the text so far. Values typically range from 0 (no penalty) to 2.0 (strong penalty). Higher values (e.g., 1.5) reduce repetition of the same words and phrases. Useful for preventing the model from getting stuck in loops.

response_format

Enables structured JSON output formatting.

stream

top_k

OpenAI Compatible: The number of tokens to keep for top-k filtering. Limits the model to consider only the k most likely next tokens at each step. Lower values (e.g., 20) make the output more focused and deterministic, while higher values allow for more diverse outputs. A value of 0 disables this filter. Often used in conjunction with top_p to control output randomness.

temperature

The amount of randomness in the response, valued between 0 and 2. Lower values (e.g., 0.1) make the output more focused, deterministic, and less creative. Higher values (e.g., 1.5) make the output more random and creative. Use lower values for factual/information retrieval tasks and higher values for creative applications.

model

The name of the model that will complete your prompt. Choose from our available Sonar models: sonar (lightweight search), sonar-pro (advanced search), sonar-deep-research (exhaustive research), or sonar-reasoning-pro (premier reasoning).

search_domain_filter

A list of domains to limit search results to. Currently limited to 20 domains for Allowlisting and Denylisting. For Denylisting, add a - at the beginning of the domain string. More information about this here.

last_updated_after_filter

Perplexity-Specific: Filters search results to only include content last updated after this date. Format should be %m/%d/%Y (e.g. 3/1/2025)

return_related_questions

search_before_date_filter

Perplexity-Specific: Filters search results to only include content published before this date. Format should be %m/%d/%Y (e.g. 3/1/2025)

max_tokens

OpenAI Compatible: The maximum number of completion tokens returned by the API. Controls the length of the model's response. If the response would exceed this limit, it will be truncated. Higher values allow for longer responses but may increase processing time and costs.

enable_search_classifier

language_preference

Perplexity-Specific: Specifies the preferred language for the chat completion response (i.e., English, Korean, Spanish, etc.) of the response content. This parameter is supported only by the sonar and sonar-pro models. Using it with other models is on a best-effort basis and may not produce consistent results.

messages

A list of messages comprising the conversation so far.

search_recency_filter

Perplexity-Specific: Filters search results based on time (e.g., 'week', 'day').

media_response

Perplexity-Specific: Configuration for controlling media content in responses, such as videos and images. Use the overrides property to enable specific media types.

Cost per run

Execution cost

$0.02

Deducted from your xPay allowance

About Perplexity Chat Completions

Perplexity Chat Completions on xpay — Sonar models with built-in web search

Perplexity's /chat/completions endpoint runs the Sonar family of models — search-grounded LLMs that answer questions with live web context and inline citations. It's an OpenAI-compatible API surface (messages, temperature, max_tokens, stream) with one critical difference: every response is grounded in a fresh web search and includes the source URLs the model used.

xpay exposes perplexity_chat_completions as a single MCP-callable tool. You pass messages and a model, you get back the assistant message + citations, billed per call.

Available Sonar models

Model	Use case	Notes
sonar	Cheap, fast, search-grounded answers	Best default for chat agents that need fresh web context
sonar-pro	Higher-quality answers, deeper search	Use when accuracy matters more than latency
sonar-reasoning	Chain-of-thought + web search	For multi-step questions, comparisons, analyses
sonar-reasoning-pro	Highest quality reasoning + search	Premium tier; use for research-grade outputs

Request shape (OpenAI-compatible)

{
  "model": "sonar",
  "messages": [
    {"role": "system", "content": "Be precise and cite sources."},
    {"role": "user", "content": "What's the latest on the SEC's stance on RWA tokenization?"}
  ],
  "max_tokens": 800,
  "temperature": 0.2
}

The response includes choices[0].message.content plus a citations array of URLs the model used. Your agent can render the answer + source links without a separate retrieval step.

When Perplexity Sonar is the right model

Up-to-date answers. Anthropic Claude and OpenAI GPT-4 don't have native live web search; Sonar does.
Citation requirements. Compliance, research, or content workflows where every claim needs a source URL.
Replacing a RAG pipeline for cases where the corpus is "the open web" and you don't want to manage your own search index.

When to choose something else

For pure reasoning over your own documents, use a frontier model + your own retrieval (Tavily, Exa, Jina Reader). Sonar's search is open-web; you can't constrain it to your corpus.
For coding tasks, GPT-4o, Claude Sonnet 4.6, or DeepSeek-R1 outperform Sonar.
For very high volume, you'll save 30–60% running a frontier model + a cheap search API separately.

Pricing

Per-call pricing on xpay reflects Perplexity's per-token rates plus a small markup. Roughly $0.005–$0.05 per call depending on prompt and response length and model tier. New accounts get $5 in free credit.

Why xpay vs. direct Perplexity API?

No Perplexity API account. xpay holds the upstream key.
MCP-native. Your agent discovers and calls perplexity_chat_completions as one of dozens of tools, no HTTP wiring on your side.
Unified billing across Perplexity, OpenAI, Anthropic, search APIs, and 60+ other providers.
Per-call pricing instead of Perplexity's monthly minimums for Pro tier.

Frequently Asked Questions

It is Perplexity's OpenAI-compatible /chat/completions API for the Sonar family of search-grounded LLMs. You pass messages and a model name; the response includes both the assistant message and the citation URLs the model used.

Use sonar for cheap, fast, search-grounded chat. Use sonar-pro when answer quality matters more than latency. Use sonar-reasoning for multi-step questions or comparisons. Use sonar-reasoning-pro for research-grade outputs where you want the highest accuracy.

Same upstream API and same response shape. xpay differences: no Perplexity API account needed, per-call billing instead of monthly minimums, MCP-native (your agent discovers and runs the tool without HTTP code), unified billing with 60+ other providers.

Yes. Every Sonar response includes a citations field listing the source URLs the model used. You can render answer + sources without running a separate retrieval pass.

The MCP wrapping is request/response; streaming is not exposed through xpay's MCP layer today. If you need streaming, call Perplexity directly. If your agent runs in an MCP client, request/response is usually fine.

Roughly $0.005–$0.05 per call depending on model tier and prompt/response length. xpay pricing reflects Perplexity per-token rates plus a small markup. New accounts get $5 in free credit.

No — Sonar searches the open web. If you need retrieval over a private corpus, use a frontier LLM (Claude, GPT-4o) plus your own retrieval (Tavily, Exa, Jina Reader, or your vector store).

Perplexity Chat Completions on xpay — Sonar models with built-in web search

xpay exposes perplexity_chat_completions as a single MCP-callable tool. You pass messages and a model, you get back the assistant message + citations, billed per call.

Available Sonar models

Model

Use case

Notes

sonar

Cheap, fast, search-grounded answers

Best default for chat agents that need fresh web context

sonar-pro

Higher-quality answers, deeper search

Use when accuracy matters more than latency

sonar-reasoning

Chain-of-thought + web search

For multi-step questions, comparisons, analyses

sonar-reasoning-pro

Highest quality reasoning + search

Premium tier; use for research-grade outputs

Request shape (OpenAI-compatible)

{
  "model": "sonar",
  "messages": [
    {"role": "system", "content": "Be precise and cite sources."},
    {"role": "user", "content": "What's the latest on the SEC's stance on RWA tokenization?"}
  ],
  "max_tokens": 800,
  "temperature": 0.2
}

The response includes choices[0].message.content plus a citations array of URLs the model used. Your agent can render the answer + source links without a separate retrieval step.

When Perplexity Sonar is the right model

Up-to-date answers. Anthropic Claude and OpenAI GPT-4 don't have native live web search; Sonar does.

Citation requirements. Compliance, research, or content workflows where every claim needs a source URL.

Replacing a RAG pipeline for cases where the corpus is "the open web" and you don't want to manage your own search index.

When to choose something else

For pure reasoning over your own documents, use a frontier model + your own retrieval (Tavily, Exa, Jina Reader). Sonar's search is open-web; you can't constrain it to your corpus.

For coding tasks, GPT-4o, Claude Sonnet 4.6, or DeepSeek-R1 outperform Sonar.

For very high volume, you'll save 30–60% running a frontier model + a cheap search API separately.

Pricing

Why xpay vs. direct Perplexity API?

No Perplexity API account. xpay holds the upstream key.

MCP-native. Your agent discovers and calls perplexity_chat_completions as one of dozens of tools, no HTTP wiring on your side.

Unified billing across Perplexity, OpenAI, Anthropic, search APIs, and 60+ other providers.

Per-call pricing instead of Perplexity's monthly minimums for Pro tier.

Perplexity Chat Completions

Pricing

Inputs

Input Parameters

Cost per run

$0.02

About Perplexity Chat Completions

Perplexity Chat Completions on xpay — Sonar models with built-in web search

Available Sonar models

Request shape (OpenAI-compatible)

When Perplexity Sonar is the right model

When to choose something else

Pricing

Why xpay vs. direct Perplexity API?

Frequently Asked Questions

What is the Perplexity chat completions endpoint?

What is the Perplexity chat completions endpoint?

Which Perplexity Sonar model should I use?

Which Perplexity Sonar model should I use?

How is this different from calling Perplexity directly?

How is this different from calling Perplexity directly?

Does the response include citations?

Does the response include citations?

Can I stream the response?

Can I stream the response?

How much does it cost per call?

How much does it cost per call?

Can I constrain Sonar to my own documents instead of the open web?

Can I constrain Sonar to my own documents instead of the open web?

xpay Assistant

Perplexity Chat Completions

Pricing

Inputs

Input Parameters

Cost per run

$0.02

About Perplexity Chat Completions

Perplexity Chat Completions on xpay — Sonar models with built-in web search

Available Sonar models

Request shape (OpenAI-compatible)

When Perplexity Sonar is the right model

When to choose something else

Pricing

Why xpay vs. direct Perplexity API?

Frequently Asked Questions

What is the Perplexity chat completions endpoint?

What is the Perplexity chat completions endpoint?

Which Perplexity Sonar model should I use?

Which Perplexity Sonar model should I use?

How is this different from calling Perplexity directly?

How is this different from calling Perplexity directly?

Does the response include citations?

Does the response include citations?

Can I stream the response?

Can I stream the response?

How much does it cost per call?

How much does it cost per call?

Can I constrain Sonar to my own documents instead of the open web?

Can I constrain Sonar to my own documents instead of the open web?

xpay Assistant