Perplexity Chat Completions
perplexity_chat_completionsGenerates a model’s response for the given chat conversation.
How it works ↓Pricing
Per call
$0.02
Model
flat
Pay only for what you use. No subscriptions.
Inputs
reasoning_effort
stringreturn_images
booleansearch_mode
stringpresence_penalty
numbersearch_after_date_filter
stringdisable_search
booleanweb_search_options
objecttop_p
numberlast_updated_before_filter
stringfrequency_penalty
numberresponse_format
objectstream
booleantop_k
numbertemperature
numbermodel *
stringsearch_domain_filter
stringlast_updated_after_filter
stringreturn_related_questions
booleansearch_before_date_filter
stringmax_tokens
integerenable_search_classifier
booleanlanguage_preference
stringmessages *
arraysearch_recency_filter
stringmedia_response
objectInput Parameters
Cost per run
Execution cost$0.02
About Perplexity Chat Completions
Perplexity Chat Completions on xpay — Sonar models with built-in web search
Perplexity's /chat/completions endpoint runs the Sonar family of models — search-grounded LLMs that answer questions with live web context and inline citations. It's an OpenAI-compatible API surface (messages, temperature, max_tokens, stream) with one critical difference: every response is grounded in a fresh web search and includes the source URLs the model used.
xpay exposes perplexity_chat_completions as a single MCP-callable tool. You pass messages and a model, you get back the assistant message + citations, billed per call.
Available Sonar models
| Model | Use case | Notes |
|---|---|---|
| sonar | Cheap, fast, search-grounded answers | Best default for chat agents that need fresh web context |
| sonar-pro | Higher-quality answers, deeper search | Use when accuracy matters more than latency |
| sonar-reasoning | Chain-of-thought + web search | For multi-step questions, comparisons, analyses |
| sonar-reasoning-pro | Highest quality reasoning + search | Premium tier; use for research-grade outputs |
Request shape (OpenAI-compatible)
{
"model": "sonar",
"messages": [
{"role": "system", "content": "Be precise and cite sources."},
{"role": "user", "content": "What's the latest on the SEC's stance on RWA tokenization?"}
],
"max_tokens": 800,
"temperature": 0.2
}
The response includes choices[0].message.content plus a citations array of URLs the model used. Your agent can render the answer + source links without a separate retrieval step.
When Perplexity Sonar is the right model
- Up-to-date answers. Anthropic Claude and OpenAI GPT-4 don't have native live web search; Sonar does.
- Citation requirements. Compliance, research, or content workflows where every claim needs a source URL.
- Replacing a RAG pipeline for cases where the corpus is "the open web" and you don't want to manage your own search index.
When to choose something else
- For pure reasoning over your own documents, use a frontier model + your own retrieval (Tavily, Exa, Jina Reader). Sonar's search is open-web; you can't constrain it to your corpus.
- For coding tasks, GPT-4o, Claude Sonnet 4.6, or DeepSeek-R1 outperform Sonar.
- For very high volume, you'll save 30–60% running a frontier model + a cheap search API separately.
Pricing
Per-call pricing on xpay reflects Perplexity's per-token rates plus a small markup. Roughly $0.005–$0.05 per call depending on prompt and response length and model tier. New accounts get $5 in free credit.
Why xpay vs. direct Perplexity API?
- No Perplexity API account. xpay holds the upstream key.
- MCP-native. Your agent discovers and calls
perplexity_chat_completionsas one of dozens of tools, no HTTP wiring on your side. - Unified billing across Perplexity, OpenAI, Anthropic, search APIs, and 60+ other providers.
- Per-call pricing instead of Perplexity's monthly minimums for Pro tier.
Frequently Asked Questions
It is Perplexity's OpenAI-compatible /chat/completions API for the Sonar family of search-grounded LLMs. You pass messages and a model name; the response includes both the assistant message and the citation URLs the model used.
Use sonar for cheap, fast, search-grounded chat. Use sonar-pro when answer quality matters more than latency. Use sonar-reasoning for multi-step questions or comparisons. Use sonar-reasoning-pro for research-grade outputs where you want the highest accuracy.
Same upstream API and same response shape. xpay differences: no Perplexity API account needed, per-call billing instead of monthly minimums, MCP-native (your agent discovers and runs the tool without HTTP code), unified billing with 60+ other providers.
Yes. Every Sonar response includes a citations field listing the source URLs the model used. You can render answer + sources without running a separate retrieval pass.
The MCP wrapping is request/response; streaming is not exposed through xpay's MCP layer today. If you need streaming, call Perplexity directly. If your agent runs in an MCP client, request/response is usually fine.
Roughly $0.005–$0.05 per call depending on model tier and prompt/response length. xpay pricing reflects Perplexity per-token rates plus a small markup. New accounts get $5 in free credit.
No — Sonar searches the open web. If you need retrieval over a private corpus, use a frontier LLM (Claude, GPT-4o) plus your own retrieval (Tavily, Exa, Jina Reader, or your vector store).

