HomeAI & LLMPerplexity Agent API: The Managed Runtime Developers Have Been Waiting For

Perplexity Agent API: The Managed Runtime Developers Have Been Waiting For

Published on

Perplexity Search API: Real-Time Web Retrieval That Outperforms Closed Search Systems

Search APIs have not fundamentally changed how they surface content for AI systems until now. Perplexity has opened access to the same retrieval infrastructure that powers its public answer engine, and the architecture is built differently from the ground up.

Essential Points

  • Perplexity Agent API provides unified access to OpenAI, Anthropic, Google, xAI, and NVIDIA models through a single API spec
  • Built-in web_search and fetch_url tools give agents real-time internet access without additional infrastructure; search costs $0.005 per call
  • Four pre-configured presets (fast-search, pro-search, deep-research, advanced-deep-research) eliminate manual model and tool tuning for the most common use cases
  • Every preset is fully overridable: swap the model, extend max_steps, or restrict search to specific domains within a single API call

Most AI agent stacks force you to manage separate API keys, orchestration layers, and retrieval infrastructure. The Perplexity Agent API removes those layers entirely. It is a multi-provider, interoperable runtime that handles model routing, tool execution, and reasoning control through one unified interface, with transparent per-request cost reporting built in.

One API, Every Major Model Provider

The Agent API supports models from OpenAI, Anthropic, Google, xAI, NVIDIA, and more, all accessible through a single client.responses.create() call. You specify a model using the provider/model-name format, for example nvidia/nemotron-3-super-120b-a12b or openai/gpt-5.4, and the API handles authentication and response normalization across providers without requiring you to maintain separate API keys.

The endpoint is available at POST https://api.perplexity.ai/v1/agent, and it also accepts POST /v1/responses as an alias for full OpenAI SDK compatibility. This means existing OpenAI-based codebases can migrate to Perplexity multi-provider routing with minimal changes.

Three Built-In Tools That Remove Infrastructure Overhead

The Agent API ships three tool categories: web_searchfetch_url, and custom function calling. You enable any tool by adding it to the tools array in your request, and the model autonomously decides when to invoke each tool based on your instructions.

web_search performs live internet searches with advanced filtering:

  • search_domain_filter: Restrict or exclude specific domains (up to 20), using a - prefix to exclude (e.g., -reddit.com)
  • search_recency_filter: Limit results by time period: dayweekmonth, or year
  • search_after_date / search_before_date: Target specific publication date ranges
  • max_tokens_per_page: Control content retrieved per search result to manage token costs

Pricing: web_search costs $0.005 per search call plus token costs. fetch_url costs $0.0005 per fetch plus token costs. Custom function calling adds no extra cost beyond standard token pricing.

fetch_url retrieves full content from a specific URL rather than search snippets, making it the right choice when you already know the source you need to analyze. Combining both tools gives agents a two-phase research pattern: search broadly to find relevant pages, then fetch the most relevant results for full-content analysis.

The Four Presets and Exactly What Each One Does

Presets are pre-configured model setups that bundle a specific model, token limits, reasoning step count, and tool access into a single preset parameter. Here is exactly what each preset contains:

Preset Model Max Steps Max Tokens Tools Best For
fast-search xai/grok-4-1-fast-non-reasoning 1 3K web_search Quick factual lookups, minimal latency
pro-search openai/gpt-5.1 3 3K web_search, fetch_url Researched answers with tool use for most queries
deep-research openai/gpt-5.2 10 10K web_search, fetch_url Complex, multi-step analysis requiring extensive research
advanced-deep-research anthropic/claude-opus-4-6 10 10K web_search, fetch_url Institutional-grade research with maximum depth and sophisticated source coverage

The system prompt tokens vary per preset: fast-search uses approximately 1,240 prompt tokens, pro-search approximately 1,502, deep-research approximately 3,267, and advanced-deep-research approximately 3,500. Higher prompt token counts reflect more detailed behavioral guidance and search strategies built into each preset.

How Preset Customization Works in Practice

Every preset default is overridable. You can swap the model while keeping the preset’s optimized system prompt, extend max_steps beyond the preset default, or restrict web_search to specific trusted domains, all within a single API call.

For example, using pro-search with a domain filter that restricts web_search to clinicaltrials.gov and fda.gov gives you a research-grade search agent scoped entirely to medical regulatory data, without writing a custom orchestration layer. Overriding max_steps to 5 on pro-search extends its default reasoning depth of 3 steps without switching to the heavier deep-research preset.

When you override a parameter, the preset’s other defaults stay in effect. Swapping the model on pro-search to anthropic/claude-sonnet-4-6 preserves the web_search and fetch_url tools, the optimized system prompt, and the 3-step reasoning limit.

Custom Function Calling: Connecting Agents to Your Own Systems

Beyond built-in tools, the Agent API supports custom function calling, which allows models to invoke your own databases, business logic, or third-party APIs during a conversation. The flow is deterministic and follows six steps:

  1. Define your function with a name, description, and JSON Schema for parameters
  2. Send your prompt with function definitions in the tools array
  3. The model returns a function_call item when it decides to invoke your function
  4. Execute the function in your own code
  5. Return the result as a function_call_output item in the next request
  6. The model uses the result to generate its final response

The arguments field returned in function calls is a JSON string, not a parsed object. Always parse it with json.loads() in Python or JSON.parse() in JavaScript before passing arguments to your function. Function descriptions drive the model’s decision about when to call each function, so specific and accurate descriptions directly improve agent reliability.

Multi-Turn State and Output Handling

The API supports multi-turn conversations through previous_response_id, which allows agents to maintain context across sequential requests without resending the full conversation history. The response.output_text convenience property aggregates all text content from the response output array, eliminating the need to iterate through response.output manually.

Transparent pricing is built into every API response. Each call returns exact input token counts, output token counts, and total cost in USD with no markup over direct provider pricing. This makes per-workflow cost tracking straightforward without requiring a separate monitoring layer.

Choosing the Right Preset for Your Workflow

  • Use fast-search when speed is the primary constraint and questions are factual and self-contained
  • Use pro-search for standard development and research queries where web-grounded, multi-step answers are expected
  • Use deep-research for competitive analysis, technical deep dives, or regulatory research requiring extended reasoning across many sources
  • Use advanced-deep-research when maximum source coverage and sophisticated analysis are required, such as institutional reports, systematic reviews, or complex competitive intelligence

Considerations Before Production Deployment

The Agent API is a cloud-hosted execution environment. Unrestricted web_search without domain filters can produce high token costs at scale; setting max_tokens_per_page and search_domain_filter on production workflows is essential for cost control. Custom function calling requires your own server-side execution and error handling between API turns, which adds implementation overhead compared to using built-in presets alone.

Frequently Asked Questions (FAQs)

What is the Perplexity Agent API?

The Perplexity Agent API is a multi-provider, interoperable API specification for building LLM-powered applications. It provides unified access to models from OpenAI, Anthropic, Google, xAI, NVIDIA, and other providers through one interface, with built-in tools for real-time web search and transparent per-request cost reporting.

What models can I access through the Perplexity Agent API?

The API supports models from OpenAI, Anthropic, Google, xAI, NVIDIA, and additional providers. You specify models using a provider/model-name format such as openai/gpt-5.4 or anthropic/claude-opus-4-6. No separate API keys are needed for each provider; the Perplexity API handles authentication across all supported providers.

What are the four Perplexity Agent API presets?

The four presets are fast-search (xAI Grok, 1 reasoning step, web search only), pro-search (GPT-5.1, 3 steps, web search and fetch URL), deep-research (GPT-5.2, 10 steps, both tools, 10K max tokens), and advanced-deep-research (Claude Opus 4.6, 10 steps, both tools, 10K max tokens). All presets are customizable by overriding individual parameters.

How much does the web_search tool cost?

The web_search tool costs $5.00 per 1,000 search calls, which equals $0.005 per individual search, plus standard token costs for processing the results. The fetch_url tool costs $0.50 per 1,000 requests ($0.0005 per fetch) plus token costs. Custom function calling has no additional cost beyond standard token pricing.

Can I restrict web search to specific domains?

Yes, the search_domain_filter parameter allows you to specify up to 20 domains as an allowlist or denylist. Use the - prefix to exclude a domain, for example -reddit.com. You can also filter by date range using search_after_date and search_before_date, or by recency using search_recency_filter with values of day, week, month, or year.

What is the difference between web_search and fetch_url?

web_search finds relevant pages across the internet and returns content snippets from multiple sources, making it suited for broad research and current news. fetch_url retrieves the full content of a specific URL you already know, making it the right choice when you need to analyze a particular document or verify a specific source. Combining both tools in one request gives comprehensive research coverage.

Does the Perplexity Agent API work with the OpenAI SDK?

Yes, the Agent API accepts requests at POST /v1/responses as an alias to its native endpoint, providing direct OpenAI SDK compatibility. This allows existing OpenAI-based codebases to route through Perplexity’s multi-provider infrastructure with minimal code changes, while gaining access to models beyond OpenAI’s own lineup.

Mohammad Kashif
Mohammad Kashif
Senior Technology Analyst and Writer at AdwaitX, specializing in the convergence of Mobile Silicon, Generative AI, and Consumer Hardware. Moving beyond spec sheets, his reviews rigorously test "real-world" metrics analyzing sustained battery efficiency, camera sensor behavior, and long-term software support lifecycles. Kashif’s data-driven approach helps enthusiasts and professionals distinguish between genuine innovation and marketing hype, ensuring they invest in devices that offer lasting value.

Latest articles

Perplexity Search API: Real-Time Web Retrieval That Outperforms Closed Search Systems

Search APIs have not fundamentally changed how they surface content for AI systems until now. Perplexity has opened access to the same retrieval infrastructure that powers its public answer engine, and the architecture is built differently from the ground up.

Xbox Project Helix: Microsoft’s Next Console Targets a New Generation of Performance

Announced at GDC 2026 by Jason Ronald, Vice President of Next Generation at Xbox, this is not a hardware revision or mid-cycle refresh. It is a generational platform change

my.WordPress.net: The WordPress That Lives in Your Browser, Not on a Server

WordPress just eliminated the single biggest reason people avoid it. my.WordPress.net launches a full WordPress environment directly in your browser, with no hosting plan, no domain purchase, and no account creation

Anthropic’s Sydney Office Marks a New Chapter in Asia-Pacific AI Expansion

Anthropic just made Australia and New Zealand a core part of its global strategy, not a future consideration. On March 10, 2026, the company announced Sydney as its fourth Asia-Pacific office, backed by concrete

More like this

Perplexity Search API: Real-Time Web Retrieval That Outperforms Closed Search Systems

Search APIs have not fundamentally changed how they surface content for AI systems until now. Perplexity has opened access to the same retrieval infrastructure that powers its public answer engine, and the architecture is built differently from the ground up.

Xbox Project Helix: Microsoft’s Next Console Targets a New Generation of Performance

Announced at GDC 2026 by Jason Ronald, Vice President of Next Generation at Xbox, this is not a hardware revision or mid-cycle refresh. It is a generational platform change

my.WordPress.net: The WordPress That Lives in Your Browser, Not on a Server

WordPress just eliminated the single biggest reason people avoid it. my.WordPress.net launches a full WordPress environment directly in your browser, with no hosting plan, no domain purchase, and no account creation