Debugging OpenAI Tool Calls in Next.js

11 min read

Published

Updated 4 months ago

Debugging OpenAI Tool Calls in Next.js

Tool calls (also known as function calling) let an OpenAI model ask your server to run code—fetch data, hit internal APIs, query a database—and then incorporate the results into its response. In Next.js, this typically happens in a Route Handler (app/api/*/route.ts) or API Route (pages/api/*), where you: (1) send the model a tool schema, (2) receive a tool call request, (3) execute the tool, and (4) send the tool result back to the model for a final answer.

When something goes wrong, failures can be subtle: arguments that do not match your schema, tools that never run due to routing or edge/runtime constraints, streaming responses that hide intermediate events, or model outputs that look “reasonable” but are actually missing tool results. This guide focuses on reliable, verifiable techniques to debug tool calls end-to-end in Next.js—from local development to production observability.

A clear mental model

Before adding logs, make sure you can pinpoint the stage that is failing. In a typical flow, problems cluster into these categories:

Request construction: the tool schema is invalid, incomplete, or not actually sent to the model.
Model decision: the model does not choose to call a tool (or calls the wrong one).
Argument parsing: tool arguments are missing, malformed JSON, or do not match your validation rules.
Tool execution: network/database errors, timeouts, runtime limitations (Node.js vs Edge), or permission issues.
Return-to-model step: tool results are not sent back correctly, or are too large/unexpected in shape.
Streaming/UI integration: intermediate tool call events are not surfaced, so it looks like “nothing happened.”

Good debugging isolates one stage at a time and adds observability where it matters: inputs, outputs, and timing at each boundary.

Start with a known-good baseline Route Handler

A baseline implementation makes debugging easier because you can compare behavior as you add complexity (multiple tools, streaming, auth, DB calls). Below is a minimal pattern showing: request parsing, tool declaration, tool execution, and structured logging. Adapt it to your OpenAI SDK version and your preferred response style (JSON or streaming).

// app/api/chat/route.ts
import { NextResponse } from "next/server";

// Prefer environment variables for secrets.
const OPENAI_API_KEY = process.env.OPENAI_API_KEY;

export const runtime = "nodejs"; // Tools often rely on Node APIs; choose explicitly.

function safeJsonParse(input: string) {
  try {
    return { ok: true as const, value: JSON.parse(input) };
  } catch (e) {
    return { ok: false as const, error: e };
  }
}

export async function POST(req: Request) {
  const requestId = crypto.randomUUID();
  const startedAt = Date.now();

  if (!OPENAI_API_KEY) {
    return NextResponse.json(
      { error: "Missing OPENAI_API_KEY" },
      { status: 500 }
    );
  }

  let body: any;
  try {
    body = await req.json();
  } catch {
    return NextResponse.json({ error: "Invalid JSON body" }, { status: 400 });
  }

  // Example: expect messages from the client.
  const messages = body?.messages;
  if (!Array.isArray(messages)) {
    return NextResponse.json(
      { error: "Body must include messages[]" },
      { status: 400 }
    );
  }

  console.log("[chat] request", {
    requestId,
    messageCount: messages.length,
  });

  // Pseudocode: replace with the OpenAI SDK call you use.
  // 1) Send model + tools.
  // 2) If a tool call is requested, parse args and run tool.
  // 3) Send tool results back to the model.

  // The key debugging idea: log at every boundary.

  const toolName = "get_time";
  const toolArgsJson = JSON.stringify({ tz: "UTC" });

  console.log("[chat] tool_call_received", {
    requestId,
    toolName,
    toolArgsJson,
  });

  const parsed = safeJsonParse(toolArgsJson);
  if (!parsed.ok) {
    console.log("[chat] tool_args_parse_error", {
      requestId,
      toolName,
      error: String(parsed.error),
    });
    return NextResponse.json(
      { error: "Tool arguments were not valid JSON" },
      { status: 400 }
    );
  }

  // Execute tool (example)
  const toolResult = {
    now: new Date().toISOString(),
    tz: parsed.value.tz ?? "UTC",
  };

  console.log("[chat] tool_executed", {
    requestId,
    toolName,
    durationMs: Date.now() - startedAt,
  });

  // Return something deterministic while building out the rest.
  return NextResponse.json({ requestId, toolResult });
}

Even if your real flow differs (for example, you stream a response to the browser), keep the same idea: a requestId, explicit runtime, and logs for “tool call received,” “arguments parsed,” “tool executed,” and “tool result sent back.”

Add the right logs (and avoid logging sensitive data)

Tool-call debugging is mostly an observability problem. You want to answer: What did we send? What did the model request? What did we execute? What did we return? Do this with structured logs and consistent IDs, while keeping secrets and private user data out of logs.

Correlate events: generate a requestId per request and include it in every log line.
Log shapes, not raw content: store counts and keys (e.g., “messageCount: 6”, “toolArgsKeys: [\"city\", \"units\"]”) instead of full user messages or secrets.
Log timing: record durations for the model call, tool execution, and overall request time to spot timeouts and slow tools.
Log decision points: whether the model chose a tool, which tool, and why you accepted/rejected the call (validation failure, allowlist failure, etc.).

If you need deeper inspection locally, temporarily log redacted payloads (for example, truncate strings, strip tokens/keys), and remove those logs before production.

Validate tool arguments with a schema (and surface errors clearly)

Many tool failures are not “model errors”—they are argument mismatches. The model can produce arguments that are syntactically valid JSON but semantically wrong (missing required fields, wrong types, unexpected values). Treat tool inputs as untrusted and validate them.

A robust approach is to define a single source of truth for each tool’s input shape (for example, with a schema validator). Then: (1) validate, (2) if invalid, return an error object that your application can render, and (3) optionally re-prompt the model with a concise validation error so it can retry with corrected arguments.

// Example pattern (schema validation pseudocode)
// - Validate args before running the tool
// - Return a consistent error shape

type ToolError = {
  type: "tool_validation_error" | "tool_runtime_error";
  message: string;
  details?: unknown;
};

function validateGetWeatherArgs(args: any): { ok: true; value: { city: string } } | { ok: false; error: ToolError } {
  if (!args || typeof args !== "object") {
    return { ok: false, error: { type: "tool_validation_error", message: "Arguments must be an object" } };
  }
  if (typeof args.city !== "string" || args.city.trim() === "") {
    return { ok: false, error: { type: "tool_validation_error", message: "city is required" } };
  }
  return { ok: true, value: { city: args.city } };
}

Make validation failures easy to spot in logs by using a dedicated event name (for example tool_args_invalid) and include a small, non-sensitive error summary.

Enforce a tool allowlist and handle “unknown tool” calls

If you expose multiple tools, always implement a server-side allowlist: the tool name coming from the model must map to a known handler. This prevents accidental execution of unintended operations and makes debugging clearer when the model requests a tool that does not exist (often due to naming mismatches).

const tools = {
  get_time: async (args: { tz?: string }) => ({ now: new Date().toISOString(), tz: args.tz ?? "UTC" }),
  // add more tools here
} satisfies Record<string, (args: any) => Promise<any>>;

async function runTool(toolName: string, args: any) {
  const fn = (tools as Record<string, any>)[toolName];
  if (!fn) {
    return {
      ok: false as const,
      error: { type: "tool_validation_error", message: `Unknown tool: ${toolName}` },
    };
  }

  try {
    const result = await fn(args);
    return { ok: true as const, result };
  } catch (e) {
    return {
      ok: false as const,
      error: { type: "tool_runtime_error", message: "Tool execution failed", details: String(e) },
    };
  }
}

Debugging streaming: make tool-call events visible

When you stream model output to the client, tool calls may happen mid-stream. If your UI only displays final tokens, you can miss the fact that a tool call was requested (or that your server rejected it). For effective debugging, your server should emit explicit events to the client (or to logs) when tool calls start and finish.

Server logs: log when a tool call is detected, when arguments are finalized, when execution begins, and when it completes.
Client-visible events: consider Server-Sent Events (SSE) or a structured stream format so the UI can render “Calling tool…” states.
Backpressure and timeouts: streaming can hide slow tool executions; measure and surface durations.

A practical debugging technique is to build a “debug mode” that, when enabled for your own sessions, streams an additional event channel (or extra fields in SSE) containing tool-call metadata (names, durations, validation outcomes) with sensitive fields redacted.

Check your runtime: Node.js vs Edge can change tool behavior

In Next.js, the runtime you deploy to (Node.js or Edge) affects what APIs you can use and how networking behaves. Some tool implementations depend on Node-only libraries (certain database drivers, filesystem access, native modules). If you see tools failing only after deployment, confirm your route’s runtime is explicitly set and compatible with your tool code.

Set export const runtime = "nodejs" for routes that depend on Node APIs.
Avoid Node-only dependencies in Edge routes unless you are sure they are supported.
Log the runtime and environment (development vs production) once per request for quicker triage.

Common failure modes and how to diagnose them quickly

The model never calls a tool: verify the tool schema is actually included in the request; ensure the tool is relevant and described clearly; confirm you are using a model/configuration that supports tool calls in your SDK.
Tool name mismatch: the model requests getTime but your server expects get_time; enforce allowlists and keep names stable and simple.
Arguments are not valid JSON: add safe parsing with a clear error path; log a truncated version of the raw arguments for debugging (avoid full content in production).
Arguments validate but tool fails: isolate the tool by running it independently (unit test or a direct server call); add timing logs and capture upstream errors.
Works locally, fails in production: check runtime (Edge vs Node), environment variables, network egress rules, and timeouts; add structured error logs with requestId to correlate.
Streaming hides the tool step: add explicit events or UI indicators for tool-call phases; ensure tool execution errors are not swallowed.

Testing strategy: debug faster with isolation and replay

Tool-call issues are easier to debug when you can reproduce them deterministically. Two practical patterns help:

Tool unit tests: test each tool handler with valid and invalid arguments, and simulate upstream failures (timeouts, 500s).
Request replay: store a redacted version of failing inputs (messages and tool-call metadata) in a secure location so you can replay the same scenario against a staging environment.

When replaying, keep a strict redaction policy (remove secrets, tokens, and sensitive user data). This lets you debug the tool pipeline without compromising privacy.

Production observability checklist

Once tool calls are in production, you need fast answers during incidents. The following checklist keeps debugging overhead low while preserving safety:

A requestId propagated across server logs and returned to the client for support tickets.
A tool-call audit trail: tool name, validation status, execution duration, and high-level error types.
Redaction by default: never log API keys, tokens, or full user prompts in production.
Timeouts for tool execution and upstream calls; log when timeouts happen.
Feature flags: enable/disable individual tools quickly without redeploying.
Alerting on tool failure rates and latency spikes (based on your platform’s logging/metrics stack).

Wrap-up: debug tool calls by instrumenting boundaries

Debugging OpenAI tool calls in Next.js is easiest when you treat your app like a pipeline with explicit boundaries: model request → tool call → argument validation → tool execution → tool result → final model response. Add structured logs and timing at each boundary, validate all tool inputs, enforce an allowlist, and make streaming tool-call events visible. With those fundamentals in place, most issues become straightforward to reproduce, diagnose, and fix.

Debugging OpenAI Tool Calls in Next.js