Voice Mode

Voice mode turns your product into a hands-free copilot. Users speak naturally and the assistant moves them through your routes and fires the same tool calls your APIs already expose — no separate widget, no separate integration. One trigger pill exposes a chat icon, a voice icon, or both, controlled by a single mode prop. Voice runs entirely on the host page over WebRTC against the OpenAI Realtime API; no extra iframe is mounted and tool calls flow through the same handlers as chat.

Voice and chat share the same getConfig and onToolCall handlers. Wire them up once on the provider and both surfaces inherit them.

The `mode` prop

Value	Trigger renders	Behaviour
`"chat"` (default)	Logo + chat icon	Opens the chat iframe
`"voice"`	Logo + voice icon	Starts a voice session on click
`"both"`	Logo + chat icon + voice icon	Either surface, one mount, shared handlers

When mode excludes voice, no voice session is constructed — there's no extra WebRTC machinery or asset cost.

Quick start

import { YakProvider, YakWidget } from "@yak-io/nextjs/client";

<YakProvider
  appId={process.env.NEXT_PUBLIC_YAK_APP_ID!}
  mode="both"
>
  {children}
  <YakWidget />
</YakProvider>

import { createYakProvider } from "@yak-io/vue";

createYakProvider({
  appId: "your-app-id",
  mode: "both",
});

import { createYakProvider } from "@yak-io/svelte";

const yak = createYakProvider({
  appId: "your-app-id",
  mode: "both",
});

import { createYakProvider } from "@yak-io/angular";

this.yak = createYakProvider({
  appId: "your-app-id",
  mode: "both",
});

// plugins/yak.client.ts
import { createYakProvider } from "@yak-io/nuxt";

export default defineNuxtPlugin((nuxtApp) => {
  const yak = createYakProvider({
    appId: "your-app-id",
    mode: "both",
  });
  nuxtApp.provide("yak", yak);
});

import { YakEmbed } from "@yak-io/javascript";

const embed = new YakEmbed({
  appId: "your-app-id",
  mode: "both",
});
embed.mount();

Tool calls in voice

Voice tool calls flow through your existing handlers — the assistant decides which tool to call based on the same routes and tool definitions returned from getConfig. No extra wiring is required.

<YakProvider
  appId="your-app-id"
  mode="both"
  getConfig={async () => {
    const res = await fetch("/api/yak");
    return res.json();
  }}
  onToolCall={async (name, args) => {
    const res = await fetch("/api/yak", {
      method: "POST",
      body: JSON.stringify({ name, args }),
    });
    const data = await res.json();
    if (!data.ok) throw new Error(data.error);
    return data.result;
  }}
>
  {children}
  <YakWidget />
</YakProvider>

Conversation history, insights & memory

Voice conversations are first-class: they persist, build insights, and feed customer memory exactly like chat.

If you pass a signed user to the provider, it applies to voice too — no extra props. When a verified user starts a voice session, Yak:

persists the transcript server-side against that user, so it appears in their history and powers insights, and
recalls their memory into the session as it connects — the assistant greets returning users already aware of their standing facts and recent moments, the same recall chat uses.

Anonymous voice sessions (no user) still persist per session when conversation storage is on, but aren't tied to an identifiable user. Persistence follows your app's Store conversations, Insights, and Memory settings, just like chat — turn storage off and voice runs without writing anything.

Voice transcripts are text only — Yak never stores the audio. Memory is recalled once when the session is minted (a session-start snapshot), where chat re-checks every turn.

Programmatic control

useYak() (and its framework equivalents) exposes voice methods alongside chat:

import { useYak } from "@yak-io/react";

function VoiceButton() {
  const { voiceState, voiceToggle, voiceIsActive, voiceLoading } = useYak();

  return (
    <button onClick={() => void voiceToggle()} disabled={voiceLoading}>
      {voiceIsActive ? `Stop (${voiceState})` : "Start voice"}
    </button>
  );
}

Method / property	Type	Description
`voiceState`	`"idle" \| "connecting" \| "listening" \| "thinking" \| "speaking" \| "error"`	Current session state
`voiceMachine`	`VoiceMachine`	Full snapshot including `errorMessage` when state is `"error"`
`voiceIsActive`	`boolean`	`true` while connecting, listening, thinking, or speaking
`voiceLoading`	`boolean`	`true` while the session is connecting (`voiceState === "connecting"`) — show a spinner
`voiceStart()`	`Promise<void>`	Start a session — must be invoked from a user gesture
`voiceStop()`	`Promise<void>`	Stop the current session
`voiceToggle()`	`Promise<void>`	Start if idle/error, stop if active

Permissions

Voice requires microphone access. The browser will prompt the user on the first session — call voiceStart() or click the voice icon directly from a user gesture so getUserMedia has transient activation. If permission is denied the state transitions to "error" with a descriptive errorMessage.

Voice mode is currently in beta. Latency and recognition quality depend on the OpenAI Realtime API; rate limits and pricing apply per session.

By default a voice session opens with a spoken greeting, which uses voice minutes. You can customize or disable it per app — see Greeting & Intro.

Next steps

Greeting & Intro — choose a generated, fixed, or silent opening for voice and chat
Styling — tune the trigger pill's colors, position, and color mode
Programmatic Control — drive the widget from your own buttons and shortcuts
Tool Adapters — connect your APIs so the assistant can act on voice requests too