Voice Mode
Voice mode turns your product into a hands-free copilot. Users speak naturally and the assistant moves them through your routes and fires the same tool calls your APIs already expose — no separate widget, no separate integration. One trigger pill exposes a chat icon, a voice icon, or both, controlled by a single mode prop. Voice runs entirely on the host page over WebRTC against the OpenAI Realtime API; no extra iframe is mounted and tool calls flow through the same handlers as chat.
Voice and chat share the same getConfig and onToolCall handlers. Wire them up once on the provider and both surfaces inherit them.
The mode prop
| Value | Trigger renders | Behaviour |
|---|---|---|
"chat" (default) | Logo + chat icon | Opens the chat iframe |
"voice" | Logo + voice icon | Starts a voice session on click |
"both" | Logo + chat icon + voice icon | Either surface, one mount, shared handlers |
When mode excludes voice, no voice session is constructed — there's no extra WebRTC machinery or asset cost.
Quick start
import { YakProvider, YakWidget } from "@yak-io/nextjs/client";
<YakProvider
appId={process.env.NEXT_PUBLIC_YAK_APP_ID!}
mode="both"
>
{children}
<YakWidget />
</YakProvider>import { createYakProvider } from "@yak-io/vue";
createYakProvider({
appId: "your-app-id",
mode: "both",
});import { createYakProvider } from "@yak-io/svelte";
const yak = createYakProvider({
appId: "your-app-id",
mode: "both",
});import { createYakProvider } from "@yak-io/angular";
this.yak = createYakProvider({
appId: "your-app-id",
mode: "both",
});// plugins/yak.client.ts
import { createYakProvider } from "@yak-io/nuxt";
export default defineNuxtPlugin((nuxtApp) => {
const yak = createYakProvider({
appId: "your-app-id",
mode: "both",
});
nuxtApp.provide("yak", yak);
});import { YakEmbed } from "@yak-io/javascript";
const embed = new YakEmbed({
appId: "your-app-id",
mode: "both",
});
embed.mount();Tool calls in voice
Voice tool calls flow through your existing handlers — the assistant decides which tool to call based on the same routes and tool definitions returned from getConfig. No extra wiring is required.
<YakProvider
appId="your-app-id"
mode="both"
getConfig={async () => {
const res = await fetch("/api/yak");
return res.json();
}}
onToolCall={async (name, args) => {
const res = await fetch("/api/yak", {
method: "POST",
body: JSON.stringify({ name, args }),
});
const data = await res.json();
if (!data.ok) throw new Error(data.error);
return data.result;
}}
>
{children}
<YakWidget />
</YakProvider>Programmatic control
useYak() (and its framework equivalents) exposes voice methods alongside chat:
import { useYak } from "@yak-io/react";
function VoiceButton() {
const { voiceState, voiceToggle, voiceIsActive, voiceLoading } = useYak();
return (
<button onClick={() => void voiceToggle()} disabled={voiceLoading}>
{voiceIsActive ? `Stop (${voiceState})` : "Start voice"}
</button>
);
}| Method / property | Type | Description |
|---|---|---|
voiceState | "idle" | "connecting" | "listening" | "thinking" | "speaking" | "error" | Current session state |
voiceMachine | VoiceMachine | Full snapshot including errorMessage when state is "error" |
voiceIsActive | boolean | true while connecting, listening, thinking, or speaking |
voiceLoading | boolean | true while the session is connecting (voiceState === "connecting") — show a spinner |
voiceStart() | Promise<void> | Start a session — must be invoked from a user gesture |
voiceStop() | Promise<void> | Stop the current session |
voiceToggle() | Promise<void> | Start if idle/error, stop if active |
Permissions
Voice requires microphone access. The browser will prompt the user on the first session — call voiceStart() or click the voice icon directly from a user gesture so getUserMedia has transient activation. If permission is denied the state transitions to "error" with a descriptive errorMessage.
Voice mode is currently in beta. Latency and recognition quality depend on the OpenAI Realtime API; rate limits and pricing apply per session.
By default a voice session opens with a spoken greeting, which uses voice minutes. You can customize or disable it per app — see Greeting & Intro.
Next steps
- Greeting & Intro — choose a generated, fixed, or silent opening for voice and chat
- Styling — tune the trigger pill's colors, position, and color mode
- Programmatic Control — drive the widget from your own buttons and shortcuts
- Tool Adapters — connect your APIs so the assistant can act on voice requests too