AI Voice Agents Nov 2024 · 10 min read

Building Production
AI Voice Agents
with Vapi

How to integrate Vapi and Retell with OpenAI to create business-grade voice receptionists that connect seamlessly with CRMs and calendars.

ZH
Zubair Hussain Full Stack Developer thezubairh@gmail.com

Why AI Voice Agents Are Now Viable

A year ago, voice AI was a novelty — robotic, laggy, and frustrating. Today, with sub-300ms latency from providers like Vapi and Retell, combined with GPT-4o's natural turn-taking, we've crossed the threshold where these agents are genuinely useful for business workflows.

In this article I'll walk you through building a production-grade AI voice receptionist — one that can handle inbound calls, book appointments, update CRM records, and hand off to humans when needed. This is based on real systems I've shipped for clients.

The magic isn't in the AI model — it's in the orchestration layer that connects voice to real business data.

Tools We'll Use

🎙️
Vapi
Handles telephony, STT/TTS, and real-time conversation management. Best developer experience in the space.
🔁
Retell AI
Alternative to Vapi with a slightly different latency profile. Great for high-volume outbound dialing.
🧠
OpenAI GPT-4o
The reasoning core. Handles intent detection, slot-filling, and deciding when to call external tools.
⚙️
Node.js
Our backend runtime. Fast enough for real-time webhook handling, excellent ecosystem for API integrations.

System Architecture

Before writing any code, it's worth understanding how data flows in a voice agent system. A caller dials in, the telephony layer streams audio, STT converts it to text, your LLM decides what to do, and TTS streams audio back — all within a few hundred milliseconds.

📞 Caller
Vapi / Retell
STT
GPT-4o
Your Webhook
Webhook → CRM / Calendar → Response → TTS → Caller

Your Node.js webhook server is the brain of the operation. Vapi calls it whenever the AI needs to take an action — booking a slot, looking up a customer record, or transferring the call. Keeping this server fast (under 800ms) is critical to conversation quality.

Implementation

1 · Setting Up the Vapi Assistant

Start by creating an assistant via the Vapi API. The key parameters are the system prompt, voice selection, and your webhook URL where function calls will be dispatched.

import Vapi from '@vapi-ai/server-sdk';

const vapi = new Vapi({ token: process.env.VAPI_API_KEY });

const assistant = await vapi.assistants.create({
  name: 'ReceptionistBot',
  model: {
    provider: 'openai',
    model: 'gpt-4o',
    systemPrompt: SYSTEM_PROMPT,
    tools: [bookAppointment, lookupCustomer, transferCall],
  },
  voice: {
    provider: 'elevenlabs',
    voiceId: 'rachel',
  },
  serverUrl: 'https://your-api.com/vapi/webhook',
});

2 · The System Prompt

Your prompt is where personality and business logic live. Be explicit about the agent's role, what it can and cannot do, and how it should handle edge cases like angry customers or questions outside its scope.

const SYSTEM_PROMPT = `
You are Alex, a professional receptionist at Acme Dental Clinic.

Your role:
- Answer inbound calls warmly and professionally
- Book, reschedule, or cancel appointments
- Look up patient records to personalize responses
- Transfer to a human agent for billing or emergencies

Rules:
- Never make up appointment slots — always call checkAvailability first
- If the caller seems distressed, offer to transfer immediately
- Keep responses concise — this is a phone call, not a chat
- Always confirm details before booking: name, date, time, reason
`;

3 · Webhook Handler

When the AI decides to call a tool, Vapi sends a POST request to your webhook. Your job is to execute the function and return a result within ~800ms.

import express from 'express';
import { bookAppointmentInCal } from './calendar.js';
import { upsertCRMContact } from './crm.js';

const app = express();
app.use(express.json());

app.post('/vapi/webhook', async (req, res) => {
  const { type, call, toolCallList } = req.body;

  if (type === 'tool-calls') {
    const results = await Promise.all(
      toolCallList.map(async (toolCall) => {
        const { name, arguments: args } = toolCall.function;

        if (name === 'bookAppointment') {
          const slot = await bookAppointmentInCal(args);
          await upsertCRMContact({ phone: call.customer.number, ...args });
          return { toolCallId: toolCall.id, result: JSON.stringify(slot) };
        }

        if (name === 'lookupCustomer') {
          const customer = await getCRMContact(args.phone);
          return { toolCallId: toolCall.id, result: JSON.stringify(customer) };
        }
      })
    );

    return res.json({ results });
  }

  // Handle call-end events for logging
  if (type === 'end-of-call-report') {
    await logCallSummary(call);
  }

  res.json({ received: true });
});
💡

Run your calendar and CRM calls in parallel with Promise.all wherever possible. Sequential awaits compound latency — the difference between a natural conversation and an awkward one is often just 300ms.

CRM & Calendar Integrations

The voice layer is only as useful as the data it can read and write. Here's how I wire up the two most common integrations for business receptionists.

Google Calendar — Slot Checking

import { google } from 'googleapis';

const calendar = google.calendar({ version: 'v3', auth: getOAuthClient() });

export async function checkAvailability({ date, durationMins = 30 }) {
  const dayStart = new Date(`${date}T08:00:00`);
  const dayEnd   = new Date(`${date}T18:00:00`);

  const { data } = await calendar.freebusy.query({
    requestBody: {
      timeMin: dayStart.toISOString(),
      timeMax: dayEnd.toISOString(),
      items: [{ id: process.env.CALENDAR_ID }],
    },
  });

  const busy = data.calendars[process.env.CALENDAR_ID].busy;
  return findFreeSlots(dayStart, dayEnd, busy, durationMins);
}

HubSpot CRM — Contact Upsert

import { Client } from '@hubspot/api-client';

const hubspot = new Client({ accessToken: process.env.HUBSPOT_TOKEN });

export async function upsertCRMContact({ phone, name, email, appointmentTime }) {
  const existing = await hubspot.crm.contacts.searchApi.doSearch({
    filterGroups: [{ filters: [{ propertyName: 'phone', operator: 'EQ', value: phone }] }],
  });

  const props = { phone, firstname: name?.split(' ')[0], lastname: name?.split(' ')[1], email,
                  last_appointment_booked: appointmentTime };

  if (existing.total > 0) {
    return hubspot.crm.contacts.basicApi.update(existing.results[0].id, { properties: props });
  }
  return hubspot.crm.contacts.basicApi.create({ properties: props });
}

Production Checklist

  1. Add a fallback transfer to a human agent for any unhandled intent or if tool calls fail — callers should never hit a dead end.
  2. Log every call with a full transcript and tool call history to your database. You'll need this for debugging and compliance.
  3. Set up a queue for webhook retries. Vapi will retry on 5xx — make your handlers idempotent using the call.id as a deduplication key.
  4. Monitor latency per-tool in your APM. Anything over 600ms is a conversation killer and needs caching or query optimization.
  5. Test with real phone calls before launch — TTS sounds different over telephony compression than in browser previews.

Production voice agents live or die by their fallback paths. Happy-path testing isn't enough — hammer the edge cases.

Retell as an Alternative

If you're running high-volume outbound campaigns, Retell's concurrency model and built-in retry logic make it a strong choice. The integration pattern is nearly identical — swap the SDK and adjust webhook field names. The biggest practical difference is that Retell gives you a built-in agent dashboard for non-technical stakeholders, which clients often appreciate.

For inbound receptionists where conversation quality matters most, I still lean toward Vapi — the latency is marginally better and the developer tooling is ahead. Benchmark both against your specific use case before committing.

Wrapping Up

AI voice agents have moved from demos to genuine business infrastructure. The stack is surprisingly approachable: Vapi handles the telephony complexity, GPT-4o handles the reasoning, and your Node.js webhook is where business logic lives. The hard work is in the integrations — fast, reliable, idempotent.

If you're building something similar or want to talk through an architecture, reach out. I'm always happy to dig into the details.