|

|  How to Implement Twilio Media Streams API for Voice Transcription in JavaScript

How to Implement Twilio Media Streams API for Voice Transcription in JavaScript

October 31, 2024

Discover how to use the Twilio Media Streams API for real-time voice transcription with JavaScript in this step-by-step guide. Ideal for developers at any level.

How to Implement Twilio Media Streams API for Voice Transcription in JavaScript

 

Set Up Environment for Twilio Media Streams

 

  • Ensure you have Node.js installed. Use the latest stable version for best compatibility with Twilio's APIs.
  •  

  • Install the Twilio Node.js SDK by running the command in your project directory:

 

npm install twilio

 

Configure Your Twilio Client

 

  • Import the Twilio module and initialize the client with your Account SID and Auth Token.

 

const twilio = require('twilio');
const client = new twilio('YOUR_ACCOUNT_SID', 'YOUR_AUTH_TOKEN');

 

Set Up WebSocket Server

 

  • Prepare a WebSocket server to handle incoming media streams. You’ll need an HTTP server and a WebSocket server.
  • Use the 'ws' library for WebSocket communication.

 

const WebSocket = require('ws');
const http = require('http');

const server = http.createServer();
const wss = new WebSocket.Server({ server });

wss.on('connection', (ws) => {
    console.log('New WebSocket connection established');
    
    ws.on('message', (message) => {
        console.log(`Received message: ${message}`);
        // Transcription logic goes here
    });
    
    ws.on('close', () => {
        console.log('WebSocket connection closed');
    });
});

server.listen(8080, () => {
    console.log('Listening on port 8080');
});

 

Integrate Twilio Media Streams with WebSocket

 

  • Configure your Twilio Voice response to stream the media to your WebSocket server.

 

const express = require('express');
const VoiceResponse = twilio.twiml.VoiceResponse;

const app = express();
app.post('/twilio-media-stream', (req, res) => {
    const response = new VoiceResponse();
    const connect = response.connect();
    connect.stream({
        url: 'wss://your-ngrok-url/ws'
    });
    
    res.type('text/xml');
    res.send(response.toString());
});

app.listen(3000, () => {
    console.log('Express server listening on port 3000');
});

 

Utilize Ngrok for Local Testing

 

  • Use Ngrok to expose your local server to the internet and test the WebSocket connection with Twilio.
  • Start Ngrok by running the following command:

 

ngrok http 3000

 

Handle Incoming Audio and Transcription

 

  • Process audio data received over WebSocket and integrate with a transcription service like Google’s Speech-to-Text API.

 

const speech = require('@google-cloud/speech');
const client = new speech.SpeechClient();

async function transcribeAudio(message) {
    const request = {
        audio: {
            content: message
        },
        config: {
            encoding: 'MULAW',
            sampleRateHertz: 8000,
            languageCode: 'en-US'
        }
    };

    const [response] = await client.recognize(request);
    const transcription = response.results
        .map(result => result.alternatives[0].transcript)
        .join('\n');
    console.log(`Transcription: ${transcription}`);
}

wss.on('connection', (ws) => {
    ws.on('message', async (message) => {
        await transcribeAudio(message);
    });
});

 

Final Testing

 

  • After setting everything up, make a test call to your Twilio number and ensure that audio is streamed through your WebSocket and transcribed appropriately.
  • Debug any issues that arise during the process, ensuring all WebSocket messages are being received and processed correctly.