How to Easily Implement Text-To-Speech Functionality In Your Next JavaScript Project
No funky installations. Beginner-Friendly. Get it done with less than 5 lines of code.
Hey friends,
It has been a hot minute since we implemented something really cool. The last time was probably February 12 when you built a web scraper with me.
Today, I am going to quickly show you how you can easily implement Text-to-Speech functionality with less than 5 lines of code. I’m currently doing this for my senior project, MataChat, since one of the requirements is to add accessibility features. Feel free to check that out.
MataChat is being built with a team of university seniors and master students — and none of us knew how easy it was to implement Text-To-Speech functionality. Give me a few minutes to show you how easy it is to implement.
Let’s get to it.
Web Speech API
To get started, we do not need to install anything funky to our computer. We are simply going to use the Web Speech API; and to use this API, there is no heavy behind-the-scenes process. There is no need to import anything. You just use it!
Speech Synthesis Interface
We are going to get started by using Web Speech API’s interface called SpeechSynthesis. This is how we bring the interface into our project:
let utterance = new SpeechSynthesisUtterance();
OR
let utterance = new window.SpeechSynthesisUtterance();
Both achieve the same result. Pick one.
Basic Text-to-Speech
For text-to-speech, we want to convert text to speech. Obviously.
Let’s say, I want to convert the text “Hello World” into speech.
The first step is to create an utterance. Basically, let the interface know what you want to be vocalized. In our case, we want to say “Hello World”. Here are a few ways we can create an utterance. Pick one.
// Take 1
let utterance = new SpeechSynthesisUtterance("Hello World");//Take 2
let message= "Hello World";
let utterance = new SpeechSynthesisUtterance(message);//Take 3
let utterance = new SpeechSynthesisUtterance();
utterance.text = "Hello World";//Take 4
let utterance = new SpeechSynthesisUtterance();
utterance["text"]= "Hello World";
All of these will produce the same end result. Do whatever makes sense to you. Pick one.
By this point, we have an utterance. We told the interface what we want to be said. Now we have to tell the interface to start vocalizing our utterance. Basically, tell the interface to start talking. To do this, we will be using the interface’s speak function. It will look like the following:
speechSynthesis.speak(utterance);
Notice that the parameter of the speak function is our utterance. When this line is executed, “Hello World” will be converted to Speech.
Here are some other examples:
let utterance = new SpeechSynthesisUtterance("Hello There");
speechSynthesis.speak(utterance);let spokenWord= new SpeechSynthesisUtterance("Hello Adam");
speechSynthesis.speak(spokenWord);let message= new SpeechSynthesisUtterance("Hello Becky");
speechSynthesis.speak(message);
This is provided to show you that the variable that holds your utterance can be named anything you want (in case it was not obvious to our new developers). And despite the fact that it can be named anything you want, you should make sure that the name makes sense and is self-explanatory. If other people were to read your code OR you were to read your code a year later, your code should make sense. It should be readable and maintainable.
And there you have it!
CodePen
If you want to see a working version, check out this CodePen. It’s very simple and easy to understand.
When you click the button, you should hear “Hello World.”
Updating The Utterance
let utterance = new SpeechSynthesisUtterance("Hello World");
speechSynthesis.speak(utterance);
This is what we have thus far. Imagine after many lines of code, we want to update our utterance so we can say something else. This is how you can update the utterance:
let utterance = new SpeechSynthesisUtterance("Hello World");
speechSynthesis.speak(utterance);utterance.text = "Goodbye World." //updates value
speechSynthesis.speak(utterance);
Of course, you can also just create a new utterance; however, that would not be an efficient use for space in memory. Don’t waste space in memory.
Customizations
You can make a few customizations to the speech. You can change the pitch, volume, rate of speech, and even the voice. (MDN Documentation)Let’s check that out.
[1] Change the Pitch
If you want to change the pitch of the utterance as it's spoken, you can lower or raise the pitch. By default, the value of the pitch is at 1. The accepted range is between 0 and 2.
Here is an example:
let utterance = new SpeechSynthesisUtterance("Hello World");
utterance.pitch = 4; //changes pitch
speechSynthesis.speak(utterance);
[2] Change the Rate
If you want to change the rate at which the utterance is spoken, you can lower or raise the rate. (Basically how fast or slow the speaker talks.) By default, the value of the rate is at 1. The accepted range is between 0.1 and 10.
Here is an example:
let utterance = new SpeechSynthesisUtterance("Hello World");
utterance.rate= 4; //changes pitch
speechSynthesis.speak(utterance);
[3] Change the Volume
If you want to change the volume of the speech, you can lower or raise it. By default, the value of the volume is at 1. The accepted range is between 0 and 1.
Here is an example:
let utterance = new SpeechSynthesisUtterance("Hello World");
utterance.volume= 0.4; //changes pitch
speechSynthesis.speak(utterance);
[4] Change the Voice
If you want to change the voice of the speaker, you can change it. The interface provides an array of voices. You can retrieve this array of voices using speechSynthesis.getVoices()
where getVoices()
is a method of the speechSynthesis
interface.
Here is an example:
let utterance = new SpeechSynthesisUtterance("Hello World");
let voicesArray = speechSynthesis.getVoices();
utterance.voice = voicesArray[2];
speechSynthesis.speak(utterance);
If you are using CodePen to test out these different voices, you might not notice a change in the voices right away. Make sure you save the CodePen (when you change the voice) AND give it maybe 10 or 20 seconds to load the new voice. I don’t know why CodePen doesn’t fix the voice immediately, but it is what it is.
Here is the CodePen. Change the values. Add some new code. Maybe create an input box that takes in text and converts it to speech. Play around with it. See what happens.
Thank you for reading. It has been fun!
If you want to learn how to use other Web APIs, give me a follow. We’re definitely going to talk about converting Speech-to-Text and getting a user’s location. I used the Web API’s Geolocation API for my weather application a few months ago. Should I do a walkthrough or maybe an improved second version?
There are lots of fun things to learn.
Thank you for reading. Have a good day. Keep coding!
More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter and LinkedIn. Join our community Discord.