Looking for an easy way to transcribe audio directly from your browser without installing complex software? WhisperWeb lets you record audio and get accurate transcriptions using OpenAI’s powerful Whisper model — all through a simple web interface.
WhisperWeb is a web-based service that allows you to record audio from your browser and transcribe it using OpenAI’s Whisper model. No external software installation required on the client side.
What You’ll Learn
- How to install and set up the WhisperWeb server
- Configure Whisper models (tiny, base, small, medium)
- Record and transcribe audio from your browser
- Choose the right model for your needs
Why Choose WhisperWeb?
📖 1. No External Software Needed
Record audio and transcribe directly from your browser. The client only needs an HTML file — no installation required on client machines.
📖 2. High Accuracy
Powered by OpenAI’s Whisper model, which outperforms most other speech-to-text solutions.
📖 3. Multiple Model Options
Choose from different Whisper models based on your needs:
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| tiny | 39 MB | Fast | Lower | Quick transcription |
| base | 74 MB | Medium | Good | General use |
| small | 244 MB | Slow | Better | Detailed work |
| medium | 1.5 GB | Slowest | Best | Professional transcription |
Start with “tiny” or “base” if you have limited disk space. Use “medium” for the most accurate transcriptions.
📖 4. Multi-Language Support
Whisper supports 99+ languages and can also translate transcribed text.
My Journey
I used Google Translate for transcription for a long time. But I found it inaccurate and poor at understanding audio. When OpenAI released Whisper, I realized it was accurate enough to translate or transcribe. Since I wanted a web-based solution, I built WhisperWeb — a simple interface to use Whisper from any browser.
Installation
Step 1: Install Python Dependencies
pip install flask flask-cors openai-whisper
The first time you run WhisperWeb, it will automatically download the Whisper model files (39MB - 1.5GB depending on model). Ensure you have a stable internet connection.
Step 2: Clone the Repository
git clone https://gitlab.com/krafi/whisperweb.git
cd whisperweb
Step 3: Run the Server
python server.py
The server will start and display something like:
Server running on http://0.0.0.0:5000
Public IP: xx.xx.xx.xx
Note down your public IP address — you’ll need it to connect from other devices on your network.
How to Use
Step 1: Access the Web Interface
Open the index.html file from the repository in your web browser. You can either:
- Open it directly in your browser:
file:///path/to/whisperweb/index.html - Or serve it locally:
python -m http.server 8000
Step 2: Configure Server Address
In the web interface, enter your server’s IP address and port:
http://YOUR_SERVER_IP:5000
Step 3: Select Whisper Model
Choose a model from the dropdown:
- tiny - Fastest, uses least resources
- base - Good balance
- small - Better accuracy
- medium - Best accuracy (1.5GB download)
Click “Set Model” to load it. On first run, the model will be downloaded.
Step 4: Record and Transcribe
- Click the record button to start recording
- Speak into your microphone
- Click stop when finished
- Click “Submit” to transcribe
- View and copy the transcription
API Endpoints
If you want to integrate WhisperWeb with your own applications:
| Endpoint | Method | Description |
|---|---|---|
/ping | GET | Check if server is running |
/upload | POST | Upload audio file for transcription |
/set_model | POST | Set the Whisper model |
Example API Call
# Set model
curl -X POST http://localhost:5000/set_model -H "Content-Type: application/json" -d '{"model": "base"}'
# Upload audio for transcription
curl -X POST -F "file=@audio.wav" http://localhost:5000/upload
Troubleshooting
📖 Microphone not working?
- Check browser permissions — ensure microphone access is allowed
- Try using a different browser (Chrome or Firefox recommended)
- On Android, try Soul Browser if local HTML files have microphone issues
📖 Server won't start?
- Check if port 5000 is already in use:
lsof -i :5000 - Try running with a different port by modifying
server.py
📖 Model download fails?
- Ensure you have a stable internet connection
- Check disk space (medium model needs 1.5GB)
- Try a smaller model first (tiny or base)
📖 Transcription is slow?
- Try a smaller model (tiny or base)
- Reduce audio length
- Close other resource-heavy applications
Use Cases
- Content Creators: Transcribe video and audio content
- Students: Record lectures and transcribe notes
- Journalists: Transcribe interviews quickly
- Developers: Add speech-to-text to applications
- Researchers: Transcribe interviews and focus groups
Source Code
View the full project and contribute: WhisperWeb on GitLab
WhisperWeb transforms how you convert spoken words into written text. With its simple setup, powerful Whisper models, and browser-based interface, it’s perfect for anyone needing reliable audio transcription.
Try WhisperWeb today and experience accurate audio transcription!
Discussion
0 commentsJoin the Discussion
Sign in to post comments and join the conversation.
No comments yet. Be the first to share your thoughts!