Gems is an automatic transcription software for qualitative researchers. It allows you to transcribe interviews conducted in Spanish. Then AI helps you extract and label quotes from transcripts, visually group and distill main themes, write and title notes for your research reports. It replaces transcription services, note-taking apps, spreadsheets, and whiteboard tools.
How to convert Spanish audio to text?
- Open Gems
- Click on the icon or press A
- Click on "Choose recording language" and pick "🇪🇸 Spanish" from the list
- Drag & drop an mp3, wav, or m4a file in the dashed area
- Alternatively you can click in the dashed area to look for audio files in your local folders
- Click on “Transcribe”
How do you transcribe Spanish audio?
Gems uses Whisper to perform automated transcription. Whisper is an automatic speech recognition (ASR) open source model developed by OpenAI. ASR technology is designed to convert spoken language into written text. OpenAI's Whisper ASR system has been trained on a vast amount of multilingual and multitask supervised data collected from the internet, making it capable of recognizing and transcribing spoken words in multiple languages and across various domains and accents.
How accurate is the Spanish transcription?
For Spanish, 97% of the words are correctly transcribed.
Whisper's performance varies widely depending on the language. According to the official repo the Word Error Rate (WER) for Spanish is 3%, meaning that 3% of the words in the ASR output differ from those in the reference transcript.
WER is calculated as the sum of substitutions (words that were replaced in the ASR output), deletions (words that were missing in the ASR output), and insertions (extra words that were added in the ASR output), divided by the total number of words in the reference transcript.
WER = (Substitutions + Insertions + Deletions) / Number of Words Spoken
For example, if the reference transcript has 1000 words, and the ASR system produces an output with 16 substitutions, 9 deletions, and 5 insertions, the WER calculation would be:
WER = (16 + 9 + 5) / 1000 = 3%
While WER measures the percentage of errors, Word Accuracy Rate (WAR) measures the percentage of correctly recognized words in the output. The formula for calculating WAR is straightforward:
WAR = 100% – WER
Hence for Spanish, the Word Accuracy Rate is 100% – 3% = 97%.
How long does it take to transcribe my audio files in Spanish?
Transcription time depends on the file audio length. With Gems, it takes around 2 minutes and 40 seconds to transcribe 1 hour of Spanish audio.
Can I upload Spanish audio files from other platforms?
We're working on a mobile app to let you upload and transcribe recorded audio from your phone.
Can I edit transcript?
Yes! Click on the icon to enter transcription editing mode and fix misspelled or missing words.
To style the transcript, select text and choose different formatting options such as color, bold, italics, underline, or strikethrough. These can be used to add emphasis to specific parts of the transcription.
To rename speakers, click a speaker name and then click "Rename". It will ask to rename the single occurrence or all occurrences for the speaker. This is especially useful during the first edit to quickly replace the default speaker names.
In which formats can I export Spanish transcription?
Click on the icon to export your Spanish transcription in Microsoft Word, Markdown, or PDF formats.
How much it costs?
For more information, please visit our pricing page.
Is there an app to transcribe audio to text free?
There are several spanish transcription services with minimal functionalities available on the market. Gems is the best option if you need a more sophisticated transcribing tool with the ability to further analyse the recordings.
Any questions? Contact us