Audio Files That AI Transcription Struggles With (Examples)

June 8, 2025

Audio Files That AI Transcription Struggles With (Examples) | AI Transcription Service - Mr. Transcription

Mojiokoshi-san is an AI transcription service that uses AI for transcription. The AI leverages technology from Google and AmiVoice (a Japanese speech recognition system provider).

AI transcription has strengths and weaknesses depending on the audio file being transcribed.

This article will introduce audio files that AI transcription struggles with, along with examples and reasons why.

By understanding these examples and reasons, you can create audio files that are easier for AI transcription to process. Please use this as a reference.

cat

Audio Files That AI Transcription Struggles With

  1. Audio that is too quiet or recorded with the microphone too far away, making it difficult to hear.
  2. Audio where excessive noise drowns out the speech.
  3. Audio where indoor reverberation blurs the sound.
  4. Audio containing music, such as song lyrics.
  5. Audio with no speech.
  6. Audio with strong dialects.
  7. Audio where multiple languages are mixed.
  8. Audio that is difficult for a human to hear.

We do not recommend using Mojiokoshi-san for such audio files, as accurate transcription cannot be guaranteed.

Related article >6 Recording Tips for Accurate AI Transcription

If you report an error for an audio file that AI transcription struggles with, it will be rejected, and we cannot refund your transcription time.

In particular, many error reports are due to issues within the file itself. Let's look at some real examples.

No Speech Recorded in the File (Continuous Noise or Silence)

If you upload an audio file without checking its contents, you might find that the recording failed, resulting in "silent recording due to failure" or "microphone disconnected during recording, leading to only noise and no speech in the middle."

The image above shows the transcription result after uploading a file that contained only noise.

Mojiokoshi-san uses AI for transcription. The AI attempts to transcribe as much as possible, even in noisy sections.

This problem is particularly likely to occur when using PerfectVoice with a file that has more than one minute of noise or silence at the beginning.

If you get strange transcription results, such as "aaaaaaa" or "mmmmmmm," or if the same phrase is repeated many times, please check the contents of your file.

You can prevent this problem by cutting out the initial noise or silence.

Even if you transcribe such files and consume your transcription time, we cannot refund the time.

1. Audio that is too quiet or recorded with the microphone too far away, making it difficult to hear.

dog

When reviewing audio that resulted in errors, it's most common to find that...

This is an example of such a case.

Even if transcription is possible, the accuracy will be low. Therefore, we do not recommend using AI transcription for audio files where the speaker's volume is low or the microphone input is too far away to pick up the sound properly.

Example: Recording a lecture with a smartphone from the back of a lecture hall.

2. Audio with excessive noise that drowns out the sound

Noise is a formidable enemy for AI transcription!

cat

Audio with noise often has the speaker's voice drowned out, making it "difficult for humans to hear" as well.

Wind noise is also a strong enemy, often overlooked during recording.

If something is "difficult for humans to hear," there's no way AI can transcribe it with high accuracy.

Example: Recording in a crowded environment like an outdoor cafe, with dish clatter or music (BGM).

Wind noise from breathing due to the microphone being too close to the mouth.

3. Audio blurred by room reverberation

Room reverberation is surprisingly easy to overlook when listening with your own ears.

When recorded, reverberation often sounds muffled or makes voices seem distant.

Reverberation is particularly common in square rooms or rooms with minimal furnishings.

Example: Recording a conversation of multiple people seated in various locations in a conference room with a single IC recorder.

Recording a presentation held in a conference room where the sound is blurred by room reverberation.

4. Audio containing music, such as song lyrics

dog

AI transcription cannot transcribe songs.

Some people might think about downloading songs without lyrics from YouTube and trying to transcribe them.

However, AI transcription is primarily designed for transcribing conversations.

It cannot transcribe songs.

Example: Transcribing a song downloaded from YouTube.

5. Files with no audio

Silent audio files cannot be transcribed.

cat

Naturally, audio files with no sound cannot be transcribed.

Perhaps you tried to transcribe without realizing the microphone input was set to zero.

Before attempting transcription, please check the audio file yourself to confirm that "sound is being properly input" before trying AI transcription.

Example: Unaware that microphone input is not working

6. Strong Dialects

dog

AI transcription struggles with dialects.

As the name suggests, "AI" performs AI transcription. AI is trained to transcribe based on standard language. Therefore, it struggles with transcribing dialects.

While it's not impossible for AI to transcribe dialects, even with excellent recording conditions and clear, slow speech,

the transcription may be incomplete or only a small portion may be transcribed.

Example: Recording audio for meeting minutes in a rural area → Speaker has a strong dialect

Example: In the case of Japanese dialects

Even with audio recorded by an announcer with clear pronunciation in a good recording environment like television, dialects are difficult to transcribe accurately.

7. Audio that is difficult for humans to hear

What's hard for humans to hear is even harder for AI.

cat

When you hear "AI," it sounds incredibly versatile and capable of performing tasks better than humans.

However, AI still falls short compared to humans.

When adaptability is required, the accuracy of AI's work drops significantly.

In the case of AI transcription, if humans find the audio "difficult to hear," the accuracy of AI transcription will be very low.

When a human transcribes, even if it's somewhat difficult to hear, they can infer what was said from the context of the conversation and the surrounding flow.

However, AI transcription can only transcribe "what it hears." It cannot supplement or infer like a human.

How to achieve highly accurate transcription with AI transcription?

dog
But I used AI for transcription, and it was accurate.
There are key tips for achieving highly accurate AI transcription!
cat

To achieve highly accurate AI transcription, it's crucial to incorporate several techniques during recording.

This article introduces optimal recording tips for AI transcription.

6 Optimal Recording Tips for AI Transcription

  1. High-quality microphone
  2. Proper microphone placement
  3. Ensure a quiet recording environment
  4. Clear speaker articulation
  5. One person speaks at a time
  6. Conduct a recording test

For more details >>6 Recording Tips for Highly Accurate Transcription

Effectively Utilize AI Transcription and Human Transcription

When comparing AI transcription and human transcription, AI transcription is significantly more cost-effective.

Specifically, among AI transcription services, "Mojiokoshi-san" is incredibly affordable, likely the lowest in the industry.

AI transcription service Mojiokoshi-san is the cheapest in the industry

However, for audio files where AI transcription struggles, it's definitely more reliable to request human transcription.

But isn't human transcription expensive?

If you're wondering about that, please also check out this article.

Related article >>What is the market rate for outsourced audio transcription? [Tips for requesting cheaply explained]

As introduced this time, AI transcription has its strengths and weaknesses.

For audio files that AI transcription excels at, use "AI transcription."

For audio files that it struggles with:

  • Try transcribing with AI transcription.
    → If it doesn't work, then use "human transcription."

This approach is recommended.

Since AI transcription is low-cost, even for audio files that AI transcription might struggle with, you can try it with a "nothing to lose, lucky if it works" mindset. Sometimes, it might transcribe successfully.

Mojiokoshi-san, the AI transcription service, offers the first minute of transcription for free. You can check the transcription accuracy. Please give it a try!

■ AI transcription service "Mr. Transscription"

"Mr. Transcription" is an online transcription tool that can be used from zero initial cost and 1,000 yen per month (* free version available).

  • Supports more than 20 file formats such as audio, video, and images
  • Can be used from both PC and smartphone
  • Supports technical terms such as medical care, IT, and long-term care
  • Supports creation of subtitle files and speaker separation
  • Supports transcription in approximately 100 languages ​​including English, Chinese, Japanese, Korean, German, French, Italian, etc.

To use it, just upload the audio file from the site. Transcription text is available in seconds to tens of minutes.
You can use it for free if you transcribe it for up to 10 minutes, so please try it once.

It is "Mr. Transcription" who can easily transcribe from audio, video, and images. Transcription allows you to transcribe for up to 10 minutes for free. You can copy, download, search, delete, etc. the transcribed text. You can also create subtitle files, which is ideal for transcription of interview videos.
HP: mojiokoshi3.com
Email: mojiokoshi3.com@gmail.com
|
Related article