Transcribe Meetings & Interviews Easily with Speaker Diarization
June 7, 2025


In such cases, we recommend using an AI transcription service with a "speaker separation" feature.
Speaker separation is the process of transcribing audio where multiple people are speaking simultaneously, separating the text by each individual speaker.
The latest speech recognition AI can distinguish the unique characteristics of each speaker's voice, allowing transcription results to be outputted in a way that identifies who is speaking.
However, speaker separation isn't available in all AI transcription services.
Speaker separation is a particularly advanced feature within AI transcription.
Therefore, when you want to use speaker separation for transcribing meetings or interviews, it's important to check if the feature is included.
This article will explain how to transcribe using the speaker separation feature and introduce recommended AI transcription services that offer it!
By using the speaker separation feature, you can reduce the effort of creating meeting minutes by more than half!
Why not speed up your transcription work by referring to this article?
AI Transcription Services with Speaker Separation
Notice: Speaker Separation (Speaker Recognition) Feature Temporarily Paused
The speaker separation (speaker recognition) feature of "Mojiokoshi-san" is currently temporarily paused.
We plan to reactivate the feature by mid-2025.
We apologize for any inconvenience this may cause until then, and we appreciate your continued support for "Mojiokoshi-san."
The recommended AI transcription service for transcribing using the speaker separation feature is "Mojiokoshi-san."
"Mojiokoshi-san" is an AI transcription service that uses the latest speech recognition AI to provide fast and highly accurate transcriptions.
You can use two types of AI transcription engines: AmiVoice and PerfectVoice. By choosing "AmiVoice," you can utilize the speaker separation feature.
When you use the speaker separation feature, the transcription is separated by each speaker, as shown below.
Of course, you can also download the transcription results!
As you can see, the downloaded file also transcribes by each speaker.
When transcribing meeting minutes, security is a common concern, but since it's a Japanese-developed transcription service, you can feel secure in that regard.
If you're looking to transcribe using the speaker separation feature, "Mojiokoshi-san" is highly recommended!
You can transcribe for up to 1 minute for free without registration or login, so why not try "Mojiokoshi-san" first?
How to Transcribe Using Speaker Separation
So, how exactly do you transcribe using the speaker separation feature?
Let's immediately look at the method and flow of transcription (tape transcription) using the AI transcription service "Mojiokoshi-san." explained!
1. Record the content of your meeting or interview
First, record the content during your meeting or interview.
You can use your smartphone's voice memo app or a dedicated IC recorder app for recording.
As a point of caution, make sure to place your smartphone or IC recorder in a location where it can clearly record the voices of everyone speaking.
AI transcription engines are high-performance and can transcribe even with some noise, but it's always recommended to record with the best possible sound quality.
If available, using specialized equipment like a condenser microphone or lavalier microphone is also recommended.
2. Open the 'Mojiokoshi-san' top page
'Mojiokoshi-san' is a service used through web browsers like Google Chrome or Safari.
Files are uploaded from the top page.
Open the 'Mojiokoshi-san' top page from this link.
*'Mojiokoshi-san' can be used from any environment (PC, smartphone, tablet) as long as you have an internet connection!
3. Select "AmiVoice"
'Mojiokoshi-san' allows you to use two types of AI transcription engines: "AmiVoice" and "PerfectVoice". However, when using the speaker diarization feature, you must use "AmiVoice".
Select "AmiVoice" from the checkbox for choosing the AI transcription engine.
4. Select language and number of speakers
When you select AmiVoice, a dropdown menu for choosing the number of speakers will appear.
Select the number of people who spoke during the meeting or interview.
There is also a language selection menu, but it is set to "Japanese" by default, so if your audio is in Japanese, you can leave it as is.
5. Select and upload file
Upload your file.
You can select the file by clicking or tapping "Select".
When using from a PC, drag and drop is also possible.
After selecting the file, click the "Transcribe" button to start the upload.
*Keep the browser screen open during upload.
6. Start transcription
Once the upload is complete, transcription will automatically begin.
Once "Processing. Please wait." is displayed, you can close the browser screen.
*If you are using it for free without registration/login, you must keep the screen open.
*If you close the screen, a transcription completion notification email will be sent to your registered email address.
7. Transcription complete
Check the transcription results.
If you closed the screen
If you closed the screen, open the link provided in the email
By clicking "History" from the menu on the 'Mojiokoshi-san' website,
you can view the transcription results.
If you kept the top page open
If you kept the top page open, the screen will switch and display the transcription results like this.
However, even in this case, to check the speaker-separated transcription results, you need to open the detailed transcription results from the "History" page.
Click the "Check History" button to navigate to the history page.
History Page
When you open the history page, you'll see a list of transcription results like this.
Click on the file name in the right column to open the detailed screen.
When you check the transcription results on the detailed screen, you'll see that they are speaker-separated for each person who spoke, like this.
8. Downloading the File
To download the transcribed content, click the "Download" button.
From the menu that opens after clicking the button, click "Speaker Separation".
This will allow you to download the speaker-separated transcription results in text file format.
When you open the downloaded file, you'll see the transcription separated by speaker, like this.
This completes the transcription using the speaker separation feature of the AI transcription service 'Mojiokoshi-san'.
With 'Mojiokoshi-san', anyone can easily and quickly transcribe meeting minutes or interviews.
Why not try the speaker separation feature of 'Mojiokoshi-san' yourself?
4 Transcription Services with Speaker Separation
Speaker separation is a particularly advanced feature among AI-powered transcription services.
Therefore, there are fewer AI transcription services that offer speaker separation.
Let's briefly introduce some of the services available.
1. Mojiokoshi-san
The AI transcription service introduced in this article, 'Mojiokoshi-san,' is the most recommended AI transcription service if you want to use speaker diarization.
'Mojiokoshi-san' utilizes two types of cutting-edge AI:
- AmiVoice: Speaker diarization available, high-speed transcription in about the same time as the audio file length.
- PerfectVoice: Supports 100 languages including Japanese and English, ultra-high-speed transcription in about 10 minutes for long files.
Since it's a service that transcribes by uploading recorded audio files, the transcription accuracy is outstanding!
※Some AI transcription services offer real-time transcription, but real-time transcription inevitably suffers from lower accuracy due to processing limitations. In contrast, services like 'Mojiokoshi-san' that use file uploads have almost no speech recognition errors.
Even when using the speaker diarization feature, processing is very speedy, taking about the same amount of time as the audio file itself!
Furthermore, within the same plan, you can transcribe foreign languages like Japanese and English, or use the ultra-high-speed transcription feature (under 10 minutes) if speaker diarization is not needed.
You can transcribe up to 1 minute without registration or login, so why not experience the AI transcription accuracy of 'Mojiokoshi-san' first?
2. User Local Voice Meeting Minutes System
The User Local Voice Meeting Minutes System is a real-time AI transcription service accessible via web browser.
With just one microphone, it can listen to audio in real-time and transcribe it with speaker diarization, identifying each speaker.
As its name suggests, it's a very simple service specialized for meeting minutes, and its clear usability, stemming from its single function, might be its appeal.
User Local Voice Meeting Minutes System
3. Sloos
Sloos is also an AI transcription service specialized for meeting minutes.
This is also a very simple AI transcription service used via a web browser. Similar to the "User Local Voice Meeting Minutes System," it can transcribe audio recorded with a single microphone, identifying each speaker.
Its ability to transcribe accurately even with some background noise is another appealing feature, characteristic of AI transcription services.
4. Group Transcribe
Group Transcribe is a smartphone app for web conferences provided by Microsoft for iPhone.
When all participants in a meeting install this app and conduct a web conference, the audio is transcribed for each speaker.
This app actually achieves speaker diarization using a different method than the other services introduced.
However, the principle is straightforward.
By leveraging the fact that everyone needs to install the app on their respective smartphones, it transcribes the content on each individual smartphone, thereby separating the transcription results by speaker.
It's an AI transcription service that uses an ingenious "Columbus's Egg" idea, unique to smartphone apps.
If you're using a speaker diarization feature, "Mojiokoshi-san" is recommended.
If you had to choose one AI transcription service from those introduced so far, we recommend "Mojiokoshi-san."
"Mojiokoshi-san" is a service that uses uploaded files rather than real-time processing, resulting in exceptional transcription accuracy!
Advanced transcriptions using the speaker diarization feature can also be completed quickly in a short amount of time.
Why not try the speaker diarization feature of "Mojiokoshi-san" yourself?
Speed up meeting and interview transcriptions with speaker diarization
Until a few years ago, summarizing meeting minutes or interview content by speaker was even more cumbersome than the transcription itself.
However, with AI transcription services available now, you can eliminate that hassle entirely!
Why not try using the speaker diarization feature of an AI transcription service like "Mojiokoshi-san" for convenient, highly accurate, and fast transcriptions?
■ AI transcription service "Mr. Transscription"
"Mr. Transcription" is an online transcription tool that can be used from zero initial cost and 1,000 yen per month (* free version available).
- Supports more than 20 file formats such as audio, video, and images
- Can be used from both PC and smartphone
- Supports technical terms such as medical care, IT, and long-term care
- Supports creation of subtitle files and speaker separation
- Supports transcription in approximately 100 languages including English, Chinese, Japanese, Korean, German, French, Italian, etc.
To use it, just upload the audio file from the site. Transcription text is available in seconds to tens of minutes.
You can use it for free if you transcribe it for up to 10 minutes, so please try it once.
Email: mojiokoshi3.com@gmail.com
Transcription for audio / video / image transcription. It is a transcription service that anyone can use for free without installation.
- What is Mr. Transcription?
- Transcript images, sounds, and videos with Mr. Transcription
- Free registration
- Rate plan
- manual