How to Transcribe Interviews: A Guide to Speaker Separation
June 7, 2025


When conducting an interview, transcribing the recorded content is crucial.
When you want to publish interview content as a web or print article, or use it as reference material, transcription is the first step.
However, transcribing interview content manually by typing it out can be very time-consuming and laborious.
This article thoroughly explains how to transcribe interview content smoothly and accurately.
We recommend AI transcription services, which have rapidly improved in performance and functionality in recent years.
Among them, it's particularly recommended to choose a service with a "speaker separation function."
There are many convenient features that allow even beginners to easily transcribe interview content into text. Why not try it out with the help of this article?
What is Interview Transcription?
Transcribing audio content into text is an essential task after conducting an interview.
What needs to be done when transcribing an interview?
How to Transcribe Interview Content?
To use interview content for web or print articles, research, or academic papers, it needs to be transcribed into written form.
In an interview, the interviewer (listener) asks questions to the interviewee (speaker), and the interview progresses with a conversation between the two.
*Interviews may also be conducted with three or more people simultaneously.
Unlike other types of transcription, such as lectures or dictation, interview transcription requires transcribing the content spoken by multiple people separately (speaker separation), which can be tricky for beginners.
Types of Interview Transcription
Interview transcription has three stages, depending on how much the audio content is refined:
- Verbatim transcription
- Clean verbatim
- Edited transcription
Let's look at each.
Verbatim Transcription
Verbatim transcription is the stage where the spoken content is transcribed exactly as it is.
This includes filler words like "um," "uh," "hmm," and even speech errors, transcribing everything exactly as spoken.
Interviewer: So, first of all, um, I'd like you to tell me about the product you recently developed.
Speaker: It's an innovative smart home device, I guess. Um, it was born out of customer demand to improve, uh, daily, daily life convenience, that was the trigger.
Interviewer: Oh, wow. So, what about the differences or advantages compared to competitors?
Speaker: When we did competitive analysis, um, there were already existing devices in the market, and, um, we thoroughly investigated them, right?
As shown, the content spoken during the interview is transcribed exactly as it is.
Clean Verbatim
Clean verbatim involves removing unnecessary parts like filler words ("um," "uh," "hmm") and speech errors to make the text easier to read.
This stage is commonly used in interview transcription because it retains subtle nuances while improving readability.
Interviewer: First of all, I'd like you to tell me about the product you recently developed.
Speaker: It's an innovative smart home device. It was born out of customer demand to improve daily life convenience.
Interviewer: What about the differences or advantages compared to competitors?
Speaker: During competitive analysis, there were already existing devices in the market, and we thoroughly investigated them.
As you can see, by removing unnecessary parts, the transcribed content is much more readable than a verbatim transcription.
Edited Transcription
Edited transcription is the process of further refining the clean verbatim content into its final form.
- Converting spoken language into written language
- Adjusting the writing style to suit the purpose, such as for business articles or informal articles
These tasks are performed.
This is often the final step when interview content is prepared for publication as an article.
Interviewer: Could you please tell me about the product you recently developed?
Speaker: It is an innovative smart home device. It was created based on customer feedback to enhance daily convenience.
Interviewer: Could you explain the differences and advantages compared to competitors?
Speaker: In our competitive analysis, we thoroughly researched existing devices in the market.
In this example, the content has been refined from a casual conversational tone during the initial interview to a style suitable for a business-related interview article.
Methods for Interview Transcription
Here are some methods for transcribing interview content:
- Using an AI transcription service
- Manually typing out the transcription yourself
- Hiring a professional transcriptionist
The most recommended method among these is to use an AI transcription service.
AI transcription services can transcribe content much faster than typing it out yourself or hiring a professional transcriptionist.
Moreover, by using advanced AI, transcription can be completed with "speaker separation," meaning the content is already divided by who spoke it.
For example, for 1 hour of audio:
AI Transcription Service | Completed in about 10 minutes (with speaker separation) |
Manual Work | Requires 5-6 hours |
Professional Transcriptionist | Delivery in about 3 days (additional time may be needed for speaker separation) |
As you can see, AI transcription services are the fastest way to transcribe.
AI technology is rapidly advancing, so the transcription accuracy is also excellent.
Choosing an AI Transcription Service for Interviews
When choosing an AI transcription service, you should check if it has a "speaker separation function."
As mentioned earlier, the speaker separation function is a feature that transcribes content for each speaker.
With an AI transcription service that has this function, you won't need to edit the transcribed content to separate the interviewer's and interviewee's parts.
Recommended Service with Timecode Display: "Mojiokoshi-san"
"Mojiokoshi-san" is a service perfect for transcribing interview content.
Mojiokoshi-san displays timecodes, so time information is added to the transcription results, allowing you to check when specific statements occurred in the audio.
Even in conversations with multiple speakers, you can grasp the approximate speaker changes by referring to the timecodes.
This feature allows for quick responses when you want to "create an interview article" or "summarize interview research content."
Furthermore, "Mojiokoshi-san" allows for up to 3 minutes of transcription for free, with no registration or login required.
If you're looking for a way to transcribe interviews, why not experience AI transcription with "Mojiokoshi-san" first?
How to Transcribe Interviews with "Mojiokoshi-san"
Now, let's specifically explain the process of transcribing interview content using "Mojiokoshi-san."
1. Record Interview Content During the Interview
First, record the interview content when conducting the interview.
You can use a smartphone or a dedicated IC recorder for recording.
However, if possible, using dedicated equipment such as a condenser microphone or lavalier microphone will allow you to record in higher quality.
*AI speech recognition is highly capable, so some background noise is usually not an issue, but better audio quality will result in even higher quality transcription.
2. Open the Top Page of "Mojiokoshi-san"
How to Use Mojiokoshi-san
1. Upload
Drag and drop or select the file you want to transcribe.
For drag and drop, you can place the file anywhere on the screen.
This prepares your file for upload. A preview will appear, allowing you to confirm the file.
The following file types are supported:
- [Image Files]
File size: Less than 10MB
Supported file formats: .jpg .jpeg .png .webp
*If the text is sideways or upside down, transcription may not be possible, so please correct the orientation. - [Document Files]
File size: Less than 50MB
Supported file format: .pdf - [Audio Files]
File size: Less than 1GB
Audio duration:
・Basic Plan: Within 90 minutes
・Value Plan: Within 3 hours
・Premium Plan: Within 5 hours - Supported file formats: .mp3 .wav .wma .m4a .aifc .flac .aac .aiff .aifc
- [Video Files]
File size: Less than 1GB
Audio duration:
・Basic Plan: Within 90 minutes
・Value Plan: Within 3 hours
・Premium Plan: Within 5 hours
Supported file formats: .mp4 .mov .avi /flv .mkv .webm .wmv .3gp
*Multiple file uploads are not supported. Please upload one file at a time.
Uploading unsupported file formats or files exceeding the maximum duration will result in an upload error.
You can enable uploads by reducing the file size, converting the file, or splitting the file.
Example: Convert an M4A file (video) to an MP3 file (audio only)
Example: Split a 3-hour MP3 file into three 1-hour files.
*When splitting by time, splitting at 90 minutes can lead to longer transcription times or larger file sizes, so it's recommended to split into approximately 1-hour segments if possible.
For detailed instructions, please refer to "How to Make Unsupported Files Compatible" here: "How to Make Unsupported Files Compatible".
2. Select Language
Select the language you want to transcribe from the dropdown menu and click the "Transcribe" button at the bottom of the screen.
*Audio containing multiple languages cannot be transcribed correctly.
3. Upload
Once settings and file selection are complete, click "Transcribe."
Clicking the upload button simultaneously starts the file upload and transcription process.
For images: Transcription will be available within a few seconds to a few minutes after starting.
5. Transcription Complete
Transcription is complete.
If you keep the top page open, the transcription results will be displayed on the top page.
If you close the page, you will be notified by email when the transcription is complete.
Alternatively, click the "Check History button" or "History" in the menu to open the history page.
■ "Check History button"
■ "History" in the menu
6. Open the Transcription Results Page
The history page has opened.
■ AI transcription service "Mr. Transscription"
"Mr. Transcription" is an online transcription tool that can be used from zero initial cost and 1,000 yen per month (* free version available).
- Supports more than 20 file formats such as audio, video, and images
- Can be used from both PC and smartphone
- Supports technical terms such as medical care, IT, and long-term care
- Supports creation of subtitle files and speaker separation
- Supports transcription in approximately 100 languages including English, Chinese, Japanese, Korean, German, French, Italian, etc.
To use it, just upload the audio file from the site. Transcription text is available in seconds to tens of minutes.
You can use it for free if you transcribe it for up to 10 minutes, so please try it once.
Email: mojiokoshi3.com@gmail.com
Transcription for audio / video / image transcription. It is a transcription service that anyone can use for free without installation.
- What is Mr. Transcription?
- Transcript images, sounds, and videos with Mr. Transcription
- Free registration
- Rate plan
- manual