Does Sonicribe work offline?

Yes, Sonicribe works 100% offline. All voice processing happens locally on your computer using the Whisper AI model. Your voice data never leaves your device.

Is there a subscription fee?

No, Sonicribe is a one-time purchase of $79. There are no monthly fees, no API costs, and no hidden charges. You own it forever.

What languages does Sonicribe support?

Sonicribe supports 99+ languages including English, Spanish, French, German, Chinese, Japanese, and many more through the Whisper AI model.

What are the system requirements?

Sonicribe works on macOS 12.0+ (Apple Silicon and Intel Macs) and Windows 10/11. Hardware with dedicated GPU acceleration offers the best performance.

Real-Time vs Batch Transcription: When to Use Each

Name: Sonicribe
Price: 79 USD
Availability: InStock
Author: Sonicribe

Real-Time Transcription Converts Speech as You Speak, While Batch Transcription Processes Pre-Recorded Audio Files After the Fact

These two transcription methods serve fundamentally different purposes, and choosing the right one depends on your workflow, your content, and how you plan to use the text output. Real-time (live) transcription is ideal for drafting, dictation, and composing text on the fly. Batch transcription is designed for processing existing audio recordings -- interviews, meetings, lectures, podcasts -- into searchable, editable text.

This guide breaks down both approaches in detail so you can build the transcription workflow that actually fits your day.

How Real-Time Transcription Works

Real-time transcription converts your speech into text as you speak, with near-zero delay between your voice and the words appearing on screen. You press a hotkey, start talking, and text flows into whatever application you are working in.

You speak -> Microphone captures audio -> AI model processes in real time
-> Text appears instantly in your active app

The key characteristic of real-time transcription is immediacy. There is no file to manage, no upload step, no waiting period. You speak and the text is there, ready to edit, send, or save.

What Makes Real-Time Transcription Different

Real-time processing requires the AI model to operate in streaming mode. Rather than analyzing an entire audio file at once, the model processes small chunks of audio (typically 1-3 seconds) sequentially, outputting text as each chunk completes.

This introduces a few technical considerations:

Latency: The delay between speaking and seeing text. Modern local models on Apple Silicon achieve 0.1 to 0.5 seconds of latency, which feels essentially instantaneous.
Context window: The model has limited look-ahead. It processes what it has heard so far, which can occasionally result in mid-sentence corrections as more context arrives.
Continuous processing: The model stays loaded in memory for the duration of your dictation session, using CPU/GPU resources continuously.

Best Use Cases for Real-Time Transcription

Drafting emails and messages. You think, you speak, the text appears. Real-time dictation turns your natural speech into a first draft faster than typing. With auto-paste enabled, the text lands directly in your email client, Slack, or messaging app. Writing first drafts of documents. Long-form writing -- articles, reports, proposals -- flows more naturally when you dictate. Many writers find that speaking produces more conversational, engaging prose than typing, which tends to be more stilted and edited as you go. Taking notes during live conversations. When you are on a call or in a meeting and want to capture key points as they happen, real-time transcription lets you dictate your observations without breaking your attention from the conversation. Composing code comments and documentation. Developers who use voice input for prose (comments, docs, commit messages) save significant time. Speaking a paragraph-long function description takes 15 seconds instead of a minute of typing. Quick capture throughout the day. Random ideas, tasks, reminders -- real-time dictation lets you capture thoughts the moment they arise. Press a hotkey, speak for 10 seconds, and move on.

Real-Time Transcription Performance

On modern hardware, real-time transcription is remarkably fast:

Hardware	Latency	Sustained Use	Quality
Apple M1/M2/M3 Mac	0.1-0.3 sec	Hours without degradation	96-98% accuracy
Apple M1/M2/M3 with Neural Engine	0.1-0.2 sec	Hours without degradation	96-98% accuracy
Intel Mac (2019+)	0.3-0.8 sec	Good for sessions under 1 hour	95-97% accuracy
Windows (RTX 3060+)	0.2-0.5 sec	Hours without degradation	96-98% accuracy

The bottleneck for real-time transcription is rarely accuracy or speed. On modern hardware, the AI model processes faster than you can speak. The real differentiator is workflow integration -- how seamlessly the text lands in the right application.

How Batch Transcription Works

Batch transcription takes an existing audio file and processes it from start to finish, outputting a complete text transcript. You provide a file (MP3, WAV, M4A, or similar), the model processes the entire recording, and you receive the full text when it completes.

Audio file (MP3, WAV, etc.) -> AI model processes entire file
-> Complete transcript generated -> Output as text/document

Batch processing has a fundamentally different rhythm than real-time. You are not sitting in front of the screen watching words appear. Instead, you initiate the process, and the transcript is ready when it finishes -- anywhere from a few minutes for short recordings to 30+ minutes for multi-hour files.

Read more: Multilingual Transcription: How to Transcribe in Multiple Languages

What Makes Batch Transcription Different

When processing a full audio file, the AI model has significant advantages over real-time mode:

Full context: The model can analyze the entire audio, using future context to improve accuracy on ambiguous words.
Multiple passes: Advanced batch processors can make multiple passes over the audio, refining accuracy each time.
Noise profiling: The model can analyze the overall noise profile of the recording and apply more effective filtering.
Speaker consistency: Over a long recording, the model builds a stronger understanding of each speaker's voice characteristics.

These advantages mean batch transcription can deliver marginally higher accuracy on challenging audio compared to real-time processing, especially in noisy environments or with heavy accents.

Best Use Cases for Batch Transcription

Meeting recordings. You recorded a 90-minute team meeting and need a searchable transcript. Batch processing turns the full recording into text you can search, quote, and distribute. Interview transcription. Journalists, researchers, and HR professionals routinely record interviews and need complete, accurate transcripts. Batch processing handles hour-long interviews without manual effort. Podcast and video production. Content creators need transcripts for show notes, SEO, accessibility captions, and repurposing content into blog posts. Batch processing an episode takes a fraction of the recording time. Lecture and course material. Students and educators process recorded lectures into searchable text for study guides, notes, and accessibility compliance. Legal and medical dictation. Professionals who record case notes, depositions, or patient encounters process these recordings in batch for documentation purposes. Archival projects. Organizations with large audio archives -- oral histories, recorded meetings, customer calls -- use batch processing to convert entire libraries into searchable text.

Batch Transcription Performance

Processing speed depends on the model size and your hardware:

Hardware	1-Hour File	4-Hour File	Quality
Apple M3 Pro/Max	8-12 min	32-48 min	97-99% accuracy
Apple M1/M2	15-25 min	60-100 min	96-98% accuracy
Intel Mac (2019+)	30-50 min	2-3.5 hours	95-97% accuracy
Windows (RTX 3060+)	10-18 min	40-72 min	96-98% accuracy

A key metric is the real-time factor (RTF). An RTF of 0.2 means one hour of audio processes in 12 minutes. Modern Apple Silicon Macs running Whisper AI typically achieve RTFs between 0.15 and 0.4, depending on the model size selected.

Head-to-Head Comparison

Factor	Real-Time	Batch
Input	Live microphone	Pre-recorded audio file
Output timing	Instant (as you speak)	After full processing
Primary use	Composing new text	Transcribing existing recordings
Accuracy (clean audio)	96-98%	97-99%
Accuracy (noisy audio)	90-95%	92-97%
Context awareness	Limited (streaming)	Full (entire file)
Hardware load	Sustained during session	Burst during processing
User attention	Active (you are dictating)	Passive (processing runs unattended)
File management	None (text goes directly to app)	Requires audio file input
Best for	Writing, email, quick capture	Meetings, interviews, archives

When to Use Each: Decision Framework

Ask yourself these three questions:

1. Are you creating new content or processing existing recordings?

If you are creating new content -- writing an email, drafting a document, capturing a thought -- use real-time transcription. There is no pre-existing audio file; you are generating text from scratch.

If you have an audio file that already exists -- a meeting recording, an interview, a podcast episode -- use batch transcription. The content is already captured; you just need it converted to text.

Read more: Offline vs Cloud Transcription: Performance, Privacy & Cost

2. Do you need the text immediately or can you wait?

Real-time transcription delivers text the moment you speak. If you need to send an email right now, reply to a message, or add a note to a document, real-time is the only option.

Batch transcription is inherently asynchronous. You start the process and come back when it finishes. If you can wait 10-30 minutes for a transcript, batch processing often delivers higher accuracy because the model has full context.

3. How long is the audio?

For short bursts (under 5 minutes), real-time transcription is almost always better. The overhead of saving a file, loading it into a batch processor, and waiting for output is not worth it for a quick dictation.

For long recordings (30+ minutes), batch processing is the practical choice. You would not want to re-dictate an hour-long meeting in real time -- the recording already exists.

For medium-length content (5-30 minutes), either approach works. If you are generating the content yourself, real-time makes sense. If you are processing someone else's speech, batch is the way to go.

Combining Both in a Single Workflow

The most productive transcription users do not choose one method exclusively. They use both, each for its natural strength.

Morning Routine

Real-time: Dictate your daily task list and priority notes
Real-time: Draft morning emails and messages by voice
Batch: Process yesterday's recorded meetings into searchable transcripts while you work on other things

During the Workday

Real-time: Dictate notes after each call or meeting while your memory is fresh
Real-time: Draft documents, proposals, and reports by voice
Batch: Queue up interview recordings, client calls, or webinar audio for background processing

End of Day

Batch: Process any remaining audio files from the day
Real-time: Dictate end-of-day summary and tomorrow's priorities

Content Creator Workflow

Real-time: Dictate blog post outlines and first drafts
Batch: Transcribe podcast episodes for show notes and blog repurposing
Real-time: Draft social media posts and email newsletters by voice
Batch: Process interview recordings for article quotes

Technical Deep Dive: Why Accuracy Differs

The accuracy gap between real-time and batch transcription comes down to context and processing strategy.

Context Window Effects

In real-time mode, the model processes audio in small chunks. When it encounters an ambiguous word -- say "their" versus "there" versus "they're" -- it must make a decision based on the few seconds of audio it has processed so far. Sometimes it gets it wrong because the disambiguating context comes later in the sentence.

In batch mode, the model can look at the entire sentence (or paragraph, or document) before committing to a transcription. "They're going to their house over there" is trivial to get right when you can see the whole sentence, but challenging when you are processing word by word.

Noise Handling

Batch processing can analyze the full recording's noise profile before transcribing. If the first 30 seconds have a consistent hum from an air conditioner, the model can build a noise profile and subtract it from the entire recording. Real-time processing does not have this luxury -- it must handle noise adaptively as it encounters it.

Post-Processing

Batch transcription pipelines often include post-processing steps:

1. Initial transcription pass

2. Punctuation and capitalization refinement

3. Speaker attribution (if applicable)

4. Confidence scoring and flagging uncertain segments

5. Final output formatting

Read more: Speaker Identification in Transcription: How It Works

Real-time transcription performs these steps inline, which is faster but provides fewer opportunities for refinement.

Setting Up Both Modes in Sonicribe

Sonicribe supports both real-time dictation and batch file processing, so you get the full spectrum of transcription capability in a single application.

Real-Time Dictation Setup

1. Configure your global hotkey: Set a keyboard shortcut that activates dictation from anywhere on your Mac. Most users prefer a double-tap of a modifier key or a function key combination.

2. Enable auto-paste: When you finish dictating, the transcribed text automatically pastes into your active application. This works with 30+ apps including Mail, Slack, Notion, Google Docs, VS Code, and more.

3. Choose your formatting mode: Select from 8 formatting modes -- Standard, Email, Code Comment, Meeting Notes, and others -- to get properly formatted output without manual editing.

4. Select your vocabulary pack: If you work in a specialized field, activate the relevant vocabulary pack (Medical, Legal, Technical, etc.) for higher accuracy on domain-specific terms.

Batch Transcription Setup

1. Drag and drop your audio file: Sonicribe accepts MP3, WAV, M4A, FLAC, and other common audio formats.

2. Select output format: Choose plain text, Markdown, or other structured formats.

3. Choose your model size: Larger models deliver higher accuracy but take longer to process. The Large v3 Turbo model offers the best balance for most users.

4. Start processing: Click transcribe and let it run. You can continue using your Mac for other tasks while batch processing runs in the background.

Choosing the Right Whisper Model

Model	Real-Time Suitability	Batch Suitability	Speed	Accuracy
Tiny	Excellent (lowest latency)	Good for quick drafts	Fastest	90-93%
Base	Very good	Good	Fast	92-95%
Small	Good	Very good	Moderate	94-96%
Medium	Acceptable	Excellent	Slower	95-97%
Large v3 Turbo	Good (on Apple Silicon)	Excellent	Moderate	97-99%

For real-time dictation, most users find the Small or Large v3 Turbo model strikes the right balance. On Apple Silicon Macs, even the Large v3 Turbo model runs fast enough for real-time use with sub-half-second latency.

For batch processing, always use the largest model your hardware can handle. Speed matters less when the processing runs in the background, so maximizing accuracy is the priority.

Common Mistakes to Avoid

Using batch processing for quick notes. If you just need to capture a two-sentence thought, do not record an audio file and batch-process it. Use real-time dictation -- it takes 10 seconds instead of 2 minutes. Using real-time dictation to "re-read" a recording. If you have a recorded meeting, do not play it through your speakers and try to "re-dictate" it in real time. Batch-process the original file directly for much higher accuracy. Ignoring formatting modes for real-time dictation. Sonicribe's formatting modes automatically add proper punctuation, paragraph breaks, and structure to your dictation. Without them, you get a wall of unformatted text that requires manual editing. Not leveraging vocabulary packs. Whether you are using real-time or batch, activating the appropriate vocabulary pack for your field dramatically improves accuracy on specialized terminology. Processing everything in real time. Some users try to dictate everything and never use batch processing. If you have audio files, batch-process them. It is faster, more accurate, and does not require your active attention.

The Bottom Line

Real-time transcription and batch processing are complementary tools, not competitors. Real-time dictation is your tool for creating new text -- emails, documents, notes, messages. Batch transcription is your tool for converting existing audio into text -- meetings, interviews, podcasts, lectures.

The most efficient workflow uses both:

Real-time for everything you compose by voice
Batch for everything that was already recorded

Sonicribe gives you both capabilities in a single offline application, powered by Whisper AI running locally on your Mac or Windows PC. No internet, no subscription, no data leaving your device.

Ready to transcribe both ways? Download Sonicribe free and start dictating in real time or processing audio files -- all offline, all private, all for $79 once.

Real-Time vs Batch Transcription: When to Use Each

Real-Time Transcription Converts Speech as You Speak, While Batch Transcription Processes Pre-Recorded Audio Files After the Fact

How Real-Time Transcription Works

What Makes Real-Time Transcription Different