Local AI Models in Sonicribe: Mistral, Llama & Phi on Your Mac
Sonicribe supports local AI models like Mistral 7B, Llama 3 8B, and Phi-3 Mini for text formatting — completely offline, no API keys needed. Free AI-powered transcription.
Sonicribe Team
Product Team

Table of Contents
Run AI Models on Your Mac. No Cloud. No API Keys.
If you've heard about local language models—Mistral, Llama, Phi—you probably think they're only for researchers or developers with serious hardware. They're not.
Sonicribe lets you run open-source AI models directly on your Mac to enhance and format your voice transcriptions. No cloud upload. No API keys. No subscription costs. Your data stays on your device, and your AI formatting runs offline.
This is significant for anyone who cares about privacy, speed, or saving on cloud API costs.
What Local AI Models Do in Sonicribe
First, clarity on what local models do and don't do.
What they do NOT do: Recognize speech. Sonicribe uses Whisper AI (a separate model from OpenAI) for speech-to-text transcription. This is built-in and offline. All speech recognition happens locally regardless of which formatting model you use. What they DO do: Format, enhance, and structure your transcribed text. After Whisper converts your voice to text, a language model refines it. Local models are one option for this refinement step.Here's the workflow:
1. You speak into Sonicribe (any language, any topic)
2. Whisper AI transcribes your voice to text, locally on your Mac
3. Optional: A language model formats your text using your output prompt
4. The formatted text pastes into your app of choice
Steps 1-2 happen always, offline, regardless of settings. Step 3 is where you choose between local models, cloud models, or no additional formatting.
Available Local Models
Sonicribe supports several open-source language models. Each has trade-offs between speed, quality, and resource usage.
Mistral 7B
Size: 7 billion parameters Speed: Fast (generates text quickly) Quality: Good balance of speed and accuracy Memory: ~5GB on disk, ~6-8GB active use Best for: General formatting, speed-sensitive workflowsMistral 7B is the default recommendation for most Mac users. It's fast enough for real-time formatting (you won't wait long) while producing quality output.
Use Mistral 7B if you want formatting to feel instant and your Mac has modest resources (8GB RAM is adequate).
Read more: Local AI Processing on Mac: Apple Silicon Neural Engine Explained
Llama 3 8B
Size: 8 billion parameters Speed: Moderate (slightly slower than Mistral) Quality: Higher quality than Mistral, more nuanced Memory: ~5GB on disk, ~7-9GB active use Best for: Complex writing, high-quality output, when speed isn't criticalLlama 3 8B produces more sophisticated output. It's better at understanding context, handling nuance, and refining complex prose. The tradeoff is it takes slightly longer to format.
Use Llama 3 8B if you need higher-quality text enhancement and have a Mac with solid specs (16GB RAM recommended).
Phi-3 Mini
Size: 3.8 billion parameters Speed: Very fast (quickest option) Quality: Good for basic formatting Memory: ~2.5GB on disk, ~4-5GB active use Best for: Older Macs, lightweight workflows, minimal hardwarePhi-3 Mini is Microsoft's efficient model. It runs on almost any Mac and generates output quickly. The trade-off is it's less nuanced than larger models.
Use Phi-3 Mini if your Mac has limited resources (4-8GB RAM) or you prioritize speed over output sophistication.
How to Download and Install Local Models
Installing a local model in Sonicribe takes a few minutes.
Step 1: Open Sonicribe preferences- Open Sonicribe
- Go to Settings > AI Formatting > Local Models
- Select Mistral 7B, Llama 3 8B, or Phi-3 Mini from the available list
- Click "Download"
- The model downloads to your Mac's disk (~2-5GB depending on the model)
- This is a one-time download
- Once downloaded, it's cached locally and used for all future formatting
- Go to your mode settings (Meeting Mode, Email Mode, etc.)
- Under "AI Formatting," select your newly downloaded model
- Test with a short voice memo
The entire process takes 5-15 minutes depending on your internet speed and which model you choose.
Storage and System Requirements
Choosing a local model is a trade-off between capability and storage.
| Model | Download Size | Active Memory | Disk Space | Min RAM | Apple Silicon | Intel Intel Macs |
|---|---|---|---|---|---|---|
| Phi-3 Mini | 2.5GB | 4-5GB | 3GB | 4GB | Yes | Yes |
| Mistral 7B | 5GB | 6-8GB | 6GB | 8GB | Yes | Yes |
| Llama 3 8B | 5GB | 7-9GB | 6GB | 12GB | Yes | Yes |
All models run efficiently on Apple Silicon because these chips have specialized neural engines. Phi-3 Mini and Mistral 7B are ideal. Llama 3 8B also runs well on M2/M3 and newer.
Intel Macs:All models work on Intel, but they'll be slower. Phi-3 Mini is recommended for older Intel Macs (2015-2018). Mistral 7B works on newer Intel machines. Llama 3 8B requires modern Intel hardware.
Storage note: After download, the model stays on disk. You can delete it anytime to free space. Re-downloading takes the same 5-15 minutes.Read more: Getting Started with Sonicribe: Your Complete Guide
Cloud Models (Alternative: No Installation Required)
Sonicribe also supports cloud-based language models if you prefer not to download anything locally.
| Model | Provider | Cost | Requires API Key | Speed | Quality |
|---|---|---|---|---|---|
| Mistral 7B | Mistral API | ~$0.27 per million tokens | Yes | Very fast | Good |
| GPT-4o | OpenAI | ~$5 per million tokens | Yes | Very fast | Excellent |
| Claude 3.5 | Anthropic | ~$3 per million tokens | Yes | Fast | Excellent |
| Gemini 2.0 | ~$0.10-0.40 per million tokens | Yes | Very fast | Good |
Cloud models don't require you to download anything. You provide an API key, and Sonicribe sends your transcribed text to the cloud model for formatting.
Trade-offs:- Cloud models are faster and higher quality than local models
- They cost money per use (though often minimal)
- They require your transcribed text to be uploaded to a cloud service
- You need valid API keys from the provider
Most Sonicribe users who opt for cloud models use GPT-4o or Claude 3.5 for premium quality on important documents.
Hybrid Approach: When to Use Each
Here's a practical guide for choosing:
Use local models if:- Privacy is critical (healthcare, law, sensitive data)
- You want zero per-use costs
- Your Mac has adequate storage and RAM
- You work offline frequently
- You want fast offline formatting
- Output quality matters most (important emails, formal writing)
- You don't mind per-use costs
- Your Mac has limited storage
- You want the fastest formatting
- You're willing to upload transcribed text
- Your dictation is already well-organized
- You just need raw transcription
- You want maximum speed and zero overhead
- Whisper's transcription is sufficient for your use case
Many users mix approaches. Local model for quick notes and brainstorms. Cloud model for important emails or formal writing. Raw transcription for simple todos.
Performance on Apple Silicon vs. Intel
The experience varies based on your Mac's architecture.
Apple Silicon Macs (M1/M2/M3+):- Phi-3 Mini: Feels instant, no perceptible delay
- Mistral 7B: 2-5 seconds to format
- Llama 3 8B: 3-8 seconds to format
- These are impressive given the model complexity
- Phi-3 Mini: Feels instant
- Mistral 7B: 5-10 seconds to format
- Llama 3 8B: 10-20 seconds to format
- Still workable, but noticeable wait
- Only Phi-3 Mini is recommended
- Others may be very slow or require lots of RAM
Apple Silicon is genuinely faster for local AI models. If you're running an Intel Mac and speed matters, consider cloud models instead.
Cost Comparison: Local vs. Cloud
Let's calculate real-world costs if you dictate heavily.
Scenario: 5,000 words per week (Sonicribe's free tier limit) Local Model (Phi-3 Mini or Mistral 7B):- Download: One-time, 5-15 minutes
- Active use: $0
- Monthly cost: $0
- Yearly cost: $0
- Storage commitment: 3-6GB disk space
- Average formatting reduces 5,000 words to ~1,000 tokens (rough ratio)
- 5,000 words/week = ~1,000 tokens formatted
- Monthly: 4,000 tokens = ~$0.01
- Yearly: 52,000 tokens = ~$0.16
The choice isn't economic. It's about privacy, reliability, and offline capability.
Common Workflows Using Local Models
Here's how different users leverage local models.
Read more: Best Local AI Tools in 2026: Privacy-First AI on Your Device
Writer Using Meeting Mode
A consultant records meeting notes in Meeting Mode. Sonicribe's Meeting Mode prompt is customized:
Format my voice notes as:
- Attendees
- Key Decisions
- Action Items (with owners)
- Next Steps
With Mistral 7B locally, the formatting is instant and offline. Notes are formatted before the meeting even ends, ready to share immediately.
Developer Using Note Mode
A programmer dictates code review feedback in Note Mode. The custom prompt is:
Format my feedback as clear, constructive code review comments.
- What's good about this code
- Suggested improvements
- Questions/clarification needed
Use professional but friendly tone.
Local Llama 3 8B produces nuanced, thoughtful code review. The entire process—dictation, transcription, formatting—stays on the developer's machine.
Student Using Summarize Mode
A student records lecture notes, and Sonicribe's Summarize Mode (with local model) condenses them:
Extract the 5 key concepts from my lecture notes.
For each concept, provide:
- Definition
- Real-world example
- Why it matters
Keep it concise.
Phi-3 Mini handles this efficiently on a student's MacBook Air. No cloud, no privacy concerns about sharing class content.
Switching Between Models
You can switch local models anytime. In Sonicribe settings:
1. Go to Settings > AI Formatting
2. Select a different model from the dropdown
3. If not already downloaded, click download
4. It becomes active immediately
You might use Phi-3 Mini for quick todos, then switch to Llama 3 8B for important writing. Same app, different settings.
Downloaded models persist. Deleting one frees disk space but requires re-download if you want to use it again.
Troubleshooting Local Models
Model is slow:- Your Mac is under resource pressure
- Try Phi-3 Mini instead (lighter)
- Close other apps consuming RAM
- On Intel Macs, slower is expected; consider cloud models
- Check internet connection
- Ensure you have disk space (download size + 50% buffer)
- Try again later if download server is busy
- Your output prompt might be unclear; refine it
- Try a larger model (Mistral to Llama)
- Cloud models produce better quality
- Local models on old hardware can stress CPUs
- Take breaks between formatting sessions
- Use Phi-3 Mini (lightest option)
- Consider cloud models instead
A Developer's Perspective
If you're a developer, local models are compelling. They're open-source, auditable, and give you deep control over text processing.
You can inspect model behavior, understand how your output prompt affects output, and ensure your data never leaves your device. For sensitive work or proprietary text, this is invaluable.
Read more: Best AI Voice Cloning Tools in 2026: Create Your Digital Voice
Sonicribe makes this accessible without requiring you to run models in the terminal or manage Python environments. Point-and-click installation, then go.
Privacy and Security
This is the core reason many users prefer local models.
When you use local models:
- Your transcribed text never leaves your Mac
- Sonicribe doesn't see your text (formatting happens locally)
- The model isn't connected to your account or any service
- No log of what you dictated exists anywhere
When you use cloud models:
- Your transcribed text is sent to the cloud provider (OpenAI, Anthropic, Google, etc.)
- The provider's privacy policy applies
- You're using their API, which has standard terms
For personal use, local models are unquestionably more private. For professional use with sensitive data, they're often required.
Free AI Formatting
The combination of Whisper AI (speech-to-text) and free local models means your AI-powered formatting costs nothing.
Sonicribe's free tier ($10,000 words/week) includes:
- Whisper AI transcription (built-in, offline)
- Local model formatting (any model you download)
- All output prompt customization
You pay nothing for the speech recognition. You pay nothing per-use for formatting. The only cost is your one-time purchase of the app ($79 for unlimited words, or free forever at 5,000/week).
This is rare in the AI-formatting space. Most tools charge per transcription minute or per API call. Sonicribe's model-inclusive pricing is distinctive.
The Future of Local Models
Larger, better models are being released continuously. Mistral just released a larger model. Llama 3.1 is coming. New efficient models launch monthly.
Sonicribe will support new models as they're released. You're not locked into current options.
Open-source AI models are also improving rapidly. In 12-24 months, expect local models to rival cloud models in quality while remaining faster and more private.
Try Local Models Today
Download Sonicribe free. 5,000 words per week, all modes, all features included.
Download a local model (Mistral 7B is recommended for most users). Record a voice memo, format it with your local model, and see how fast and private it feels.
If local formatting isn't sufficient, you can always switch to cloud models or no additional formatting. The choice is yours, and switching is instant.
Download Sonicribe NowRelated Reading
Ready to transform your workflow?
Join thousands of professionals using Sonicribe for fast, private, offline transcription.

