Setup
Getting Aloud running
Aloud ships no recognition of its own — it calls Doubao streaming ASR through your own Volcano Engine account. Three steps: provision the service and get credentials, grant system permissions, and (optional) set a term dictionary. Without step one the tool does nothing at all.
Step 1 · Required
Provision Doubao, get AppID / Access Token
- Sign in to the Volcano Engine console and search for "豆包语音" (Doubao Voice) or open "智能语音" (Intelligent Speech).
- Create an Application and provision the 语音识别大模型 (Speech Recognition Large Model) service. It must be the large-model / streaming 2.0 service, not the older small-model one — Aloud uses 2.0, and the wrong service returns a 403.
- On the application detail page, grab two values: AppID and Access Token.
- Open the Aloud menu-bar icon →
Voice Engine Settings…and put the two values intoApp IDandAccess Tokenunder the "豆包流式语音识别(必填)" section, then clickSave. - Tap Fn and say something. If text comes out, you're set.
Getting a 403 / "not provisioned"
The error usually says "service not provisioned," but the real cause is almost always that you provisioned the small-model service instead of large-model streaming 2.0. Go back to the console, confirm the service is "语音识别大模型", wait a few minutes for it to take effect, and retry. Wrong credentials only cause an auth failure, not a 403.
Step 2 · Required
System permissions
Aloud is unsigned, and it needs to monitor the Fn key, inject text into other apps, and record from the microphone. Miss any of the three and it won't work.
- First launch: a double-click gets blocked. Right-click
Aloud.app→ Open → Open again; or go toSystem Settings → Privacy & Securityand use the "Open Anyway" line near the bottom. - Microphone:
System Settings → Privacy & Security → Microphone, toggle Aloud on. - Accessibility:
System Settings → Privacy & Security → Accessibility, toggle Aloud on. Both the Fn-key monitoring and injecting text into the focused field depend on this — without it, pressing Fn does nothing.
Restart Aloud once after changing permissions for the cleanest result.
Step 3 · Optional
Term dictionary
Technical words, names, and product names get heard as homophones. The term dictionary feeds these to Doubao before recognition — more reliable than letting an LLM guess after the fact, and without the extra few seconds of latency.
Voice Engine Settings…→ the "热词" (hot words) box under "术语词库", one term per line, e.g.Kubernetes,Pydantic,idempotent, the names and projects you say often.- Roughly 100 entries cap; anything beyond is trimmed. Pick the high-frequency words most often misheard — don't pad the list.
- Stored locally, sent inline to Doubao at recognition time. It does not upload to a cloud word table or route through any third party.
- It and LLM correction are two layers: the dictionary works before recognition (more accurate, zero latency), the LLM works after as a backstop (fixes obvious mis-hears). Both on is best; you can also run the dictionary alone and turn the LLM off.
Haven't downloaded yet? Back to the Aloud download page. If it breaks, hello@openedon.com.