Upload 2–20 audio files (MP3 or AAC) and get back a single merged file instantly. Tracks are joined at the stream level — zero re-encoding, zero quality loss, processing in milliseconds regardless of file duration.
How it works
Step
What happens
1. Upload
Send files as multipart form data. Optionally include silence spacing between tracks.
2. Validate
Server verifies all files share the same codec, sample rate, and channel count.
3. Merge
Streams are concatenated losslessly. Silence frames inserted if spacing requested.
4. Stream back
By default the API returns responseFormat=file with raw audio bytes. Opt into responseFormat=json when you want a JSON wrapper instead.
Response modes
merge-fast defaults to responseFormat=file. That means omitted transport selection returns the merged audio as the raw response body. If you need a wrapper with base64 data and metadata in meta, send responseFormat=json inside the multipart metadata object.
audio/mpeg or audio/aac → default file-mode success
application/json with success: true → explicit responseFormat=json success
application/json with success: false → error in either mode
Why use it?
Built for production audio pipelines
Bit-perfect output. Every sample is preserved exactly as-is. No generational loss, no artifacts, no re-encoding. Your listeners hear exactly what you produced.
Millisecond merges. A 2-hour audiobook merges as fast as a 10-second jingle — there's no re-encoding step, so duration doesn't affect latency.
Programmable silence gaps. Insert precise silence between tracks — 500 ms pause after an intro, 2 seconds between chapters, custom per-boundary. No need to create silent audio files yourself.
One request, one response. No job queues, no polling, no webhook callbacks, no pre-signed download URLs. Send files, get merged audio back in the same HTTP call.
Common use cases
Podcast production — stitch intro + ad + episode + outro in your CI/CD pipeline
Audiobook assembly — combine chapter recordings with timed silence breaks
Music playlists — create seamless mixes or DJ sets from individual tracks
Voice-over — merge narration segments for e-learning or video production
Automated workflows — batch merge thousands of files from a queue or cron job
Check Content-Type before reading the body.audio/* means default file-mode success, while application/json can be either explicit JSON success or an error.
Normalize your sources first. All files must share the same codec, sample rate, and channel count. Use the X-Audio-* headers from a previous merge to check what format you're working with.
Let auto-detection do the work. Use format: "auto" (or omit the field entirely) — the server reads the codec from the first file. Only set mp3/aac explicitly to enforce strict format validation.
Read metadata from headers, not the file. Duration, file count, format, and output size are all in X-Audio-* response headers — no need to probe the binary to get stats for your UI or database.
Use file mode for streaming pipelines. The default raw body is ideal for piping straight to disk or object storage without buffering the whole file in memory.
Cost & Limits
Feature
Detail
Base cost
2 credits (first 50 MB)
Extra cost
+1 credit per additional 50 MB block
Max cost
5 credits per request (200 MB cap on premium tier)