Merge Fast

POST/audio/v1/merge-fast

What it does

Upload 2–20 audio files (MP3 or AAC) and get back a single merged file instantly. Tracks are joined at the stream level — zero re-encoding, zero quality loss, processing in milliseconds regardless of file duration.

How it works

Step	What happens
1. Upload	Send files as multipart form data. Optionally include silence spacing between tracks.
2. Validate	Server verifies all files share the same codec, sample rate, and channel count.
3. Merge	Streams are concatenated losslessly. Silence frames inserted if spacing requested.
4. Stream back	By default the API returns responseFormat=file with raw audio bytes. Opt into `responseFormat=json` when you want a JSON wrapper instead.

Response modes

merge-fast defaults to responseFormat=file. That means omitted transport selection returns the merged audio as the raw response body. If you need a wrapper with base64 data and metadata in meta, send responseFormat=json inside the multipart metadata object.

audio/mpeg or audio/aac → default file-mode success
application/json with success: true → explicit responseFormat=json success
application/json with success: false → error in either mode

Why use it?

Built for production audio pipelines

Bit-perfect output. Every sample is preserved exactly as-is. No generational loss, no artifacts, no re-encoding. Your listeners hear exactly what you produced.
Millisecond merges. A 2-hour audiobook merges as fast as a 10-second jingle — there's no re-encoding step, so duration doesn't affect latency.
Programmable silence gaps. Insert precise silence between tracks — 500 ms pause after an intro, 2 seconds between chapters, custom per-boundary. No need to create silent audio files yourself.
One request, one response. No job queues, no polling, no webhook callbacks, no pre-signed download URLs. Send files, get merged audio back in the same HTTP call.

Common use cases

Podcast production — stitch intro + ad + episode + outro in your CI/CD pipeline
Audiobook assembly — combine chapter recordings with timed silence breaks
Music playlists — create seamless mixes or DJ sets from individual tracks
Voice-over — merge narration segments for e-learning or video production
Automated workflows — batch merge thousands of files from a queue or cron job

Examples

Basic merge (cURL)

curl -X POST 'https://api.creatornode.io/audio/v1/merge-fast' \
  -H 'X-API-Key: YOUR_KEY' \
  -F 'files=@intro.mp3' \
  -F 'files=@content.mp3' \
  -F 'files=@outro.mp3' \
  --output merged.mp3

Merge with silence spacing

curl -X POST 'https://api.creatornode.io/audio/v1/merge-fast' \
  -H 'X-API-Key: YOUR_KEY' \
  -F 'files=@chapter1.mp3' \
  -F 'files=@chapter2.mp3' \
  -F 'files=@chapter3.mp3' \
  -F 'metadata={"format":"auto","spacing":[500,1000]}' \
  --output audiobook.mp3

# spacing: 500ms gap after chapter 1, 1000ms gap after chapter 2

Explicit JSON mode

curl -X POST 'https://api.creatornode.io/audio/v1/merge-fast'   -H 'X-API-Key: YOUR_KEY'   -F 'files=@intro.mp3'   -F 'files=@outro.mp3'   -F 'metadata={"responseFormat":"json","format":"auto"}'

Response headers

Header	Description
`X-Audio-Format`	Output format (`mp3` or `aac`)
`X-Audio-Codec`	Audio codec name
`X-Audio-Sample-Rate`	Sample rate in Hz (e.g. `44100`)
`X-Audio-Channels`	Channel count (`1` mono, `2` stereo)
`X-Audio-Duration-Ms`	Total output duration in milliseconds
`X-Audio-File-Count`	Number of input files merged
`X-Audio-Spacing-Count`	Number of silence gaps inserted
`X-Output-Size-Bytes`	Output file size in bytes
`X-Processing-Time-Ms`	Server-side processing time

Tips & tricks

Check Content-Type before reading the body. audio/* means default file-mode success, while application/json can be either explicit JSON success or an error.
Normalize your sources first. All files must share the same codec, sample rate, and channel count. Use the X-Audio-* headers from a previous merge to check what format you're working with.
Let auto-detection do the work. Use format: "auto" (or omit the field entirely) — the server reads the codec from the first file. Only set mp3/aac explicitly to enforce strict format validation.
Read metadata from headers, not the file. Duration, file count, format, and output size are all in X-Audio-* response headers — no need to probe the binary to get stats for your UI or database.
Use file mode for streaming pipelines. The default raw body is ideal for piping straight to disk or object storage without buffering the whole file in memory.

Cost & Limits

Feature	Detail
Base cost	2 credits (first 50 MB)
Extra cost	+1 credit per additional 50 MB block
Max cost	5 credits per request (200 MB cap on premium tier)

Tier Limits

Limit	Free	Premium
Max files per request	3	20
Max file size	10 MB	50 MB
Max total payload	25 MB	200 MB
Silence spacing	✓	✓