Lossless fade-out for a mp4/m4a song
By moltenform / Ben Fisher
Sometimes you'll have a song that goes on for longer than you like. It's an mp4 audio file (aka .m4a or "aac"). You could open the song in Audacity, delete the last part of the song, and add a fade-out. This would work, but then you to have to re-encode the audio again, and since m4a is a lossy format, some audio data will be lost.
Let's say the song is 6:00 long, and we want to keep only the first 4:00, with an 8 second fade-out. We'll re-encode these last 8 seconds, but we can leave the rest of the song untouched.
Losslessly split the input file into 3 audio files with
ffmpeg
- (0:00 - 4:00), save as "part1.m4a"
- (4:00 - 4:08), save as "part2.m4a"
- (4:08 - 6:00), save as "part3.m4a"
Use
ffmpeg
'safade
to create an 8-second fade out to "part2", save to "part2_fade.wav"Use
qaac
to encodepart2_fade.wav
topart2_fade.m4a
Use
ffmpeg
to extract the raw .aac data frompart1.m4a
topart1raw.aac
Use
ffmpeg
to extract the raw .aac data frompart2_fade.m4a
topart2_faderaw.aac
Strip priming data at the beginning of
part2_faderaw.aac
(see Details section below)Concatenate the contents of
part1raw.aac
andpart2_faderaw.aac
tooutput.aac
Use
ffmpeg
to add an mp4 header,output.aac
tooutput.m4a
Code
I've written a C# implementation here.
Details
The raw data aac file is built from frames. The first frames contain priming data,
mentioned in https://developer.apple.com/library/mac/technotes/tn2258/_index.html
The encoder delay for Nero aac is ~2600 samples
The encoder delay for qaac (Apple) is ~2112 samples
We can't break apart a frame without needing to encode, but 2048 is quite close to 2112.
So, for `qaac`, deleting the first two 1024-sample frames yields good results.
Alternatives that aren't sufficient:
If the -af filter for ffmpeg is used, it re-encodes the entire file.
Joining the two m4a files with ffmpeg -concat leaves a gap of silence
Gluing the two pieces of aac with ffmpeg -concat leaves a gap of silence
Gluing the two pieces of aac by concat'ing aac files leaves a gap of silence
The result is all lossless except for the final seconds, the fade-out itself. There will sometimes be a quiet sound artifact heard right at the transition. In my example, if there is a quiet sound at 4:00 in the output, I'll try the process again using 4:01 instead of 4:00.