This is why "audiophile" is a joke term in the audio engineering community.
It is not biologically possible for us (unless one is an alien) to hear above 22kHz. 44.1kHz is enough to record in perfect fidelity [0]. For strictly listening purposes, 192kHz/24bit is wasteful, just extreme overkill because "bigger numbers = more good".
People can't even reliably detect a difference between 320kbps MP3s and source-quality FLAC/ALAC/WAV/PCM. On very good headphone and speaker setups. Good quality MP3/AAC is all one needs for airpods and the bluetooth protocol can easily handle those bitrates.
edit: changed record->playback. Here I'm just discussing playback; for audio production purposes 192kHz/24bit is desireable for several reasons. Once one ships that album/song though, it should be downmixed back to 44.1kHz/16bit.
It's not necessarily just that we can't hear above 20khz - recording above 44.1khz has its merits. For example, if you want to apply digital distortion or do amplitude modulation then up-sampling or having a high SR stops wrap around in the spectrum and getting all those nasty artefacts. The end user should never need to worry about this and it should be downsampled at the master.
You forgot the #1 reason for recording at high SR; using a lower order low-pass filter without affecting the audible range. Sharp digital low-pass filters have far fewer tradeoffs and better SNR than analog filters, so doing the anti-aliasing low-pass filter digitally has big advantages.
The interesting part is in how actually valuable improvements are rare, while useless "improvements" are common.
I've got all music in 44.1kHz or 48kHz FLAC (only on the server, so I can transcode it to ogg opus for mobile playback reducing the space usage without lossy to lossy artifacts). Similar effects apply to many other such cases.
Audiophiles buy 10'000$ golden HDMI cables... which don't even support HDMI 2.0a. They buy gold-plated toslink cables (!)
Transcoding is a fairly common feature, although typically to mp3. I use Navidrome [0] to self-host my own music, and it lets me set up custom transcoding profiles with ffmpeg.
KDE's Amarok (semi-dead project) can automatically transcode audio files if you copy them to e.g. a phone, Jellyfin (an Emby fork, which is a Plex clone) can automatically transcode files on the fly during playback.
I've simply written a small python utility to handle everything automatically.
Once I learned the Nyquist frequency rule existed in my EE classes I became incredibly suspicious of anybody who threw the words "lossless" around. Glad to know that this suspicion was well-founded.
Aside, "Nyquist criterion" would be a sick band name.
While 192KHz is overkill, there are plenty of instruments that produce non-trivial amounts of energy above 22KHz. While we're unable to hear them some can be generally felt (depending on volume and proximity). A pair of headphones won't reproduce those ultrasonic energies but good speakers can. Even so, this is only germane in a handful of types of music with high fidelity recording and mastering.
The same for higher resolution samples, an audio pipeline using that available dynamic range can do cool stuff.
This is somewhat true. Also, while human beings cannot hear a dog whistle per se, due to the way the human ear works they can hear the beats produced by the combination of that sound with another, straightforwardly audible-frequency sound. Per Nørgård exploits this in his Symphony No. 5. But examples of this actually affecting enjoyment or even an A/B test in practice are so few they shouldn't really guide one's encoding choices, CD audio is already sufficiently wide in practice and claims that we need more are audiophile wishful thinking.
I'm definitely not arguing for the stupid "audiophile" frequencies. For the tiny fraction of music taking advantage of ultrasonic frequencies and high dynamic range it's nice there's an option now.
I hope the availability of high resolution, or more importantly the spacial audio, will inspire some artists to really explore the space. I'm hoping for some stuff like the Flaming Lips' Zaireeka or Yoshimi album's 5.1 mix.
But does sampling above the Nyquist rate guarantee phase reconstruction though? In a real world (not theoretical ideal) implementation of a reconstruction filter, what’s the phase reproduction ability the nearer you get to the Nyquist rate? I thought that there was no phase reconstruction guarantee by the Shannon reconstruction theorem, only frequency, and that there can be advantages to sampling rates higher than than Nyquist alone would suggest. Can anyone with more authority on this subject speak to this? It’s a bit fuzzy but I remember my DSP professor making that argument once and it always kind of stuck with me. It does make some intuitive sense, as you picture trying to reconstruct a sinusoid close to (but below) the Nyquist rate. It’s not hard to see how a reconstruction filter may get some odd results like incorrect amplitude, oscillating amplitude, or phase shift the closer you get.
> found that a function called memcpy was the culprit, most memory players use memcpy and this is one of the reasons why memory play sounds worse ie digital sounding. Fortunately there is an optimised version of memcpy from http://www.agner.org/optimize/, using this version removes the hard edge produced by memcpy.
The trouble with this threads is they straddle a gaping chasm between complete delusion, or amazing trolling.
Just imagine if stuff was like this though and computing were more 'analogue' -- each of those 15 layers of JavaScript transpiling just degraded the quality of the end result slightly.
Really weird things happen with half-knowledge and fixed ideas built on it. If you "know" it'll sound different, it will sound different to you. Combine that with actual, unrelated variations and maybe a bug that fits the pattern at some point...
This shit always gets me. I don't care how much you spend on electrostatic speakers, granite slabs, acoustic treatments, and magic speaker wire — your room will never, ever sound as good as a decent pair of headphones.
That said there absolutely is a case for high-bitrate, losslessly-compressed, DRM-free, watermark-free audio as the standard, and that is sampling and remixing. Slowing down a 320 kbps mp3 by just 50% sounds like shit.
Some of the people concerned about room response are listening to surround-sound classical recordings. In works that involve a spatial element -- for example an orchestra in front and players that move around the hall (or are embedded within the audience), creating a 360° soundstage -- headphones just don't preserve that as well as actual speakers.
Headphones with a proper HRTF preserve that perfectly fine better than any actual speaker setup ever can. The typical demo for this functionality is https://www.youtube.com/watch?v=IUDTlvagjJA
Recordings have to be specially recorded and processed for that. There are however many older recordings that were made without the use of such technology and are expected to be played on speakers in 5.0.
You can apply the same virtually as well: simulate a raytraced environment with 5.0 speakers and furniture, simulate an HRTF, and generate the resulting 2.0 audio.
It's possible to do this in a way that's not in any way distinguishable from a real 5.0 setup
I don’t think it is realistic to expect an ordinary home listener of some old SACD 5.0 recording to carry out some elaborate simulation involving the specific furniture in his home, just to listen to something on headphones.
The user themselves not, but if you have some apple homepods and airpods max, they can build the same model and actually do that (and that's part of their spatial audio).
Dolby provides a similar system, but with a generic room simulation, as do several others.