Over the holidays, I looked at the data from the watermark listening test. My hypothesis was that the watermark added by Universal Music Group is at least as audible as 128 kbps MP3 compression artifacts. The test results support the hypothesis for certain types of music.
The most important takeaway is that the UMG watermark is an enormous confounding factor for evaluating audio quality of streaming services. The noise of the watermark in Universal content will overwhelm differences in compression quality.
The listening test asks subjects to identify the watermarked sample from each of 16 pairs. It's similar to this McGill MP3 discrimination test, which makes it useful for comparison. It should be noted that the McGill listening test was administered in a quiet, controlled listening environment with a high-end sound system, so one might expect that subjects taking my test in the wild might not perform as well. Nonetheless, the results of the McGill test are included in the chart (white bars) for reference:
The difference among songs shows that the watermark is highly content-sensitive. At one extreme (Engulfed Cathedral), 163 out of 201 subjects identified the watermark, comparable to a 96 kbps MP3 on the McGill test. At the opposite extreme, users were not able to detect the watermark in electronic music with strong transients (Jongebeer).
The watermark seems particularly problematic for classical and acoustic works. The four worst-performing samples are piano/orchestral music.