Matt Montag

Sound, music, perception, and interaction

Protected: Rdio

This post is password protected. To view it please enter your password below:


Music Streaming Snapshot

Monthly Users According to Facebook - February 2012

Service Launch Date Monthly Users % Change
Spotify Oct 2008 15,800,000 +20%
Pandora Jan 2000 9,200,000 +3%
SoundCloud Oct 2008 3,100,000 +24%
Bandcamp Sep 2008 1,200,000 +21%
Grooveshark Jan 2006 1,100,000 0%
Slacker Jun 2007 150,000 0%
MOG Dec 2009 140,000 -30%
Rdio Aug 2010 100,000 +43%
Rhapsody Dec 2001 40,000 -20%
 
iTunes 20,200,000 likes +6%
Amazon MP3 181,000 likes +1%

Monthly Users According to Facebook - January 2012

Service Launch Date Monthly Users
Spotify Oct 2008 13,200,000
Pandora Jan 2000 8,900,000
SoundCloud Oct 2008 2,500,000
Grooveshark Jan 2006 1,100,000
Bandcamp Sep 2008 990,000
MOG Dec 2009 200,000
Slacker Jun 2007 150,000
Rdio Aug 2010 70,000
Rhapsody Dec 2001 50,000
Google Music Nov 2011 1 (again, according to Facebook)
 
iTunes 19,000,000 likes (typically 5x monthly users)
Amazon MP3 180,000 likes

-

Universal's Audible Watermark

A while ago I posted my confusion about Weird Spotify Compression Artifacts. It turns out the artifacts are not due to compression, but a result of audio watermarks that Universal Music Group embeds in all of their digitally distributed tracks. This includes tracks resold in lossless formats. The artifacts appear on UMG tracks at Rdio, Spotify, iTunes, Amazon, and anywhere else the tracks are distributed. I have also heard UMG's watermarks over FM radio in my car. Universal Music labels include Interscope, The Island Def Jam, Universal Republic, Verve, GRP, Impulse!, Decca, Deutsche Grammophon, etc.

Just to show that it's not limited to Spotify, here is an example of the watermark artifacts through another distributor.

What the watermark sounds like


Spectrogram of the difference between a watermarked and unwatermarked UMG track. The energy is concentrated in two bands between about 1 khz and 3.5 khz - where the human ear is most sensitive.

UMG uses a spread spectrum watermark, a technique explained in detail in this Microsoft research paper. The watermark scheme modulates the total energy in two different bands, 1khz to 2.3 khz and 2.3 to 3.6 khz. The energy is concentrated in the most perceptually sensitive frequencies because that makes it more difficult to attack or remove without significant audible distortion.

The energy is increased or reduced in 0.04 second blocks. The result can be characterized as a fluttering, tremolo sound. Listen closely to the original vs. watermarked audio samples and try to focus on the 1 khz to 3.6 khz noise range. It helps to wear headphones in a quiet environment.

Audio samples

Here is a short sample (excerpt: Three Doors Down - When You're Young). These are lossless original and watermarked files; what you hear is not a result of compression.

Original:

Watermarked:

If the difference between the two isn't clear, here it is by itself:

Difference:

The character of the watermark may seem subtle during this short sample, but through the duration of an entire song it becomes more familiar and more annoying. Check out my original post on the subject for more examples.

Technical details

The watermark does not start until 1 second into the audio. After this the signal is divided into 0.08 second blocks. Each block is divided in two: some amount of energy is added to the first half and the same amount is subtracted from the second half. This coding scheme allows blind detection (without access to the original file). The actual information in the watermark is not easily recovered because it is modulated by a pseudo random sequence, which is generated by a secret key.

I have done a little searching and I think UMG's watermarking technology is provided by MarkAny, a Korean company that has developed their own watermarks out of university research, and purchased some watermarking patents from Digimark.

Removing the watermark

Since the watermark creates audible distortion, it's worthwhile to try to reduce it. I wrote a script that analyzes the block energy and applies some smoothing. This is the result so far.

Watermarked:

Restored:

More discussion

Hydrogenaudio forums on watermarking

UMG Watermarks audiophile files, pisses off paying customers

Why do labels watermark tracks? Watermarking simplifies copyright enforcement by letting a company track music on peer-to-peer networks. "It gives them the ability to put pressure on policy makers and ISPs to do filtering," says Fred Von Lohmann, an Electronic Frontier Foundation attorney. That may be about the best explanation you will find. See DRM Is Dead, But Watermarks Rise From Its Ashes

I don't have anything against watermarking, but I have a problem as a consumer when it is poorly implemented and destroys music I've downloaded legally.

Why isn't UMG's watermark talked about more? Maybe people think the audio quality problems are due to some kind of lossy compression, as I did, and ignore it completely, or blame the streaming service/distributor. The problem here is that the UMG watermark degrades the audio to about the equivalent of a 96 kbit MP3. My guess is that if consumers were informed about what is going on, they would care. Especially those who pay full retail price for digital downloads advertised as lossless audio.

WFS Paper Presented at AES 131

I presented a short paper derived from my thesis work at the 131st Convention of the Audio Engineering Society in New York City. The paper is titled Wave Field Synthesis in Three Dimensions by Multiple Line Arrays. The paper focuses on subjective assessment of a modification to traditional wave field synthesis. I met Dr. Frank Melchior, CTO of IOSONO, after the presentation. Dr. Melchior also presented an assessment of modified WFS at the convention.

Abstract:

Wave field synthesis (WFS) is a spatial audio rendering technique that produces a physical approximation of wavefronts for virtual sources. Large loudspeaker arrays can simulate a virtual source that exists outside of the listening room. The technique is traditionally limited to the horizontal plane due to the prohibitive cost of planar loudspeaker arrays. Multiple-line-array wave field synthesis is proposed as an extension to linear WFS. This method extends the virtual source space in the vertical direction using a fraction of the number of loudspeakers required for plane arrays. This paper describes a listening test and software environment capable of driving a loudspeaker array according to the proposed extension, as well as the construction of a modular loudspeaker array that can be adapted to multiple-line configurations.

Mac OS X Speech Synthesis Markup

I put a Sound Blaster 16 in my first computer. It came with a CD-ROM full of goodies like Dr. Sbaitso, the talking psychologist. One app was called TextAssist and it had a special syntax that let you string together phonemes, specifying pitch, duration, even vibrato. I spent hours composing weird robot jingles.

Itsy Bitsy Spider by TextAssist

The built-in text-to-speech on Mac OS also has the ability to sing your melodies through phonemic modifiers and TUNE syntax. (The full specification is available at the link.) TUNE embedded speech commands alter intonation by controlling pitch, word emphasis, pause length, etc. This stuff is baked in, so if you're on a Mac, try highlighting the following code block, control-click, and go Speech > Start Speaking:

[[inpt TUNE]]
_
1OW {D 1066; P 109.0:0 119.8:12 153.0:30 164.6:39 154.0:50 131.7:62 120.7:69 111.2:73 97.1:89 103.1:93}
_
h {D 194; P 123.0:0 130.0:12}
1EY {D 652; P 147.0:0 159.0:14 161.0:32 147.3:63 112.0:91}
_
w {D 161; P 109.0:0}
1AW {D 1171; P 116.0:0 153.3:17 148.2:33 151.0:51 133.0:55 112.4:71 99.5:74 79.0:89 65.7:98 119.0:100}
% {D 293}
_
D {D 132; P 127.0:0}
IH {D 318; P 160.0:0 136.5:43 114.6:63 112.0:88}
s {D 145; P 112.0:0 119.0:95}
_
IH {D 189; P 119.0:0 119.0:13 119.0:27 120.0:87 119.0:93}
z {D 180; P 117.0:0 117.0:64 117.0:86 117.0:93}
_
k {D 80; P 117.0:0 113.0:81 164.0:95}
1UW {D 820; P 164.0:0 168.0:9 172.0:18 169.5:46 139.0:74 116.8:86}
l {D 260; P 104.0:0 92.0:64 104.0:100}
. {D 212}
[[inpt TEXT]]

There's also a simpler mode for phonemic modifiers. The first line here is spoken with default interpretation, and the second line uses the modifiers.  Try it:


You talkin' to me
[[inpt PHON]] [[slnc 500]] [[rate -30]]
+yUW _1tAOl=kIHn ~AX [[pbas +3]]+mIY?

XCode also ships with a hidden gem called Repeat After Me that helps you build this funky syntax from your own spoken phrase. It extracts pitch contour and fits phonetic onsets of a typed phrase to your spoken phrase. 

Gaze-Enhanced HDR Viewing

Someday, Walmart will sell TVs capable of blinding you. But until then, we have to deal with weak displays that cannot reproduce the sun's luminosity. And High Dynamic Range images will have to be viewed by proxy.

 

Tones that fall in the HDR range of the histogram are tones that are impossible to display on your monitor and therefore appear as blown out or completely stopped up spots in an image.

In video games with an HDR graphics pipeline, the exposure is controlled automatically. http://www.youtube.com/watch?v=-spSQYtVfuk The game acts like a digital video camera, truncating the histogram based on what's in the center of the frame.

However, with static HDR photographs or panoramas on a computer screen, there is no such region-of-interest-based automatic exposure. The entire picture has to be tone mapped, flattening out the image's dynamic range. If you ask me, that defeats the purpose of retaining HDR images in the first place.

One way around this is to create an interactive HDR viewer that would tune the exposure for, say, the region near the mouse cursor. But an even better option would be to track the user's gaze. Now we're in academic territory.

Interactive viewing could be implemented in browser with these cool HDR Javascript tools: http://pfstools.sourceforge.net/hdrhtml.html. Actual gaze tracking in the browser, well, that I'm not so sure about.

Another cool site that demonstrates true HDR: http://www.hdrlabs.com/gallery/realhdr/

OpenCV 2.1 and 2.3 with Visual Studio 2010 Quick Start

I am using OpenCV 2.1/2.3 with the newer C++ style OpenCV interface. There are a few tricky parts and changes that aren't mentioned in the cheatsheet. Hopefully the OpenCV documentation will continue to improve as it grows.  Anyway, here is a quick start guide that might help beginners out.

Camera Capture in OpenCV 2.x

#include "opencv\cv.h"
#include "opencv\highgui.h"
using namespace cv; ...
VideoCapture cap(0); // open the default camera
if(!cap.isOpened()) // check if we succeeded
return -1;
for(;;) {
Mat frame;
cap >> frame; // get a new frame from camera
imshow("Camera Preview", frame);
if(waitKey(30) >= 0) break;
}

OpenCV 2.1 Visual Studio Project Starter

  • C/C++ > Include Directories:
    • X:\OpenCV2.1\include
  • Linker > Additional Library Directories:
    • X:\OpenCV2.1\lib
  • Linker > Input > Additional Dependencies: (this will vary from one project to the next, but you probably want the first three)
    • cv210.lib
    • cxcore210.lib
    • highgui210.lib
    • cvaux210.lib
    • cxts210.lib
    • ml210.lib
    • opencv_ffmpeg210.lib
  • Add to PATH environment variable (or copy DLLs into output directory):
    • X:\OpenCV2.1\bin

OpenCV 2.3.1 Visual Studio Project Starter

OpenCV 2.3.1 directory structure is different from other versions. I downloaded from OpenCV-2.3.1-win-superpack.exe. This assumes x86, 32-bit architecture, so change accordingly if you are on 64-bit Windows. Note this also includes libs for static linking (instead of requiring DLLs) in OpenCV2.3\build\[architecture]\[compiler]\staticlib. Project settings (this is kind of a deluxe set that was necessary to build the opencv_stitching example):

  • C/C++ > Include Directories:
    • X:\OpenCV2.3\build\include
    • X:\OpenCV2.3\modules\imgproc\src
  • Linker > Additional Library Directories:
    • X:\OpenCV2.3\build\x86\vc10\lib
  • Linker > Input > Additional Dependencies: (this will vary from one project to the next, but you probably want the first three)
    • opencv_core231.lib
    • opencv_highgui231.lib
    • opencv_imgproc231.lib
    • opencv_features2d231.lib
    • opencv_flann231.lib
    • opencv_gpu231.lib
    • opencv_haartraining_engine.lib
    • opencv_legacy231.lib
    • opencv_ml231.lib
    • opencv_objdetect231.lib
    • opencv_ts231.lib
    • opencv_video231.lib
    • opencv_calib3d231.lib
    • opencv_contrib231.lib
  • Add to PATH environment variable (or copy DLLs into output directory):
    • X:\OpenCV2.3\build\x86\vc10\bin
    • X:\OpenCV2.3\build\common\tbb\ia32\vc10 (optional, for Intel Threading Building Blocks parallel processing support)

And you'll need to include the right headers at the top of your source. You have the option of including the entire OpenCV 1 and OpenCV 2 C++-style interface, or just the C++-style interface. This gives you everything, since cv.h and highgui.h include the C++ interface:

#include "opencv/cv.h"
#include "opencv/highgui.h"
...
using namespace cv;

If you are feeling dangerous, you can use the C++ interface exclusively with .hpp:

#include "opencv2/core/core.hpp"
#include "opencv2/highgui/highgui.hpp"
...
using namespace cv;
If everything compiles and then you get runtime errors at the first instance of an OpenCV function call, something like this:
First-chance exception at 0x7c90e4ff in OpenCVHello.exe: 0xC0000008: An invalid handle was specified.
This might be a bug in the OpenCV build, I don't know. You can disable it by going to Debug > Exceptions, unfold Win32 exceptions, and uncheck 0xC0000008.

Integrating OpenCV in larger programs
with rich interfaces

OpenCV highgui is great for experimenting.  But eventually, you might want to develop a useful stand-alone program.  So for this, cv::imshow and cvWaitKey are not going to cut it. See discussion at StackOverflow, Integrating OpenCV with larger programs.

In most cases, there is no magic to handing off image data to other libraries. The standard container is an unsigned char array. If your image is 320x240 RGB, then your array size is 3 x 320 x 240 x sizeof(unsigned char) bytes. Since this is an unlabeled container, the consumer of this data will also need to be supplied with the image height, width, and color mode. You can initialize a cv::Mat with raw data by supplying a pointer (this assumes RGB color mode): Mat img(rawDataWidth, rawDataHeight, CV_8UC3, rawData); and later access the raw data with img.data. The only snags you are likely to come across are in dealing with row order and RGB vs. BGR format. OpenCV provides cv::flip() and cv::cvtColor() to deal with this.

I have had success using videoInput library to access DirectShow devices. The library enumerates and lists friendly names of connected devices, and generally gives you more information when something goes wrong. It's much better than guessing capture device IDs, and essential if you are going to provide some camera selection to the user.

Qt + OpenCV

Key points:

  • Qt is a good option because Qt is great and has a big community.
  • Open a new thread to run cv::VideoCapture in a loop and emit signal after frame capture. Use Qt'smsleep mechanism, not OpenCV. So, we are still using OpenCV highgui for capture.
  • Convert cv::Mat to QtImage:
    QImage qtFrame(cvFrame.data, cvFrame.size().width, cvFrame.size().height, cvFrame.step, QImage::Format_RGB888);qtFrame = qtFrame.rgbSwapped();
  • Optional: Render with GLWidget. Convert QtImage to GLFormat with Qt built-in method:
    m_GLFrame = QGLWidget::convertToGLFormat(frame);this->updateGL();

Juce + OpenCV

I have also had success integrating OpenCV with Juce/OpenGL, rendering a capture frame on rotating surface in OpenGL. Converting a cv::Mat to an OpenGL texture goes like this:

declarations:

cv::Mat frame; // object to receive a captured frame.
cv::VideoCapture cap;  // video capture object
CvMat *_img;
CvMat *arrMat, *cvimage, stub;

in camera capture loop (usually a separate thread):

cap >> frame;
/* following will convert cv::Mat to CvMat data format, which can be
	used as a texture in OpenGL */
if (_img == 0)
	_img = new CvMat(frame);
CvArr *arr = _img;
arrMat = cvGetMat(arr, &stub);
cvimage = cvCreateMat(arrMat->rows, arrMat->cols, CV_8UC3);
cvConvertImage(arrMat, cvimage, 0);
//sleep function here:
  waitKey(33); //OpenCV
//Sleep(33);   //windows.h
//msleep(33);  //Qt
//wait(33);    //Juce

in the OpenGL render function:

glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, cvimage->cols, cvimage->rows, 0,
	GL_BGR_EXT /* GL_BGR */, GL_UNSIGNED_BYTE, cvimage->data.ptr);

I have a feeling this sample code, which I borrowed from this OpenGL+OpenCV tutorial, could be even simpler; I also don't like that it's using OpenCV 1.0 style structures. More on Juce/OpenGL later.

 

Google Latitude is Cool

Google Latitude is starting to look a lot like my original Stractor location tracking concept work. They have pie charts that illustrate where you spend your time; work, home, out and about.

Google Latitude:

Stractor:

Okay, that's nice. Here are some other Stractor concepts...maybe we'll see some of these soon:

http://www.butterscotch.com/news/168/Google-Continues-To-Improve-And-Streamline-Maps-In-New-Update

Music Smasher: Streaming Music API Mashup

Music Smasher is a smash up of popular music streaming service APIs. I developed it to simplify the task of searching across Spotify, Rdio, Grooveshark, and other catalogs to find out who has the music you love, giving you more information for choosing a streaming service.

Music Smasher is currently hosted at mattmontag.com/smasher.

Note that due to differences in the APIs, the result counts aren't telling the whole truth (especially for Grooveshark :( ). I hope I can fix that.
Please post any problems or suggestions in the comments.
Technical notes coming soon.

Update 9/9/2011:

Music Smasher has been pretty useful during regular Spotify browsing. Today I learned Spotify doesn't have any Antiloop, Mark Ronson's Bike Song, or any of the Fabriclive series.

Update 12/24/2011:

Results are now linked to for streaming. Rdio tracks that are unavailable for streaming or download will no longer show up in results.

Weird Spotify Compression Artifacts

Spotify Artifacts Showcase

I love Spotify. But I've noticed some weird artifacts on a few albums. It sounds like a fluttery warble noise in the midrange. It's most noticeable during big string or choir sections with broad spectral content.

The problem seems limited to certain albums. The worst album I've come across is this Pascal Roge Poulenc album.

Spotify uses Ogg Vorbis encoding. Free streaming accounts get Ogg Vorbis q5, 160kbps. I wonder if this is a normal Ogg Vorbis artifact, or something more complicated. Here are some audio samples. Can you hear it?

Michael Jackson - Ben / on Spotify
Poulenc - Piano Concerto Movement 2 (linked above)
Poulenc - Piano Concerto Movement 3
Poulenc - Piano Sonata Movement 1

And the worst example - yikes, is this intentional?
Three Doors Down - When You're Young / on Spotify

Update 1:

I just upgraded to Spotify Premium and switched to 320kbps high quality streaming. I'm sad to say the nasty artifacts are still audible on these songs.

Update 2:

Due to investigative effort by Ulysses at spotifyclassical.com, this is no longer a mystery but indeed just a case of bad compression; it turns out maybe 50 percent of the Spotify library is actually available at 320kbps.

If this is true then I don't like Ogg Vorbis, since their 160kbps rate seems to have far more annoying artifacts than any 160kbps MP3 I've ever heard. Ogg is supposed to be better.I have not been able to reproduce the artifacts with my own Ogg compressor.

Update 3:

I compared the latest Three Doors Down album on Spotify, Rdio, and FLAC original audio. It's pretty clear that sometimes, the labels just supply bad tracks to the streaming services. The Spotify and Rdio tracks were identical.  I heard the same artifacts at the same points in the song.  The FLAC was very different.  This changes some of my earlier conclusions. The artifacts are not Ogg 160kbps artifacts.

To answer my question above, the warbling noise is definitely not part of the original Three Doors Down - When You're Young  track.  Now the question is, exactly what format are these tracks provided in, and why is there such variance even among a single label's catalog? Are these differences intentional?

Update 4:

Finally, the source of all this consternation: Universal Media Group supplies heavily watermarked audio to all services. More.