"At COMPANY _______ we value your privacy a great deal. Almost as much as we value the ability to take the data you give us and slice, dice, julienne, mash, puree and serve it to our business partners, which may include third-party advertising networks, data brokers, networks of affiliate sites, parent companies, subsidiaries, and other entities, none of which we’ll bother to list here because they can change from week to week and, besides, we know you’re not really paying attention.
We’ll also share all of this information with the government. We’re just suckers for guys with crew cuts carrying subpoenas.
Remember, when you visit our Web site, our Web site is also visiting you. And we’ve brought a dozen or more friends with us, depending on how many ad networks and third-party data services we use. We’re not going to tell which ones, though you could probably figure this out by carefully watching the different URLs that flash across the bottom of your browser as each page loads or when you mouse over various bits. It’s not like you’ve got better things to do.
Each of these sites may leave behind a little gift known as a cookie -- a text file filled with inscrutable gibberish that allows various computers around the globe to identify you, including your preferences, browser settings, which parts of the site you visited, which ads you clicked on, and whether you actually purchased something.
Those same cookies may let our advertising and data broker partners track you across every other site you visit, then dump all of your information into a huge database attached to a unique ID number, which they may sell ad infinitum without ever notifying you or asking for permission.
Also: We collect your IP address, which might change every time you log on but probably doesn’t. At the very least, your IP address tells us the name of your ISP and the city where you live; with a legal court order, it can also give us your name and billing address (see guys with crew cuts and subpoenas, above).
Besides your IP, we record some specifics about your operating system and browser. Amazingly, this information (known as your user agent string) can be enough to narrow you down to one of a few hundred people on the Webbernets, all by its lonesome. Isn’t technology wonderful?
We store this information an indefinite amount of time for reasons even we don’t fully understand. And when we do eventually get around to deleting it, you can bet it’s still kicking around on some network backup drives in somebody’s closet. So once we have it, there’s really no getting it back. Hell, we can’t even find our keys half the time -- how do you expect us to keep track of this stuff?
Not to worry, though, because we use the very bestest security measures to protect your data against hackers and identity thieves, though no one has actually ever bothered to verify this. You’ll pretty much just have to take our word for it.
So just to recap: Your information is extremely valuable to us. Our business model would totally collapse without it. No IPO, no stock options; all those 80-hour weeks and bupkis to show for it. So we’ll do our very best to use it in as many potentially profitable ways as we can conjure, over and over, while attempting to convince you there’s nothing to worry about.
(Hey, Did somebody hold a gun to your head and force you to visit this site? No, they did not. Did you run into a pay wall on the home page demanding your Visa number? No, you did not. You think we just give all this stuff away because we’re nice guys? Bet you also think every roomful of manure has a pony buried inside.)
Thanks to Dan Tynan
has an article about a couple of companies called [x+1]
. The companies are essentially web miners. In the case of [x+1], it sounds like they store a cookie when you encounter one of the ads using their service. When you next encounter one of their ad servers, they can display a different ad, and by tracking your cookie history, possibly a more relevant one. Demdex, provides something they call a behavioral bank
. It sounds like they mine corporate data and apply some sort of quality score, called a TraitWeight, to individual features. Their website is heavy on jargon and low on information. The WSJ article says New York-based Demdex Inc., for instance, helps websites build "behavioral data banks" that tap sources including online-browsing records, retail purchases and a database predicting a person's spot in a corporate hierarchy. It crunches the data to help retailers customize their sites to target the person they think is visiting. "If we've identified a visitor as a midlife-crisis male," says Demdex CEO Randy Nicolau, a client, such as an auto retailer, can "give him a different experience than a young mother with a new family." The guy sees a red convertible, the mom a minivan
. As if. I wonder how they define "mid-life crisis" and what TraitWait that gets. Maybe they can do what the WSJ claims or maybe the WSJ reporter fell for some corporate hype.
The article has some examples of [x+1]'s analysis of users of a credit card web site. It's a little difficult from the article to get a handle on how much information was knowingly supplied by the users and how much was inferred by [x+1] and what they are reading from you browser history and cookies. Most of inferences seem to have came from location based information available from an IP address.
What I find troubling about this sort of thing is not web mining itself. I don't care if sites track me to present more relevant ads. The credit company in the article says they only use click mining to present more relevant ads to users of their web site. But it doesn't take much imagination to realize that there are many others who will want to use it for such things as looking at your bowser history to decide if you are a terrorist. The thing that bothers me is the ratio of false positives to true positives for this sort of thing. If a company display an irrelevant ad to me, too bad for the company, but it's not a big problem for me. If Homeland Security, has a high ratio, it will lead to many innocents being hassled and the bad guys slipping through.
I worry about this because I do a lot of mining of biological data, simple stuff like DNA sequences. When attempting to find simple patterns, we are often confronted with a high false/true hit rate and this is for simple data and searching for well defined characteristics, Features like "mid-life crisis" and terrorist are so ill-defined that even characterizing a true positive is likely to prove problematic.
BTW, [x+1] has to be one of the worst corporate names I have run across. Try googling it. You get a billion hits with only one relevant.
No trackbacks for this item. Use this
to ping. (right-click, copy link target)