I have been working my way through Toby Segram's book Programming Collective Intelligence
. The book is about mining web data, particularly Web 2.0 (Gawd, I hate that term) data. My motivation is twofold: brush up on my Python programming and learn somethings about web mining. I do a fair amount of data mining in my day job. In computational biology, a lot of time is spent extracting various types of data from web resources and trying to analyze the data and produce something meaningful from it. The book does a decent job of introducing the kinds of techniques that we commonly use in genomic data analysis: Bayesian classification, clustering, SVMs etc. However, it doesn't go into a lot of detail about any of them. For that, you have to go elsewhere. The book gives the Python code for mining sources for extracting data from sites like Ebay, Facebook, and Yahoo Finance. So far, I haven't encountered anything particularly astounding, but it's been an interesting series of exercises.
One of the things that has become apparent in my work in computational biology is the power of the average. Web data like that described in the book is often high-dimensional (many variables) and discrete (values are not continuous). In dealing with high-D, discrete data, it's common to use optimization techniques and find a "best" answer. We have seen in some cases of biological data, this approach can be misleading. The probability of the "best" result can be vanishingly small and in some cases not representative of the data as a whole. (See Carvalho and Lawrence
for some examples.) Instead, a centroid
or median result may be more useful and more representative of the distribution of the data values.
In dealing with group data, such as in this book, sometimes you want the best result: the highest mileage car, the cheapest flight. However, when you're mining data to determine something like public attitudes, a measure which finds the "center" may be of more use.
I do a lot of commuting by car. During my drives, I like to listen to audio books and podcasts. I recently listened to an exceptional collection of podcasts, The Traneumenatary
. The Traneumentary is a collection of commentaries on John Coltrane's life and music by musicians and writers. Even if you're die-hard Coltrane fan like me, you'll find new insights into Coltrane and his music. If you are not familiar with Coltrane, it's a wonderful place to start. 'Trane is a towering figure in modern music. Forty years after his death, the music world still hasn't completely come to terms with him.
My affair with Coltrane's music began when I was in high school. During that time, I would occasionally listen to an R&B station out of the Baltimore-Washington area . On one of the early anniversaries of Coltrane's death, the DJ played a couple of Coltrane tunes. Even hearing the music on a scratchy AM station, I knew it was something special. A couple of days later, I checked out the Coltrane bin in a local record store. I found the Expression
album. Expression was recorded shortly before Coltrane's death. Knowing nothing about 'Trane or his music, I figured it should be his best. In some ways, I was right.
During those days, some friends and I would play poker at my house with my dad. It was strictly a nickel-dime game; if you won or lost $2.00, it as a big night. My Dad said, "why don't you put on that record that you bought?", so I stuck Expression on the record player. Expression has some pretty "out-there" parts. 'Trane pushes the limits of the sax, but with absolute control. The title piece and the cut, Offering, have some parts that to some ears might be considered harsh or atonal. Needless to say, the poker crowd reacted negatively ("WTF is that!"), except for me and one other guy, Mike Mayes (I wonder where he is today). I was spellbound. I had never heard or even imagined music like that.
Over the years, I have listened to the Expression album and every other Coltrane piece I could find over and over. There is a deep spiritual quality to the music coupled with the coolest damn jazz you can imagine. More importantly, the man himself comes through the music.
I think the spirit of Coltrane is expressed best in these two quotes:
I start from one point and go as far as possible. But, unfortunately, I never lose my way. I say, unfortunately, because what would interest me greatly is to discover paths that I'm perhaps not aware of of.
I would like to bring to people something like happiness. I would like to discover a method so that if I want it to rain, it will start right away to rain. If one of my friends is ill, I'd like to play a certain song and he will be cured; when he'd be broke, I'd bring out a different song and immediately he'd receive all the money he needed.
If I could live like that, maybe I would really be doing something.