Before Completion

Order from chaos...while you wait.

31 December

Please Excuse My Dead Aunt Sally

Operator precedence: parenthesis, exponentiation, multiplication, division, addition, subtraction
10:50:45 - bbth - No comments 0 TrackBacks

01 August

Perl 6

I just downloaded Rakudo Star, the latest beta of Perl6. I've been a Perl user for over 10 years, using it for bioinformatics. I tried an early version of Parrot the Perl6 byte code interpreter and even wrote a simple Lisp in it. But due to the slow pace of Perl6 development I lost interest in Perl6. Finally, after 10 years, Rakudo Star has been released. It claims to be a usable, but incomplete version of Perl6. I'll give it try.

I have been dubious about the Perl6 project. It's taken a long time. For quite a while, it stalled Perl5 development. But over the last few years, Perl5 has made great strides, particularly with the Moose object system. I wonder if the language wouldn't be better off if the efforts put into Perl6 had been put into Perl5. Would we have the feature set of Perl6 working by now?
21:32:06 - bbth - No comments 0 TrackBacks

24 June

Collective Intelligence

I have been working my way through Toby Segram's book Programming Collective Intelligence. The book is about mining web data, particularly Web 2.0 (Gawd, I hate that term) data. My motivation is twofold: brush up on my Python programming and learn somethings about web mining. I do a fair amount of data mining in my day job. In computational biology, a lot of time is spent extracting various types of data from web resources and trying to analyze the data and produce something meaningful from it. The book does a decent job of introducing the kinds of techniques that we commonly use in genomic data analysis: Bayesian classification, clustering, SVMs etc. However, it doesn't go into a lot of detail about any of them. For that, you have to go elsewhere. The book gives the Python code for mining sources for extracting data from sites like Ebay, Facebook, and Yahoo Finance. So far, I haven't encountered anything particularly astounding, but it's been an interesting series of exercises.

One of the things that has become apparent in my work in computational biology is the power of the average. Web data like that described in the book is often high-dimensional (many variables) and discrete (values are not continuous). In dealing with high-D, discrete data, it's common to use optimization techniques and find a "best" answer. We have seen in some cases of biological data, this approach can be misleading. The probability of the "best" result can be vanishingly small and in some cases not representative of the data as a whole. (See Carvalho and Lawrence for some examples.) Instead, a centroid or median result may be more useful and more representative of the distribution of the data values.

In dealing with group data, such as in this book, sometimes you want the best result: the highest mileage car, the cheapest flight. However, when you're mining data to determine something like public attitudes, a measure which finds the "center" may be of more use.
18:55:03 - bbth - No comments 0 TrackBacks

07 January

Monads in Perl

I've dabbled a little in Haskel so I'm vaguely familiar with the concept of monads. Monads are a way to maintain state in functional languages. Perl is hardly functional, although with some contortions, it can be used in a functional manner. So why monads in Perl? It already has many ways to produce side effects. I guess the main reason is that it can be done and this article shows how to do it.
21:41:33 - bbth - No comments 0 TrackBacks

03 January

Open Source Search

Wikia Search has been getting a bit of buzz lately. It's the new open source search engine project headed bu Wikipedia founder, Jimmy Wales. A good open source search engine is really needed. We're all too reliant on Google. The problem with Google is that it's algorithms are hidden and thus subject to manipulation by Google, for example, to meet the approval of various governments such as China's. The down side to an open source search engine seems to be that it will be easier for web sites to manipulate their page positions than the hidden and changing Google algorithms. Since you can know how the engine is indexing, it seems that you could adjust your site to get a high page index. Wales has indicated that Wikia Search, a for-profit company, will allow anyone to build a search engine. That would be interesting - If we each had our own engine, tailored to our own wants and needs. Certainly, that would be great for individual web sites, but I think it may be a while before anyone can seriously challenge Google as a general purpose web search engine.

Here is a list of Open Source search tools. http://www.searchtools.com/tools/tools-opensource.html
19:23:30 - bbth - No comments 0 TrackBacks

07 February

OLPC security

The One Laptop Per Child (OLPC) project has announced the security plans for the $100 laptop. It built around a system called Bitfrost. One part that I found troubling is this quote from the main developer in the Wired story about Bitfrost:

Still, Krstic admits there's a drawback to his system: It limits interactions between applications. "This kind of model makes it more difficult for glue between applications to be built," Krstic said. "But 99 percent don't need glue."

This seems like it's going to lead to two problems: monolithic software because programs do need glue and lock down of the computer because it will lead to the computer being used but not programmed.

Maybe I am misunderstanding the security plans. I hope so. Preventing millions of $100 laptops from joining the botnet is an admirable goal, but not if it limits what a kid can do with the computer. One of the goals of teh prject states: Our commitment to software freedom gives children the opportunity to use their laptop computers on their own terms. While we do not expect every child to become a programmer, we do not want any ceiling imposed on those children who choose to modify their machines. Let's hope Bitfrost provides adequate security but doesn't get in the way of this important goal.
22:12:57 - bbth - No comments 0 TrackBacks

10 September

L# - why somthing like .NET is the future of programming

Listening to Reason has a nice description of L#, a Lisp dialect that runs on Microsoft's .NET framework. I can't really say much about L#, since I haven't used it and I don't have .NET installed on anything that I can access readily. However, the article illustrates why something like .NET is important. One of the problems faced by the developers of programming languages is the simple fact that an entire framework is needed before a language can be of much use to a programmer. One of the complaints about various Lisp dialects is that they lack many standard libraries needed for modern program development. One of the great things about a language like Perl is CPAN, which contains just about every concevable module (and a few inconcevable ones) needed for writing apps. The large number of freely available code libraries for languages like Perl, Python and Java keep people programming in them even when there might be other languages available. Something like .NET gives allows a language designer to build on a large set of resources that fill in the gaps for things like regular expressions, web output etc. Language builders can thus concentrate on designing elegant, simple languages.

I'm not advocating for .NET in any way. I don't know enough about it. I usually don't like Microsoft products. I find them, for the most part, bloated and tending to do things the "Microsoft Way" and not the way I want them done. Mono might be an open source alternative. Again, I don't know enough about it, but I intend to take a hard look at it. In principle, I think this is where language design will be headed - simple elegant languages built on top of framewoks that provide the resources needed to interact with the world.
17:21:15 - bbth - No comments 0 TrackBacks

09 September

Lisp and Perl Syntax

There's an interesting comment from Mark Jason Dominus here. MJD is Perl guru of the highest level. If you are a programmer, you owe it to yourself to get his book, Higher Order Perl. It's good for Python and Ruby folks too. The comment is about a year old and describes the power of Lisp macros and compares them to the weaker abilities of languages like Perl and C++.

I've noted before that my major complaint about Perl is that it is all syntax, as opposed to Lisp which has a minimal synatx. You can learn enough Lisp syntax in one short session to write useful programs for years. I've programmed in Perl for 10 years and I'm still surprised by odd little Perl syntactic quirks.

This was bought to mind becuase I have been working on a LispKit compiler for the Parrot virtual machine. LispKit Lisp is like a bad habbit for me. Periodically, I come back to it and rewrite it. It's a bit like a Jazz musician coming back to his favorite riffs from an old standard during an improvisation. I have written Lispkit in C, Pascal and Java. The other evening, I wrote the tokenizer and parser in Perl. The whole thing is less than 300 lines, including utility routines. It only took about two hours to write. This is much easier to write and understand in Perl than the earier version were. Part of this is because of Perl's superior string and regex handling.

I could have used a Perl parsing module like Parse::RecDescent, but that would have been overkill. The beauty of Lisp is that it is trivial to parse. Once the input was tokenized, parsing the Lisp expressions is about 50 lines of Perl.

This leads me to a second comment about coding style. I tend to write Perl with a C style. I don't like Perl's automatic variables and don't like to use constructions like the built-in $_ for a loop variable. This can lead to uncomfortable surprises. I like my variables to be explicit. Some Perl mavens sneer at this approach, but I think being pedantic in code is a good idea. One of the rules I try to apply is to be as obvious as possible. Sometimes, however, a construction like this $num =~ /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/ for identifying a valid number is better than the explicit code. If you haven't seen that regex before, I challenge any Perl programmer to decpher it in less than a couple of minutes. However, it would take many lines of more explicit code to replace it. It would be easy to get bogged down in the details and get them wrong. That's the power of syntax. In this case, Perl's concise syntax makes code more readable. The trick is to know when to be obvious and when not to.
19:05:11 - bbth - No comments 253 TrackBacks

23 August

Perl, Ruby and the Parrot

Ruby has been described as Perl's younger, better looking sister. It's definitely better looking. One of Perl's bigger problems is that it is all syntax. Even a well written Perl program can look like a dog's breakfast. There are so many different syntax forms that it's all but impossible to remember them. It can be concise to the point of madness. I've been programming in Perl for 10 years and I still run across unreadable snippets of code, code that looks like line noise.

Ruby, on the other hand, has a cerain elegance. Ruby is object oriented, and everyting is an object to Ruby. It maintains much of Perl's ease of use and powerful regex features. It really seems to be a better Perl.

So, why not switch? The main reason is BioPerl. My day job is computational biology research. I frequently have to do things like parse Blast output, align groups of sequences, generate phylogentic trees etc. Bioperl makes doing these tasks easier. Every time I thinkl about switching to another scripting language like Ruby or Python, I think about giving up BioPerl and don't make the switch. There are BioRuby and BioPython projects as well as BioJava and Biowhatever efforts. None of these are as mature or as useful as Bioperl, so I guess I'll be sticking with Ruby's older, homely sister.

The current version of Perl is 5.8. Perl 6 is somewhere over the horizon. It's a tough call whether Microsoft Vista, Duke Nukem Forever or Perl 6 will arrive first. In the meantime, there's Parrot. Parrot is a virtual machine designed to support the Perl 6 language. It can support other languages as well such as Scheme.

I've had an interest in functional languages ever since I read Peter Henderson's book, Functional Programming, years ago. In it he describes a functional variant of Lisp called Lispkit Lisp. I've implemented this language with extensions several times, most recently in Java. In the book, Henderson describes a virtual machine called the SECD machine. It has a handful of instructions and 4 registers. He shows how Lispkit Lisp can be compiled for this machine.

My plan is to implement Lispkit Lisp for Parrot. I'll report here from time to time how the project is going.


18:28:28 - bbth - No comments 258 TrackBacks

16 April

Continuations for Everyone

As I mentioned before, I have been reading M ark Jason Dominus book, Higher Order Perl. Probably because of that I've become more aware of the way functional programming techniques are used in common scripting languages like Perl, Python and Ruby. Stumbling around the net this morning, I ran across this article by Sam Ruby called Continuations for Curmudgeons. It's a really nice exposition of the use of continuations and closures for those who grew up writing C instead of Lisp. Although, I would have called, the Python 'yield' statement an iterator instead of implementing an continuation, but that's just me.

After reading the chapter in Mark Dominus' book on memoization, I went looking for examples of memoization in Python and stumbled onto this snippet of Perl code by Danny Yoo:

use strict;
use Memoize;


sub fib_maker {
my ($a, $b) = @_;
my $fib_helper = sub {
my ($f, $n) = @_;
print "Called on $n\n";
if ($n == 0) { return $a;}
if ($n == 1) { return $b;}
return $f->($f, $n-1) + $f->($f, $n-2);
};
## Some contortions seem necessary to get a nested-scope recursive
## function... *grin*
return sub { $fib_helper->($fib_helper, @_[0]) };
}

{ no strict 'refs';
*{fib} = fib_maker(1, 1);
}

memoize('fib');

print "call1", "\n";
print fib(5), "\n";
print "call2", "\n";
print fib(6), "\n";
print "call3", "\n";
print fib(6), "\n";

Despite the presence of the Memoize module, this doesn't the way you want. It yeilds a very shallow form of caching. The code is probably a little more convoluted than you would actually write. It's also unntelligible to non-Perl programmers (and some Perl prgrammers too).

It's easy to fix so that it memoizes properly by wrapping the returned closure with a memoize like this $fib_helper = memoize (sub { .... } ); The example is the kind of thing you don't run into often, nested scope recursive functions, but it shows that even a very clever module like Memoize can be yield surprises once in a while.

07:53:14 - bbth - No comments 261 TrackBacks

27 March

Perl don't get no respect

I was having a conversation the other day with a fellow programmer. He's new to bioinformatics programming and the conversation drifted into the area of programming languages. I mentioned that a lot of the programming I do is in C or Perl. He snorted and said "Perl? It's impossible to maintain." I have to disagree. Well written Perl code is no more of a maintenance problem than anything well written in any language. It just looks harder because of Perl's syntax. That's really an issue more of the "eye of the beholder" than a real problem. To some programmers Lisp looks elegant, to others it looks like a rat's nest of parenthesis. To some Perl looks like the dog's breakfast to the rest of us...well, it still looks a bit like the dog's breakfast, but we like it.

I also ran up against this issue when I recently gave a talk to a group of CS grad students about bioinformatics. I mentioned to a couple of folks who were questioning me after the talk that Perl was one of the big languages in bioinformatics programming because it had some useful tools for digging though large files of strings. The general reaction was "Ewwweh". They didn't seem pleased with the notion of grubbing around in DNA data with a non-elegant, in their opinion, tool like Perl.

I bring all this up because I just receieved Mark Jason Dominus' Higher Order Perl. In it, he demonstrates that Perl can be used in a functional programming style. Functional prgramming is a powerful abstraction tool and Mark shows how it can fit into the toolkit of the Perl programmer. I recommend this book to anyone who has to code in Perl, has an interest in learning about functional prgramming but has been scared off by the academic bent of the functional programming crowd or just wants to learn to be a better programmer in whatever language they are currently using. The writing is clear, the examples consise. The syntax still looks like the dog's breakfast, though.
14:36:34 - bbth - No comments 264 TrackBacks

TrackBack

No trackbacks for this item. Use this trackback url to ping. (right-click, copy link target)