--- date: "2005-02-16T00:40:23Z" title: Guess Who's Back? ---

The blog post hiatus has ended! Here's what's new in the world o' Pablotron. First of all, the main hard drive on vault — my file/database/LDAP/email server — bit the dust last Wednesday. Fortunately the drive just started to fail (instead of dying outright). I had ample room to do immediate backups, and I had an unused 160G drive laying around. I spent most of Sunday afternoon and all of Monday evening, partitioning the new drive and copying stuff back to it. As far as I can tell, the only thing I actually lost was the words file for spamprobe. I don't really consider that much of a loss, since I save all my email (even the cursed spam), so I can easily toss the requisite good and bad corpora at spamprobe to get things going again. Even though I'm short a 100G drive now, the experience overall has been a positive one. Here's some thoughts I had; maybe they'll prevent a week of stress for someone else:

On the non-catastrophic hardware failure front, I upgraded halcyon to the latest Xorg, then promptly downgraded to the latest stable release. Here's the approximate order of events:

  1. Spent an hour or two configuring, compiling, and installing the latest Xorg.
  2. Ran X, and found out that the proprietary NVidia driver isn't compatible with the latest CVS snapshot of Xorg.
  3. Discovered just how painful the composite extension is without hardware acceleration by foolishly attempting to run X using the nv driver. Hint: Imagine using Netscape Navigator 3.0 on your old Commodore 64 with Photoshop doing an RLE Gaussian Blur on a 100 meg image in the background.
  4. Promptly downgraded to the stable release, cursing both NVidia for their proprietary sillyness, and the bastards at freedesktop.org for having the audacity to make source code changes that inconvenienced me. I spent plenty of time on this step, so go ahead and re-read that last paragraph a couple of times.

Since I spent the majority of a Sunday afternoon recompiling X no less than 3 times, I also took the opportunity to try out the latest Enlightenment DR16 from CVS (yes Kim, I'm one of the few people still using e16). It's got it's own built-in, mostly (semi?) working composite manager, so the neither the patch nor the xcompmgr hackery I describe in this post are necessary any more). The new default theme looks great, too!

Why use other peoples' broken software when you can write your own? Here's the latest on the Pablotron coding front:

The big stuff I've been working on lately is core of the future Raggle. Before I begin, here's a high-level overview of how the components interact with one another (yup, a diagram!):

next gen raggle

I've mentioned Squaggle previously, but for those of you sleeping in the back of the class (you know who you are), here's a brief recap. Squaggle is the SQLite-Ruby-based engine for Raggle. It's cleaner, faster, it uses less memory, and it lets me do all sorts of cool things I can't really do with the current engine (fancty delicious-style tagging, fast cross-feed searching, smart/auto categorization, and more). The version of Squaggle in CVS is functional (it even includes a usable WEBrick-based interface.

So what's this new stuff on ye olde diagram? libptime is a C-based RFC822 datetime and W3C datetime parsing library. It's BSD licensed, so you can download version 0.1.0 (signature), and use it to your heart's content. The other new library on the diagram is libfeed, an Expat-based RSS (0.9x, 1.0, and 2.0)/Atom feed parser. Why bother writing an RSS parser in C? The existing Raggle engine is slow, partly from being DOM-based, and partly from being written in Ruby. Don't get me wrong, REXML is a great XML parser, but RSS aggregators deal in volume, and I want to be sure the volume isn't constrained by parsing. I also noticed there wasn't a nice C-based RSS/Atom parsing library. Now there is (well, almost!). If that doesn't convince you, then maybe this will:


pabs@halcyon:~/cvs/libfeed/test> du -sh data/big-pdo-wdom.rss 
15M     data/big-pdo-wdom.rss
pabs@halcyon:~/cvs/libfeed/test> time perl -mXML::RSS -e \
  '$rss = new XML::RSS; $rss->parsefile("data/big-pdo-wdom.rss");'
real    7m56.892s
user    4m31.578s
sys     0m19.939s
pabs@halcyon:~/cvs/libfeed/test> time perl -mXML::RSS -e \
  '$rss = new XML::RSS; $rss->parsefile("data/big-pdo-wdom.rss");'
real    5m57.838s
user    4m28.727s
sys     0m3.703s
pabs@halcyon:~/cvs/libfeed/test> time ruby -rrss/2.0 -e \
  'RSS::Parser::parse(File.read("data/big-pdo-wdom.rss"))'
real    2m30.950s
user    1m46.904s
sys     0m8.610s
pabs@halcyon:~/cvs/libfeed/test> time ./testfeed data/big-pdo-wdom.rss \
  >/dev/null 2>&1
real    0m2.195s
user    0m1.472s
sys     0m0.104s
pabs@halcyon:~/cvs/libfeed/test> time ./testfeed data/big-pdo-wdom.rss \
  >/dev/null 2>&1
real    0m2.010s
user    0m1.475s
sys     0m0.099s

The Perl times were so bad I had to run them twice to be sure. 60 times faster than Ruby and over 100 times faster than Perl; I'd say that's a pretty good start :).

Unfortunately, I have to be awake in three hours, so I'll have to save the rest of the next-gen Raggle description for another day...