--- date: "2005-02-16T00:40:23Z" title: Guess Who's Back? ---
The blog post hiatus has ended! Here's what's new in the world o' Pablotron. First of all, the main hard drive on
vault
— my file/database/LDAP/email
server — bit the dust last Wednesday. Fortunately the drive
just started to fail (instead of dying outright). I had ample
room to do immediate backups, and I had an unused 160G drive laying
around. I spent most of Sunday afternoon and all of Monday evening,
partitioning the new drive and copying stuff back to it. As far as I
can tell, the only thing I actually lost was the words file for spamprobe
. I don't really consider that much of a
loss, since I
save all my email (even the cursed spam), so I can easily toss the
requisite good and bad corpora at spamprobe
to get things
going again. Even though I'm short a 100G drive now, the experience
overall has been a positive one. Here's some thoughts I had; maybe
they'll prevent a week of stress for someone else:
cron
. Depending on how large this data set is,
I'll be burning DVDs of
the backup directory contents on a weekly or bi-weekly basis.
Aside: Richard (richlowe) has been
advocating revision controlled config files for quite a while (eg.
cvs -d pabs@cvs:/cvs co etc-files/vault
); maybe
I'll give that for a spin, too.sumo
is my IRC
/PostGres machine, but that hardly
qualifies as a crippling load).vault
started
failing at 1:30 in the morning on a Wendesday morning. I was able to
start making backups and moving stuff around right then. If I
didn't have the extra hard drive, I would have been
SOL for several
platter-scraping hours.
On the non-catastrophic hardware failure front, I upgraded
halcyon
to the latest Xorg, then
promptly downgraded to the latest stable release. Here's the
approximate order of events:
nv
driver. Hint: Imagine using
Netscape Navigator 3.0 on your old Commodore 64 with Photoshop doing an
RLE Gaussian Blur on a 100 meg image in the background.
Since I spent the majority of a Sunday afternoon recompiling X no less
than 3 times, I also took the opportunity to try out the latest Enlightenment DR16 from CVS (yes Kim, I'm one of
the few
people still using e16). It's got it's own built-in, mostly (semi?)
working composite manager, so the neither the patch nor the
xcompmgr
hackery I describe in this post are necessary any
more). The new default theme looks great, too!
Why use other peoples' broken software when you can write your own? Here's the latest on the Pablotron coding front:
kirchneraggle
or something more
suitable ;).wcolor_set
support to the built-in Curses interface.
Ville suggested it eons ago, and that was the last thing stopping me
from porting Raggle from Ncurses-Ruby.
The big stuff I've been working on lately is core of the future Raggle. Before I begin, here's a high-level overview of how the components interact with one another (yup, a diagram!):
I've mentioned Squaggle
previously,
but for those of you sleeping in the back of the class (you know who you
are), here's a brief
recap. Squaggle is
the SQLite-Ruby-based
engine for Raggle. It's cleaner, faster, it
uses less memory, and it lets me do all sorts of cool things I can't
really do with the current engine (fancty delicious-style tagging, fast cross-feed
searching, smart/auto categorization, and more). The version of Squaggle in CVS
is functional (it even includes a usable WEBrick-based interface.
So what's this new stuff on ye olde diagram? libptime
is a
C-based
RFC822 datetime and W3C datetime parsing library. It's
BSD licensed, so you can download
version 0.1.0 (signature),
and use it to your heart's content. The other new library on the
diagram is libfeed
, an
Expat-based RSS
(0.9x, 1.0, and
2.0)/Atom
feed parser. Why bother writing an RSS
parser in C? The existing Raggle engine is
slow, partly from being DOM-based, and partly from being
written in Ruby. Don't get me wrong, REXML is a
great XML parser,
but RSS
aggregators deal in volume, and I want to be sure the volume isn't
constrained by parsing. I also noticed there wasn't a nice C-based RSS/Atom
parsing library. Now there is (well, almost!). If that doesn't convince you, then maybe this will:
pabs@halcyon:~/cvs/libfeed/test> du -sh data/big-pdo-wdom.rss
15M data/big-pdo-wdom.rss
pabs@halcyon:~/cvs/libfeed/test> time perl -mXML::RSS -e \
'$rss = new XML::RSS; $rss->parsefile("data/big-pdo-wdom.rss");'
real 7m56.892s
user 4m31.578s
sys 0m19.939s
pabs@halcyon:~/cvs/libfeed/test> time perl -mXML::RSS -e \
'$rss = new XML::RSS; $rss->parsefile("data/big-pdo-wdom.rss");'
real 5m57.838s
user 4m28.727s
sys 0m3.703s
pabs@halcyon:~/cvs/libfeed/test> time ruby -rrss/2.0 -e \
'RSS::Parser::parse(File.read("data/big-pdo-wdom.rss"))'
real 2m30.950s
user 1m46.904s
sys 0m8.610s
pabs@halcyon:~/cvs/libfeed/test> time ./testfeed data/big-pdo-wdom.rss \
>/dev/null 2>&1
real 0m2.195s
user 0m1.472s
sys 0m0.104s
pabs@halcyon:~/cvs/libfeed/test> time ./testfeed data/big-pdo-wdom.rss \
>/dev/null 2>&1
real 0m2.010s
user 0m1.475s
sys 0m0.099s
The Perl times were so bad I had to run them twice to be sure. 60 times faster than Ruby and over 100 times faster than Perl; I'd say that's a pretty good start :).
Unfortunately, I have to be awake in three hours, so I'll have to save the rest of the next-gen Raggle description for another day...