---
date: "2005-02-16T00:40:23Z"
title: Guess Who's Back?
---

<p>
The blog post hiatus has ended!  Here's what's new in the world o' <a
href='/'>Pablotron</a>.  First of all, the main hard drive on
<code>vault</code> &mdash; my file/database/<acronym 
title='Lightweight Directory Access Protocol'>LDAP</acronym>/email
server &mdash; bit the dust last Wednesday.  Fortunately the drive
just <em>started</em> to fail (instead of dying outright).  I had ample
room to do immediate backups, and I had an unused 160G drive laying
around.  I spent most of Sunday afternoon and all of Monday evening,
partitioning the new drive and copying stuff back to it.  As far as I
can tell, the only thing I actually lost was the words file for <a
href='http://spamprobe.sf.net/'><code>spamprobe</code></a>.  I don't really consider that much of a
loss, since I
save all my email (even the cursed spam), so I can easily toss the
requisite good and bad corpora at <a
href='http://spamprobe.sf.net/'><code>spamprobe</code></a> to get things
going again.  Even though I'm short a 100G drive now, the experience
overall has been a positive one.  Here's some thoughts I had; maybe
they'll prevent a week of stress for someone else:
</p>

<ul>
<li>Regular backups are just something you <em>do</em>.  The ad-hoc
backups I've been doing are better than nothing, but they wouldn't have
done me any good if the my drive had died outright.  Had the
circumstances been different, I would have lost weeks, possibly even a
month of email.  My solution is (rather, will be, once everything is up
and running again) an 
<acronym title='Network File System'>NFS</acronym>-mounted backup
directory on every machine (obviously not for <a href='http://snowman.net/'>peope who don't like <acronym title='Network File System'>NFS</acronym>)</a>).  Each machine will be responsible for it's
own daily and weekly backups, via <a href='http://directory.fsf.org/cron.html'><code>cron</code></a>.  Depending on how large this data set is,
I'll be burning <acronym title='Digital Video Disc'>DVD</acronym>s of
the backup directory contents on a weekly or bi-weekly basis.  
Aside: <a href='http://richlowe.net/'>Richard (richlowe)</a> has been
advocating revision controlled config files for quite a while (eg.
<code>cvs -d pabs@cvs:/cvs co etc-files/vault</code>); maybe
I'll give that for a spin, too.</li>
<li>Distribute services across machines.  I've got 4 other machines
sitting around twiddling their thumbs at the moment.  Any of them coud
easily be an authentication, database, email,
<acronym title='Lightweight Directory Access Protocol'>LDAP</acronym>,
or <acronym title='Concurrent Versioning System'>CVS</acronym> server,
but instead they're all sitting around twiddling their thumbs (to be
fair, <code>sumo</code> is my <acronym title='Internet Relay Chat'>IRC</acronym>
/<a href='http://postgresql.org/'>PostGres</a> machine, but that hardly
qualifies as a crippling load).</li>
<li>Keep extra hardware laying around.  As a true geek you're already
doing this, of course :).  The drive in <code>vault</code> started
failing at 1:30 in the morning on a Wendesday morning.  I was able to
start making backups and moving stuff around <em>right then</em>.  If I
didn't have the extra hard drive, I would have been 
<acronym title='Shit Out of Luck'>SOL</acronym> for several
platter-scraping hours.</li>
<li>Losing your spam filter settings means you get to say cool words
like "corpora" on your web page.</li>
</ul>

<p>
On the non-catastrophic hardware failure front, I upgraded
<code>halcyon</code> to the latest <a href='http://x.org/'>Xorg</a>, then
promptly downgraded to the latest stable release.  Here's the
approximate order of events:
</p>

<ol>
<li>Spent an hour or two configuring, compiling, and installing the
latest <a href='http://x.org/'>Xorg</a>.</li>
<li>Ran X, and found out that the proprietary <a href='http://nvidia.com/'>NVidia</a> driver isn't compatible with the latest 
<acronym title='Concurrent Versioning System'>CVS</acronym> snapshot of <a
href='http://x.org/'>Xorg</a>.</li>
<li>Discovered just how painful the <a
href='http://www.freedesktop.org/Software/CompositeExt'>composite
extension</a> is without hardware acceleration by foolishly attempting
to run X using the <code>nv</code> driver.  Hint: Imagine using
Netscape Navigator 3.0 on your old Commodore 64 with Photoshop doing an
RLE Gaussian Blur on a 100 meg image in the background.</li>
<li>Promptly downgraded to the stable release, cursing both <a
href='http://nvidia.com/'>NVidia</a> for their proprietary sillyness,
and the bastards at <a
href='http://freedesktop.org/'>freedesktop.org</a> for having the
audacity to make source code changes that inconvenienced me.  I spent
plenty of time on this step, so go ahead and re-read that last paragraph
a couple of times.</li>
</ol>

<p>
Since I spent the majority of a Sunday afternoon recompiling X no less
than 3 times, I also took the opportunity to try out the latest <a
href='http://enlightenment.org/'>Enlightenment DR16</a> from <acronym
title='Concurrent Versioning System'>CVS</acronym> (yes Kim, I'm one of
the <a
href='http://sourceforge.net/mailarchive/message.php?msg_id=10424379'>few
people still using e16</a>).  It's got it's own built-in, mostly (semi?)
working composite manager, so the neither the patch nor the
<code>xcompmgr</code> hackery I describe in <a
href='http://pablotron.org/?cid=1402'>this post</a> are necessary any
more).  The new default theme looks great, too!
</p>

<p>
Why use other peoples' broken software when you can write your own?
Here's the latest on the <a href='http://pablotron.org/'>Pablotron</a>
coding front:
</p>

<ul>
<li>I've converted the 
<acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 
feeds on <a href='http://pablotron.org/'>pablotron.org</a>, <a
href='http://paulduncan.org/'>paulduncan.org</a>, and <a
href='http://raggle.org/'>raggle.org</a> from steaming loads of 
standards-incompliant crap to pedantically-correct
<a href='http://blogs.law.harvard.edu/tech/rss'><acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 2.0</a>.
If your <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym>
aggregator couldn't read my pages before, it probably can now (unless
your aggregator is based on the 
<acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> library
built-in to <a href='http://ruby-lang.org/'>Ruby 1.8</a>, but I'll get
to that part of the story in a few minutes...)</li>
<li>Lots and lots and lots of updates to the next version of <a
href='http://raggle.org/'>Raggle</a>.  Some of the changes are even by me!  <a
href='http://halffull.org/'>Thomas Kirchner (redshift)</a> has been
doing an unbelievable amount of work on the <a
href='http://cvs.pablotron.org/?m=raggle'><acronym
title='Concurrent Versioning System'>CVS</acronym> version of
Raggle</a>.  So much so, in fact, that I feel kind of embarassed calling
this latest version mine at all.  So I think when it's ready for
release, we'll call it <code>kirchneraggle</code> or something more
suitable ;).</li>
<li><a
href='http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/4393'>This
patch</a> for <a href='http://ruby-lang.org/'>Ruby</a> which 
adds <code>wcolor_set</code> support to the built-in Curses interface.
Ville suggested it eons ago, and that was the last thing stopping me
from porting <a href='http://raggle.org/'>Raggle</a> from Ncurses-Ruby.
</li>
<li>A partially working Curses windowing library for <a
href='http://ruby-lang.org/'>Ruby</a>.  This isn't in <a
href='http://cvs.pablotron.org/'><acronym title='Concurrent Versioning
System'>CVS</acronym></a> just yet, but don't worry, I've got some new
stuff for you to play with.  Keep reading...</li>
</ul>

<p>
The big stuff I've been working on lately is core of the future <a
href='http://raggle.org/'>Raggle</a>.  Before I begin, here's a
high-level overview of how the components interact with one another
(yup, a diagram!):
</p>

<p>
<img src='http://pablotron.org/gallery/misc/next-gen-raggle-thumb.png'
  width='574' height='602' title='next gen raggle' alt='next gen raggle'
  border='0' />
</p>

<p>
I've mentioned <a
href='http://cvs.pablotron.org/'><code>Squaggle</code></a> previously,
but for those of you sleeping in the back of the class (you know who you
are), here's a brief
recap.  <a href='http://cvs.pablotron.org/?m=squaggle'>Squaggle</a> is 
the <a href='http://sqlite-ruby.rubyforge.org/'>SQLite-Ruby</a>-based
engine for <a href='http://raggle.org/'>Raggle</a>.  It's <a
href='http://pablotron.org/?cid=1398'>cleaner</a>, faster, it
uses less memory, and it lets me do all sorts of cool things I can't
really do with the current engine (fancty <a
href='http://del.icio.us/'>delicious</a>-style tagging, fast cross-feed
searching, smart/auto categorization, and more).  The version of <a
href='http://cvs.pablotron.org/?m=squaggle'>Squaggle</a> in <a
href='http://cvs.pablotron.org/'><acronym title='Concurrent Versioning System'>CVS</acronym></a>
is functional (it even includes a usable <a
href='http://webrick.org/'>WEBrick</a>-based interface.  
</p>

<p>
So what's this new stuff on ye olde diagram?  <a
href='http://cvs.pablotron.org/?m=libptime'><code>libptime</code></a> is a
C-based 
<a href='http://asg.web.cmu.edu/rfc/rfc822.html#sec-5'><acronym
title='Request For Comments'>RFC</acronym>822 datetime</a> and <a
href='http://www.w3.org/TR/NOTE-datetime'><acronym
title='W3 Consortium'>W3C</acronym> datetime</a> parsing library.  It's
BSD licensed, so you can <a
href='http://pablotron.org/download/libptime-0.1.0.tar.gz'>download
version 0.1.0</a> (<a
href='http://pablotron.org/download/libptime-0.1.0.tar.gz.asc'>signature</a>),
and use it to your heart's content.  The other new library on the
diagram is <a
href='http://cvs.pablotron.org/?m=libfeed'><code>libfeed</code></a>, an
<a href='http://expat.sf.net/'>Expat</a>-based <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 
(0.9x, <a href='http://web.resource.org/rss/1.0/spec'>1.0</a>, and
<a href='http://blogs.law.harvard.edu/tech/rss'>2.0</a>)/<a
href='http://www.atomenabled.org/developers/syndication/atom-format-spec.php'>Atom</a>
feed parser.  Why bother writing an <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 
parser in C?  The existing <a href='http://raggle.org/'>Raggle</a> engine is
slow, partly from being <acronym
title='Document Object Model'>DOM</acronym>-based, and partly from being
written in <a href='http://ruby-lang.org/'>Ruby</a>.  Don't get me wrong, <a
href='http://www.germane-software.com/software/rexml/'>REXML</a> is a
great <acronym title='eXtensible Markup Language'>XML</acronym> parser,
but <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym>
aggregators deal in volume, and I want to be sure the volume isn't
constrained by parsing.  I also noticed there wasn't a nice C-based <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym>/Atom 
parsing library.  Now there is (well, almost!).  If that doesn't convince you, then maybe this will:
</p>
<pre style='padding: 20px;'><code>
pabs@halcyon:~/cvs/libfeed/test&gt; du -sh data/big-pdo-wdom.rss 
<b>15M</b>     data/big-pdo-wdom.rss
pabs@halcyon:~/cvs/libfeed/test&gt; time perl -mXML::RSS -e \
  '$rss = new XML::RSS; $rss-&gt;parsefile("data/big-pdo-wdom.rss");'
real    <b>7m56.892s</b>
user    4m31.578s
sys     0m19.939s
pabs@halcyon:~/cvs/libfeed/test&gt; time perl -mXML::RSS -e \
  '$rss = new XML::RSS; $rss-&gt;parsefile("data/big-pdo-wdom.rss");'
real    <b>5m57.838s</b>
user    4m28.727s
sys     0m3.703s
pabs@halcyon:~/cvs/libfeed/test&gt; time ruby -rrss/2.0 -e \
  'RSS::Parser::parse(File.read("data/big-pdo-wdom.rss"))'
real    <b>2m30.950s</b>
user    1m46.904s
sys     0m8.610s
pabs@halcyon:~/cvs/libfeed/test&gt; time ./testfeed data/big-pdo-wdom.rss \
  &gt;/dev/null 2&gt;&amp;1
real    <b>0m2.195s</b>
user    0m1.472s
sys     0m0.104s
pabs@halcyon:~/cvs/libfeed/test&gt; time ./testfeed data/big-pdo-wdom.rss \
  &gt;/dev/null 2&gt;&amp;1
real    <b>0m2.010s</b>
user    0m1.475s
sys     0m0.099s
</code></pre>

<p>
The <a href='http://perl.org/'>Perl</a> times were so bad I had to run
them twice to be sure.  60 times faster than <a
href='http://ruby-lang.org/'>Ruby</a> and over 100 times faster than <a
href='http://perl.org/'>Perl</a>; I'd say that's a pretty good start :).
</p>

<p>
Unfortunately, I have to be awake in three hours, so I'll have
to save the rest of the next-gen <a href='http://raggle.org/'>Raggle</a>
description for another day...
</p>