--- date: "2005-02-16T00:40:23Z" title: Guess Who's Back? --- <p> The blog post hiatus has ended! Here's what's new in the world o' <a href='/'>Pablotron</a>. First of all, the main hard drive on <code>vault</code> — my file/database/<acronym title='Lightweight Directory Access Protocol'>LDAP</acronym>/email server — bit the dust last Wednesday. Fortunately the drive just <em>started</em> to fail (instead of dying outright). I had ample room to do immediate backups, and I had an unused 160G drive laying around. I spent most of Sunday afternoon and all of Monday evening, partitioning the new drive and copying stuff back to it. As far as I can tell, the only thing I actually lost was the words file for <a href='http://spamprobe.sf.net/'><code>spamprobe</code></a>. I don't really consider that much of a loss, since I save all my email (even the cursed spam), so I can easily toss the requisite good and bad corpora at <a href='http://spamprobe.sf.net/'><code>spamprobe</code></a> to get things going again. Even though I'm short a 100G drive now, the experience overall has been a positive one. Here's some thoughts I had; maybe they'll prevent a week of stress for someone else: </p> <ul> <li>Regular backups are just something you <em>do</em>. The ad-hoc backups I've been doing are better than nothing, but they wouldn't have done me any good if the my drive had died outright. Had the circumstances been different, I would have lost weeks, possibly even a month of email. My solution is (rather, will be, once everything is up and running again) an <acronym title='Network File System'>NFS</acronym>-mounted backup directory on every machine (obviously not for <a href='http://snowman.net/'>peope who don't like <acronym title='Network File System'>NFS</acronym>)</a>). Each machine will be responsible for it's own daily and weekly backups, via <a href='http://directory.fsf.org/cron.html'><code>cron</code></a>. Depending on how large this data set is, I'll be burning <acronym title='Digital Video Disc'>DVD</acronym>s of the backup directory contents on a weekly or bi-weekly basis. Aside: <a href='http://richlowe.net/'>Richard (richlowe)</a> has been advocating revision controlled config files for quite a while (eg. <code>cvs -d pabs@cvs:/cvs co etc-files/vault</code>); maybe I'll give that for a spin, too.</li> <li>Distribute services across machines. I've got 4 other machines sitting around twiddling their thumbs at the moment. Any of them coud easily be an authentication, database, email, <acronym title='Lightweight Directory Access Protocol'>LDAP</acronym>, or <acronym title='Concurrent Versioning System'>CVS</acronym> server, but instead they're all sitting around twiddling their thumbs (to be fair, <code>sumo</code> is my <acronym title='Internet Relay Chat'>IRC</acronym> /<a href='http://postgresql.org/'>PostGres</a> machine, but that hardly qualifies as a crippling load).</li> <li>Keep extra hardware laying around. As a true geek you're already doing this, of course :). The drive in <code>vault</code> started failing at 1:30 in the morning on a Wendesday morning. I was able to start making backups and moving stuff around <em>right then</em>. If I didn't have the extra hard drive, I would have been <acronym title='Shit Out of Luck'>SOL</acronym> for several platter-scraping hours.</li> <li>Losing your spam filter settings means you get to say cool words like "corpora" on your web page.</li> </ul> <p> On the non-catastrophic hardware failure front, I upgraded <code>halcyon</code> to the latest <a href='http://x.org/'>Xorg</a>, then promptly downgraded to the latest stable release. Here's the approximate order of events: </p> <ol> <li>Spent an hour or two configuring, compiling, and installing the latest <a href='http://x.org/'>Xorg</a>.</li> <li>Ran X, and found out that the proprietary <a href='http://nvidia.com/'>NVidia</a> driver isn't compatible with the latest <acronym title='Concurrent Versioning System'>CVS</acronym> snapshot of <a href='http://x.org/'>Xorg</a>.</li> <li>Discovered just how painful the <a href='http://www.freedesktop.org/Software/CompositeExt'>composite extension</a> is without hardware acceleration by foolishly attempting to run X using the <code>nv</code> driver. Hint: Imagine using Netscape Navigator 3.0 on your old Commodore 64 with Photoshop doing an RLE Gaussian Blur on a 100 meg image in the background.</li> <li>Promptly downgraded to the stable release, cursing both <a href='http://nvidia.com/'>NVidia</a> for their proprietary sillyness, and the bastards at <a href='http://freedesktop.org/'>freedesktop.org</a> for having the audacity to make source code changes that inconvenienced me. I spent plenty of time on this step, so go ahead and re-read that last paragraph a couple of times.</li> </ol> <p> Since I spent the majority of a Sunday afternoon recompiling X no less than 3 times, I also took the opportunity to try out the latest <a href='http://enlightenment.org/'>Enlightenment DR16</a> from <acronym title='Concurrent Versioning System'>CVS</acronym> (yes Kim, I'm one of the <a href='http://sourceforge.net/mailarchive/message.php?msg_id=10424379'>few people still using e16</a>). It's got it's own built-in, mostly (semi?) working composite manager, so the neither the patch nor the <code>xcompmgr</code> hackery I describe in <a href='http://pablotron.org/?cid=1402'>this post</a> are necessary any more). The new default theme looks great, too! </p> <p> Why use other peoples' broken software when you can write your own? Here's the latest on the <a href='http://pablotron.org/'>Pablotron</a> coding front: </p> <ul> <li>I've converted the <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> feeds on <a href='http://pablotron.org/'>pablotron.org</a>, <a href='http://paulduncan.org/'>paulduncan.org</a>, and <a href='http://raggle.org/'>raggle.org</a> from steaming loads of standards-incompliant crap to pedantically-correct <a href='http://blogs.law.harvard.edu/tech/rss'><acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 2.0</a>. If your <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> aggregator couldn't read my pages before, it probably can now (unless your aggregator is based on the <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> library built-in to <a href='http://ruby-lang.org/'>Ruby 1.8</a>, but I'll get to that part of the story in a few minutes...)</li> <li>Lots and lots and lots of updates to the next version of <a href='http://raggle.org/'>Raggle</a>. Some of the changes are even by me! <a href='http://halffull.org/'>Thomas Kirchner (redshift)</a> has been doing an unbelievable amount of work on the <a href='http://cvs.pablotron.org/?m=raggle'><acronym title='Concurrent Versioning System'>CVS</acronym> version of Raggle</a>. So much so, in fact, that I feel kind of embarassed calling this latest version mine at all. So I think when it's ready for release, we'll call it <code>kirchneraggle</code> or something more suitable ;).</li> <li><a href='http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/4393'>This patch</a> for <a href='http://ruby-lang.org/'>Ruby</a> which adds <code>wcolor_set</code> support to the built-in Curses interface. Ville suggested it eons ago, and that was the last thing stopping me from porting <a href='http://raggle.org/'>Raggle</a> from Ncurses-Ruby. </li> <li>A partially working Curses windowing library for <a href='http://ruby-lang.org/'>Ruby</a>. This isn't in <a href='http://cvs.pablotron.org/'><acronym title='Concurrent Versioning System'>CVS</acronym></a> just yet, but don't worry, I've got some new stuff for you to play with. Keep reading...</li> </ul> <p> The big stuff I've been working on lately is core of the future <a href='http://raggle.org/'>Raggle</a>. Before I begin, here's a high-level overview of how the components interact with one another (yup, a diagram!): </p> <p> <img src='http://pablotron.org/gallery/misc/next-gen-raggle-thumb.png' width='574' height='602' title='next gen raggle' alt='next gen raggle' border='0' /> </p> <p> I've mentioned <a href='http://cvs.pablotron.org/'><code>Squaggle</code></a> previously, but for those of you sleeping in the back of the class (you know who you are), here's a brief recap. <a href='http://cvs.pablotron.org/?m=squaggle'>Squaggle</a> is the <a href='http://sqlite-ruby.rubyforge.org/'>SQLite-Ruby</a>-based engine for <a href='http://raggle.org/'>Raggle</a>. It's <a href='http://pablotron.org/?cid=1398'>cleaner</a>, faster, it uses less memory, and it lets me do all sorts of cool things I can't really do with the current engine (fancty <a href='http://del.icio.us/'>delicious</a>-style tagging, fast cross-feed searching, smart/auto categorization, and more). The version of <a href='http://cvs.pablotron.org/?m=squaggle'>Squaggle</a> in <a href='http://cvs.pablotron.org/'><acronym title='Concurrent Versioning System'>CVS</acronym></a> is functional (it even includes a usable <a href='http://webrick.org/'>WEBrick</a>-based interface. </p> <p> So what's this new stuff on ye olde diagram? <a href='http://cvs.pablotron.org/?m=libptime'><code>libptime</code></a> is a C-based <a href='http://asg.web.cmu.edu/rfc/rfc822.html#sec-5'><acronym title='Request For Comments'>RFC</acronym>822 datetime</a> and <a href='http://www.w3.org/TR/NOTE-datetime'><acronym title='W3 Consortium'>W3C</acronym> datetime</a> parsing library. It's BSD licensed, so you can <a href='http://pablotron.org/download/libptime-0.1.0.tar.gz'>download version 0.1.0</a> (<a href='http://pablotron.org/download/libptime-0.1.0.tar.gz.asc'>signature</a>), and use it to your heart's content. The other new library on the diagram is <a href='http://cvs.pablotron.org/?m=libfeed'><code>libfeed</code></a>, an <a href='http://expat.sf.net/'>Expat</a>-based <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> (0.9x, <a href='http://web.resource.org/rss/1.0/spec'>1.0</a>, and <a href='http://blogs.law.harvard.edu/tech/rss'>2.0</a>)/<a href='http://www.atomenabled.org/developers/syndication/atom-format-spec.php'>Atom</a> feed parser. Why bother writing an <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> parser in C? The existing <a href='http://raggle.org/'>Raggle</a> engine is slow, partly from being <acronym title='Document Object Model'>DOM</acronym>-based, and partly from being written in <a href='http://ruby-lang.org/'>Ruby</a>. Don't get me wrong, <a href='http://www.germane-software.com/software/rexml/'>REXML</a> is a great <acronym title='eXtensible Markup Language'>XML</acronym> parser, but <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> aggregators deal in volume, and I want to be sure the volume isn't constrained by parsing. I also noticed there wasn't a nice C-based <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym>/Atom parsing library. Now there is (well, almost!). If that doesn't convince you, then maybe this will: </p> <pre style='padding: 20px;'><code> pabs@halcyon:~/cvs/libfeed/test> du -sh data/big-pdo-wdom.rss <b>15M</b> data/big-pdo-wdom.rss pabs@halcyon:~/cvs/libfeed/test> time perl -mXML::RSS -e \ '$rss = new XML::RSS; $rss->parsefile("data/big-pdo-wdom.rss");' real <b>7m56.892s</b> user 4m31.578s sys 0m19.939s pabs@halcyon:~/cvs/libfeed/test> time perl -mXML::RSS -e \ '$rss = new XML::RSS; $rss->parsefile("data/big-pdo-wdom.rss");' real <b>5m57.838s</b> user 4m28.727s sys 0m3.703s pabs@halcyon:~/cvs/libfeed/test> time ruby -rrss/2.0 -e \ 'RSS::Parser::parse(File.read("data/big-pdo-wdom.rss"))' real <b>2m30.950s</b> user 1m46.904s sys 0m8.610s pabs@halcyon:~/cvs/libfeed/test> time ./testfeed data/big-pdo-wdom.rss \ >/dev/null 2>&1 real <b>0m2.195s</b> user 0m1.472s sys 0m0.104s pabs@halcyon:~/cvs/libfeed/test> time ./testfeed data/big-pdo-wdom.rss \ >/dev/null 2>&1 real <b>0m2.010s</b> user 0m1.475s sys 0m0.099s </code></pre> <p> The <a href='http://perl.org/'>Perl</a> times were so bad I had to run them twice to be sure. 60 times faster than <a href='http://ruby-lang.org/'>Ruby</a> and over 100 times faster than <a href='http://perl.org/'>Perl</a>; I'd say that's a pretty good start :). </p> <p> Unfortunately, I have to be awake in three hours, so I'll have to save the rest of the next-gen <a href='http://raggle.org/'>Raggle</a> description for another day... </p>