1 files changed, 254 insertions, 0 deletions
diff --git a/content/posts/2005-02-16-guess-who-s-back.html b/content/posts/2005-02-16-guess-who-s-back.html
new file mode 100644
index 0000000..f98c9c5
--- /dev/null
+++ b/content/posts/2005-02-16-guess-who-s-back.html
@@ -0,0 +1,254 @@
+---
+date: "2005-02-16T00:40:23Z"
+title: Guess Who's Back?
+---
+
+<p>
+The blog post hiatus has ended!  Here's what's new in the world o' <a
+href='/'>Pablotron</a>.  First of all, the main hard drive on
+<code>vault</code> &mdash; my file/database/<acronym 
+title='Lightweight Directory Access Protocol'>LDAP</acronym>/email
+server &mdash; bit the dust last Wednesday.  Fortunately the drive
+just <em>started</em> to fail (instead of dying outright).  I had ample
+room to do immediate backups, and I had an unused 160G drive laying
+around.  I spent most of Sunday afternoon and all of Monday evening,
+partitioning the new drive and copying stuff back to it.  As far as I
+can tell, the only thing I actually lost was the words file for <a
+href='http://spamprobe.sf.net/'><code>spamprobe</code></a>.  I don't really consider that much of a
+loss, since I
+save all my email (even the cursed spam), so I can easily toss the
+requisite good and bad corpora at <a
+href='http://spamprobe.sf.net/'><code>spamprobe</code></a> to get things
+going again.  Even though I'm short a 100G drive now, the experience
+overall has been a positive one.  Here's some thoughts I had; maybe
+they'll prevent a week of stress for someone else:
+</p>
+
+<ul>
+<li>Regular backups are just something you <em>do</em>.  The ad-hoc
+backups I've been doing are better than nothing, but they wouldn't have
+done me any good if the my drive had died outright.  Had the
+circumstances been different, I would have lost weeks, possibly even a
+month of email.  My solution is (rather, will be, once everything is up
+and running again) an 
+<acronym title='Network File System'>NFS</acronym>-mounted backup
+directory on every machine (obviously not for <a href='http://snowman.net/'>peope who don't like <acronym title='Network File System'>NFS</acronym>)</a>).  Each machine will be responsible for it's
+own daily and weekly backups, via <a href='http://directory.fsf.org/cron.html'><code>cron</code></a>.  Depending on how large this data set is,
+I'll be burning <acronym title='Digital Video Disc'>DVD</acronym>s of
+the backup directory contents on a weekly or bi-weekly basis.  
+Aside: <a href='http://richlowe.net/'>Richard (richlowe)</a> has been
+advocating revision controlled config files for quite a while (eg.
+<code>cvs -d pabs@cvs:/cvs co etc-files/vault</code>); maybe
+I'll give that for a spin, too.</li>
+<li>Distribute services across machines.  I've got 4 other machines
+sitting around twiddling their thumbs at the moment.  Any of them coud
+easily be an authentication, database, email,
+<acronym title='Lightweight Directory Access Protocol'>LDAP</acronym>,
+or <acronym title='Concurrent Versioning System'>CVS</acronym> server,
+but instead they're all sitting around twiddling their thumbs (to be
+fair, <code>sumo</code> is my <acronym title='Internet Relay Chat'>IRC</acronym>
+/<a href='http://postgresql.org/'>PostGres</a> machine, but that hardly
+qualifies as a crippling load).</li>
+<li>Keep extra hardware laying around.  As a true geek you're already
+doing this, of course :).  The drive in <code>vault</code> started
+failing at 1:30 in the morning on a Wendesday morning.  I was able to
+start making backups and moving stuff around <em>right then</em>.  If I
+didn't have the extra hard drive, I would have been 
+<acronym title='Shit Out of Luck'>SOL</acronym> for several
+platter-scraping hours.</li>
+<li>Losing your spam filter settings means you get to say cool words
+like "corpora" on your web page.</li>
+</ul>
+
+<p>
+On the non-catastrophic hardware failure front, I upgraded
+<code>halcyon</code> to the latest <a href='http://x.org/'>Xorg</a>, then
+promptly downgraded to the latest stable release.  Here's the
+approximate order of events:
+</p>
+
+<ol>
+<li>Spent an hour or two configuring, compiling, and installing the
+latest <a href='http://x.org/'>Xorg</a>.</li>
+<li>Ran X, and found out that the proprietary <a href='http://nvidia.com/'>NVidia</a> driver isn't compatible with the latest 
+<acronym title='Concurrent Versioning System'>CVS</acronym> snapshot of <a
+href='http://x.org/'>Xorg</a>.</li>
+<li>Discovered just how painful the <a
+href='http://www.freedesktop.org/Software/CompositeExt'>composite
+extension</a> is without hardware acceleration by foolishly attempting
+to run X using the <code>nv</code> driver.  Hint: Imagine using
+Netscape Navigator 3.0 on your old Commodore 64 with Photoshop doing an
+RLE Gaussian Blur on a 100 meg image in the background.</li>
+<li>Promptly downgraded to the stable release, cursing both <a
+href='http://nvidia.com/'>NVidia</a> for their proprietary sillyness,
+and the bastards at <a
+href='http://freedesktop.org/'>freedesktop.org</a> for having the
+audacity to make source code changes that inconvenienced me.  I spent
+plenty of time on this step, so go ahead and re-read that last paragraph
+a couple of times.</li>
+</ol>
+
+<p>
+Since I spent the majority of a Sunday afternoon recompiling X no less
+than 3 times, I also took the opportunity to try out the latest <a
+href='http://enlightenment.org/'>Enlightenment DR16</a> from <acronym
+title='Concurrent Versioning System'>CVS</acronym> (yes Kim, I'm one of
+the <a
+href='http://sourceforge.net/mailarchive/message.php?msg_id=10424379'>few
+people still using e16</a>).  It's got it's own built-in, mostly (semi?)
+working composite manager, so the neither the patch nor the
+<code>xcompmgr</code> hackery I describe in <a
+href='http://pablotron.org/?cid=1402'>this post</a> are necessary any
+more).  The new default theme looks great, too!
+</p>
+
+<p>
+Why use other peoples' broken software when you can write your own?
+Here's the latest on the <a href='http://pablotron.org/'>Pablotron</a>
+coding front:
+</p>
+
+<ul>
+<li>I've converted the 
+<acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 
+feeds on <a href='http://pablotron.org/'>pablotron.org</a>, <a
+href='http://paulduncan.org/'>paulduncan.org</a>, and <a
+href='http://raggle.org/'>raggle.org</a> from steaming loads of 
+standards-incompliant crap to pedantically-correct
+<a href='http://blogs.law.harvard.edu/tech/rss'><acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 2.0</a>.
+If your <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym>
+aggregator couldn't read my pages before, it probably can now (unless
+your aggregator is based on the 
+<acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> library
+built-in to <a href='http://ruby-lang.org/'>Ruby 1.8</a>, but I'll get
+to that part of the story in a few minutes...)</li>
+<li>Lots and lots and lots of updates to the next version of <a
+href='http://raggle.org/'>Raggle</a>.  Some of the changes are even by me!  <a
+href='http://halffull.org/'>Thomas Kirchner (redshift)</a> has been
+doing an unbelievable amount of work on the <a
+href='http://cvs.pablotron.org/?m=raggle'><acronym
+title='Concurrent Versioning System'>CVS</acronym> version of
+Raggle</a>.  So much so, in fact, that I feel kind of embarassed calling
+this latest version mine at all.  So I think when it's ready for
+release, we'll call it <code>kirchneraggle</code> or something more
+suitable ;).</li>
+<li><a
+href='http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/4393'>This
+patch</a> for <a href='http://ruby-lang.org/'>Ruby</a> which 
+adds <code>wcolor_set</code> support to the built-in Curses interface.
+Ville suggested it eons ago, and that was the last thing stopping me
+from porting <a href='http://raggle.org/'>Raggle</a> from Ncurses-Ruby.
+</li>
+<li>A partially working Curses windowing library for <a
+href='http://ruby-lang.org/'>Ruby</a>.  This isn't in <a
+href='http://cvs.pablotron.org/'><acronym title='Concurrent Versioning
+System'>CVS</acronym></a> just yet, but don't worry, I've got some new
+stuff for you to play with.  Keep reading...</li>
+</ul>
+
+<p>
+The big stuff I've been working on lately is core of the future <a
+href='http://raggle.org/'>Raggle</a>.  Before I begin, here's a
+high-level overview of how the components interact with one another
+(yup, a diagram!):
+</p>
+
+<p>
+<img src='http://pablotron.org/gallery/misc/next-gen-raggle-thumb.png'
+  width='574' height='602' title='next gen raggle' alt='next gen raggle'
+  border='0' />
+</p>
+
+<p>
+I've mentioned <a
+href='http://cvs.pablotron.org/'><code>Squaggle</code></a> previously,
+but for those of you sleeping in the back of the class (you know who you
+are), here's a brief
+recap.  <a href='http://cvs.pablotron.org/?m=squaggle'>Squaggle</a> is 
+the <a href='http://sqlite-ruby.rubyforge.org/'>SQLite-Ruby</a>-based
+engine for <a href='http://raggle.org/'>Raggle</a>.  It's <a
+href='http://pablotron.org/?cid=1398'>cleaner</a>, faster, it
+uses less memory, and it lets me do all sorts of cool things I can't
+really do with the current engine (fancty <a
+href='http://del.icio.us/'>delicious</a>-style tagging, fast cross-feed
+searching, smart/auto categorization, and more).  The version of <a
+href='http://cvs.pablotron.org/?m=squaggle'>Squaggle</a> in <a
+href='http://cvs.pablotron.org/'><acronym title='Concurrent Versioning System'>CVS</acronym></a>
+is functional (it even includes a usable <a
+href='http://webrick.org/'>WEBrick</a>-based interface.  
+</p>
+
+<p>
+So what's this new stuff on ye olde diagram?  <a
+href='http://cvs.pablotron.org/?m=libptime'><code>libptime</code></a> is a
+C-based 
+<a href='http://asg.web.cmu.edu/rfc/rfc822.html#sec-5'><acronym
+title='Request For Comments'>RFC</acronym>822 datetime</a> and <a
+href='http://www.w3.org/TR/NOTE-datetime'><acronym
+title='W3 Consortium'>W3C</acronym> datetime</a> parsing library.  It's
+BSD licensed, so you can <a
+href='http://pablotron.org/download/libptime-0.1.0.tar.gz'>download
+version 0.1.0</a> (<a
+href='http://pablotron.org/download/libptime-0.1.0.tar.gz.asc'>signature</a>),
+and use it to your heart's content.  The other new library on the
+diagram is <a
+href='http://cvs.pablotron.org/?m=libfeed'><code>libfeed</code></a>, an
+<a href='http://expat.sf.net/'>Expat</a>-based <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 
+(0.9x, <a href='http://web.resource.org/rss/1.0/spec'>1.0</a>, and
+<a href='http://blogs.law.harvard.edu/tech/rss'>2.0</a>)/<a
+href='http://www.atomenabled.org/developers/syndication/atom-format-spec.php'>Atom</a>
+feed parser.  Why bother writing an <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 
+parser in C?  The existing <a href='http://raggle.org/'>Raggle</a> engine is
+slow, partly from being <acronym
+title='Document Object Model'>DOM</acronym>-based, and partly from being
+written in <a href='http://ruby-lang.org/'>Ruby</a>.  Don't get me wrong, <a
+href='http://www.germane-software.com/software/rexml/'>REXML</a> is a
+great <acronym title='eXtensible Markup Language'>XML</acronym> parser,
+but <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym>
+aggregators deal in volume, and I want to be sure the volume isn't
+constrained by parsing.  I also noticed there wasn't a nice C-based <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym>/Atom 
+parsing library.  Now there is (well, almost!).  If that doesn't convince you, then maybe this will:
+</p>
+<pre style='padding: 20px;'><code>
+pabs@halcyon:~/cvs/libfeed/test&gt; du -sh data/big-pdo-wdom.rss 
+<b>15M</b>     data/big-pdo-wdom.rss
+pabs@halcyon:~/cvs/libfeed/test&gt; time perl -mXML::RSS -e \
+  '$rss = new XML::RSS; $rss-&gt;parsefile("data/big-pdo-wdom.rss");'
+real    <b>7m56.892s</b>
+user    4m31.578s
+sys     0m19.939s
+pabs@halcyon:~/cvs/libfeed/test&gt; time perl -mXML::RSS -e \
+  '$rss = new XML::RSS; $rss-&gt;parsefile("data/big-pdo-wdom.rss");'
+real    <b>5m57.838s</b>
+user    4m28.727s
+sys     0m3.703s
+pabs@halcyon:~/cvs/libfeed/test&gt; time ruby -rrss/2.0 -e \
+  'RSS::Parser::parse(File.read("data/big-pdo-wdom.rss"))'
+real    <b>2m30.950s</b>
+user    1m46.904s
+sys     0m8.610s
+pabs@halcyon:~/cvs/libfeed/test&gt; time ./testfeed data/big-pdo-wdom.rss \
+  &gt;/dev/null 2&gt;&amp;1
+real    <b>0m2.195s</b>
+user    0m1.472s
+sys     0m0.104s
+pabs@halcyon:~/cvs/libfeed/test&gt; time ./testfeed data/big-pdo-wdom.rss \
+  &gt;/dev/null 2&gt;&amp;1
+real    <b>0m2.010s</b>
+user    0m1.475s
+sys     0m0.099s
+</code></pre>
+
+<p>
+The <a href='http://perl.org/'>Perl</a> times were so bad I had to run
+them twice to be sure.  60 times faster than <a
+href='http://ruby-lang.org/'>Ruby</a> and over 100 times faster than <a
+href='http://perl.org/'>Perl</a>; I'd say that's a pretty good start :).
+</p>
+
+<p>
+Unfortunately, I have to be awake in three hours, so I'll have
+to save the rest of the next-gen <a href='http://raggle.org/'>Raggle</a>
+description for another day...
+</p>
+