diff options
Diffstat (limited to 'content/posts/2005-02-16-guess-who-s-back.html')
-rw-r--r-- | content/posts/2005-02-16-guess-who-s-back.html | 254 |
1 files changed, 254 insertions, 0 deletions
diff --git a/content/posts/2005-02-16-guess-who-s-back.html b/content/posts/2005-02-16-guess-who-s-back.html new file mode 100644 index 0000000..f98c9c5 --- /dev/null +++ b/content/posts/2005-02-16-guess-who-s-back.html @@ -0,0 +1,254 @@ +--- +date: "2005-02-16T00:40:23Z" +title: Guess Who's Back? +--- + +<p> +The blog post hiatus has ended! Here's what's new in the world o' <a +href='/'>Pablotron</a>. First of all, the main hard drive on +<code>vault</code> — my file/database/<acronym +title='Lightweight Directory Access Protocol'>LDAP</acronym>/email +server — bit the dust last Wednesday. Fortunately the drive +just <em>started</em> to fail (instead of dying outright). I had ample +room to do immediate backups, and I had an unused 160G drive laying +around. I spent most of Sunday afternoon and all of Monday evening, +partitioning the new drive and copying stuff back to it. As far as I +can tell, the only thing I actually lost was the words file for <a +href='http://spamprobe.sf.net/'><code>spamprobe</code></a>. I don't really consider that much of a +loss, since I +save all my email (even the cursed spam), so I can easily toss the +requisite good and bad corpora at <a +href='http://spamprobe.sf.net/'><code>spamprobe</code></a> to get things +going again. Even though I'm short a 100G drive now, the experience +overall has been a positive one. Here's some thoughts I had; maybe +they'll prevent a week of stress for someone else: +</p> + +<ul> +<li>Regular backups are just something you <em>do</em>. The ad-hoc +backups I've been doing are better than nothing, but they wouldn't have +done me any good if the my drive had died outright. Had the +circumstances been different, I would have lost weeks, possibly even a +month of email. My solution is (rather, will be, once everything is up +and running again) an +<acronym title='Network File System'>NFS</acronym>-mounted backup +directory on every machine (obviously not for <a href='http://snowman.net/'>peope who don't like <acronym title='Network File System'>NFS</acronym>)</a>). Each machine will be responsible for it's +own daily and weekly backups, via <a href='http://directory.fsf.org/cron.html'><code>cron</code></a>. Depending on how large this data set is, +I'll be burning <acronym title='Digital Video Disc'>DVD</acronym>s of +the backup directory contents on a weekly or bi-weekly basis. +Aside: <a href='http://richlowe.net/'>Richard (richlowe)</a> has been +advocating revision controlled config files for quite a while (eg. +<code>cvs -d pabs@cvs:/cvs co etc-files/vault</code>); maybe +I'll give that for a spin, too.</li> +<li>Distribute services across machines. I've got 4 other machines +sitting around twiddling their thumbs at the moment. Any of them coud +easily be an authentication, database, email, +<acronym title='Lightweight Directory Access Protocol'>LDAP</acronym>, +or <acronym title='Concurrent Versioning System'>CVS</acronym> server, +but instead they're all sitting around twiddling their thumbs (to be +fair, <code>sumo</code> is my <acronym title='Internet Relay Chat'>IRC</acronym> +/<a href='http://postgresql.org/'>PostGres</a> machine, but that hardly +qualifies as a crippling load).</li> +<li>Keep extra hardware laying around. As a true geek you're already +doing this, of course :). The drive in <code>vault</code> started +failing at 1:30 in the morning on a Wendesday morning. I was able to +start making backups and moving stuff around <em>right then</em>. If I +didn't have the extra hard drive, I would have been +<acronym title='Shit Out of Luck'>SOL</acronym> for several +platter-scraping hours.</li> +<li>Losing your spam filter settings means you get to say cool words +like "corpora" on your web page.</li> +</ul> + +<p> +On the non-catastrophic hardware failure front, I upgraded +<code>halcyon</code> to the latest <a href='http://x.org/'>Xorg</a>, then +promptly downgraded to the latest stable release. Here's the +approximate order of events: +</p> + +<ol> +<li>Spent an hour or two configuring, compiling, and installing the +latest <a href='http://x.org/'>Xorg</a>.</li> +<li>Ran X, and found out that the proprietary <a href='http://nvidia.com/'>NVidia</a> driver isn't compatible with the latest +<acronym title='Concurrent Versioning System'>CVS</acronym> snapshot of <a +href='http://x.org/'>Xorg</a>.</li> +<li>Discovered just how painful the <a +href='http://www.freedesktop.org/Software/CompositeExt'>composite +extension</a> is without hardware acceleration by foolishly attempting +to run X using the <code>nv</code> driver. Hint: Imagine using +Netscape Navigator 3.0 on your old Commodore 64 with Photoshop doing an +RLE Gaussian Blur on a 100 meg image in the background.</li> +<li>Promptly downgraded to the stable release, cursing both <a +href='http://nvidia.com/'>NVidia</a> for their proprietary sillyness, +and the bastards at <a +href='http://freedesktop.org/'>freedesktop.org</a> for having the +audacity to make source code changes that inconvenienced me. I spent +plenty of time on this step, so go ahead and re-read that last paragraph +a couple of times.</li> +</ol> + +<p> +Since I spent the majority of a Sunday afternoon recompiling X no less +than 3 times, I also took the opportunity to try out the latest <a +href='http://enlightenment.org/'>Enlightenment DR16</a> from <acronym +title='Concurrent Versioning System'>CVS</acronym> (yes Kim, I'm one of +the <a +href='http://sourceforge.net/mailarchive/message.php?msg_id=10424379'>few +people still using e16</a>). It's got it's own built-in, mostly (semi?) +working composite manager, so the neither the patch nor the +<code>xcompmgr</code> hackery I describe in <a +href='http://pablotron.org/?cid=1402'>this post</a> are necessary any +more). The new default theme looks great, too! +</p> + +<p> +Why use other peoples' broken software when you can write your own? +Here's the latest on the <a href='http://pablotron.org/'>Pablotron</a> +coding front: +</p> + +<ul> +<li>I've converted the +<acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> +feeds on <a href='http://pablotron.org/'>pablotron.org</a>, <a +href='http://paulduncan.org/'>paulduncan.org</a>, and <a +href='http://raggle.org/'>raggle.org</a> from steaming loads of +standards-incompliant crap to pedantically-correct +<a href='http://blogs.law.harvard.edu/tech/rss'><acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 2.0</a>. +If your <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> +aggregator couldn't read my pages before, it probably can now (unless +your aggregator is based on the +<acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> library +built-in to <a href='http://ruby-lang.org/'>Ruby 1.8</a>, but I'll get +to that part of the story in a few minutes...)</li> +<li>Lots and lots and lots of updates to the next version of <a +href='http://raggle.org/'>Raggle</a>. Some of the changes are even by me! <a +href='http://halffull.org/'>Thomas Kirchner (redshift)</a> has been +doing an unbelievable amount of work on the <a +href='http://cvs.pablotron.org/?m=raggle'><acronym +title='Concurrent Versioning System'>CVS</acronym> version of +Raggle</a>. So much so, in fact, that I feel kind of embarassed calling +this latest version mine at all. So I think when it's ready for +release, we'll call it <code>kirchneraggle</code> or something more +suitable ;).</li> +<li><a +href='http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/4393'>This +patch</a> for <a href='http://ruby-lang.org/'>Ruby</a> which +adds <code>wcolor_set</code> support to the built-in Curses interface. +Ville suggested it eons ago, and that was the last thing stopping me +from porting <a href='http://raggle.org/'>Raggle</a> from Ncurses-Ruby. +</li> +<li>A partially working Curses windowing library for <a +href='http://ruby-lang.org/'>Ruby</a>. This isn't in <a +href='http://cvs.pablotron.org/'><acronym title='Concurrent Versioning +System'>CVS</acronym></a> just yet, but don't worry, I've got some new +stuff for you to play with. Keep reading...</li> +</ul> + +<p> +The big stuff I've been working on lately is core of the future <a +href='http://raggle.org/'>Raggle</a>. Before I begin, here's a +high-level overview of how the components interact with one another +(yup, a diagram!): +</p> + +<p> +<img src='http://pablotron.org/gallery/misc/next-gen-raggle-thumb.png' + width='574' height='602' title='next gen raggle' alt='next gen raggle' + border='0' /> +</p> + +<p> +I've mentioned <a +href='http://cvs.pablotron.org/'><code>Squaggle</code></a> previously, +but for those of you sleeping in the back of the class (you know who you +are), here's a brief +recap. <a href='http://cvs.pablotron.org/?m=squaggle'>Squaggle</a> is +the <a href='http://sqlite-ruby.rubyforge.org/'>SQLite-Ruby</a>-based +engine for <a href='http://raggle.org/'>Raggle</a>. It's <a +href='http://pablotron.org/?cid=1398'>cleaner</a>, faster, it +uses less memory, and it lets me do all sorts of cool things I can't +really do with the current engine (fancty <a +href='http://del.icio.us/'>delicious</a>-style tagging, fast cross-feed +searching, smart/auto categorization, and more). The version of <a +href='http://cvs.pablotron.org/?m=squaggle'>Squaggle</a> in <a +href='http://cvs.pablotron.org/'><acronym title='Concurrent Versioning System'>CVS</acronym></a> +is functional (it even includes a usable <a +href='http://webrick.org/'>WEBrick</a>-based interface. +</p> + +<p> +So what's this new stuff on ye olde diagram? <a +href='http://cvs.pablotron.org/?m=libptime'><code>libptime</code></a> is a +C-based +<a href='http://asg.web.cmu.edu/rfc/rfc822.html#sec-5'><acronym +title='Request For Comments'>RFC</acronym>822 datetime</a> and <a +href='http://www.w3.org/TR/NOTE-datetime'><acronym +title='W3 Consortium'>W3C</acronym> datetime</a> parsing library. It's +BSD licensed, so you can <a +href='http://pablotron.org/download/libptime-0.1.0.tar.gz'>download +version 0.1.0</a> (<a +href='http://pablotron.org/download/libptime-0.1.0.tar.gz.asc'>signature</a>), +and use it to your heart's content. The other new library on the +diagram is <a +href='http://cvs.pablotron.org/?m=libfeed'><code>libfeed</code></a>, an +<a href='http://expat.sf.net/'>Expat</a>-based <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> +(0.9x, <a href='http://web.resource.org/rss/1.0/spec'>1.0</a>, and +<a href='http://blogs.law.harvard.edu/tech/rss'>2.0</a>)/<a +href='http://www.atomenabled.org/developers/syndication/atom-format-spec.php'>Atom</a> +feed parser. Why bother writing an <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> +parser in C? The existing <a href='http://raggle.org/'>Raggle</a> engine is +slow, partly from being <acronym +title='Document Object Model'>DOM</acronym>-based, and partly from being +written in <a href='http://ruby-lang.org/'>Ruby</a>. Don't get me wrong, <a +href='http://www.germane-software.com/software/rexml/'>REXML</a> is a +great <acronym title='eXtensible Markup Language'>XML</acronym> parser, +but <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> +aggregators deal in volume, and I want to be sure the volume isn't +constrained by parsing. I also noticed there wasn't a nice C-based <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym>/Atom +parsing library. Now there is (well, almost!). If that doesn't convince you, then maybe this will: +</p> +<pre style='padding: 20px;'><code> +pabs@halcyon:~/cvs/libfeed/test> du -sh data/big-pdo-wdom.rss +<b>15M</b> data/big-pdo-wdom.rss +pabs@halcyon:~/cvs/libfeed/test> time perl -mXML::RSS -e \ + '$rss = new XML::RSS; $rss->parsefile("data/big-pdo-wdom.rss");' +real <b>7m56.892s</b> +user 4m31.578s +sys 0m19.939s +pabs@halcyon:~/cvs/libfeed/test> time perl -mXML::RSS -e \ + '$rss = new XML::RSS; $rss->parsefile("data/big-pdo-wdom.rss");' +real <b>5m57.838s</b> +user 4m28.727s +sys 0m3.703s +pabs@halcyon:~/cvs/libfeed/test> time ruby -rrss/2.0 -e \ + 'RSS::Parser::parse(File.read("data/big-pdo-wdom.rss"))' +real <b>2m30.950s</b> +user 1m46.904s +sys 0m8.610s +pabs@halcyon:~/cvs/libfeed/test> time ./testfeed data/big-pdo-wdom.rss \ + >/dev/null 2>&1 +real <b>0m2.195s</b> +user 0m1.472s +sys 0m0.104s +pabs@halcyon:~/cvs/libfeed/test> time ./testfeed data/big-pdo-wdom.rss \ + >/dev/null 2>&1 +real <b>0m2.010s</b> +user 0m1.475s +sys 0m0.099s +</code></pre> + +<p> +The <a href='http://perl.org/'>Perl</a> times were so bad I had to run +them twice to be sure. 60 times faster than <a +href='http://ruby-lang.org/'>Ruby</a> and over 100 times faster than <a +href='http://perl.org/'>Perl</a>; I'd say that's a pretty good start :). +</p> + +<p> +Unfortunately, I have to be awake in three hours, so I'll have +to save the rest of the next-gen <a href='http://raggle.org/'>Raggle</a> +description for another day... +</p> + |