content/posts/2005-02-16-guess-who-s-back.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254

---
date: "2005-02-16T00:40:23Z"
title: Guess Who's Back?
---

<p>
The blog post hiatus has ended!  Here's what's new in the world o' <a
href='/'>Pablotron</a>.  First of all, the main hard drive on
<code>vault</code> &mdash; my file/database/<acronym 
title='Lightweight Directory Access Protocol'>LDAP</acronym>/email
server &mdash; bit the dust last Wednesday.  Fortunately the drive
just <em>started</em> to fail (instead of dying outright).  I had ample
room to do immediate backups, and I had an unused 160G drive laying
around.  I spent most of Sunday afternoon and all of Monday evening,
partitioning the new drive and copying stuff back to it.  As far as I
can tell, the only thing I actually lost was the words file for <a
href='http://spamprobe.sf.net/'><code>spamprobe</code></a>.  I don't really consider that much of a
loss, since I
save all my email (even the cursed spam), so I can easily toss the
requisite good and bad corpora at <a
href='http://spamprobe.sf.net/'><code>spamprobe</code></a> to get things
going again.  Even though I'm short a 100G drive now, the experience
overall has been a positive one.  Here's some thoughts I had; maybe
they'll prevent a week of stress for someone else:
</p>

<ul>
<li>Regular backups are just something you <em>do</em>.  The ad-hoc
backups I've been doing are better than nothing, but they wouldn't have
done me any good if the my drive had died outright.  Had the
circumstances been different, I would have lost weeks, possibly even a
month of email.  My solution is (rather, will be, once everything is up
and running again) an 
<acronym title='Network File System'>NFS</acronym>-mounted backup
directory on every machine (obviously not for <a href='http://snowman.net/'>peope who don't like <acronym title='Network File System'>NFS</acronym>)</a>).  Each machine will be responsible for it's
own daily and weekly backups, via <a href='http://directory.fsf.org/cron.html'><code>cron</code></a>.  Depending on how large this data set is,
I'll be burning <acronym title='Digital Video Disc'>DVD</acronym>s of
the backup directory contents on a weekly or bi-weekly basis.  
Aside: <a href='http://richlowe.net/'>Richard (richlowe)</a> has been
advocating revision controlled config files for quite a while (eg.
<code>cvs -d pabs@cvs:/cvs co etc-files/vault</code>); maybe
I'll give that for a spin, too.</li>
<li>Distribute services across machines.  I've got 4 other machines
sitting around twiddling their thumbs at the moment.  Any of them coud
easily be an authentication, database, email,
<acronym title='Lightweight Directory Access Protocol'>LDAP</acronym>,
or <acronym title='Concurrent Versioning System'>CVS</acronym> server,
but instead they're all sitting around twiddling their thumbs (to be
fair, <code>sumo</code> is my <acronym title='Internet Relay Chat'>IRC</acronym>
/<a href='http://postgresql.org/'>PostGres</a> machine, but that hardly
qualifies as a crippling load).</li>
<li>Keep extra hardware laying around.  As a true geek you're already
doing this, of course :).  The drive in <code>vault</code> started
failing at 1:30 in the morning on a Wendesday morning.  I was able to
start making backups and moving stuff around <em>right then</em>.  If I
didn't have the extra hard drive, I would have been 
<acronym title='Shit Out of Luck'>SOL</acronym> for several
platter-scraping hours.</li>
<li>Losing your spam filter settings means you get to say cool words
like "corpora" on your web page.</li>
</ul>

<p>
On the non-catastrophic hardware failure front, I upgraded
<code>halcyon</code> to the latest <a href='http://x.org/'>Xorg</a>, then
promptly downgraded to the latest stable release.  Here's the
approximate order of events:
</p>

<ol>
<li>Spent an hour or two configuring, compiling, and installing the
latest <a href='http://x.org/'>Xorg</a>.</li>
<li>Ran X, and found out that the proprietary <a href='http://nvidia.com/'>NVidia</a> driver isn't compatible with the latest 
<acronym title='Concurrent Versioning System'>CVS</acronym> snapshot of <a
href='http://x.org/'>Xorg</a>.</li>
<li>Discovered just how painful the <a
href='http://www.freedesktop.org/Software/CompositeExt'>composite
extension</a> is without hardware acceleration by foolishly attempting
to run X using the <code>nv</code> driver.  Hint: Imagine using
Netscape Navigator 3.0 on your old Commodore 64 with Photoshop doing an
RLE Gaussian Blur on a 100 meg image in the background.</li>
<li>Promptly downgraded to the stable release, cursing both <a
href='http://nvidia.com/'>NVidia</a> for their proprietary sillyness,
and the bastards at <a
href='http://freedesktop.org/'>freedesktop.org</a> for having the
audacity to make source code changes that inconvenienced me.  I spent
plenty of time on this step, so go ahead and re-read that last paragraph
a couple of times.</li>
</ol>

<p>
Since I spent the majority of a Sunday afternoon recompiling X no less
than 3 times, I also took the opportunity to try out the latest <a
href='http://enlightenment.org/'>Enlightenment DR16</a> from <acronym
title='Concurrent Versioning System'>CVS</acronym> (yes Kim, I'm one of
the <a
href='http://sourceforge.net/mailarchive/message.php?msg_id=10424379'>few
people still using e16</a>).  It's got it's own built-in, mostly (semi?)
working composite manager, so the neither the patch nor the
<code>xcompmgr</code> hackery I describe in <a
href='http://pablotron.org/?cid=1402'>this post</a> are necessary any
more).  The new default theme looks great, too!
</p>

<p>
Why use other peoples' broken software when you can write your own?
Here's the latest on the <a href='http://pablotron.org/'>Pablotron</a>
coding front:
</p>

<ul>
<li>I've converted the 
<acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 
feeds on <a href='http://pablotron.org/'>pablotron.org</a>, <a
href='http://paulduncan.org/'>paulduncan.org</a>, and <a
href='http://raggle.org/'>raggle.org</a> from steaming loads of 
standards-incompliant crap to pedantically-correct
<a href='http://blogs.law.harvard.edu/tech/rss'><acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 2.0</a>.
If your <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym>
aggregator couldn't read my pages before, it probably can now (unless
your aggregator is based on the 
<acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> library
built-in to <a href='http://ruby-lang.org/'>Ruby 1.8</a>, but I'll get
to that part of the story in a few minutes...)</li>
<li>Lots and lots and lots of updates to the next version of <a
href='http://raggle.org/'>Raggle</a>.  Some of the changes are even by me!  <a
href='http://halffull.org/'>Thomas Kirchner (redshift)</a> has been
doing an unbelievable amount of work on the <a
href='http://cvs.pablotron.org/?m=raggle'><acronym
title='Concurrent Versioning System'>CVS</acronym> version of
Raggle</a>.  So much so, in fact, that I feel kind of embarassed calling
this latest version mine at all.  So I think when it's ready for
release, we'll call it <code>kirchneraggle</code> or something more
suitable ;).</li>
<li><a
href='http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/4393'>This
patch</a> for <a href='http://ruby-lang.org/'>Ruby</a> which 
adds <code>wcolor_set</code> support to the built-in Curses interface.
Ville suggested it eons ago, and that was the last thing stopping me
from porting <a href='http://raggle.org/'>Raggle</a> from Ncurses-Ruby.
</li>
<li>A partially working Curses windowing library for <a
href='http://ruby-lang.org/'>Ruby</a>.  This isn't in <a
href='http://cvs.pablotron.org/'><acronym title='Concurrent Versioning
System'>CVS</acronym></a> just yet, but don't worry, I've got some new
stuff for you to play with.  Keep reading...</li>
</ul>

<p>
The big stuff I've been working on lately is core of the future <a
href='http://raggle.org/'>Raggle</a>.  Before I begin, here's a
high-level overview of how the components interact with one another
(yup, a diagram!):
</p>

<p>
<img src='http://pablotron.org/gallery/misc/next-gen-raggle-thumb.png'
  width='574' height='602' title='next gen raggle' alt='next gen raggle'
  border='0' />
</p>

<p>
I've mentioned <a
href='http://cvs.pablotron.org/'><code>Squaggle</code></a> previously,
but for those of you sleeping in the back of the class (you know who you
are), here's a brief
recap.  <a href='http://cvs.pablotron.org/?m=squaggle'>Squaggle</a> is 
the <a href='http://sqlite-ruby.rubyforge.org/'>SQLite-Ruby</a>-based
engine for <a href='http://raggle.org/'>Raggle</a>.  It's <a
href='http://pablotron.org/?cid=1398'>cleaner</a>, faster, it
uses less memory, and it lets me do all sorts of cool things I can't
really do with the current engine (fancty <a
href='http://del.icio.us/'>delicious</a>-style tagging, fast cross-feed
searching, smart/auto categorization, and more).  The version of <a
href='http://cvs.pablotron.org/?m=squaggle'>Squaggle</a> in <a
href='http://cvs.pablotron.org/'><acronym title='Concurrent Versioning System'>CVS</acronym></a>
is functional (it even includes a usable <a
href='http://webrick.org/'>WEBrick</a>-based interface.  
</p>

<p>
So what's this new stuff on ye olde diagram?  <a
href='http://cvs.pablotron.org/?m=libptime'><code>libptime</code></a> is a
C-based 
<a href='http://asg.web.cmu.edu/rfc/rfc822.html#sec-5'><acronym
title='Request For Comments'>RFC</acronym>822 datetime</a> and <a
href='http://www.w3.org/TR/NOTE-datetime'><acronym
title='W3 Consortium'>W3C</acronym> datetime</a> parsing library.  It's
BSD licensed, so you can <a
href='http://pablotron.org/download/libptime-0.1.0.tar.gz'>download
version 0.1.0</a> (<a
href='http://pablotron.org/download/libptime-0.1.0.tar.gz.asc'>signature</a>),
and use it to your heart's content.  The other new library on the
diagram is <a
href='http://cvs.pablotron.org/?m=libfeed'><code>libfeed</code></a>, an
<a href='http://expat.sf.net/'>Expat</a>-based <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 
(0.9x, <a href='http://web.resource.org/rss/1.0/spec'>1.0</a>, and
<a href='http://blogs.law.harvard.edu/tech/rss'>2.0</a>)/<a
href='http://www.atomenabled.org/developers/syndication/atom-format-spec.php'>Atom</a>
feed parser.  Why bother writing an <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym> 
parser in C?  The existing <a href='http://raggle.org/'>Raggle</a> engine is
slow, partly from being <acronym
title='Document Object Model'>DOM</acronym>-based, and partly from being
written in <a href='http://ruby-lang.org/'>Ruby</a>.  Don't get me wrong, <a
href='http://www.germane-software.com/software/rexml/'>REXML</a> is a
great <acronym title='eXtensible Markup Language'>XML</acronym> parser,
but <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym>
aggregators deal in volume, and I want to be sure the volume isn't
constrained by parsing.  I also noticed there wasn't a nice C-based <acronym title='Really Simple Summary / RDF Site Summary / Who really knows what RSS stands for now, anyway?'>RSS</acronym>/Atom 
parsing library.  Now there is (well, almost!).  If that doesn't convince you, then maybe this will:
</p>
<pre style='padding: 20px;'><code>
pabs@halcyon:~/cvs/libfeed/test&gt; du -sh data/big-pdo-wdom.rss 
<b>15M</b>     data/big-pdo-wdom.rss
pabs@halcyon:~/cvs/libfeed/test&gt; time perl -mXML::RSS -e \
  '$rss = new XML::RSS; $rss-&gt;parsefile("data/big-pdo-wdom.rss");'
real    <b>7m56.892s</b>
user    4m31.578s
sys     0m19.939s
pabs@halcyon:~/cvs/libfeed/test&gt; time perl -mXML::RSS -e \
  '$rss = new XML::RSS; $rss-&gt;parsefile("data/big-pdo-wdom.rss");'
real    <b>5m57.838s</b>
user    4m28.727s
sys     0m3.703s
pabs@halcyon:~/cvs/libfeed/test&gt; time ruby -rrss/2.0 -e \
  'RSS::Parser::parse(File.read("data/big-pdo-wdom.rss"))'
real    <b>2m30.950s</b>
user    1m46.904s
sys     0m8.610s
pabs@halcyon:~/cvs/libfeed/test&gt; time ./testfeed data/big-pdo-wdom.rss \
  &gt;/dev/null 2&gt;&amp;1
real    <b>0m2.195s</b>
user    0m1.472s
sys     0m0.104s
pabs@halcyon:~/cvs/libfeed/test&gt; time ./testfeed data/big-pdo-wdom.rss \
  &gt;/dev/null 2&gt;&amp;1
real    <b>0m2.010s</b>
user    0m1.475s
sys     0m0.099s
</code></pre>

<p>
The <a href='http://perl.org/'>Perl</a> times were so bad I had to run
them twice to be sure.  60 times faster than <a
href='http://ruby-lang.org/'>Ruby</a> and over 100 times faster than <a
href='http://perl.org/'>Perl</a>; I'd say that's a pretty good start :).
</p>

<p>
Unfortunately, I have to be awake in three hours, so I'll have
to save the rest of the next-gen <a href='http://raggle.org/'>Raggle</a>
description for another day...
</p>