The Universe of Discourse


Thu, 07 Jul 2011

Pickle slicers

Will not appear in live blog

formula provider: mathjax

Pseudohashes

Perl objects are usually made from hashes, because then you can access the member data using a locution like $object->{color}. But they can also be made from arrays, which are faster than hashes, because Perl doesn't have to scan and hash the hash key and then traverse the internal hash structure looking for it. But you would have to write $object->[17] instead, which is unreadable, so nobody ever does this. You can define a compile-time alias for the 17, but then there is the question of what the scope of the alias should be.

So about ten years ago, the following idea was proposed: You would declare to Perl that you were defining an object class, say Potato. You would also declare the names of the member data fields of Potato, including a field named color. And you would build your object based on an array instead of a hash. And every time you had a variable that would contain such an object, you would declare the variable as holding a Potato. Then Perl would have enough information that if you wrote $potato->{color} it would be able to figure out that you really meant $potato->[17] and compile that as if you had written that in the first place. This you would get the readability of an object based on a hash, but the performance of an object based on an array, in exchange for a bunch of declarations. So far this is not obviously a loser of an idea.

But there is a problem. What if someone writes $potato->{$key} somewhere? The value of $key can't be known until run time, and indeed might vary, so the compile-time optimization can't be performed. So the solution that was adopted was that the first element of the $potato array must contain a hash that maps member names to array indices. Then, since Perl cannot pretend that you wrote $potato->[17], it will instead pretend that you wrote $potato->[$potato->[0]{$key}]. This will of course be slower than if you had just used a hash to begin with because it has to do two array lookups in addition to doing the same hash lookup it would have had to do in the first place, but the hope was that the speedup in the common case would more than pay for the slowdown in the general case. Such an object, an array that carries a key-to-index hash in its first element, was called a pseudohash.

By this point, it should start to be clear that pseudohashes are not a great idea. The basic concept is sort of dubious, because the slow part of Perl accessors is the function call overhead, not the hash lookup, but whatever, it went into Perl 5.004. Then all sorts of complications ensued: $potato is pretending to be a hash, so what if you try to call keys on it? Well, okay, Perl can pretend you wrote keys %{$potato->[0]} instead. But what if you call delete on it? So delete and exists had to be extended to work on arrays as well as on hashes.

The extension of delete and exists to work on arrays was the moment at which I realized the feature was a bad idea, the moment when I said "Okay, this wasn't obviously a dog when it was first proposed, but at this point is had become clear that it can't be made to work." (It was shoehorned into a supposedly frozen beta release at the last minute, under the bizarre theory that "it isn't actually a change".) I wasn't the only one who thought it was crazy. Tom Christiansen argued strongly against it at this point, until Larry Wall told him to shut up. And a couple of months later it went into Perl 5.6.0.

Some time later, the space aliens who had taken control of Larry and Sarathy's brains went back to their home planet, and everyone realized that pseudohashes were a stupid idea, and they were deprecated, and eventually removed. But in between, there was ten years of suffering, and everything in Perl was slowed down by the pseudohash code.

This sad story reminds me of something once said by Erik Naggum, God rest his bitter, cranky soul, about Perl programmers trying to solve the wrong problem:

the perl programmer who veers off the road into the forest will get out of his car and cut down each and every tree that blocks his progress, then drive a few meters and repeat the whole process. whether he gets where he wanted to go or not is immaterial—a perl programmer will happily keep moving forward and look busy.
I think of this often, because it so perfectly nails a problem that a lot of smart people have—not just Perl programmers, but programmers in general. They are smart, and they love to solve problems, so their response, when faced with a problem, is to solve it. And if their solution creates more problems, so much the better, because they can solve those too!

Object access in Perl is slow? Okay, let's use arrays instead of hashes. But wait, you have to replace the memorable string names with unintelligible array indices? Okay, let's hack the compiler so that it will translate names to indices at compile time! But wait, not all the names are known at compile time? Okay, let's include a translation table in the array to handle names at run time! But wait, now you have to extend exists to work on array elements! Okay, we can solve that problem too! But wait, there seem to be a lot of trees in this direction, but that's okay, my car has a trunk full of axes!

Smart match

Last month RJBS and I were discussing the Perl "smart match" feature, embodied in the new ~~ operator, and its accomplices given and when. If you don't know about this yet, good for you! There are really only two things to say. First, to understand the behavior of this language feature requires that you memorize a table with 23 entries. And second, the space aliens have gone home again and people are talking about how to salvage some value from this idiotic feature, which is likely to be ruthlessly overhauled in the next couple of years. This week RJBS mooted a cut-down version that reduces the table from 23 bizarre items to five sane and memorable items. (For example, if you do $a ~~ $b and $b is a code reference, the result is true if and only if $b->($a) is true.) Response to this proposal was entirely positive.

Okay, at least it didn't take as long to get rid of this insane shitpile as it took to get rid of pseudohashes. (It was introduced in late 2005, and ought to be gone by spring of 2013 or so.) But one wonders how it got in at all.

Technology is complicated stuff, and it can be really, really hard to see all the ramifications of a change ahead of time; often you have to go ahead and take risks and see how things turn out. Sometimes. But sometimes it should be clear ahead of time that an idea is bad. This was one of those times. The table for the behavior of the ~~ in Perl 5.10 already had 18 items, and it was already a big mess. (The table in original patch had 15 items, and the author got some of them mixed up.) I want to say that anyone could have seen at the time that it was a bad idea. Clearly, a lot of people didn't, or at least the people who counted didn't; I couldn't find any serious criticism of this feature in the mailing list archives between December 2005, when it was introduced, and December 2007, when I stopped looking. But I thought it was rotten the first time I saw it, as part of the Perl 6 proposal, around ten years ago, and I can't have been the only one.

Pickle slicers

I don't pretend to be good at seeing the future, or at picking winning technologies, which is as much an exercise in seeing the future as in anything else. But it seems that I am a lot better at picking losing technologies. I think I have a pretty good track record at recognizing fairly early that certain technologies are actually pickle slicers.

Every so often someone shows up with a new technology that seems to me to be not just a bad idea, but obviously and entirely a bad idea; I have just given two examples. The mystery, from my point of view, is not why this is so obvious, but how it could fail to be obvious to everyone. It is as if someone has showed up on the Perl mailing list with a pickle-slicing machine and said "Hey, everyone! Let's all stick our dicks in this pickle-slicing machine! It'll be AWESOME!" And then instead of saying "are you joking?", everyone lines up to do it! And they keep doing it for years, until one day the space aliens go home and everyone looks down and says "Why did we think that would be a good idea?"

SPF

Here is another example of geeks cutting down trees trying to get through the forest, a pickle slicer if ever there was one.

You probably get a lot of spam with forged sender addresses. Perhaps it is possible to recognize that that these messages are forged and to reject them early, before they get into your mailbox. How about this: when your mail server is contacted by the server at IP address 1.2.3.4, and asked to receive a message that appears to be from fred@example.com, it will look up the "SPF records" for example.com and find out if the address 1.2.3.4 is in fact allowed to send mail from example.com. If not, your server can immediately reject the message, which is forged.

That seems reasonable enough, unless you know anything at all about email and you think about it for two minutes, in which case you will think of at least half a dozen fundamental problems with it. For example, it breaks email forwarding. I cannot forward email delivered to mjd@plover.com to elsewhere, because when mail from fred@example.com is forwarded from plover.com, the receiving server will look up the SPF records for example.com, discover that plover.com is not one of the permitted senders of example.com messages, and reject the forwarded message as a forgery.

A lot of geeks probably had the idea for SPF, thought about it for two minutes, realized it would break forwarding, and decided on that basis that it was a bad idea. But perhaps it was inevitable that one day some geek would decide that that was a problem he could solve. So he opened the trunk and got out the axes. A lot of people pointed out other problems with it, but this just whetted the geeks' appetites. The trunk is full of axes.

I had a conversation with one of the inventors of SPF way back when the idea was first proposed. I immediately pointed out that it wouldn't work. "That's just what Brad Templeton said," he told me sadly. "Why does everyone say this won't work?"

And then they took their pickle slicer around to Yahoo! and AOL and the other big email providers and got all of them to shove their dicks into it. And now we have SPF, which turned out to be just as worthless as Brad Templeton and a hundred other people predicted. And a lot of email forwarding is broken.

My current prediction

I've resolved that I'm going to trust myself a little more when I think some technology is insanely bad. As I said, I may not know a good technology when I see it, but I think I know a bad one. For a success, a lot of things have to go right, and have to fit together. It can be hard to see whether that will happen, and some things look great at first and then turn out to be lousy. But some failures are simple: they look lousy at first because they are lousy.

I know next to nothing about MongoDB, but I read an awesome slide presentation about how to optimize it, and my response was "You have got to be kidding. People put up with this shit?"

Please do not confuse this with an argument. I have no argument to present. I don't even have a well-informed opinion. All I have is a strong, unpleasant feeling in my viscera and a conviction that I am going to stay as far from MongoDB as I can for as long as I can, until the aliens go back to their home planet, because I have seen a lot of lot of pickle slicers, and I hope to keep my dick out of this one.

So don't write to me with your argument that MongoDB is okay, because I think if you can still think that after reading through those slides, we have nothing to talk about. Maybe you're right and I'm wrong. Awesome. Have fun.


[Other articles in category /tech] permanent link