[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]

On Pseudohashes



Raphaël>My feelings, exactly!

Raphaël> The idea of hashes/array polymorphism is appealing, but the current
Raphaël> implementation exposes way too much information to the user. It
Raphaël> reminds me of C++ justifying language design semantics mostly with
Raphaël> implementation reasons.

Me, it reminds of those Pascal string implementations that "hid" the 
length in byte stored at the 0th element of the array of characters.
(Yes, those were short strings.)

I confess to always having been a bit uncomfortable with pseudohashes.
This malaise stems not from their purpose but their implementation
and integration.  The "use fields" pragma wasn't very well integrated
back into the main body of Perl documentation, and did, at the
various times I looked at it, feel a bit awkward; one could probably
argue that this all remains the case today.  I am aware that others,
including but not limited to Damian, have used them to good effect,
so perhaps the fault lies in my fuzzy understanding of their mechanics
and how one goes about bending them to one's will.

I propose that we collectively consider the needs that pseudohashes
were designed to meet, then assess the extent to which those needs
have been adequately met.  This might help those of us who aren't
completely fond of them to appreciate them a bit better.  More
importantly, it offers a process we can explore to help decide
whether the experiment should be deemed a success or a failure--or
more likely, some admixture of the two.

The predominant goal appears to have been creating a alternate
underlying implementation for those hashes that conform to one
particular access pattern, that of invariant key names.  This is
exactly what you have in the static records used to implement C
structures, Pascal records, or most pointedly, the named data
attributes of an object.  Guaranteeing that keys would be neither
added nor deleted from the hash once created holds the promise
of offering us two important benefits.  Let us examine these
two properties separately from one another.

The first benefit is one of storage and access efficiency.  A generic
hash is a remarkably flexible--and consequently powerful--data
structure.  It pays for this flexibility with a profligate consumption
of memory.  In short, Perl greatly overcommits in order to accommodate
comparatively efficient access by arbitrary and unforeseeable key
names.  It also makes this sacrifice in the interests of accommodating
the subsequent addition and deletion of keys still unforeseen.

But since with sorts of hashes we're talking about, those with fixed
keys, we don't need to be able to cope with arbitrary, unforeseen
keys, nor do we have to consider how to handle changing the key
list at some future point after set-up.  My recollection is that
one point we considered the following sort a mechanism as a way of
signalling this condition:

    keys %hash = qw/key1 key2 key3 key4/;

That's not what we eventually semi-settled on, of course.

It was only natural to use an array to supply the underlying
implementation for a hash of static keys.  This means that the 
compiler had to be able to decide at parse time how to map
a given key into a particular slot in the array.

That decision to use arrays, and the ramifications of compiler
knowledge, led immediately to the second important benefit of these
invariant hashes: simple static analysis to ascertain the correctness
of access.  In the case of literals, this can be done at compile
time, greatly assuaging the not unrealistic concerns of those folks
who prefer what for lack of a better name I'll call a more type-safe
by-name access system.  Sure, we've always been type-safe in the
sense that $obj->attribute() will eventually fail at run-time if
such a method is lot permissible on whatever object comes to be
stored in that scalar.  But we never before had the compile-time
error checking of that method's own implementation, which would in
all likelihood be built upon Perl code using $self->{"attribute"}
at its heart.

The peril, of course, is how easy it is to misspell "attribute".
Abigail, Damian, and if I recall correctly, at least a couple of
other well-meaning folks have all devised devious clevernesses
involving mediated access through a tied hash to inspect for the
appropriateness of the requested keys.  Brilliant and subtle though
they may be, all these strategies fail in one critical respect:
they act at run-time, not at compile-time.  Consequently, they can
offer the programmer no assurances whatsoever of the correctness
of codepaths not actually executed.

It was Malcolm who prototyped our pseudohashes as they stand today.
He did this as an outgrowth of his work on the Perl compiler and
native code generation system.  From this came our little-documented
facility for compile-time, lexical type declarations:

    my Dog $spot;		# upper case for user-defined
    my integer @vector;		# lower case for built-in

Let me please stress how beneficial this innovation can be.  My
providing more type information at compile time, the code generator
has the chance to produce dramatically more efficient code.  Without
these hints, it would have to employ a sort of miraculous and subtle
dynamic-type analysis as Andy Koenig demonstrated discussed in his
talk on ML he gave at Usenix some years back.  This is hard stuff,
and has finite theoretical limits related to halting-problem style
determinability (although less restrictive ones than I had ever
realized).  So it's just as well that we can tell the compiler
things.

It's possible that one reason that I'm excited about lexical analysis
is that this is something that languages such as Python (and, at
least once upon a time, Tcl) were completely unable to provide.
Perhaps it's just the competitive spirit rearing its raspberry-hurling
head to sardonically nyah-nyah the serpent worshipers, but even so,
it's not something to be lightly discarded.  This buys us a lot.

What does it *not* buy us?

    * Type-conformance testing on assignment.  There's no guarantee
      that $spot above really holds a suitable object.  This is
      hard.  Not only must you do static @ISA inspection to make
      sure that my Canine $hound is allowed to be assigned to my
      Dog $spot.

    * Dealing with fields whose names are run-time determined; that
      is, not just $obj->{"attribute"}, but also of $obj->{$attrname}.
      My understanding is that it is the need to convert the latter
      into a valid array index at run-time that resulted in the
      zeroth element of the array holding the hash key names for
      the pseudohash.

    * Compile-time detection of the validity of $obj->attribute()
      in addition to just $obj->{"attribute"}.  In the general case,
      this lies somewhere between hard and impossible.  But in some
      cases, it could be made to work.  I suspect that nearly all
      of us have kicked ourselves for misspelling a subroutine name,
      whether in the case of a method call as above, or in calling
      a Perl built-in improperly, such as using getpwname() or
      lenght().

Those are only the simplest of cases.  The non-simple cases, are,
well, beyond the scope of this programming language we call Perl.
The issue is that you must also be able to ascertain the type of
an arbitrary expression.  Can f(g($n)) be safely assigned to $spot?
What about f($n)+g($m)?  Perhaps I'm wrong, but it certainly seems
to me that that attempting to support this leads Perl into the pains
of C++ that many of us have sought to avoid.  Larry has stated that
Perl was not going to become a statically-typed language, though,
so I don't really see this ever happening.  Those sorts of languages
exist, and you should use them if that's the kind of thing that you
want.

I still hold some hope for a bit of help from the compiler on the
third case.  Being able to run

    perl -MO=Lint,-context,-undefined-subs myperlprogram

really just isn't the same thing.  I, and many of us, would like
there to be some pragma that, for most intents and purposes, forces
one to declare all function calls.  The goal would be to catch typos
like getpwname() or lenght() or $obj->atribute() at compile time,
much as today we can, if we want to, detect things like @ARVG or
even "use File::Basname" at compilation.

Enough of that particular desideratum, which interesting and even
important though it may be, is nevertheless straying a bit from the
topic I set out to explore.

What then, about pseudohashes?  They provide an efficient way to
represent a particular subset of hashes, those that don't vary their
keys. They allow the compiler to detect a certain class of programming
error: misspelled hash subscript on such a hash.

What are their real problems?  I think for many of you, their biggest
problem is that they expose their underlying implementation as mere
arrays to the programmer's prying eyes.  They further clutter
themselves by placing details of their internal structure right
there for all to see by keeping their valid field names in their
zeroth element.  In defence of that decision, however, is the
observation of the proven utility of introspective mechanisms that
allow the programmer at run-time to peer into the inner workings
of the scaffolding his program his running on.

Pseudohashes are also, I think, somewhat tricky for use in
multiple-inheritance situations, although I seem to recall that
this matter has been addressed at least as much as it is apt to be.
I do not at this point even recall the restrictions on such.

I do not believe that the two desirable features that pseudohashes
afford us--namely, efficient run-time representation and limited
compile-time correctness tests--necessitate any particular
implementation.  The interface is as simple as one can imagine: a
hash that you make certain promises about.  So long as the actual
private implementation remain sealed up behind the public interface
with pragmata such as "use base" and "use fields", one would imagine
that the implementation would be free to transparent amendment at
some later date.

However, this appears to be of dubious feasibility due to the
retrospectively questionable decision to allow programmers to *ever*
access these pseudohashes as arrays, particularly with respect to
the zeroth element.  Had we instead restricted ourselves to thinking
of these as nothing more than hashes of static keys, rather than
pseudohashes enjoying a dual access pattern as both arrays and as
hashes, then I believe a good bit of complication and dalliance--not
the least of which being yesterday's extensive interchange on the
topic--might have been deftly avoided.

Enough.

That should, I hope, suffice to spark constructive discussion about
the real goals of pseudohashes and an analysis of the current state
of success in reaching those goals.  I hope that those of you with
less desultory experiences in pseudohashes than I myself possess
might contribute your accumulated observations on them to this thread.

Thank you.

--tom


Follow-Ups from:
Ilya Zakharevich <ilya@math.ohio-state.edu>
Peter Scott <Peter@PSDT.com>
Nick Ing-Simmons <nick@ing-simmons.net>

[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]