[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]

[PATCH English.pm] Removing the regex nastiness (finally)



Laptops and long commutes means you actually get to do some of the
things in your TODO list.  Being bored, I sat down and devised a way
to write English.pm such that its no longer a global performance drag
on regexes.

$MATCH, et al are exported as trick tie'd variables.  On their first
use they replace themselves with their actual intended values.  Passes
all t/lib/english.t tests.

I also added a $VERSION number (1.00 for lack of a better term), moved
the POD after the __END__ block for slightly faster loading and
altered the docs to remove the "this module is evil" warnings.

Unfortunately, t/lib/english.t segfaults on my copy of 5.005_63 (works
fine on 5.005_03).  I suspect I've uncovered a bug in tying.  That'll
be my next post.



--- English.pm	2000/01/21 16:16:57
+++ English.pm	2000/01/21 16:48:35
@@ -1,42 +1,10 @@
 package English;
 
+$VERSION = 1.00;
+
 require Exporter;
 @ISA = (Exporter);
 
-=head1 NAME
-
-English - use nice English (or awk) names for ugly punctuation variables
-
-=head1 SYNOPSIS
-
-    use English;
-    ...
-    if ($ERRNO =~ /denied/) { ... }
-
-=head1 DESCRIPTION
-
-You should I<not> use this module in programs intended to be portable
-among Perl versions, programs that must perform regular expression
-matching operations efficiently, or libraries intended for use with
-such programs.  In a sense, this module is deprecated.  The reasons
-for this have to do with implementation details of the Perl
-interpreter which are too thorny to go into here.  Perhaps someday
-they will be fixed to make "C<use English>" more practical.
-
-This module provides aliases for the built-in variables whose
-names no one seems to like to read.  Variables with side-effects
-which get triggered just by accessing them (like $0) will still 
-be affected.
-
-For those variables that have an B<awk> version, both long
-and short English alternatives are provided.  For example, 
-the C<$/> variable can be referred to either $RS or 
-$INPUT_RECORD_SEPARATOR if you are using the English module.
-
-See L<perlvar> for a complete list of these.
-
-=cut
-
 local $^W = 0;
 
 # Grandfather $NAME import
@@ -106,10 +74,10 @@
 
 # Matching.
 
-	*MATCH					= *&	;
-	*PREMATCH				= *`	;
-	*POSTMATCH				= *'	;
-	*LAST_PAREN_MATCH			= *+	;
+	tie $MATCH, 	'English::Evil', 'MATCH';
+	tie $PREMATCH,	'English::Evil', 'PREMATCH';
+	tie $POSTMATCH,	'English::Evil', 'POSTMATCH';
+	tie $LAST_PAREN_MATCH,	'English::Evil', 'LAST_PAREN_MATCH';
 
 # Input.
 
@@ -184,4 +152,81 @@
 #	*OFMT					= *#	;
 #	*MULTILINE_MATCHING			= **	;
 
+
+# Here we set up suicidal variables which self-destruct on their first
+# use.  This protects against the use of English causing regex
+# inefficiencies.
+package English::Evil;
+
+%Evil_Vars = (
+	MATCH		=> '&',
+	PREMATCH	=> '`',
+	POSTMATCH	=> "'",
+	LAST_PAREN_MATCH	=> '+',
+);
+
+sub TIESCALAR {
+	my($proto) = shift;
+	my($self) = shift;
+	bless \$self;
+}
+
+sub FETCH {
+	my($self) = shift;
+	my($caller) = caller;
+	
+	# Replace myself with the evil in question.
+	*{$caller.'::'.$$self} = *{$Evil_Vars{$$self}};
+	
+	return ${$caller.'::'.$$self};
+}
+
+sub STORE {
+	my($self, $val) = @_;
+	my($caller) = caller;
+	
+	# Replace myself with the evil in question.
+	*{$caller.'::'.$$self} = *{$Evil_Vars{$$self}};
+	
+	${$$self} = $val;
+
+	# XXX Is this correct behavior?	
+	return ${$caller.'::'.$$self};
+}
+
+
 1;
+__END__
+=pod
+
+=head1 NAME
+
+English - use nice English (or awk) names for ugly punctuation variables
+
+=head1 SYNOPSIS
+
+    use English;
+    ...
+    if ($ERRNO =~ /denied/) { ... }
+
+=head1 DESCRIPTION
+
+This module provides aliases for the built-in variables whose
+names no one seems to like to read.  Variables with side-effects
+which get triggered just by accessing them (like $0) will still 
+be affected.
+
+For those variables that have an B<awk> version, both long
+and short English alternatives are provided.  For example, 
+the C<$/> variable can be referred to either $RS or 
+$INPUT_RECORD_SEPARATOR if you are using the English module.
+
+See L<perlvar> for a complete list of these.
+
+=head1 CAVEATS
+
+You should I<not> use this module in programs intended to be portable
+among Perl versions.  The problem of English causing regex inefficiencies 
+has been solved.
+
+=cut


-- 

Michael G Schwern                                           schwern@pobox.com
                    http://www.pobox.com/~schwern
     /(?:(?:(1)[.-]?)?\(?(\d{3})\)?[.-]?)?(\d{3})[.-]?(\d{4})(x\d+)?/i


Follow-Ups from:
Hugo <hv@crypt.compulink.co.uk>

[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index][Thread Index][Top&Search][Original]