Oakland.pm

Reviews

Review of "Mastering Regular Expressions"

author: Jeffrey Friedl

reviewer: George Woolley


Longer Review

Contents

Note:

  • Since the Second and Third Editions of this book are similar in structure and content, this review of the Third Edition uses material from my review of the Second Edition liberally.

The Title

The Meaning of the Title: The book "Mastering Regular Expressions" doesn't belong to an O'Reilly series. So I don't have that to guide me in interpreting the title. However, the author is quite helpful regarding this.

According to the author the book provides the information needed to acquire a "full command" (that's roughly synonymous with mastery to me) of regular expressions. It is the author's intent that the book also provide the motivation to use the information to acquire that "full command".

I gather that for the author "information" includes:

  • lists of regex features in particular applications
  • explanations of how regex work and why
  • hints, tips, pitfalls

but also

  • useful ways of thinking about things
  • useful habits to have

Does the author deliver? I don't know whether the book contains enough information to master regular expressions. If I ever feel like I'm in "full command" of regular expressions, perhaps I'll have an opinion. What I can say is that I know of no other book on regular expressions with comparable scope and detail.

Does the author provide the motivation to use this information? What I can say is that the author sure motivated me. Parts of the book were quite difficult for me, but I've now been through all of it. Some of the things contributing to my motivation have been:

  • the title of the book
  • the broad scope of the book
  • the great detail of the book
  • the author's perspective on regular expressions
  • the author's guidance when things were more difficult
  • the reputation of the author

About the Reviewer

Biases: Some of my biases that could effect this review are

  • I'm big into string manipulation.
  • Perl is my favorite language.
  • I use PHP and PHP regular expressions a lot.
  • I'm a fan of O'Reilly.

If you want more detail, take a look at the "About the Reviewer" section of my review of the article "Five Habits for Successful Regular Expressions" by Tony Stubblebine. One thing though, I now use PHP a lot.

Oh, I owned the first edition of the book being reviewed here and consider it excellent. I thought it was so good, that I bought a second copy so I could have one copy at work and one copy at home. I have an even higher regard for the 2nd and 3rd editions.

Expectations: When I read the 2nd edition, what I was hoping for was

  • to improve my ability to write regular expressions
  • to improve my ability to teach regular expressions

When I read this 3rd Edition, what I was hoping for was mainly to improve my use of PHP regular expressions.

Relation to Second Edition

The second edition came out in 2002, the third in 2006. That's a period of four years. Many things have happened in that time. Two that had a significant effect on the 3rd Edition are:

  • Java regular expressions evolved in Java 1.5/1.6.
  • The importance of PHP and PHP regular expressions became clear.

The following table gives a crude picture of the relationship between the chapters of the second and third editions. I've included the Preface and Index in the table as I think they are important.

3rd Edition Chapter
or ...
Corresponding
Second Edition
Chapter
Comment on
3rd Edition
PrefacePreface-
1. Intro to Regex1-
2. Extended Intro Examples2-
3. Overview of Features/Flavors3-
4. Mechanics of Exp Processing4-
5. Practical Regex Techniques5-
6. Crafting an Efficient Exp6-
7. Perl7-
8. Java8The Java chapter has been updated to account for Java 1.5/1.6.
9. .NET9-
10. PHPnoneThe PHP chapter is new.
IndexIndex-

Notes:

  • Words in the chapter names are abbreviated in some cases.
  • At the level of chapters, the structure of the 2nd and 3rd Editions is almost identical.
  • A - in the third column indicates that any changes to the chapter were minor.

What I Learned about Writing Regexes

Note:

  • This section was lifted (with only minor changes) from my review of the 2nd Edition. However, the 2nd and 3rd editions are quite similar in both structure and content; the section still works for the 3rd edition with some minor rewording.

Well, I learned a great deal. Even before my review of the 2nd Edition was complete, I began using the following in some of my Perl code:

  • embedded code in regexes to assist me in understanding how they work.
  • the qr operator to help me more easily build up complex regular expressions from simpler ones

Before reading the 2nd edition of this book, I was unaware that I could embed code in regexes and I was only vaguely aware of the qr operator. Both are explained in a clear way in the book and there are examples.

My Point: Well, for all I know, you already use emedded code and qr. Still, I suggest taking a look at this book.

Or maybe Perl isn't the world you do regexes in. Well, the book also covers Java, .NET, PHP and much more. And it becomes clear reading the book that as far as regex features are concerned other languages are competitive with Perl. Perl has a prominent place in this book, but this is definitely not a Perl book.

Well, I learned a lot, and I am already putting some of what I learned to use.

What I Learned about Teaching Regexes

Note:

  • As with the previous section, this section was lifted (with only minor changes) from my review of the 2nd Edition. However, the 2nd and 3rd editions are quite similar in both structure and content; the section still works for the 3rd edition with some minor rewording.

From time to time, I get an opportunity to help a novice Perl programmer become more proficient at Perl programming in general and using regexes in particular. Reading "Mastering Regular Expressions" has put me in a better position for doing that. I give a specific example below. But more important I now have a book to recommend to novices who are serious about regexes. (Of course, I also recommend this book for people who are experienced with regexes.)

The Language Analogy: The language analogy Friedl puts forward in the book has been helpful to me, and I believe it would also be useful to novices.

Here's the picture I have from the language analogy. Perl is a programming language which includes regular expressions which are a language which includes character classes which are a mini language.

It's important to recognize that these three languages are (beautifully) integrated together in Perl. It's also often useful to recognize them as three languages. In this context, here's something Friedl says about character classes that I've found useful:

"Consider character classes as their own mini language. The rules regarding which metacharacters are supported (and what they do) are completely different inside and outside of character classes."

And the author gives concrete examples of this.

A Book for Novices: I have become convinced that this book could be useful for a Perl novice who:

  • has been using regular expressions for a while
  • is serious about gaining a deeper understanding of them

For such a person, I'd suggest the following use of the book:

chapter(s)suggestion
prefaceread thoroughly
chapter 1read thoroughly
chapter 2read thoroughly
chapters 3&7scan
and read parts of interest
chapters 4-6, 8-10just be generally aware
of contents

My Point: This book can be useful to you in helping novices. The book certainly has been useful in teaching the regex novices I've encountered who, it happens, are also typically learning Perl. Hey, you may live in a Java-centric world. What I'm saying is that there is valuable material here for a novice and for people who are oriented to helping novices.

I'm thinking this might be missed by some given that parts of this book are quite advanced. The author is super clear when dealing with basics.

What I Got from the PHP Chapter

Note:

  • The PHP chapter was added for this new edition.

PHP actually has 3 regular expression engines. This book focuses on the regular expressions supported by the preg regex engine because the author thought it the best of the three with respect to features and with respect to speed. When I started using PHP, I quickly determined I preferred the preg engine.

preg stands for "Perl Regular Expressions". preg uses the PCRE ("Perl Compatible Regular Expressions. The regular expressions supported by the preg engine are quite similar to those in Perl. However, at least from my point of view, the way preg regular expressions are integrated into PHP is quite different from the way regular expressions are integrated into Perl.

I got a number of things from the PHP chapter. Below I'll discuss two of them as examples.

preg Function Descriptions: One of the best things about the PHP chapter is the descriptions of the preg functions. These descriptions are in far more detail than the corresponding descriptions in my favorite PHP books. The descriptions take up 22 pages (to describe 7 functions).

I've already found these descriptions useful. The function of most interest to me is preg_replace. One thing I learned is that I can pass this function an array of patterns and a corresponding array of replacements and it will apply all the patterns and when the patterns match perform the corresponding replacement. I wasn't aware you could do that.

The PHP Chapter is the best treatment of preg functions I've seen anywhere. Similarly, the Perl Chapter is the best treatment of Perl regular expressions I've seen anywhere. I expect to use these chapters for reference as time goes on. The author makes it clear that he didn't intend the book as a reference. However, it is much better than anything else I'm aware of for this, so I plan to use it that way.

I'll refrain from commenting on the Java and .Net chapters since I've only used Java a little and I have never used .Net.

Recursive Regular Expressions: preg has a special notation for recursive expressions. For example, the author suggests the following regex for checking that parentheses are balanced:

      ^((?:[^()]++|\((?1)\))*)$

The (?1) says the whole first grouping may be applied where it occurs in the pattern. You don't need to understand this notation now (though if you do, cool!). Just understand there is a simple notation for recursion in preg regular expressions and it is explained in the PHP chapter.

My Point: If you use PHP preg functions much, I bet you'll find the preg function descriptions in this chapter useful.

You may not be interested in using recursion in preg, but this is just one of many things explained in this chapter.

Gripes

To keep things in perspective, this is the most impressive technical book I've read to date on any subject. But I can always find something to gripe about.

Testing: IMO, you can't master regexes without becoming effective at testing them. But this is not much addressed in the book. There's not a separate section on testing, nor even an index entry for testing. One thing that Friedl does point out is that it's a good idea to watch out for unwanted matches.

The article I mentioned earlier by Stubblebine does begin to address testing regular expressions.

Not Optimum as a Reference: This book is not intended to be a reference. However, it's such an excellent book and covers so much, that (as things stand) it makes sense to use it that way once you've worked through it.

I can think of two ways the publisher could address this:

  • commission a better index
  • publish an updated Regular Expression Pocket Reference

Perl 6 not Included: I think Perl 6 is mentioned in the book; but Perl 6 rules are not discussed, even though they rock. Perl 6 rules are evolutionary descendents (or perhaps I should say revolutionary) of what we call regular expressions in Perl 5, though I gather regular expressions in Perl and other computer languages long ago ceased to be regular expressions in mathematical terms.

Here we are in 2006 and Perl 6 has still not yet been released. OK, I'll reframe this as a wish:

If Perl 6 or some other languages makes a release including anything like Perl 6 rules, my wish is that the author update his thinking in this area with a new edition of the book, an article or some such.

Who is the Book for?

I recommend this book for anyone who

  • has used regexes for a while
  • is serious about them

You don't have to be a regex whiz to read this book. However, the book is demanding, if you aren't serious I wouldn't bother getting it.

Depending on your regex background, I have different things to say:

  • If you are a regex novice, I suggest focusing on chapters 1-2. IMO the book is worth the price for just those two chapters.
  • If you are somewhere between a novice and a whiz, absorbing the material in this book should have a major impact on your proficiency.
  • If you are a regex whiz, I'm betting you'll pick up some things anyway, especially if you haven't kept up to date on the advances in various flavors of regex.

Final Thought

This is an impressive book; it's easily the best book I've encountered on regular expressions. If you got all the way through this review, I suggest you have enough interest to get it.

Smiley Rating: :) :) :) :) :) of 5.

Completed: 2006-08-31