Pattern Matching Guide (PHP)

PHP Pattern Matching Guide


There are many functions for pattern matching. Let’s start with the simplest function.

<?php int preg_match(string $pattern, string $search); ?>

You can see that preg_match() returns 1 if found, and 0 if not found.

But that’s not true or false!

Yes, it is. 1 in a weakly typed language is true. 0 is false. You can only get false if it is one of these values: 0, "", "0", false, "false", null.


Basic Match


If you use /cat/ as the pattern, it will match any cat found. Matches: my cat was here, cats are inside the house. Notice that it also matches ‘cats’ because ‘cat’ is there.

Try using /.at/ as the pattern. It matches hat, pat, sat, but also #at. To limit it between characters, use /[a-z]at/. The . is a wildcard. It matches anything except a newline (\n).


Slashes or Delimiters?


You may have noticed the slashes. What are they? They’re delimiters!

Other delimiters include: !...! and {...}.


Character Classes


You’ve only seen one example of a character class. “Where?” you may ask. Well, it’s this: []

Special characters include: - used to specify ranges, ^ “not” if it’s the first character. The carret symbol can be found at shift + 6.

Do /[a-z]/ to specify a character between a to z. To specify a character in lowercase or uppercase, do /[a-zA-Z]/. To specify a number, do [0-9].


First or Last Character


Sometimes, you want to match the first or last character.

^ - Match first character.
$ - Match last character.

Therefore, /^[a-zA-Z]$/ matches a single character from a to z and A to Z.


Subpatterns


Subpatterns (...) can specify a group easier.

/https?[1-2]/ matches http1, http2, https1, and https2.

The question mark means the character is optional.

/(https)?[1-2]/ matches https1, https2, 1, and 2.

The question mark after the subpattern means the subpattern is optional. This may come in useful.


More than one or zero


Use + for more than zero. Use * for more than or equal to zero.


Escaping Characters


To escape characters, use the \ (backslash). Use \\ to match a backslash.


Counted Expressions


Counted expressions look like this: {num1, num2}. Be warned, though. If you are using the {...} delimiters, then you must escape the characters with a backslash \{...\}. To get a { and } matched with the {...} delimiters, use \\\{ and \\\}.


POSIX-styled character classes


Use [[:alpha:]] to match an alphabetic character. And there’s more…


Capturing


Capturing to \1, \2, $1, and $2 is easy with subpatterns.

Notice: Only for the function preg_replace()


Examples


Email Match

<?php $result = preg_match("/[a-zA-Z0-9\-]+@[a-zA-Z0-9\-]+\.[a-zA-Z0-9\-\.]/",  $content); ?>

Notice that we escaped the - and .. These two characters have special meanings.

I will explain the meaning.

[a-zA-Z0-9\-]+@ matches the characters before the @. The \- part matches a dash. Notice that we escape it thoroughly.

Before the dots, the part [a-zA-Z0-9\-]+\. matches the part before the ..

[a-zA-Z0-9\-\.] matches the rest of the domain, with more dots if needed (subdomains).

URL Match

<?php
$result = preg_match("{https?://[a-zA-Z0-9\.%#@?=\-]}i", $content, $array);
foreach ($array as $val) {
  echo $val."<br>";
}
?>

This example is more complicated. We match an URL with valid URL characters. Then we get all the matched results and put them in $array. Then we use a foreach loop to loop through the array.


Functions


preg_match($pattern, $test, $array) - Returns 1 if $pattern matches $test. Use $array for the part that matches. Use count($array) or sizeof($array) to get the amount of $array.

preg_split($pattern, $string) - Turns a string into an array by regular expressions.

preg_match_all($pattern, $test, $array) - returns how many times the pattern was matched.

preg_replace($pattern, $replacement, $search) - searches for a pattern that matches $pattern and replaces it with $replacement in $search. Returns the new string.

There is more, but I think I’ll stop now. Thanks for reading about regular expressions!

3 Likes

Oh, REGEX guide!

Note that PHP uses (I think it is) PERL REGEX syntax that is used by lots of other languages as well, so while maybe the regex function is different, the regex syntax is almost exactly the same.

3 Likes

Just as a note, to avoid confusion with people, this means true or false. 1 being True, 0 being False.

2 Likes

Good point, I just added a note on that

1 Like

Plus empty values such as NULL.

Yes