Jonathan Hilgeman

Everything complex is made up of simpler things.

Use filter_var to Strip Out Non-Digit Characters

Here’s a fun tidbit – how many times have you used a simple regular expression like this:

$x = preg_replace("/[^0-9]/","", $y);

…to quickly strip out all non-digit characters in a string? I’ve done it probably a hundred times over the years, but I ran across a script today that was packed FULL of code that used regular expressions where they didn’t need to be used, and it was taking about 100 milliseconds to run on each loop (and there were thousands of loops).

I know the character-by-character parsing in C# is extremely fast, so I first tried to do the same thing with a user function in PHP like this:

function stripNonDigits($str)
{
  $new = "";
  $len = strlen($str);
  for($i = 0; $i < $len; $i++)
  {
    $c = $str[$i];
    if(($c >= '0') || ($c <= '9'))
    {
      $new .= $c;
    }
  }
  return $new;
}

I assumed that wasn’t going to work all that well since it wasn’t compiled code, and I was right. The overhead alone from calling a user function almost surpassed the regex performance times, and once I added in the character comparison and string concatenation, it was all over. It was way slower than regex (over 2x as slow)!

I then figured that PHP had to have a way to do this, and I remembered filter_var had a filter for integer numbers, although it didn’t strip out + or – signs, so I set up another test where the filtering line was just:

$signs = array("-","+");
$x = str_replace($signs,"",filter_var($y, FILTER_SANITIZE_NUMBER_INT));

The result was blistering faster compared to preg_replace. I ran the same code against a variety of data sources – from 64-byte strings to 256k strings, and filter_var() consistently outperformed all other methods. The only way I can think of to get better performance is to build a custom PHP extension that would strip out the plus and minus signs as well, but this is about as good as we can get it using standard PHP functionality.

So next time you reach for regex, check out filter_var() first!

Add A Comment