HOW TO: Search/replace URLs with WP-CLI and Regex

July 1, 2019

Often, when pushing a WordPress site from a local or staging environment to a live site, you'll need to search/replace the domain throughout the database. In most cases, you could use a plugin like Better Search Replace, or if you're familiar with WP-CLI, you would use this command in your terminal:

wp search-replace "developmentdomain.com" "www.livedomain.com" --precise --all-tables --skip-columns=guid

Easy peasy. There's a breakdown of the 3 flags used above (plus one more useful one) at the bottom in case you're interested.

A trickier scenario

There are times when you need to be a bit more specific and use capture groups (with regex). There's a great plugin called WP Offload Media (formerly Offload to S3) that's available from the geniuses over at Delicious Brains. The plugin offloads media files from WordPress to an S3 bucket so your server doesn't get all stuffed up with static data (images/documents/videos/etc.) and automatically rewrites the URLs to match the S3 bucket location.

I ran into a case recently where a site administrator needed to temporarily disable the plugin for a specific task but still serve the content from their own server. They needed to pull that data back from the S3 bucket and rewrite the URLs to match the local path. The premium version of the plugin has a feature built in to do just that.

Without going into detail, the problem was that the URLs had propagated through other database tables that couldn't be caught by a normal search/replace due to this particular site's setup, so a more manual replacement method was needed.

Regex search-replace with CLI

Here's an example of the paths looked like:

ndflme7com.cloudfront.net/files/28/04/ndstrctblptme/picture.jpg

...and we want them to look like:

403page.com/wp-content/uploads/28/04/picture.jpg

Breaking it up

In this example, we're going to break the URL structure into 4 distinct parts:

ndflme7com.cloudfront.net/files/28/04/ndstrctblptme/picture.jpg

We need to search and replace the domain part (ndflme7com.cloudfront.net/files/) leaving the date/month intact (28/04/), then replace the next directory path (ndstrctblptme/) while retaining the filename (picture.jpg).

We'll do this in two stages.

First pass

First, we'll start with doing just the domain and files part:

wp search-replace "dndflme7com.cloudfront.net/files/" "403page.com/wp-content/uploads/" --precise --all-tables --skip-columns=guid

Now our media URLs should look something like:

403page.com/wp-content/uploads/28/04/ndstrctblptme/picture.jpg

So far so good.

Second pass (with regex)

Next we need to drop that ndstrctblptme/ completely - along with any other randomly generated directory names in that position, while retaining the filename after it.

wp search-replace "403page.com/wp-content/uploads/(\d)/(\d)/(\d*)/" "403page.com/wp-content/uploads/\$1/\$2/" --precise --all-tables --skip-columns=guid --regex

I've used (\d)/(\d)/ to match and "capture" WordPress' default directory structure for media so they can be called back in the replacement with /\$1/\$2/. You can see I used (\d*)/ to capture the random directory name, ndstrctblptme/, and don't call it back in the replacement. So now the URLs look like this:

403page.com/wp-content/uploads/28/04/picture.jpg

Optional - Third pass (multisite)

If you're working with a multisite, you need to replace the paths for subsites too, as they all have their own additional directory:

wp search-replace "403page.com/wp-content/uploads/sites/(\d)/(\d)/(\d)/(\d)/" "403page.com/wp-content/uploads/sites/\$1/\$2/\$3/" --precise --all-tables --skip-columns=guid --regex

And you're all done!

Flags Explained

  • --precise = Force the use of PHP (instead of SQL) which is more thorough, but slower.

  • --all-tables = Enable replacement on ALL tables in the database, regardless of the prefix, and even if not registered on $wpdb. Overrides –network and –all-tables-with-prefix.

  • --skip-columns=guid = The term “GUID” stands for “Globally Unique Identifier”. The GUID field is primarily used to create the WordPress feeds. When a feed-reader is reading feeds, it uses the contents of the GUID field to know whether or not it has displayed a particular item before. It does this in one of various ways, but the most common method is simply to store a list of GUID’s that it has already displayed and “marked as read” or similar.  However, the second part of that is that the GUID must never change. Even if you shift domains around, the post is still the same post, even in a new location. Feed readers being shifted to your new feeds when you change URLs should still know that they’ve read some of your posts before, and thus the GUID must remain unchanged. Read more information here.

  • --regex = Runs the search using a regular expression (without delimiters). Warning: search-replace will take about 15-20x longer when using –regex.