Howto mod_rewrite with Apache


Howto mod_rewrite with Apache

About mod_rewrite for Apache

This module uses a rule-based rewriting engine (based on a regular-expression parser) to rewrite requested URLs on the fly. It supports an unlimited number of rules and an unlimited number of attached rule conditions for each rule to provide a really flexible and powerful URL manipulation mechanism. The URL manipulations can depend on various tests, for instance server variables, environment variables, HTTP headers, time stamps and even external database lookups in various formats can be used to achieve a really granular URL matching.

This module operates on the full URLs (including the path-info part) both in per-server context (httpd.conf) and per-directory context (.htaccess) and can even generate query-string parts on result. The rewritten result can lead to internal sub-processing, external request redirection or even to an internal proxy throughput.

But all this functionality and flexibility has its drawback: complexity. So don’t expect to understand this entire module in just one day.

Using Apache’s mod_rewrite

The Apache module mod_rewrite can be used to perform various forms of URI acrobatic manipulation. A prerequisite concept before attempting to understand mod_rewrite are regular expressions.

When a URL is requested by a server, this does not necessarily map directly to the server’s filesystem. This request can be twisted and turned to (hopefully) present more sense to the browsing user.

Why should I care?

A clean URL is part of a good user experience. It works as a breadcrumb trail – allowing the user to see where they are located in the site, it doesn’t break in bookmarks, you can easily send it over email, and allows users to guess where they want to go next. Most importantly a easy to read URL will be indexed by search engines such as Google. Having all your pages indexed creates a huge advantage to getting visitors to your website. Many search engine robots cannot read a URL with symbols such as ? & or commas and therefor not indexing your website.

This is all possible by using human-readable URLs:

“In principle, users should not need to know about URLs which are a machine-level addressing scheme. In practice, users often go to websites or individual pages through mechanisms that involve exposure to raw URLs.” — Jakob Neilsen, Jakob Nielsen’s Alertbox, March 21, 1999: URL as UI

“a URL should contain human-readable directory and file names that reflect the nature of the information space.” — Jakob Nielsen, item #4 Top Ten Mistakes in Web Design

By choosing a well thought-out URL, you won’t have to change it during the next re-organization. URLs that remain the same tend to pick up more links over time.

Getting started:

First, Apache must be compiled with the mod_rewrite module for any of this to take place. Insert these lines into the vhost definition for the domain that you want to work with.

RewriteEngine on
RewriteLog /path/to/logs/server.rewrite.txt
RewriteLogLevel 1

The first line turns the RewriteEngine on. Otherwise, extra code doesn’t get processed by the Apache webserver. Next, we specify where the logfile that records the rewrite activity should be placed. This is mostly for debugging, as your CustomLog should be keeping track of traffic.

A beginning example:

One of the simplest uses of mod_rewrite is to re-direct a web request from one page to another. Many times this will be done if the first has expired, was spelled wrong, or the site has a new naming scheme. It’s nice to forward new users to the correct page in case they have the previous one bookmarked, or if a search engine has cached the old location.

RewriteRule ^/biogarphy.php3 /biography/ [R=301]

This forwards a browser request from one page to the other. because the [R=301] at the end. I’ve taken a file that was spelled wrong, and fixed it at the same time removing a an old filetype suffix. (php4 has replaced that suffix with .php) What if I were to dump php from my system, and go with *.html, *.jsp, or even *.willie? By rewriting my URI to look like a directory, it doesn’t matter what filetype I’m using, nor what my DirectoryIndex options are.

Compound Example:

What if you used the above example, but didn’t decide to create a “biography” directory at your Doc-root? Apache can still be told where the content resides by including another RewriteRule following the first. Rules will continue attempting to match until a “last” case is presented with the [L] modifier at the end. This is much like a switch programming structure, using break to prevent each option from being executed.

RewriteRule ^/biography/ /biogarphy.php3 [L]

This might seem a little redundant, since we just did the opposite. This line will tell requests to “biography” to read the content from /biogarphy.php3 instead of looking for a biography directory. Confusing? Well, I could do this instead:

RewriteRule ^/(.+)/?$ /content/$1.php [L]

I can search on anything that follows the beginning slash, and replace the file request to look for that file in the content directory through the use of the regular expression and the backreference.

I’ve also placed this inside another directory that I don’t necessarily want the browsing user to see, or know about, but it’s easier for the webmaster to keep track of the roles of each file on the site. Since I’ve upgraded from php3 to php4 the suffix has changed.

More Advanced – the Query String:

The query string is passed in separately from the URL. This means that a simple regex doesn’t necessarily do the trick, but a compound statement using RewriteCond (condition) is required.

RewriteCond %{QUERY_STRING} id=([^&;]*)
RewriteRule ^/$ http://%{SERVER_NAME}/%1/? [R]
RewriteRule ^/([^/]*)/?$ /index.php?id=$1 [L]

The RewriteCondition matches only when the following condition is true, and continues until a “last” [L] is stated. The Condition’s backreferences are different, using the % prefix, and their scope lasts beyond the Condition line.

This above example would translate “/?id=home” into “/home/”, and then re-assign the value of “home” to the id HTTP_GET_VAR. One more thing to notice here is that the the second line has a trailing ? – this is used to negate copying of the query string into the new, re-directed URI.

More Reference Links: