Clean URL with .htaccess

This script forces all URLs with components of
http://www.example.com/projects.html?i=#
or
http://www.example.com/projects.php?i=#
to a URL of the design
https://example.com/projects/


If you do not care to understand what you’re doing here, follow the Summary.
Summary: Create a file named .htaccess in the root directory of your site (the htdocs folder). Copy and paste the code at the bottom of this post into that file. If that file already exists in your root directory, paste the code at the bottom of the file.


Note: I do not know if this works for WordPress - I do not use WordPress. Please look up a tutorial specific to your tool. If you find that this script works for your tool, please share so in the comments :yellow_heart:

Why should you want a clean URL?

The Duplicate Content Problem (SEO)

At its default state, your website could be producing as many as 8 versions of a single page according to Daniel Morell at www.danielmorell.com. In the example she provides, she states that if you have SSL encryption installed, your website could exist with the following valid pages:

  1. http://example.com/blog
  2. http://example.com/blog/
  3. http://www.example.com/blog
  4. http://www.example.com/blog/
  5. https://example.com/blog
  6. https://example.com/blog/
  7. https://www.example.com/blog
  8. https://www.example.com/blog/

Typically, search engines detect this and choose which version they prefer; but this is not always the case. John Mueller clarified what counts as duplicate content for Google in this tweet.
image

When a search engine finds duplicate pages, it chooses the one it thinks is best and ignores the rest - but in doing this, it also lowers your PageRank. Your site’s PageRank is the likelihood for your site to appear in search results. This is very important because a lowered PageRank tells the search engine that your site is less likely to be trustworthy and helpful to users; it won’t promote your content as much in search results.

Furthermore, the URL an engine chooses is often not consistent with other search engines and is sometimes not the URL you’d prefer.


The Importance of Brevity

A clean URL is much more readable. Jonathan Hochman at hochmanconsultants.com states that when “short and meaningful, they are more pleasing to the eye and more likely to be shared on Twitter, Facebook, or other sites, and via email.” Not only does it make your site more accessible, but it may genuinely increase traffic through word-of-mouth.

When your website appears in a search engine’s results, you want to be as clear as possible as to the purpose and intentions of your site. By promoting clarity, you will draw in as many users as possible who are searching for the content you provide.

Keep in mind, some of users of your website may not be very tech-savvy. When navigating a site, sometimes they will use the URL to reach other pages. For example, Ryan Stemcoski at startups.com reports that, “users who do not know the logo is a route to the home page will instead clear the address bar back to the root domain to get back home.”

By providing a clean & readable URL, end users can interact with it much easier. If you further organize the pages of your site into directories such as example.com/directory/ & example.com/directory/file/, it can serve as a constant & useful navigation for your clients. When designing websites I tend to ask myself, “How can I be as clear as possible and waste the least amount of time for my users?” and design systems around that.


Given this, it is deeply beneficial to your site for there to only exist 1 valid, constant, short, and readable URL for each page. By specifying a valid URL, we can ensure that the only form of URL that our end users see is the best one. We can achieve this via redirects from the pages we don’t want to a page which we do want.

How do you make a clean URL?

We can clean our URLs by using a .htaccess file in the root directory of our websites (in the case of InfinityFree, the htdocs folder).

.htaccess File?

This type of file is an Apache server configuration file. It can change the rules of a website’s server on a per-directory basis. This means that if we place a .htaccess file in a subfolder, what’s found in the .htaccess file will only affect the files found in that folder and its subfolders.

I will note that this is not very efficient if you have access to your server config files; but as InfinityFree is not a VPS(Virtual Private Server), we do not; this is the only solution I could come across. There may be another solution via server-side scripting such as php, but I have not discovered a way to do it.


This file has many capabilities; there are a plethora of copy-paste solutions online for various things that it can accomplish. To rewrite URLs, there are the mod_rewrite tools. Sadly, there is very little instruction on how to actually read and write mod_rewrite code, but we do have the mod_rewrite documentation available to us.

Coding Rewrite Rules

The approach that’s typically used to rewrite URLs are the RewriteCond and RewriteRules directives. Using “backreferences,” there is code that will generate proper URLs given a trigger condition and a rule to apply.

You can learn to read and write this code; but really, it’s not worth the time. Other people online have solved the problems for us - I have compiled a collection of those solutions that work with InfinityFree below. If you wish to work through this anyway, the best I can do for you is to refer you to the mod_rewrite introduction, the mod_rewrite documentation, the ap_expr documentation, and the Apache glossary. As you develop your understanding of it, try to read the given code below and edit it to your liking.

If you really do something crazy (such as combining it all into 1 redirect), share it in the comments or on a #community-guides; a single redirect edit would be pretty cool - after sifting through the tool myself, I think it may be possible; I just haven’t found any examples of such a thing online. I imagine it would go as a long list of RewriteConds to detect each of the problems while gathering all the backreferences necessary for a single URL, and then a single RewriteRule that would construct the URL from those backreferences; it would probably look something like:

RewriteCond #something to detect
RewriteCond #something to detect
RewriteCond #something to detect
# https://<remove www if it's there>hostname/directorypath/
RewriteRule (.*) https://%{HTTP_HOST}/.../

Best of luck :yellow_heart:


You don’t need to understand this code to edit it.

Editing Rewrite Rules

The order of elements is important to minimize the number of redirects; but if you want to change rules, you can do so just by inserting or removing bits of code. You can find much of this code online

For example, if you want to force include www. instead, you can remove:

# Remove www. Prefix
RewriteCond %{HTTP_HOST} ^www\.
RewriteCond %{HTTPS}s ^on(s)|off
RewriteCond http%1://%{HTTP_HOST} ^(https?://)(www\.)?(.+)$
RewriteRule ^ %1%3%{REQUEST_URI} [R=301,L]

… and replace it with:

# Force www. Prefix
RewriteCond %{HTTP_HOST} !^$
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteCond %{HTTPS}s ^on(s)|
RewriteRule ^ http%1://www.%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

THE CODE IS BELOW:

.htaccess Code

Note: This enforces encryption; if you don’t have SSL encryption on your site, please set up encryption. If you plan to use your website professionally, you really need encryption. Alternatively, remove the portion of code that follows # Force https:// Prefix.

# Turn on Rewrite Engine
RewriteEngine On
# Deny URLs Not Composed of “a-zA-Z0-9.+/-?=&” characters
RewriteCond %{REQUEST_URI} !^/(wp-login.php|wp-admin/|wp-content/plugins/|wp-includes/).* [NC]
RewriteCond %{THE_REQUEST} !^[A-Z]{3,9}\ [a-zA-Z0-9\.\+_/\-\?\=\&]+\ HTTP/ [NC]
RewriteRule .? - [F,NS,L]
# Force / Suffix
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)([^/])$ https://%{HTTP_HOST}/$1$2/ [L,R=301]
# Remove ?i=1
RewriteCond %{QUERY_STRING} ^(.*)i=[^&]+(.*)$ [NC]
RewriteRule ^(.*)$ /$1?%1%2 [R=301,L]
# Forward Documents to .html Internally
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^(.*?)/?$ $1.html [L]
# Forward Documents to .php Internally
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.*?)/?$ $1.php [L]
# Force https:// Prefix
RewriteCond %{HTTPS} !on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
<IfModule mod_headers.c>
    Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
</IfModule>
# Remove www. Prefix
RewriteCond %{HTTP_HOST} ^www\.
RewriteCond %{HTTPS}s ^on(s)|off
RewriteCond http%1://%{HTTP_HOST} ^(https?://)(www\.)?(.+)$
RewriteRule ^ %1%3%{REQUEST_URI} [R=301,L]
# Remove .html Suffix
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.html [NC]
RewriteRule ^ %1/ [R=301,L]
# Remove .php Suffix
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s([^.]+)\.php [NC]
RewriteRule ^ %1/ [R=301,L]

Are all these redirects bad?

When you place http://www.example.com/projects.php?i=# into the URL, it doesn’t just redirect once; it redirects 5 times, once for each RewriteRule that’s triggered. You can see that when I use Google Chrome’s Developer Tools to monitor the network as my website loads (Ctrl+Shift+I).

At first, seeing this is concerning. That sounds like too much, right?

I don't think so.

The arguments I have heard against chained page redirects in this context are as follows.

1.) “Each redirect increases server load and loading times.”

The goal of acquiring a clean URL is to prevent duplicate URLs being found online. This means that, as long as your site hasn’t been shared much before you set up clean URLs, you shouldn’t really encounter any links that require redirects.

That is to say, the real purpose of doing this is not necessarily for a user, it’s more of a way to teach a search engine what you want to show users - redirects aren’t indexed, they’re never seen in the URL bar, and shouldn’t ever really be found or shared online. Therefore, the server should rarely encounter them and users should never encounter them.

2.) “Search engines lower PageRank for each redirect.”

Search engines used to lower PageRank for each redirect; but in 2016 Gary Illyes reported in this tweet that 301, 302, & 307 redirects don’t do that anymore.
image

3.) “Google doesn’t follow more than five redirects deep.”

As explained at the beginning of this guide, there can exist 8 versions of your website without these redirects (see The Duplicate Content Problem (SEO)). One of these versions is the final redirect destination, the other 7 are all redirected away from.

When a search engine tests all 8 of these, it’s going to recognize the one that’s correct as being the only valid result. Therefore, it reaches your page just fine, probably without any redirects. At the most, I can imagine it might have to follow 2 redirects to remove the use of .php / .html and to append the trailing /.


In conclusion, 301 redirects do not harm SEO performance or reduce the PageRank metrics associated with a page URL. I can’t find much of a reason that more than 1 redirect to result in a solid URL with .htaccess is a bad thing.

7 Likes

Nice article :+1:

2 Likes