Wednesday, April 8, 2009

.htaccess - Part 2

Well, first another boring bit! To prevent people from being able to see the contents of your .htaccess file, you need to place the following code in the file:


order allow,deny
deny from all

Be sure to format that just as it is above, with each line on a new line as shown. There is every likelihood that your existing .htaccess file, if you have one, includes those lines already.

Magic Trick No. 1: Redirect to Files or Directories

You have just finished a major overhaul on your site, which unfortunately meant you have renamed many pages that have already been indexed by search engines, and quite possibly linked to or bookmarked by users. You could use a redirect meta tag in the head of the old pages to bring users to the new ones, but some search engines may not follow the redirect and others frown upon it.

.htaccess leaps to the rescue!

Enter this line in your .htaccess file:

Redirect permanent /oldfile.html http://www.domain.com/filename.html

You can repeat that line for each file you need to redirect. Remember to include the directory name if the file is in a directory other than the root directory:

Redirect permanent /olddirectory/oldfile.html http://www.domain.com/newdirectory/newfile.html

If you have just renamed a directory you can use just the directory name:

Redirect permanent /olddirectory http://www.domain.com/newdirectory

(Note: The above commands should each be on a single line, they may be wrapping here but make sure they are on a single line when you copy them into your file.)

This has the added advantage of preventing the increasing problem on the Internet, as people change their sites, of 'link rot'. Now people who have linked to pages on your site will still have functioning links, even if the pages have changed location.

Magic Trick No. 2: Change the Default Directory Page

In most cases the default directory page is index.htm or index.html. Many servers allow a range of pages called index, with a variety of extensions, to be the default page.

Suppose though (for reasons of your own) you wish a page called honeybee.html or margarine.html to be a directory home page?

No problem. Just put the following line in your .htaccess file for that directory:

DirectoryIndex honeybee.html

You can also use this command to specify alternatives. If the first filename listed does not exist the server will look for the next and so on. So you might have:

DirectoryIndex index.html index.htm honeybee.html margarine.html

(Again, the above should all be on a single line)

Magic Trick No. 3: Allow/Prevent Directory Browsing

Most servers are configured so that directory browsing is not allowed, that is if people enter the URL to a directory that does not contain an index file they will not see the contents of the directory but will instead get an error message. If your site is not configured this way you can prevent directory browsing by adding this simple line to your .htaccess file:

IndexIgnore */*

But there may be times when you want to allow browsing, perhaps to allow access to files for downloading or for whatever reason, on a server configured not to allow it. You can override the servers settings with this line:

Options +Indexes

Easy!

Magic Trick No. 4: Allow SSI in .html files

Most servers will only parse files ending in .shtml for Server Side Includes. You may not wish to use this extension, or you may wish to retain the .htm or .html extension used by files prior to your changing the site and using SSI for the first time.

Add the following to your .htaccess file:

AddType text/html .html
AddHandler server-parsed .html
AddHandler server-parsed .htm

You can add both extensions or just one.

Remember though that files which must be parsed by the server before being displayed will load more slowly that standard pages. If you change things as above, the server will parse all .html and .htm pages, even those that do not contain any includes. This can significantly, and unnecessarily, slow down the loading of pages without includes.

Magic Trick No 5: Keep Unwanted Users Out

You can ban users by IP address or even ban an entire range of IP addresses. This is pretty drastic action, but if you don't want them, it can be done very easily.

Add the following lines:

order allow,deny
deny from 123.456.78.90
deny from 123.456.78
deny from .aol.com
allow from all

The second line bans the IP address 123.456.78.90, the third line bans everyone in the range 123.456.78.1 to 123.456.78.999 and so is much more drastic. The fourth line bans everyone from AOL. A somewhat excessive display of power perhaps!

One thing to bear in mind here it that banned users will get a 403 error - "You do not have permission to access this site", which is fine unless you have configured a custom error for this page which in fact appears to let them in. So bear that in mind and if you are banning users for whatever reason make sure your 403 error message is a dead end.

Magic Trick No. 6: Prevent Linking to Your Images

The greatest and most irritating bandwidth leech is having someone link to images on your site. You can foil such thieves very easily with .htaccess. Copy the following into your .htaccess file:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?domain.com/.*$ [NC]
RewriteRule \.(gif|jpg)$ - [F]

You don't need to understand any of that! Just change 'domain.com' to the name of your domain.

(Again each command should be on a single line. There are 4 lines above, each starting with 'Rewrite')

If you want to really let them know they have been rumbled why not make an image like the one below (or take this one if you like)

call it stealing.gif, save it to your images file and add the following line after the code above:

RewriteRule \.(gif|jpg)$ http://www.domainname.com/images/stealing.gif [R,L]

(The above command should be on a single line)

Magic Trick No 7: Stop the Email Collectors

While you positively want to encourage robot visitors from the search engines, there are other less benevolent robots you would prefer stayed away. Chief among these are those nasty 'bots that crawl around the web sucking email addresses from web pages and adding them to spam mail lists.

RewriteCond %{HTTP_USER_AGENT} Wget [OR]
RewriteCond %{HTTP_USER_AGENT} CherryPickerSE [OR]
RewriteCond %{HTTP_USER_AGENT} CherryPickerElite [OR]
RewriteCond %{HTTP_USER_AGENT} EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ExtractorPro
RewriteRule ^.*$ X.html [L]

Note that at the end of each line for a named robot there appears an '[OR]' - don't forget to include that if you add any others to this list.

This is by no means foolproof. Many of these sniffers do not identify themselves and it is almost impossible to create an exhaustive list of those that do. It's worth a try though if it even keeps some away. The above as as many as I could find.

....and Finally

There is one very important area of the .htaccess file's use that we have not really mentioned and that is its use for user authentication. It is perfectly possible to configure your .htaccess files by hand to control access to directories on your site, but this is rarely necessary.

In most cases your host will provide a method to allow you to much more easily configure the file from your hosting control panel and there are a myriad of Perl scripts that will allow you to set up full user management systems by harnessing the power of .htaccess.

No comments:

Post a Comment