Wednesday, April 8, 2009

.htaccess - Part 2

Well, first another boring bit! To prevent people from being able to see the contents of your .htaccess file, you need to place the following code in the file:


order allow,deny
deny from all

Be sure to format that just as it is above, with each line on a new line as shown. There is every likelihood that your existing .htaccess file, if you have one, includes those lines already.

Magic Trick No. 1: Redirect to Files or Directories

You have just finished a major overhaul on your site, which unfortunately meant you have renamed many pages that have already been indexed by search engines, and quite possibly linked to or bookmarked by users. You could use a redirect meta tag in the head of the old pages to bring users to the new ones, but some search engines may not follow the redirect and others frown upon it.

.htaccess leaps to the rescue!

Enter this line in your .htaccess file:

Redirect permanent /oldfile.html http://www.domain.com/filename.html

You can repeat that line for each file you need to redirect. Remember to include the directory name if the file is in a directory other than the root directory:

Redirect permanent /olddirectory/oldfile.html http://www.domain.com/newdirectory/newfile.html

If you have just renamed a directory you can use just the directory name:

Redirect permanent /olddirectory http://www.domain.com/newdirectory

(Note: The above commands should each be on a single line, they may be wrapping here but make sure they are on a single line when you copy them into your file.)

This has the added advantage of preventing the increasing problem on the Internet, as people change their sites, of 'link rot'. Now people who have linked to pages on your site will still have functioning links, even if the pages have changed location.

Magic Trick No. 2: Change the Default Directory Page

In most cases the default directory page is index.htm or index.html. Many servers allow a range of pages called index, with a variety of extensions, to be the default page.

Suppose though (for reasons of your own) you wish a page called honeybee.html or margarine.html to be a directory home page?

No problem. Just put the following line in your .htaccess file for that directory:

DirectoryIndex honeybee.html

You can also use this command to specify alternatives. If the first filename listed does not exist the server will look for the next and so on. So you might have:

DirectoryIndex index.html index.htm honeybee.html margarine.html

(Again, the above should all be on a single line)

Magic Trick No. 3: Allow/Prevent Directory Browsing

Most servers are configured so that directory browsing is not allowed, that is if people enter the URL to a directory that does not contain an index file they will not see the contents of the directory but will instead get an error message. If your site is not configured this way you can prevent directory browsing by adding this simple line to your .htaccess file:

IndexIgnore */*

But there may be times when you want to allow browsing, perhaps to allow access to files for downloading or for whatever reason, on a server configured not to allow it. You can override the servers settings with this line:

Options +Indexes

Easy!

Magic Trick No. 4: Allow SSI in .html files

Most servers will only parse files ending in .shtml for Server Side Includes. You may not wish to use this extension, or you may wish to retain the .htm or .html extension used by files prior to your changing the site and using SSI for the first time.

Add the following to your .htaccess file:

AddType text/html .html
AddHandler server-parsed .html
AddHandler server-parsed .htm

You can add both extensions or just one.

Remember though that files which must be parsed by the server before being displayed will load more slowly that standard pages. If you change things as above, the server will parse all .html and .htm pages, even those that do not contain any includes. This can significantly, and unnecessarily, slow down the loading of pages without includes.

Magic Trick No 5: Keep Unwanted Users Out

You can ban users by IP address or even ban an entire range of IP addresses. This is pretty drastic action, but if you don't want them, it can be done very easily.

Add the following lines:

order allow,deny
deny from 123.456.78.90
deny from 123.456.78
deny from .aol.com
allow from all

The second line bans the IP address 123.456.78.90, the third line bans everyone in the range 123.456.78.1 to 123.456.78.999 and so is much more drastic. The fourth line bans everyone from AOL. A somewhat excessive display of power perhaps!

One thing to bear in mind here it that banned users will get a 403 error - "You do not have permission to access this site", which is fine unless you have configured a custom error for this page which in fact appears to let them in. So bear that in mind and if you are banning users for whatever reason make sure your 403 error message is a dead end.

Magic Trick No. 6: Prevent Linking to Your Images

The greatest and most irritating bandwidth leech is having someone link to images on your site. You can foil such thieves very easily with .htaccess. Copy the following into your .htaccess file:

RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?domain.com/.*$ [NC]
RewriteRule \.(gif|jpg)$ - [F]

You don't need to understand any of that! Just change 'domain.com' to the name of your domain.

(Again each command should be on a single line. There are 4 lines above, each starting with 'Rewrite')

If you want to really let them know they have been rumbled why not make an image like the one below (or take this one if you like)

call it stealing.gif, save it to your images file and add the following line after the code above:

RewriteRule \.(gif|jpg)$ http://www.domainname.com/images/stealing.gif [R,L]

(The above command should be on a single line)

Magic Trick No 7: Stop the Email Collectors

While you positively want to encourage robot visitors from the search engines, there are other less benevolent robots you would prefer stayed away. Chief among these are those nasty 'bots that crawl around the web sucking email addresses from web pages and adding them to spam mail lists.

RewriteCond %{HTTP_USER_AGENT} Wget [OR]
RewriteCond %{HTTP_USER_AGENT} CherryPickerSE [OR]
RewriteCond %{HTTP_USER_AGENT} CherryPickerElite [OR]
RewriteCond %{HTTP_USER_AGENT} EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ExtractorPro
RewriteRule ^.*$ X.html [L]

Note that at the end of each line for a named robot there appears an '[OR]' - don't forget to include that if you add any others to this list.

This is by no means foolproof. Many of these sniffers do not identify themselves and it is almost impossible to create an exhaustive list of those that do. It's worth a try though if it even keeps some away. The above as as many as I could find.

....and Finally

There is one very important area of the .htaccess file's use that we have not really mentioned and that is its use for user authentication. It is perfectly possible to configure your .htaccess files by hand to control access to directories on your site, but this is rarely necessary.

In most cases your host will provide a method to allow you to much more easily configure the file from your hosting control panel and there are a myriad of Perl scripts that will allow you to set up full user management systems by harnessing the power of .htaccess.

.htaccess - Part 1

If your site is hosted on a Unix or Linux server which runs Apache, you may already be familiar with your .htaccess file.

But that is far from the whole story! In this article we will look at some of the other things that this powerful little file can do. In part two we have 7 Magic Tricks that you can perform with .htaccess, but first let's have a look at the file itself.

What is the .htaccess file?

The .htaccess file is a text file which resides in your main site directory and/or in any subdirectory of your main directory. There can be just one, there can be a separate one in each directory or you may find or create one just in a specific directory.

Any commands in a .htaccess file will affect both the directory it is in and any subdirectories of that directory. Thus if you have just one, in your main directory, it will affect your whole site. If you place one in a subdirectory it will affect all the contents of that directory.

Some Important Points

Windows does not use the .htaccess system. I believe there are ways of doing the things .htaccess does on Windows servers but that is a story for another day and I am afraid I will not be telling it - it just isn't as simple or as elegant as the way Apache manages things in my humble opinion! So unless you are on a Linux/Unix server, this article is no good to you. Sorry.

A warning you will commonly see is that changing the .htaccess file on a server that has FrontPage extensions installed will at best not work and at worst make a complete mess of your extensions. I have to say that this has not been my experience and I have done a fair bit of messing with .htaccess files on FrontPage sites, including using .htaccess for authentication. However do any of these things at your own risk - I cannot be responsible for any harm they might cause.

Your host may not support alteration of the .htaccess file; either contact them first and ask before you make changes or proceed with caution and be sure you have a backup of the original file in case of problems.

Oh! And none of the 'Magic Tricks' described in this article are either magic or tricks. They just seem that way!

Working With Your .htaccess File

Sometimes the first problem is finding it! When you FTP to your site the .htaccess file is generally the first one displayed in a directory if it exists.


Some servers are configured to hide files whose names begin with a period. Your FTP client allows you to chose to display these files. In WS_FTP you can do this by entering -la or -al as indicated in the image on the left and then clicking Refresh. Other clients may use a different method - check the help files in yours.

Editing should be done in a text editor, such as NotePad. You should not edit .htaccess files in editors such as FrontPage. The best thing to do is download a copy of your .htaccess file to your computer, edit it, and upload again, remembering to save a copy of the original in case of errors.

If you do not already have a .htaccess file you can create one in NotePad, it is just a simple text file. However when saving it to the server you may need to rename it from .htaccess.txt to just .htaccess. The two are NOT the same. In fact .htaccess is an extension - to a file with no name!

It is very important when entering commands in your file that each is entered on a new line and that the lines do not wrap. If you find that when you paste any of the commands in this article into your file that the lines are not breaking or are wrapping you will need to correct this.

You must upload and download your .htaccess file in ASCII mode, not BINARY.

So, What about the Magic Tricks? Read on!

Cascading Style Sheats (CSS) for Beginners - Part 3

Using CSS for Text Formatting

For the purposes of this part of the tutorial I am just going to assume you have absorbed the basic rules and terminology from the previous part - if you haven't, go back and review it now!

While CSS can control just about every aspect of the style and layout of your pages we are going to concentrate for now on the text only as this is the most useful and easy to apply aspect.

We are all accustomed to using different font, in various sizes, colors and combinations when designing our pages. The difficulty in pages where CSS are not used is that making a small change to, say the size of Heading 3 or the colour of visited links can mean having to go through each page individually. Where linked style sheets are used one page - the style sheet - is all that needs to be changed.

Exercise

1. Create a new web in FP. When it is open click File>New Page. Choose the 'Style Sheets' tab and then choose 'Normal Style Sheet'.

2. Paste the style sheet below into that page and save the file as 'site.css'.

P {font-family: Arial, Helvetica, sans-serif; font-weight: normal; color: blue;}

H1 {font-family: Verdana, Helvetica, sans-serif; font-weight: bold; color: green;}

H2 {font-family: Verdana, Helvetica, sans-serif; font-weight: bold; color: red;}

H3 {font-family: Verdana, Helvetica, sans-serif; font-weight: bold; color: #CC0099;}

A:link {text-decoration: underline; color: #ff3399;}

A:hover {text-decoration: none; color: #990000;}

A:visited {text-decoration: underline; color: #3399FF;}

3. Now insert the following into the section of a normal page in an FP site.

<link rel="STYLESHEET" type="text/css" href="site.css">

This is the method used to link a HTML file to a style sheet.

4. In normal view in this document type three headings with short paragraphs below them - any old rubbish will do! Highlight the headings in turn and select 'Heading 1' from the drop down menu for the first, 'Heading 2' for the second etc. Now create a link or two in your paragraphs. Save this page.

5. Pretty colourful eh? Now lets have a look at what is happening. This style sheet defines seven rules, one each for the HTML tags

,

,

,

and three for three separate classes of : , and

You can have a look at the stylesheet and the sample page and copy them to practice with if you like.

Each of the rules defines - using selectors and definitions - the font family, font weight and font style in which any text surrounded by those tags should be rendered on the page.

Play with this a bit, changing things in the style sheet to see the effect it has on your page.

Inheritance

You may have noticed that no font family or font weight is specified for any of the classes of a. This is because the links are appearing within a paragraph and anything that is specified for the

or paragraph tags will apply to them automatically. This is referred to as 'inheritance' and the

tag in this case is referred to as the 'parent tag'.

The only selectors that need to be defined in this case are those which either do not appear in the parent tag or which differ from the parent tag.

Linked or Embedded?

In the exercise above the style sheet is a linked one. You could just as easily paste exactly the same sheet between tags in the head of a page. However then it would only apply to that page and would need to be individually included in each page to which it applied. It would also need to be changed on each page if you wanted to vary the style of any tag. This means you would miss out on the real power of CSS.

It is best practice to use a linked style sheet for your web saving embedded ones for variations that you only want included in one page or in a group of pages. Remember the cascade: an embedded sheet will override a linked one.

Selectors for Text Style

We are going to look in detail at each of the selectors commonly used for text, how they can be applied and at the browser compatibility issues that may affect them. During this the rest of this tutorial you might like experiment with some of these things by making alterations to the CSS file you created earlier and trying them out on the page.

A Note About Browser Support...

As a general rule you can assume that versions of NN prior to version 4 and of IE prior to version 4 have no support for CSS. Opera, from version 3.6, fully supports all these properties. Things are further complicated by the fact that there are different versions of each browser for Windows and Mac and these can sometimes vary in their support of CSS. That said you will see that, provided you take care, all these properties can be applied in a browser friendly way.

Ok lets go with the various selectors!

1. font-size

This is probably the property that causes most angst so we will take quite bit of time with it. You will note that I have not actually included it in any of the rules in the style sheet above. This is because I want you to experiment - add in bits to that style sheet as we go through this section and see how they behave.

Declarations: Font size can be defined in a number of different ways, which broadly divide into three methods.

Fixed Measurements - These could be:
- point size (pt)
- pixels (px)
- millimeters, centimeters or inches (mm, cm, in)

eg

h1 {font-size: 15px;}
h2 {font-size: 12pt;}

Using a fixed size had the advantage that you are reasonably sure how your text will appear to the end user. Probably the best choice from among these is pixels. Inches and centimeters etc are not at all reliable and should be avoided.

All suffer the same problems - there is varying cross browsers support and, more importantly, they take away the users choice in how text is rendered. We will see later how other methods can give the user increased choice.

Exercise: Try adding 'font-size:12 pt' to the p rule in the sample style sheet to see its effect. You can also add various sizes to each of the header rules.


Words - xx-small, x-small, small, medium, large, x-large, xx-large.

These are very dodgy and best avoided altogether! In most version of NN anything below small is unreadable. In any case they are a fairly blunt tool.

Relative Sizes - You can use relative sizes as either percentages or ems.

Percentages of what I hear you ask? Well there are two option here:

1. Define a base size:
So for example you could define text size using the following rule for

body {text-size:12 pt}

And these rules for heading 1 and paragraphs respectively

h1 {text-size: 150%}
p {text size: 90%}

In this case heading 1 text would be rendered larger than 12 points, paragraph text smaller.

2. Allow the user's browser setting to be the base size:
In this case you simply use percentages in your rules without defining a measured size anywhere. This is a very elegant solution as it means that if a short sighted user has their browser set to render large text, you will acknowledge this just as you will the preference of someone who has, by choice set their browser default to small.

About 'ems'
Ems are a measurement of the letter capital 'M' in the font size chosen by the user, so again using ems accommodates the users choice.

You can express the proportional size you wish the font to appear in as a fraction of ems - eg 0.5 ems or 1.2 ems. Ems take some getting used to but can allow very fine gradation of sizes.

Exercise: Try adding percentage values to some of the rules in the style sheet.

Note the following important points:

  • Size is inherited. So for example if you set 'p' text to 80% and also specify 80% in the 'a' rule also your link text will be smaller than text in the surrounding paragraph. It also means it is important that tags are closed, which unfortunately FrontPage sometimes neglects to do. So if you had a series of paragraphs, using p text at 80%, and the

    tags are missing, each paragraph will be 80% of the previous one until the text pretty quickly becomes unreadable.
  • The default browser settings, which many people never change, are different in NN and IE: IE default is medium, NN default is small. Thus you many be surprised to see that your pages look different when you view them in different browsers. Do not panic about this, you can never have absolute control over the way your pages are seen. The point is that by using percentages of what the viewer is accustomed to seeing all users should be comfortable with the way your page is rendered.
  • If undefined: Any size applied to a parent tag will apply (ie the parent style will be inherited).
    If you do not define size at all in your style sheet then the user's browser default sizes will apply. This may seem desirable and many purists would say it is as the web is intended to be. But the fact is that people do not often choose to configure their browser settings and they are accustomed to looking at pages where text size has been defined in some way. Using percentages as least acknowledges the efforts of those who, for one reason or another have chosen to use custom settings.
  • Browser Support: Patchy if measurements are used, best for pixels; poor where words are used. In general pretty good in the case of percentages.

2. font-family

Declarations: Any font family name can be used and any number of alternatives suggested. It is good practice to have as a final alternative a generic family, eg serif or sans-serif.

Example: p {font-family: Americana, Verdana, Helvetica, sans-serif;}

If undefined: Any style applied to a parent tag will apply (ie the style will be inherited). If none is available then the users own browser default setting will apply.

Browser Support: Fully supported by NN 4.0+ and IE 4.0+

3. font-style

Declarations: normal, italic. Example:

h3 { font-style:italic}

If undefined: Any style applied to a parent tag will apply (ie the parent style will be inherited). Otherwise 'normal' is assumed.

Browser Support: Fully supported by NN 4.0+ and IE 4.0+

4. font-weight

Declarations: There are two ways in which font weight can be defined:

a) normal, bold, bolder, lighter
bolder and lighter must be applied with reference to a parent with the property 'bold' applied.
b) Using numeric values: 100, 200, 300, …..800, 900. (Normal text has a value of 400).
These values can only be used where the font has a range of weight values built in. Relatively few fonts have - arial and veranda do.

Example:

p { font-weight:bold }
a.link {font-weight:bolder }

Where supported this would result in link text in a paragraph being bolder than the main paragraph text.

If undefined: Any style applied to a parent tag will apply (ie the parent style will be inherited). Otherwise 'normal' is assumed.

Browser Support: Both 4.0+ browsers fully support 'normal' and 'bold'. After that it gets a little complicated!

  • Netscape Navigator prior to version 4.7 does not support 'bolder' and 'lighter'. 100-500 is all drawn as normal. 600-900 is all drawn as bold.
  • Internet Explorer for Mac prior to 5.0 will render 100-400 as normal, 500-900 as all bold.
  • Internet Explorer for Windows prior to version 5.5 will render 100-500 as normal, 600-900 as increasingly heavy.

5. text-decoration

Declarations: none, underline, overline, line-through, blink

Example:

a:link { text-decoration: underline;}
a:hover { text-decoration: none;}

which would cause the underline on the link to disappear when the link is moused over.

If undefined: In most cases none is assumed; in the case of the selector a (which applies to everything within tags, ie links) underline is assumed.

Note that this value is not inherited.

Browser Support: These features are pretty well supported with the exceptions that IE does not support 'blink' and NN prior to 4.7 does not support 'overline'. Also note that 'a:hover' is not supported by NN, but using it will not cause a problem in that browser.

6. text-transform

Declarations: none, capitalize, upper-case, lower-case

What they do:
1. Capitalize capitalizes first character of each word
2. Uppercase capitalizes all characters of each word
3. Lowercase uses small letters for all characters of each word
4. None is self explanatory!

Example: h1 { text-transform:capitalize }
This Will Cause Each Word In Heading 1 To Be Capitalized

If undefined: Any style applied to a parent tag will apply (ie the parent style will be inherited). Otherwise 'none' is assumed.

Browser Support: Fully supported by NN 4.0+ and IE 4.0+

Cascading Style Sheats (CSS) for Beginners - Part 2

Constructing a Style Sheet

Style Sheets consist of code which browsers use to format the HTML in your document and present it to the viewer. While CSS is not complicated, as with all code there are a few basic terms to learn and rules to follow.

Terminology
New terminology is always confusing - but getting it right is vital. This next section introduces the terminology associated with CSS. If you get a bit confused at first please persist - get this lot into your head and the rest of CSS is a snip! You will be introduced to the following terms, as they apply to CSS:

  • Rule
  • Selector and Declaration
  • Property and Value
  • Class and ID

A simple Style Sheet

Let's look at a very simple style sheet, for this example an embedded Style Sheet - one that would be placed in the head of a document - which controls the appearance of Heading 1, or the tag

.

We will use this example for much of this section so have a careful look at it.



The opening and closing tags - simply tell the browser that this is an embedded style sheet.

Exercise
Create a new page in FrontPage now. Paste the style sheet above just over the tag of your page in HTML view (you may need to paste it into Notepad first to retain the formatting).

Now type a few words into the page in normal view and highlight them. From the drop down formatting menu choose Heading 1.

You should have a large, bold, red, statement.

A Few Words

Now, on to a closer look at how that was acheived.

Rules

A style sheet consists of a series of rules that will be interpreted by the browser to display the content of your page.

This particular Style Sheet has just a single rule which tells the browser how any text surrounded by the

tags should appear.

Selectors and Declarations

Each rule in a style sheet must have two components - a selector and a declaration.

H1, or the tag whose style is defined, is referred to as a selector. Any HTML tag can be a selector.

Every thing within the curly braces {} is referred to as the declaration.

Look very carefully at how it is written: the selector, followed by the declaration in curly braces:

H1 {font-family: Verdana, Helvetica, sans-serif; color: red; font-weight: bold;}

Properties and Values

The declaration in turn consists of a series of properties and their associated values.

In our simple style sheet three properties for H1 and their associated values are defined. The properties and their values are as below:

Property Value

font-family

Verdana, Helvetica, sans-serif

color red
font-weight bold


Again look very carefully at how the property and value are written: the property is always followed by a colon and the value or values by a semicolon.

H1 {font-family: Verdana, Helvetica, sans-serif; color: red; font-weight: bold;}

In most cases a property will have only one value - eg color: red, - but in the case of font-family it is common practice to have a series of values. In our example the style sheet tells the browser to look first for Verdana, if that is not there look for Helvetica, if that fails use any san serif font. Where several alternate fonts are specified like this a comma separates them.

font-family: Verdana, Helvetica, sans-serif;


Class and ID

As mentioned above any HTML tag can be used as a selector. But suppose you are looking for even more than that - for example suppose instead of defining one rule for H1 you would like to have 2 different H1 styles from which you could pick and choose at will. This is where Class comes in and it is one of the most powerful and important aspects of CSS.

In our example let's say we want to specify two different types of H1 style. We can alter our style sheet to read:



In this case 'two' is a class of H1, or an alternative style rule for H1.

Now we can choose from the two Heading 1 styles in our document by making a small change to the

tag.

If we want to use the first style we would simply select Heading 1 from the formatting menu to get this code:

This is a red, bold, headline.

and this effect:

This is a red, bold, headline

If we want to use the second style we can start by applying Heading 1 from the formatting menu, the switch to html view and make a small alteration to the code for that heading:

This is what we would find:

This is a green, normal, headline.

Now we simply need to apply the second style to this particular instance of the H1 tag and we do this by adding 'class=two' to the tag.

This is a green, normal, headline.

This is a green, normal, headline

If you take a look at the code on this page you will see that style sheet in action.

Naming Classes

You can use any title you like for your class: one, two, first, second, red, green, monkey, giraffe - literally anything. As long as you use the same title in your HTML tag it will be recognized.

Once again note carefully the way it is written.

In the Style Sheet the class is specified immediately after its selector and separated from it by a full stop (period). In the body tag you simply insert 'class=classname' after the name of the tag.

You can use the different classes as often and as much as you like in your documents. If you had a mind to there is no reason why you could not have 20 classes of H1, or of any other tag. This is where class differs from ID.

Using ID

Unlike classes an ID can only be used once. So suppose you are happy with your big, bold, red headline but on just one occasion in a page you want it to be different. In this case you might choose to use an ID. Your style sheet would now read:



Now on the one occasion where you wished to use the green headline you would alter the

tag to read:

This is a green, normal, headline.



Again note the format: ID is specified by putting '#' between the selector and the name of the chosen ID. Again you can give the ID any title you like.

Which to choose? Class or ID?
The primary difference between class and ID is that the former can be used as often as you like - you can only use ID once in a page.

So if you are creating a variation in style that you are going to use again and again then choose class, if you are just creating a one off variation then choose ID. It might seem that there is little advantage in bothering with ID but it does save a lot of fiddling with tags if you are only going to use the variation once. However classes are much more important, more useful and more widely used.

Important Note
You can only use Class and ID in embedded and linked sheets - not in local styles.

Cascading Style Sheats (CSS) for Beginners - Part 1

Most of us are familiar with the use of HTML tags as a means of formatting our pages. For example we can use this tag:

Arial 12 point

to create Arial 12 point text, whether we apply it by hand or using FrontPage.

However individually applying styles to each element of your page in this way is laborious, prone to error and creates bulky code. There is another way: welcome to CSS, Cascading Style Sheets!!

What is a style sheet?

Print publications such as newspapers and magazines have long used style sheets - sets of rules governing page formatting, typefaces, nature of headlines etc - to give them a consistent and coherent appearance. Similarly in HTML documents style sheets can control text formatting, colors, image placement and a myriad of other features that determine how our pages look.

Why Bother with them?

1) Consistency: Style sheets are easy to construct and can be readily applied to all the pages in your site, ensuring continuity of style throughout.

2) Convenience: If you have used style sheets to format your pages making changes to the format of your entire site, or to individual pages, can be as simple as making a small alteration to your style sheets or the way in which they apply.

3) Consideration: Style sheet rules can be applied in a manner that allows viewers to see your site in a way that suits them. We will look at this in more detail later.

4) Compulsion!: the use of inline tags to define style – such as the tag example given above – will in the future be deprecated and we will be required to use style sheets to format our pages, so we may as well start now.

What about Browser Compatibility?

Many people are put off using style sheets because they have heard or read about poor cross browser support for them. While it is true that there are differences in the degree and nature of support given to CSS, it is also true that very many features enjoy full cross browser support and in many other cases it is possible to find work rounds.

As with all other of web design the trick is in knowing what browsers will do with your code and then applying it accordingly. Throughout this series of tutorials we will look at the issue of browser compatibility wherever it is a consideration.

CSS Basics

For the moment let's forget about how style sheets are structured and just have a look at how they are applied.

There are three different ways to apply style sheets:

1. Inline Styles
You can apply a style to any individual html tag in a page. This is something you already do to some degree all the time. For example, you can easily change the text colour of an individual paragraph in your page. In so doing you are applying a style to that portion of the page – or an ‘Inline Style’.

2. Embedded Style Sheets
An embedded style sheet controls the formatting of an individual page. The Style Sheet is placed between the and tags.

If you have used FrontPages facility to apply a ‘rollover effect’ to text links you will have placed an embedded style sheet in the head of your document which will look something like this:

This causes your hyperlinks, when in the hover state, to appear red and bold.

In the case of CSS the effect it the same but formatting is a little different and the same style sheet would look like this:

3. Linked Style Sheets
This is where the real power of style sheets is found. You can create a stand alone set of instructions for how things should appear on your page – a Style Sheet - and then link it to as many pages of your site as you wish.

This single style sheet will control the appearance of all the pages and making a change to it will affect all pages at once.

What about the ‘Cascading’ bit?

This is a term that seems to cause an immense amount of confusion but really need not do so. The thing to remember is that you can use each of the three different types of style sheet, as defined above, in the same web site. Which one will be applied in any situation is governed by a set of priorities:

- In line styles take precedence
- Embedded styles come next
- Where neither an inline not an embedded style is used
the linked style will apply

This is referred to as the ‘cascade’.

Let's look at it another way. Suppose you have a web site that includes a style sheet to which all pages are linked, giving it a nice professional and consistent appearance. But you have a single page that you want to appear radically different from the others. You can simply place an embedded style sheet in the head of this page and that style sheet will override the linked one on that page.

Now you look at this page and it is fine – except for one small section that, again, you would like to appear in a different format. Simply apply an inline style and this will, in turn, over ride the embedded one. Hurrah! Your style sheets are cascading!!

Creating and Using a robots.txt File

Creating and Using a robots.txt File
FrontPage Newsletter Article July 2002

In this article we will take a look at how you can create an effective robots.txt file for your site, why you need one and at some tools that can help with the job.

What on Earth is a robots.txt File?

A robots.txt is a file placed on your server to tell the various search engine spiders not to crawl or index certain sections or pages of your site. You can use it to prevent indexing totally, prevent certain areas of your site from being indexes or to issue individual indexing instructions to specific search engines.

The file itself is a simple text file, which can be created in Notepad. It need to be saved to the root directory of your site, that is the directory where your home page or index page is.

Why Do I Need One?

All search engines, or at least all the important ones, now look for a robots.txt file as soon their spiders or bots arrive on your site. So, even if you currently do not need to exclude the spiders from any part of your site, having a robots.txt file is still a good idea, it can act as a sort of invitation into your site.

There are a number of situations where you may wish to exclude spiders from some or all of your site.

  1. You are still building the site, or certain pages, and do not want the unfinished work to appear in search engines
  2. You have information that, while not sensitive enough to bother password protecting, is of no interest to anyone but those it is intended for and you would prefer it did not appear in search engines.
  3. Most people will have some directories they would prefer were not crawled - for example do you really need to have your cgi-bin indexed? Or a directory that simply contains thank you or error pages.
  4. If you are using doorway pages (similar pages, each optimized for an individual search engine) you may wish to ensure that individual robots do not have access to all of them. This is important in order to avoid being penalized for spamming a search engine with a series of overly similar pages.
  5. You would like to exclude some bots or spiders altogether, for example those from search engines you do not want to appear in or those whose chief purpose is collecting email addresses.

The very fact that search engines are looking for them is reason enough to put one on your site. Have you looked at your site statistics recently? If your stats include a section on 'files not found', you are sure to see many entries where search engines spiders looked for, and failed to find, a robots.txt file on your site.

Creating the robots.txt file

There is nothing difficult about creating a basic robots.txt file. It can be created using notepad or whatever is your favorite text editor. Each entry has just two lines:

User-Agent: [Spider or Bot name]
Disallow: [Directory or File Name]

This line can be repeated for each directory or file you want to exclude, or for each spider or bot you want to exclude.

A few examples will make it clearer.



1. Exclude a file from an individual Search Engine

You have a file, privatefile.htm, in a directory called 'private' that you do not wish to be indexed by Google. You know that the spider that Google sends out is called 'Googlebot'. You would add these lines to your robots.txt file:

User-Agent: Googlebot


Disallow: /private/privatefile.htm

2. Exclude a section of your site from all spiders and bots

You are building a new section to your site in a directory called 'newsection' and do not wish it to be indexed before you are finished. In this case you do not need to specify each robot that you wish to exclude, you can simply use a wildcard character, '*', to exclude them all.

User-Agent: *
Disallow: /newsection/

Note that there is a forward slash at the beginning and end of the directory name, indicating that you do not want any files in that directory indexed.

3. Allow all spiders to index everything

Once again you can use the wildcard, '*', to let all spiders know they are welcome. The second, disallow, line you just leave empty, that is your disallow from nowhere.

User-agent: *
Disallow:

4. Allow no spiders to index any part of your site

This requires just a tiny change from the command above - be careful!

User-agent: *
Disallow: /

If you use this command while building your site, don't forget to remove it once your site is live!

Getting More Complicated

If you have a more complex set of requirements you are going to need a robots.txt file with a number of different commands. You need to be quite careful creating such a file, you do not want to accidentally disallow access to spiders or to areas you really want indexed.

Let's take quite a complex scenario. You want most spiders to index most of your site, with the following exceptions:

  1. You want none of the files in your cgi-bin indexed at all, nor do you want any of the FP specific folders indexed - eg _private, _themes, _vti_cnf and so on.
  2. You want to exclude your entire site from a single search engine - let's say Alta Vista.
  3. You do not want any of your images to appear in the Google Image Search index.
  4. You want to present a different version of a particular page to Lycos and Google.

  1. (Caution here, there are a lot of question marks over the use of 'doorway pages' in this fashion. This is not the place for a discussion of them but if you are using this technique you should do some research on it first.)

Let's take this one in stages!

1. First you would ban all search engines from the directories you do not want indexed at all:

User-agent: *
Disallow: /cgi-bin/
Disallow: /_borders/
Disallow: /_derived/
Disallow: /_fpclass/
Disallow: /_overlay/
Disallow: /_private/
Disallow: /_themes/
Disallow: /_vti_bin/
Disallow: /_vti_cnf/
Disallow: /_vti_log/
Disallow: /_vti_map/
Disallow: /_vti_pvt/
Disallow: /_vti_txt/

It is not necessary to create a new command for each directory, it is quite acceptable to just list them as above.

2. The next thing we want to do is to prevent Alta Vista from getting in there at all. The Altavista bot is called Scooter.

User-Agent: Scooter
Disallow: /

This entry can be thought of as an amendment to the first entry, which allowed all bots in everywhere except the defined files. We are now saying we mean all bot can index the whole site apart from the directories specified in 1 above, except Scooter which can index nothing.

3. Now you want to keep Google away from those images. Google grabs these images with a sperate bot from the one that indexes pages generally, called Googlebot-Image. You have a couple of choices here:

User-Agent: Googlebot-Image
Disallow: /images/

That will work if you are very organized and keep all your images strictly in the images folder.

User-Agent: Googlebot-Image
Disallow: /

This one will prevent the Google image bot from indexing any of your images, no matter where they are in your site.

4. Finally, you have two pages called content1.html and content2.html, which are optimized for Google and Lycos respectively. So, you want to hide content1.html from Lycos (The Lycos spider is called T-Rex):

User-Agent: T-Rex
Disallow: /content1.html

and content2.html from Google.

User-Agent: Googlebot
Disallow: /content2.html

Summary and Links

Writing a robots.txt file is, as you have seen, a relatively simple matter. However it is important to bear in mind that it is not a security method. It may stop your specified pages from appearing in search engines, but it will not make them unavailable. There are many hundreds of bots and spiders crawling the Internet now and while most will respect your robot.txt file, some will not and there are even some designed specifically to visit the very pages you are specifying as being out of bounds.

For those who would like to know more here are some resources you may find useful.

robots.txt File Generators

I think it may be easier to write your own file than use these but for those who would like to have their robots file generated automatically there are a couple of free online tools that will do the trick for you.

Tuesday, March 31, 2009

April 1 Conficker Virus

Despite security Relevant Products/Services analysts insisting that April 1 is only a red herring, the Conficker malware Relevant Products/Services hype keeps growing as April Fools' Day approaches. Indeed, the doom and gloom is persisting even as security researchers offer a voice of reason.

The worm first appeared in late November, exploiting a vulnerability in Microsoft Relevant Products/Services Windows to spread unhindered on local area networks. Its goal is to install rogue software on infected computers. Microsoft issued a patch for the vulnerability, but users that haven't installed it are open for infection as the worm spreads through portable USB flash drives.

As the speculation grows around Conficker, also known as the Downadup worm, Symantec and its Conficker Working Group partners continue researching the possibilities of the April 1 fallout from a worm that wreaked havoc on millions of computers earlier this year. So far, Symantec has determined three facts that it is sharing.

Symantec Sets the Record Straight

First, Symantec has determined that on April 1, W32.Downadup.C, the most recent variant of the malware also known as Conficker, will begin to use a new algorithm to determine what domains to contact. No other actions have been identified to take place on April 1.

Second, Symantec said it's possible that systems infected with W32.Downadup.C will be updated with a newer version of the malware on April 1 by contacting domains on the new domain list. However, the security company noted, these systems could be updated on any date before or after April 1, as well by using the peer-to-peer updating method found in W32.Downadup.C.

Third, Symantec said, the public should not be alarmed. However, as always, computer users should exercise caution and implement security best practices into their daily computing routines.

The worm certainly is an issue of concern, but the probability of a major Downadup-related cyber event on April 1 is not likely, according to Vincent Weafer, vice president of Symantec Security Response.

"In reality, the author or authors of Downadup probably didn't intend for this malware to get as much attention as it has. Most malware these days is designed to be used for some type of criminal monetary gain, and conducting such criminal acts typically requires stealth measures to be successful," Weafer said. "As such, this makes the odds that a major event will take place on April 1 even less likely, since there is so much attention being paid to that day."

What Should We Expect?

McAfee said we don't know the intent of the authors of the Conficker worm, but one thing is certain: They have consistently improved the worm by adding new functionality and anti-debugging tricks with every released variant.

"In order to resist the Conficker cabal initiative, which recently blocked domain registrations associated with previous Conficker A and B variants, the worm authors upped the randomly generated domain count from 250 to 50,000," said Vinoo Thomas, a security researcher at McAfee. "The intent behind generating and attempting to contact so many domains is to make it extremely difficult for security researchers to monitor sites that could potentially host a payload for the Conficker worm to download and execute."

Security firms advise home users to make sure their security software is up to date with the latest antivirus Relevant Products/Services signatures and to enable their systems' automatic security updates. On the enterprise Relevant Products/Services front, Symantec recommends that companies continue to deploy all critical security patches, ensure their security software is up to date, clean any systems that are infected with any version of Downadup using the available removal tools and guidance provided, and evaluate additional security best practices in accordance with their organizations' policies and procedures.

Source: newsfactor.com