JustArticles.net Article Directory
Translate Page To German Tranlate Page To Spanish Translate Page To French Translate Page To Italian Translate Page To Japanese Translate Page To Korean Translate Page To Portuguese Translate Page To Chinese
  Number Times Read : 26    Word Count: 552  
Categories

Arts & Entertainment
Business
Communications
Computers
Disease & Illness
Fashion
Finance
Food & Beverage
Health & Fitness
Home & Family
Internet Business
Politics
Product Reviews
Recreation & Sports
Reference & Education
Root Category
Self Improvement
Society
Travel & Leisure
Vehicles
Writing & Speaking
 
Stats
Total Articles: 25
Total Authors: 33982
Total Downloads: 9174753


Newest Member
Mast Holiday

 


   

The Robots.txt protocol



[Valid RSS feed]  Category Rss Feed - http://justarticles.net/rss.php?rss=101
By : Sachin Garg    19 or more times read
Submitted 2009-10-09 04:10:18
The Robots.txt protocol, also called the “robots exclusion standard” is designed to lock out web spiders from accessing part of a website. It is a security or privacy measure, the equivalent of hanging a “Keep Out” sign on your door.

This protocol is used by web site administrators when there are sections or files that they would rather not be accessed by the rest of the world. This could include employee lists, or files that they are circulating internally. For example, the White House website uses robots.txt to block any inquiries on speeches by the Vice President, a photo essay of the First Lady, and profiles of the 911 victims.

How does the protocol work? It lists the files that shouldn’t be scanned, and places it in the top-level directory of the website. The robots.txt protocol was created by consensus in June 1994 by members of the robots mailing list (robots-request@nexor.co.uk). There is no official standards body or RFC for the protocol, so it’s difficult to legislate or mandate that the protocol be followed. In fact, the file is treated as strictly advisory, and does not have absolute guarantee that those contents won’t be read.

In effect, robot.txt requires cooperation by the web spider and even the reader, since anything that is uploaded into the internet becomes publicly available. You aren’t locking them out of those pages, you are just making it harder for them to get in. But it takes very little for them to ignore these instructions. Computer hackers can also easily penetrate the files and retrieve information. So the rule of thumb is—if it’s that sensitive, it shouldn’t be on your website to begin with.

Care, however, should be taken to ensure that the Robots.txt protocol doesn’t block the website robots from other areas of the website. This will dramatically affect your search engine ranking, as the crawlers rely on the robots to count the keywords, review metatags, titles and crossheads, and even register the hyperlinks.

One misplaced hyphen or dash can have catastrophic effects. For example, the robots.txt patterns are matched by simple substring comparisons, so care should be taken to make sure that patterns matching directories have the final '/' character appended: otherwise all files with names starting with that substring will match, rather than just those in the directory intended.

To avoid these problems, consider submitting your site to a search engine spider simulator, also called search engine robot simulator. These simulators—which can be bought or downloaded from the internet— use the same processes and strategies of different search engines and give you a “dry run” of how they will read your site. They will tell you which pages are skipped, which links are ignored, and which errors are encountered. Since the simulators will also reenact how the bots will follow your hyperlinks, you’ll see if your robot.txt protocol is interfering with the search engine’s ability to read through all the necessary pages.

It’s also important to review your robot.txt files, which will enable you to spot any problems and correct them before you submit them to real search engines.
Author Resource:- Did you find this article useful? For more useful tips and hints, points to ponder and keep in mind, techniques, and insights pertaining to Internet Business, do please browse for more information at our websites.
http://www.allhottips.com
http://www.bookstoretoday.com

Article From Just Articles Free Articles and Free Content

HTML Ready Article. Click on the "Copy" button to copy into your clipboard.




Firefox users please select/copy/paste as usual
Rate This Article
Vote to see the results!

Do you like this article?
  • Yes.
  • Not Sure.
  • No.
New Members
select
Sign up
select
learn more
Affiliate Sign in
Affiliate Sign In
 
Nav Menu
Home
Login
Submit Articles
Submission Guidelines
Top Articles
Link Directory
About Us
Contact Us
Privacy Policy
RSS Feeds

Actions
Print This Article
Add To Favorites

 
Sponsors

Purchase this software