iExpertsForum Community
May 25, 2013, 11:11:46 PM *
Welcome, %1$s. Please login or register.

 
      
  Print  

Author Topic: Robots.txt?.....  (Read 2082 times)

chang

  • Newbie
  • *
  • Posts: 23
    • View Profile
Robots.txt?.....
« on: June 04, 2009, 05:08:42 AM »
Hi
Is it necessary to put robots.txt file in the root directory of website, i know it is helpful to control crawlers, but if i not put this file in the website, does this cause some problem, please advice
Thanks for your support

dave denz

  • Newbie
  • *
  • Posts: 19
    • View Profile
Re: Robots.txt?.....
« Reply #1 on: June 04, 2009, 05:52:03 AM »

No, a robot.txt file is not needed or required. It can be helpful when used properly, but it can also cause problems when it is used and not configured properly.

nickdnerd

  • Newbie
  • *
  • Posts: 17
    • View Profile
Re: Robots.txt?.....
« Reply #2 on: June 04, 2009, 06:17:40 AM »
Robots.txt is not compulsory..

Good bots (spiders/ crawlers) generally follow robots.txt file. So you can control how much bandwidth you allocate to such new spiders/ crawlers.

Although there are bad bots out there (like email harvesters) which will not follow your robots.txt rules.

Not to mention your competitors which will look at your robots.txt and try to see which folders you are hiding from crawlers.

So ideally, if you are not concerned about your website bandwidth then you don't need to worry about Robots.txt .

On the other hand, if you are a control freak (like me) then setup a robots.txt file with specific permissions. Set a honey trap inside the robots.txt which catches malicious bots/ users who do not follow your robots.txt and automatically add their ip addresses to your .htaccess file.

hope it helps..

madjack

  • Newbie
  • *
  • Posts: 22
    • View Profile
Re: Robots.txt?.....
« Reply #3 on: June 04, 2009, 06:30:58 AM »
Hi

It is not necessary.If you want to exclude some file and directory to protect your valuable information form malicious user then you can create a robot.txt file to inform crawler to not crawl this web address.

sam

  • Newbie
  • *
  • Posts: 45
    • View Profile
Re: Robots.txt?.....
« Reply #4 on: June 04, 2009, 06:54:19 AM »
Although robots.txt isn't required, Google encourages it and provides a tool to create a Google friendly one in their Google Webmaster Tools.

kamal ajmal

  • Newbie
  • *
  • Posts: 14
    • View Profile
Re: Robots.txt?.....
« Reply #5 on: June 05, 2009, 12:17:35 AM »
One thing to keep in mind is that google doesnt like error pages. This should be reason enough to create a robots.txt and fav.ico file.

emily vance

  • Newbie
  • *
  • Posts: 16
    • View Profile
Re: Robots.txt?.....
« Reply #6 on: June 05, 2009, 12:25:01 AM »
I agree with you kamal...
A basic and minimal robots wil stop your logs being flooded with 404`s

jane brooke

  • Newbie
  • *
  • Posts: 27
    • View Profile
Re: Robots.txt?.....
« Reply #7 on: June 05, 2009, 01:12:26 AM »
You should not rely on robots.txt file only.
I had some cases that while google visiting the pages
the server didn't show robots file, or show it blank.
So the spider seeing no restrictions indexed everything.
If you want to restrict the whole content, yes, you should put meta tags to all the pages in the header.

BTW in webmasters there is an option to remove pages from an index.

jeremy

  • Newbie
  • *
  • Posts: 42
    • View Profile
Re: Robots.txt?.....
« Reply #8 on: June 05, 2009, 03:26:15 AM »
Even if you go into your google webmaster tools and request that the pages be removed - it clearly states that unless you take measures to stop the indexing, the pages will get indexed again. So even if you ask google to remove the page, its still going to be indexed again.

You could use htaccess and htpasswd to password protect the directory. Just do some google searches for htpasswd and you should be able to find something. I'am using this same method to protect a backup script for my sites.

Once you have the file in place, when someone goes to view the site, or any directory that is under the .htaaccess file, they will be promoted for a username and password.

joshua

  • Newbie
  • *
  • Posts: 36
    • View Profile
Re: Robots.txt?.....
« Reply #9 on: June 05, 2009, 03:57:16 AM »
You don't use robots.txt for SEO. It's used to block parts of your website from being crawled.

Kelly

  • Newbie
  • *
  • Posts: 37
    • View Profile
Re: Robots.txt?.....
« Reply #10 on: June 05, 2009, 04:23:45 AM »
If you are blocking pages with robots.txt then those pages will be dropped from the index. If those pages are not in their index and they have links on them to any of the 20 pages that are NOT blocked then those 20 pages that are not blocked will not get credit for those inbound links & link text and those other pages on your site cannot pass your 20 indexed pages PR.

IMO it is a very bad idea to keep the engines from indexing the majority of your site... You could also possibly pick up long tail traffic from some of the other pages.

Gibraltar

  • Newbie
  • *
  • Posts: 24
    • View Profile
Re: Robots.txt?.....
« Reply #11 on: June 05, 2009, 05:13:12 AM »
.If you will use robots.txt then it's stop the crawler to crawl your page where u used robots.txt

nickdnerd

  • Newbie
  • *
  • Posts: 17
    • View Profile
Re: Robots.txt?.....
« Reply #12 on: June 05, 2009, 06:25:37 AM »
Work with Google webmaster tool is the best ways for new sites to indexed....

  Print  
 
 

Powered by MySQL Powered by PHP SMF 2.0.3 | SMF © 2011, Simple Machines Valid XHTML 1.0! Valid CSS!