Surfing Robots.txt Files eh?

This page is dedicated to my new pastime, surfing robots.txt files. Give yourself a pat on the back if you visited this page via the robots.txt file. Surfing robots.txt files for fun isn't exactly a mainstream past-time you have to admit, but hey if you're here then I'm not the only one right!

Okay then so what is this page about?

Well, I'm still working on that one. I have come to two main ideas though, the first is to make available information about cool stuff I've found while surfing these robots.txt files and how I came about it, the second is to include a few links to good resources on robots.txt files. The second point is mainly covered by robotstxt.org so I won't go into too much detail about that aspect of it at the moment.

Places to go, things to see.

If there is a site not allowing you to see their robots.txt file, why not do a bit of historical searching on the wayback machine, for example an old yahoo robots.txt can be found here Old Yahoo robots.txt.

Inspiration for this page

During discussions in the alt.www.webmaster usenet group I read and replied to a few posts about robots.txt file, while considering some of the replies I visited a few sites to try and find out what their robots.txt files looked like. Occasionally while doing this I came across a few interesting pages, I started looking for more and more pages to visit. Okay I admit it probably doesn't sound the most enthralling of hobbies does it? just try it though, you'll be seeing parts of websites many people don't even know are there, it's a strange feeling. Another thing that started me off on this I guess was coming across infiltration.org, it has nothing to do with computers but, just reading some of the stories made me want to do something a little different from the normal.

Become a part of it!

Go on you know you want too! Below are a few of my ideas for sharing discoveries and cool things we have found, but preserving the spirit of exploration and discovery.

How to take part

Genesis

Well first of all start your own disallowed section to put you discoveries! I am not going to turn this page into a comprehensive listing of cool sites people send in to me, make your own. You don't even have to put up discoveries, put up something cool that isn't available anywhere else on your site, we're going for exclusivity here.

Fundamental concepts

Navigating peoples hidden content isn't a Google job for starters, I don't remember hearing about Tim Berners-Lee inventing the search engine, hypertext is about linking. Found another webpage with hidden content like yours, why not link to it?

People who've joined in.

This is where I will put the addresses of peoples websites who've deliberately added content for other robots.txt surfers.

What should I NOT to put in a robots.txt file

Do not put folders you are trying to keep secure in a robots.txt file! It is for excluding robots which are trying to index your site, not for implementing security procedures. In many ways putting something in a robots.txt file is an invitation to unscrupulous web robots (those used by email harvesters for example) to visit the page, or geeks like me who are interested in doing that kind of thing.

Hey I'm an angry webmaster who's stuff you linked to!

Okay, chill out, anger's bad for your health. Send me an email requesting that your stuff be removed from the page.

By the way it would be helpful to remember that robots.txt is not a security process, it is an automated bot indexing process, if you want to keep parts of your site secure implement a security procedure.

[Note: My intent is not to visit cgi programs and the like, just cool stuff that webmasters don't want search engines to index, it is not neccesarily wrong for a person to visit this stuff, it's to exclude automated bots so they don't do recursive searches etc (thats why it's called robots.txt and not people.txt).]

Miscellaneous Links