Google is now reported to be using html language on its blogs and other Google sites to block non-Google search engines. Google claims a business motto of “Don’t Be Evil.” (*See the HTML that is used to block search engines at the bottom.)
Google is automatically evil. To most Arabs and fellow anti-Semites Google in Hebrew is evil; to the Chinese government the Google listings of Falun Gung’s websites is evil, to the North Korean government everything about Google news is evil and to Fidel Castro and his fellow Latin despots all Google data searches that show the economic outcomes of communism are evil.
But a corporation is not a moral institution so my complaint is about Google’s managerial stupidity. Blocking other search engines for the presumed benefit of Google does three things: it encourages retaliation, it will destroy Google’s positive reputation in a short time when the press understands it and it will dissuade other companies and good employees from joining Google in its ventures.
*Jason Ducek: “Search engines....
*Jason Ducek: “Search engines use 'web robots' or 'spiders' to mirror the web into
their own private mini-copy of the web and that's where they actually
perform the searches.
“The robots.txt file is placed in any folder the contents of which you
do not want any robot to snarf up. this comes from the early days of
the web, when spiders we're getting into all kinds of trouble. a
spider's job is to visit every link that it finds on every page on a
site, and download everything on every page. sometimes a spider would
visit a university website and meander into a section of the site with
lots of data, like a part of the site full of astronomy data or
something like that. the spider would be stuck there for days,
downloading megs and megs of stuff while simultaneously clogging the
network for legitimate users.
“The robots.txt file is supposed to be acknowledged by all robots
whenever they see it, although there are evil robots that regularly
ignore it. the major search engines in general do not ignore the
robots.txt file.
about the code in particular:
User-agent: *
# the star means 'any'. this line reads "no matter what kind of
robot
# you are, this file applies to you!"
Disallow: /base
# anything on this server under '/base', which is to say any URL
# beginning with 'http://base.google.com/base', is not to be
# read by the robot.”