Singing Potatoes
Thursday, 20 November 2003
Rushing in where robots fear to tread

Take a look at the White House's robots.txt file. This is the file that prevents (well-behaved) search engines and archivers from reading particular files and directories.

I wonder, why don't they want public documents about Iraq to be indexed and archived? A lot of the Iraq-related directories they're blacklisting don't even exist. That seems a bit odd to me.

Addendum: When you see files numbered sequentially, but discover that certain numbers aren't linked on the index page, it's interesting what you can find when you go looking for the missing ones.

Posted by godfrey (link)
Comments
You wouldn't want the general public to know anything other than what the White House is officially saying now, would you? My goodness - you're asking people to think for themselves and maybe formulate their own opinion other than what Fox News is spewing out. It's outrageous, I tell you. People could hurt themselves actually reading the "truth" about what the government is doing.
But these are publicly accessible documents. Anyone can read them (well, except for the directories which don't exist).

The question is, why don't they want the documents to be easily located through Google, or stored by sites like the Internet Archive?

Maybe they want the public to actually get away from their collective computer and go to the library. This way the information has to be re-input to be spread around. It's a conspiracy.
It has to do with "the end of combat in Iraq." Yes, that's how they originally wrote it. They then changed it to "end of major combat," and denied that it was written any other way. So people pointed at the Google archives, which had the original.

If you don't have search, you don't have archives and you can rewrite history however you see fit.

Surely you don't believe our government would engage in historical revisionism, like we were the Union of Soviet Socialist Republics?

They did in this case; that's undeniable. They changed a four-month-old press release so it read "major combat" instead of just "combat." And then denied it was changed. This was documented in the Washington Post by Dana Milbank.

However, it does seem as though the robots.txt problem was just an error; see here for more info, and here for the response they gave an employee of the Internet Archive. Essentially, they said "just ignore robots.txt--we want you to archive us." Sorry I didn't have this info earlier--went looking for it last night but couldn't find it 'til this morning.
Well, as to the first link, why would it matter if Google gives people two links to a document instead of just one?

And as to the second, if they didn't actually want the prohibitions to be paid attention to, why put them in at all? Well, never attribute to malice what can adequately be explained by incompetence, I suppose.

(And yeah, I knew there'd been some revisionism. But that's just so nineteen years ago.)