Untrusted content, nofollow, etc.

This entry was published at least two years ago (originally posted on August 26, 2004). Since that time the information may have become outdated or my beliefs may have changed (in general, assume a more open and liberal current viewpoint). A fuller disclaimer is available.

Phil Ringnalda pointed to an idea that Ian Hickson just tossed out while brainstorming ways to battle the ever-increasing issue of comment spam.

I’m thinking that HTML should have an element that basically says “content within this section may contain links from external sources; just because they are here does not mean we are endorsing them” which Google could then use to block Google rank whoring. I know a bunch of people being affected by Web log spam would jump at that chance to use this element if it was put into a spec.

Personally, I’d love to be able to wrap the comments section of my individual entry pages in something like this — and actually, it reminds me a lot of a technique I used to use when I had my website running on my own webserver. At the time, I had a good number of pages that weren’t part of the weblog, so rather than using MovableType‘s built-in search engine, I used the Fluid Dynamics Search Engine (May 9 2019 update: This link is now dead and has been removed).

FDSE is a very solid system, and one of the things I liked was an extra FDSE-specific tag that allowed an author to designate sections of a page that the search engine would ignore when performing its page scan. In addition to respecting the standard meta tags of index, noindex, follow and nofollow for a full page, FDSE also allows you to use those tags within HTML comments to section off areas of a page that should be treated differently from the page as a whole.

For instance, on my individual entry archive pages, the only real important content as far as a search engine is concerned is the entry itself. As the sidebar in my design is repeated on every page on the site, there’s really no great reason for a search engine to include that text in the database for every page, so I would wrap the entire sidebar inside a noindex, nofollow declaration.

I’d also do the same for things like the TrackBack section headers that appear on every page. As they are repeated on every single archive page, trying to search for an actual discussion on TrackBack is nearly impossible — but when I was using the FDSE and hid that section header from the search engine, it was very easy for me find discussions about TrackBack, as FDSE was only indexing the actual content of each page, rather than every little bit of text that the page contained.

I’ve wished for a long time that Google either supported a way to do the same thing, or just adopted FDSE’s method. According to FDSE’s author, he submitted his technique to Google as a suggestion quite a few years ago, but nothing more was ever heard about that.

Maybe Ian’s suggestion will get something moving in this direction again. Here’s hoping, at least.

iTunes: “Never Say Never (Hot Tracks)” by Romeo Void from the album Edge, The Level 1 (1995, 5:47).