Getting in Google's good graces

One of the constant topics that many webmasters and webloggers are concerned with these days is Google, how to increase your site’s standing in Google’s eyes, and therefore drive more traffic to your site. I use a number of techniques on my weblog, both in the code and how I create entries, that help Google get the most useful information out of my pages.

While I’ve mentioned some in the past, the subject recently came up in a thread on the TypePad User Group, and I shared some of my methods in that thread. At the request of both Liza and Richard, who have also been posting about this topic, I’m re-posting my post (post-haste, though not post-mortem, and definitely not postpartum) here…

Still, I’m amazed to read that you had 1,000 per day BEFORE MS made you a web celeb (boo! to them). Do you think those hits came from your blogging subject or from special tactics you engaged in to increase your site traffic.

A little bit of both, probably.

First off, it’s not so much my subject, as my lack of subject. ;) Because I’ve never really focused on any specific topic for my blog, and just randomly babble about whatever crosses my mind, that gives Google a lot of potential keywords to pick up on.

Also, I’ve been at this for about three years now, so I’ve got a fairly large archive section, which also increases the probability of any given keyword turning up in a search.

As far as special tactics, there’s a few techniques I’ve picked up on over the years that seem to help (some of which you covered in your post).

  1. Descriptive headlines as a page title. The title of a webpage scores very highly in Google’s ranking scheme, so I generally try to make sure that my post titles are descriptive of what I’m posting about (“Lord of the Rings Trailer” rather than “This is cool!”), and I make sure that the post title is included in the page title.

    I believe that TypePad is set to include post titles in page titles for individual archives by default, but some weblog tools (including MovableType in its early stages, I believe, though I could be wrong) only include the site name for every page title, so instead of a site containing 1000+ differently named pages, you’d end up with a site containing 1000+ pages all named “My Weblog”, which doesn’t give Google nearly as much to work with.

  2. Setting a consistent structure for the code on each page. As HTML was designed to emulate (though not visually replicate) the structure of a printed document, it includes various structural elements such as various levels of heading. As Google pays attention to these when it scans a document, it often helps to use them correctly.

    In the past, rather than using the <h1>, <h2>, etc. elements for headlines, division markers, and so on, many sites would use <font> tags to give their subdivision headings the look they wanted. Now that the <font> tag has been deprecated and we can use CSS to style every element on a page the way we want, it’s good to return to using structurally correct markup. In addition to making a site much easier to code, it also assists Google in determining the structure, topic, and relevance of any given page.

    For each individual archive page on my site, I’ve structured it as follows:

    1. <title>: website name > post title

    2. <h1>: website name

    3. <h2>: website ‘tagline’

    4. <h3>: post title

    5. <p>: post body

    6. <h3>: trackback

    7. <h4>: trackback source

    8. <p>: trackback body

    9. <h3>: comments

    10. <h4>: comment author

    11. <p&>: comment body

    12. <h3>: comment posting form

    This gives each page a clearly delineated, easy to read structure that tells both the reader and Google which parts of the page are the most important and the most relevant to the topic of the page.

  3. Link descriptively. Simply, this involves using natural language for your links so that the link is descriptive to what it points to. For instance, saying “The new Lord of the Rings trailer is out!” instead of “You’ve gotta see this!” gives Google more information about what you’re linking to.

    This carries a double benefit, in that not only does it give Google better information about what you’re referencing, it also lets Google know more about what you’re linking to, which helps out whoever is on the target end of your link.

  4. Alt text on all images. This is important for a few reasons. First off, it lets Google know what each image is so that Google can include it more reliably in their image search feature. Secondly, though, and more importantly, it greatly improves the readability of your site for people with disabilities using specialized browsers to read the web.

    Blind users can use a “screen reader” to read websites — this is a specialized browser which translates the text to audio, and reads the page to them. Without alt text, all that screen reader can do is give them the name of the graphic, and might end up telling them something like “Image named funnypicture.jpg”. With alt text, they’ll instead hear something like “Image named Gimli falls off his horse”.

  5. Use the excerpt field to create useable descriptions. While keywords are no longer recognized by Google, another <meta> tag in the <head> section of your document still is (I think), which helps Google determine the topic of the page, and that’s the ‘description’ tag. What I’ve done is put this code into the <head> of each individual archive:

    <meta title="description" content="<$MTEntryExcerpt>" />

    I then make sure to take a moment to create an excerpt for each entry as I’m making it that relates to the topic of the post, rather than just relying on TypePad’s auto-generated excerpt (which generally just grabs the first n words of each post).

Anyway, there’s a few of the things I do which seem to help my site visibility. Mostly, though, I think a lot of it just boils down to the fact that after three years of babbling, I give Google a lot to work with. ;)

The trickiest zen on the menu

I wanted to take a moment to point out Pops’ domain, 2 Hour Lunch. I discovered his site at some point during the TypePad beta testing process, and he’s become one of my favorite reads. He’s got a wonderful writing voice, and it’s not at all uncommon for his posts to elicit grins or laughter.

Here’s a wonderful bit from this past week, taken from \”Creepy? Check! Kooky? Check!:

Kids?

Kids are a dump truck full o’ work.

Mr. Man is lively, academically gifted, and a first class nerd. He is endlessly curious and self-motivated. He’s a remarkable conversationalist if you’re over 30. Under 30 – you suspect he’s a midget. Testing has shown that Mr. Man has the reasoning and logic skills of some one 10 years older.

His school has called several times to say, “He has no social skills.” and I respond, “Well, no need for a paternity test then, is there?!?”

Silence ensues.

Mr. Man needed surgery when he was 18 months old. He’s been so sick we never thought he would ever get better again. Just by the mere fact that he is a new human being in this world, he has found no end of ways to scare the living shit out of us.

We knew that would happen.

We just didn’t know when or how.

Parenting is amazing. Parenting is torture. Parenting is like like any intense relationship you’ve ever had in your life. It drives away your future expectations and makes you live very much in the moment.

It’s the trickiest zen on the menu.

The two of us see all of it as a grand adventure sorta like the Jungle Boat Ride. We take turns making bad jokes and Mr. Man is the foreign tourist who doesn’t understand a word of it.

And every day we get out of bed and go forward from there.

Stop by and say hi.

Three Years

Today marks my three-year anniversary of weblogging. Technically, I’ve actually been at this for a bit longer than that — since sometime in 1998 or 1999 — but at that point I was just updating a static HTML page by hand, and much to my dismay, I lost my archives of those pages some time ago. So, for all practical purposes, I’m just dating back to my first archived post, from Nov. 25 2000.

I’ve been slowly working on moving all of my old archives over into my TypePad account for the past couple months, with a goal of having them all online by today. Thankfully, that happened, and I now have all three years of archives — 1,949 individual posts (an average of 1.78 posts per day) — online and available for perusing.

As I’ve worked my way through them all, I’ve highlighted a few at the top of my archives page as “Greatest Hits”. These aren’t necessarily the most-visited posts on the site. Rather, they’re posts that I find notable or especially worth visiting for one reason or another. Here’s a rundown of what I consider the highlights of the past three years:

1/9/2001: Words of Wisdom
One of the few pieces of forwarded e-mail that I’ve ever liked enough not just to keep, but to post. Just a good list of advice and observations worth keeping in mind.
1/17/2001: Things to remember while e-mailing
Another forward that I found worth saving. A good list of things that everyone should keep solidly in mind before passing on the latest virus warning, plea for help, or urban legend that lands in their e-mail inbox.
4/20/2001: About my tattoo
Some fairly bad pictures of me, but decent pictures of my tattoo. I’ve never been much of one for body modification — no piercings, and this is my only tattoo — but after finding the design years ago and giving it roughly five years of consideration, I decided that I’d found something worth permanently adding to my body.
5/24/2001: Mars needs a facelift!
Pure, unadulterated silliness. After finding some new pictures of the famous “face” on Mars, I decided to go all-out and see how well I could do at coming across as a flaming loony conspiracy theory nut. Apparently I did fairly well, as when I originally posted this on another discussion board, a few people commented that until the end of the post when I admitted that I was just fooling around, they actually believed that I was frightfully serious about what I was writing. There’s not much higher praise than that.
2/28/2002: Where were you?
A list of important historical dates, and my recollections of where I was when they happened and how they affected me. Some dates weren’t overly clearly remembered, most of the ones that really stuck with me range from the Challenger explosion to the Sept. 11 attacks.
3/2/2002: Hippies on Mars!
Another bout of Mars-inspired silliness. A false-color image of the Martian poles that reminded me of tye-dyed clothing patterns inspired this “press release” about Grateful Dead fans traveling across the plains of Mars. As far as I know it’s purely coincidental that I had two Mars-inspired bits of creative writing.
7/20/2002: Best of times, worst of times
Looking back at my experiences with people who went from being friends of mine to being roommates from hell. It’s always an odd time of my past to look back on, as it’s a strong combination of fond memories and things that at times I’d rather be able to forget.
10/28/2002: George
Much as I love cats, my brother’s cat George is the only cat I’ve ever met that I just couldn’t get along with. Completely and utterly psychotic. This is one of George’s more amusing moments in life.
3/2/2003: Sleep — from the painting by Salvadore Dali
A piece I wrote during my junior year of high school, inspired by a Salvadore Dali painting. As can be expected from something written around thirteen years ago, there are definitely things that I would do differently were I writing it now, but I’ve always liked what I came up with enough to leave it unaltered since its original inception.
3/3/2003: Just hang up
I’m not a fan of cell phones at all. I’ll only have one if required and paid for by my job, which has only happened once. One of the things that drives me up the wall is how incredibly rude many people can be when it comes to cell phones, and this rant was born from that frustration.
5/6/2003: Cynicism reigns supreme\
5/8/2003: Darwin has left the building
A pair of posts exploring one of my more cynical beliefs — that the human race is essentially throwing evolution out the window and breeding itself into oblivion. Some very interesting discussion arose out of these posts.
5/29/2003: Glitch
So far, my first foray into ‘fanfic’. Initially inspired by a dream I had after watching “Matrix: Reloaded”, it explores what might happen if someone accidentally tapped into a debugging routine in the Matrix without really realizing what was going on.
6/1/2003: Newly Digital (Back in the Day Redux)
My contribution to Adam Kalsey’s ‘Newly Digital’ project, looking back on my early experiences with computers, technology, and the internet, and some of the wierd and wonderful things I’ve seen over the years since these glowing screens first caught my attention.
7/9/2003: The Purity Test
I first discovered the Purity Tests on a BBS while I was in High School, and have always found them to be quite entertaining. Download the test (100, 500, 1000, and 2000 question versions) and find out just how morally, ethically, and sexually pure you are.
7/31/2003: Blogstop
Wordgame fun. Construct a post from the letters of the last word in the immediately preceeding post. It’s easier just to take a look and figure it out as you go.
10/29/2003: Fifteen Minutes of Fame
I look back on the first day or two of notoriety after news of my brush with Microsoft exploded across the ‘net.

I’m sure there are more goodies buried in my archives that are also worth dredging up from time to time. Some may be of more interest than these, most will be of less. These are just the ones that I find to be most worthy of calling attention to. If you’ve read any of them before, feel free to either just move along or take another look. If any of these are new to you, I just hope you like what you find.

Here’s hoping I’ve got another three years of this — or more — left in me.

The biggest cause of failure is success

Mike is doing some brainstorming on how to predict and cope with bandwidth spikes when a post or page suddenly becomes a popular destination.

When a blogger’s work becomes successful enough to, for a moment, graze the underbelly of commercial publishing, it threatens the very low-cost predicate of the publication itself.

>

Setting aside for the moment the absurdity of the situation, which is clear, it seems to me that over the past few years we’ve seen this exact phenomenon occur over and over again. I’m guessing, now that media people have integrated the blogosphere into their information gathering practices, we’ll see it with greater frequency and to more devastating effect over time.

My bandwidth as of 11/23/03As I recently discovered, this is a very real worry. I’d joked in the past about the “perfect post”, that one blog entry that suddenly exposes a site to the world and brings in all the traffic that so many people wish that they had — but actually stumbling upon that “perfect post” has made it very clear just how much of a double-edged sword that can really be.

In Mike’s ruminations on how things like this can be coped with, he mentioned something that sounded like a possibility…

…I think there is a proactive business opportunity for the right business to defray these transient bandwidth costs, probably in the form of short term ads on the sites that are experiencing the bolus. […] I will note that it might even be cooler yet if this feature enabled Google keyword ads. Maybe it should be an independent service, or a program that the keyword service provides for bloggers, who are currently more or less specifically discouraged from using it.

I applied for Google AdSense at one point, but they turned me down. While it was a bit of a bummer, it wasn’t much of a surprise, as Google doesn’t seem to want to accept most weblogs into their AdSense program. It seems that if you run a very tightly-focused weblog on a specific topic (such as PVR Blog or Daring Fireball) you’ve got a good chance of being accepted, but less-focused weblogs (such as mine, yours, the one you’re going to read next, or the other 99% of the blogosphere) will be denied. Unfortunately, the exact methodology or reasoning behind the approval/denial process is more than a little unclear.

There’s a far more serious problem with AdSense, though. The approval system is capricious, even arbitrary. It’s understandable that Google wants to make sure sites aren’t just ad farms, and it’s in everyone’s interest that quality be maintained, ideally by human verifiers. Nobody wants to see those sad Red Cross PSAs that take the place of house ads on poorly-indexed sites.

>

The human verification process at Google, though, is uncharacteristically opaque. I’d assume they factor in the ads which would run on a site before approving or denying an application, and if I take a look at , I see some of value. Ads specifically targeted to weblog software, Manhattan computer repair, New York hotels. These all seem relevant and valuable to me, but I’ve been repeatedly rejected.

>

It’s not just sour grapes on my part. Take NYC Eats, a great little niche weblog. Aaron’s brilliant little AdSense senser shows , which makes sense since the letters “NYC” by themselves cost two dollars a click. But no AdSense approval there. The problem is the wording in theprogram policies:

>

In general, we do not accept personal pages, chat sites, or blogs into the AdSense program. However, if a site contains targeted, text-based content and/or provides a product or service, we may consider it for participation.

In a perfect world (well, my perfect world, that is), of course, Google would open up their AdSense program to the weblogging world at large. While their AdSense ads might be a little random on the main page of a site due to the random nature of the main page posts not giving clear, concise keywords to work with, if a site design includes individual archive pages than each individual post should have enough keywords to target a specific ad category (my Mac-specific posts would get Mac-centric ads, my political posts would get political-centric ads, and so on).

If they don’t want to do that, though, what if Google set up an agreement with TypePad (or other for-pay hosting sites) in which, in order to offset the cost of bandwidth spikes, Google AdSense ads could be (semi-)automatically added to a site when they reached a certain bandwidth point (90% of their available monthly bandwidth per their agreement, for example)? Each auto-generated template could include code something along the lines of <$MTAdSense><!-- include "/ads/google/adsense.inc" --><$/MTAdSense$> that would be automatically triggered by the TypePad servers when bandwidth exceeded whatever the cutoff point was. Any revenue generated by clicks on the ads would automatically be siphoned to TypePad and applied to offset the costs of the extra bandwidth usage during the spike.

There could even be a toggle in the TypePad preferences that allowed a site author to insert a “registration key” if they were accepted by the Google AdSense program that would enable the AdSense ads on a full-time basis. In this case, Google would send any revenue to the site author as per their usual setup, instead of sending it to TypePad.

Just an idea. Workable? I haven’t got a clue — barriers include the coding of the feature (while I’m no program-level coder, it doesn’t strike me as being too terribly difficult of a feature to enable), inclusion of the feature into already-existing weblogs (not difficult for TypePad Basic, Plus, or Pro levels using the auto-generated templates, Pro levels using advanced templates would need to add the requisite code themselves), and — most importantly (and possibly most difficult) — Google and TypePad (or, of course, whatever other hosting service that might be interested) negotiating the partnership. Still, if it could be worked out, I think it could be useful and beneficial to the blogging community at large.

Back from the Meetup

Just got back a bit ago from this month’s Seattle weblogger Meetup. Saw and chatted with quite a few people there (most of whom I have to admit I can’t remember names/sites of), including Anita, Scoble, and dayment, who was kind enough to give me a couple CDs (Tones on Tail’s “Night Music” and The Faint’s “Danse Macabre”)! All in all, a quite pleasant evening.

November Weblog Meetup

For the first (and quite likely only) time, I’m going to be able to attend one of the local Weblogger’s Meetup events, as my training schedule for this Wednesday has me off work at 6pm. Once I start my 1pm-9:30pm schedule I’ll be missing them again, but at least I can make this month’s.

So, for any local Seattle bloggers, looks like I’ll be seeing some of you at Uptown Espresso, Wednesday evening at 7pm!

(via Anita)

Major referrer spam attack

Looks like there’s a major referrer spam attack going on at the moment. The sites in question look like real weblogs but aren’t — instead, many of them have similar (or the same) content, comment and trackback links just link back to the home page, they have a hidden graphic that leads to porn sites, and many (if not all) of them are stealing their designs from other weblogs. Ignore (or block, if you can) any and all referrer links you might see in your logs from:

http://www.a-b-l-o-g.com/
http://www.akksess.com/
http://www.bongohome.com/
http://jennifersblog.com/
http://kwlablog.com/
http://www.malixya.com/
http://mikesplace.com/
http://www.saulem.com/
http://www.teoras.com/
http://www.websearchde.com/
http://www.websearchus.com/
http://www.worldnewslog.com/
http://www.wr18.com/

More information on this at A Preponderance of Evidence, MetaFilter, idly.org, milov.nl, and probably more.

Fight Link Rot!

link rot n.

The natural decay of web links as the sites they’re connected to change or die.

Calpundit has an excellent summary posted on how to link to New York Times articles without having the links succumb to link rot. This should be required reading for all bloggers, IMNSHO — citing sources is important, and it’s best if the sources don’t later disappear.

Update: Even better than Calpundit’s method (as good as it is) is the New York Times Link Generator! Just feed it the URL of a NYT story, and it will generate the link rot proof version of the URL for use in your weblog. Thanks to Aaron Swartz for providing this, and to Jason Kottke for pointing it out in Calpundit’s comment thread.

Calpundit also breaks down the most archive-friendly (i.e., least susceptible to link rot) sources:

  1. Tier 1: CNN, the Guardian, and the BBC all have permanent archives that never disappear.
  2. Tier 2: The Washington Post places old articles behind an archive wall, but previously existing links to the articles work forever. The New York Times makes permanent links possible, even if they’re a bit of a pain.
  3. Tier 3: The LA Times places all its content behind an archive wall after a few days and breaks any existing links.
  4. Purgatory: The Wall Street Journal is in a class by itself, since their content is never accessible free of charge on the Web.