'go to referrer' bookmarklet

I just stumbled across this wonderful little trick, thanks to Scott.

Ever opened up a link in a new tab or a new window, then left it alone for a while? Come back to it in a couple hours, figure it’s interesting enough to blog about, then realize that you can’t remember just whose site pointed you to the link? It’s a minor annoyance, if you try to link back to your sources.

Here’s a bookmarklet to solve the problem: go to referrer. Drag that into your bookmarks, and choosing it will snap you back to whatever page sent you to the link, even in new tabs or windows. Very handy.

Blaster

Y’know what?

I never got touched by the Blaster worm that’s taking down machines all over the place (yes, I own a PC as well as a Mac).

Y’know why?

When that little “Windows Update” icon blinks at me on my PC, I pay attention to it, download, and install the updates. It’s a pain in the butt, especially since there seems to be a new “cricital update” every week, but sometimes those critical updates really are critical.

I’m sorry that this is hitting so many people. But at the same time — look, you’re dealing with Microsoft software. Bugs aren’t an unfortunate side effect, they’re a gaurantee. The patch for this particular exploit has been available for over a month on Microsoft’s site. Rant at Microsoft all you want for writing shitty software (it’s often well deserved), but at least in this instance, the fix was discovered, publicized, and patched in plenty of time to protect your computers well before Blaster was released.

Yeah, so I’m a little snarky this evening. I spent all day barely restraining myself from beating the ever-loving crap out of my work PC for various other bugs and oddities, which has left me in no great frame of mind when it comes to PCs or Microsoft in general. But at least in this one instance, they did what they could.

Three hours later…

You may have noticed that I’ve put a surprising number of posts up for this early in the day. That’s simply because I’ve spent the past three hours watching Windows XP chew through security updates, software patches, and other sundry changes to the OS. Running a web browser was about as intense an activity as I wanted to tax the machine with during that process.

Now, three hours later I can finally get to work doing what they pay me for — but that’s only because I got sick of watching a stalled progress bar, force-quit the Windows Update program, and told it to sod off. My security updates were done anyway, it was just chewing on some less critical patches, so I’m not too worried.

Frustrated, and quite willing to toss the computer out a window, if only I had one.

But not worried.

Cameras in classrooms

When students in Biloxi, Miss., show up this morning for the first day of the new school year, a virtual army of digital cameras will be recording every minute of every lesson in every classroom.

Hundreds of Internet-wired video cameras will keep rolling all year long, in the hope that they’ll deter crime and general misbehavior among the district’s 6,300 students — and teachers.

You know, I’m honestly not sure what I think of this. On the one hand, the “Big Brother” aspect of constant video surveillance creeps me out, in a big way. On the other hand, when used effectively, I could see there being some really strong advantages to the technology.

The USA Today article about this is actually surprisingly good, too (is USA Today getting better? I’ve always seen them as the ‘lowest common denominator’ of news. Anyway…).

“It helps honest people be more honest,” says district Superintendent Larry Drawdy, who, along with principals and security officers, can use a password to view classrooms from any computer. In an emergency, police also can tune in.

This is one of the quotes that creeps me out, and I think it’s entirely the wrong attitude to take. If you’re planning on using the cameras to supervise the teachers or students, then just admit it — but trying to put a false positive spin through ridiculous statements like this just raises my hackles. I like to think that I’m a fairly honest person, but a camera isn’t going to help me be more honest. It’s not going to encourage me to be less honest, either. It’s just there, and a mild annoyance.

Though Biloxi’s camera system hasn’t captured serious crimes, Drawdy says it has “prevented a lot of things from happening”…

Another ridiculously empty statment. What has it prevented? Well, we don’t know, because we prevented it. But if the cameras weren’t there, it would have been hell! I swear it! Ugh. I don’t suppose Drawdy learned his PR skills from the Bush administration’s WMD search?

Webcams have popped up in a few Defense Department schools on U.S. military bases, allowing soldiers deployed overseas to look in on their children’s classrooms and even chat via two-way setups. Teachers in London are calling for Webcams in every classroom so parents can see children’s behavior from home.

This is another aspect that gives me the willies. Aren’t kids ever allowed to be out of the eyes of their parents? How are children ever supposed to learn how to interact with each other, with other adults, with the world in general, if they’re not allowed to do so on their own? Today’s society seems so absurdly obsessed with constantly micro-managing every last little aspect of their children’s lives (from cameras in classrooms to playgrounds that, while harmless, are also uniformly bland and boring) that kids don’t ever have a chance to be kids anymore. Sure, they’re going to screw up, get a few bruises, butt heads, and be little shits every so often. But they’re kids. That’s the point. They’ve got to learn, and they’ve got to have some freedom in order to do that.

“I’m there to work; I’m there to do my job,” says R. Scott Page, an earth science and photography teacher at Hanford High School in Richland, Wash. “I don’t have a problem with somebody seeing that I’m doing my job.”

Page, a former biology teacher, granted open access to anyone who wanted to view his classroom, no password required. He says families tuned in regularly and loved it. “You could see if the kid was wearing the same thing they left the house in that morning.”

Page often focused the camera on lab experiments so he and students could monitor them over the weekend. Students would log on when they were home sick, sending messages with questions.

“Any way that you can increase communication between home and school, you’re going to help students,” Page says. “That’s what it’s all about.”

Most of what this teacher has to say I like. I’m put off by the suggestion on checking up on the kid’s clothing, but the rest of it is exactly what I think could be good about the availability of classroom cameras. Rather than just shoving the camera in a corner to be an ever-present watchful eye, he incorporated them into his teaching. Monitoring experiments over the weekend from home, letting students who are home sick participate virtually via webcam and IM — these are excellent examples of how to use technology in teaching.

All in all, I guess that’s a lot more cons than pros, isn’t it? Maybe I’m not so undecided on how I feel about this, though I’m not quite ready to commit to a solid stance. I guess it would come down to how any particular administration and teacher dealt with the technology. If it’s simply a Big Brother-style surveillance system, I have serious issues with it. But if a teacher can use the technology to the advantage of the class, that I can support.

Unfortunately, that may be an uncomfortably big “if”.

(via /.)

The MovableType/Mac conspiracy…

Another IM conversation, investigating the MovableType/Six Apart/Mac/Apple conspiracy…

Me: i’ve got a blogger account for a side project of mine, but it’ll probably be moving to TypePad pretty soon
Me: i can’t do anything on a free Blogger account, and if I’m going to give someone money, I’d rather have it be the Trotts

Phil: Keep it for testing at any rate, could you? I don’t really know anyone who uses Blogger and has a Mac.
Phil: Other than me.

Me: sure, will do

Phil: The Mac populace seems to prefer MT, interestingly. Except the people at Forwarding Address: OS X.
Phil: Hm…. maybe I could get Cory Doctorow as a beta tester. That’d be amusing.

Me: i’ve noticed that, actually – been pleasantly surprised at how often Macs get mentioned on TP blogs

Phil: Interesting correlation, really, if you think about it.
Phil: People who use Blogger often go on forums and curse about how unreliable and buggy it is.
Phil: People who use Windows often go on forums and curse about how unreliable and buggy it is.
Phil: People who use MT are often like “Look at this cool trick I can do with my blog!”
Phil: People with Macs are often like “Look at this cool trick I can do with my Mac!”
Phil: Do you see a trend?
Phil: I think maybe Movable Type is the Mac of the blogging world.

Me: i think you just get in a mindset…using computer == dealing with bugs (if you’re on the Windows side)

Phil: Same way with Blogger.
Phil: Using Blogger == dealing with bugs.
Phil: Oh!

Me: Is Six Apart the New Apple?

Phil: Yeah, I saw that.
Phil: And (using Blogger/using windows) == no help at all from the parent company.
Phil: Well, except the UNIX geeks and developers.

Me: ‘zactly
Me: and us Mac users are spoiled by the “It Just Works” syndrome

Phil: True.

Me: MT “just works” – and you never have to deal with the underlying code if you don’t want to
Me: OS X “just works” – and you never have to deal with the terminal if you don’t want to
Me: but in both cases, if you do want to, a whole world of new toys and possibilities open up

Phil: Hacks, plugins, new applications you’d never even thought of.
Phil: And I could be talking about either one with that last sentence.

Me: bingo

I think we’ve got something here!

Help wanted: Apache/PHP

I’m planning on sticking with TypePad as my weblog host once everything opens up officially (tomorrow, from the looks of it). However, this poses a bit of a problem. While I’m slowly moving all of my old posts from my old weblog to this new site, there are still lots of links scattered throughout the ‘net that point to the old addresses.

I think I know of a solution, however, I’m not well enough versed in the intricacies of Apache and PHP to pull it off on my own. So, I’m asking for help!

Here’s what I’d like to do…

All of my old posts reside at my personal server at http://www.djwudi.com/longletter/. It’s a Mac OS X computer running Apache, with PHP enabled.

I know that Apache can handle redirects, based on rules set up in the httpd.conf file. I also know that pattern matching and text string munging can be carried out in PHP.

All of my old individual entry pages are stored in my webserver with the following directory structure:

http://www.djwudi.com/longletter/archives/year/month/day/dirified_post_title.php http://www.djwudi.com/longletter/archives/2003/07/31/help_wanted_apache_php.php

All of the pages on this new site are stored using a similar, but slightly different directory structure:

http://djwudi.typepad.com/eclecticism/year/month/truncated_title.html http://djwudi.typepad.com/eclecticism/2003/07/help_wanted_apa.html

What I’m envisioning for the final system is this:

  • Anytime my webserver receives a request for a page that resides within the ‘/longletter/archives/’ directory, Apache redirects to a customised PHP script on my server.
  • That script does three things:
    1. Presents a simple page to the user with wording to the effect of “This site has moved, one moment while we redirect you…”.
    2. Looks at the requested URI and converts it to what the new URI should be. As I’ve kept post titles consistent, and the directory structures are similar, this should be fairly easy with the right regular expressions.
      1. Parse the requested URI.
      2. Remove everything before the 4-digit year and replace it with the new base address.
      3. Remove the 2-digit day.
      4. Truncate the post title to fifteen characters.
      5. Remove the .php extention and replace it with .html.
    3. Redirects the users browser to the new, correct URI.
  • Hey presto, we’re done — no matter which page was linked to at my old site, the user has been redirected to the corresponding page at my new site.

More brainstorming:

  • The above method works well for links going to individual pages, but what about category archives or the main index page itself?
  • Could the PHP script be made smarter? For instance…
    1. If the requested URI contains the year/month/day/title.php string, then the above transformation and redirect is processed.
    2. If the requested URI contains any other string (in other words, it doesn’t point to a specific post), then a page is presented that says something along the lines of “This site has moved, one moment while we redirect you to the new site…”, and a redirect is passed to the user’s browser that points to the index page of the new weblog.

Anyway, that’s what I’d like to do. It all seems straightforward enough in my brain, and I think that the technology I have available should be able to handle it all without a problem — I just don’t have the faintest idea how to code it.

Any and all advice, hints, tips, or straight-up solutions would be greatly appreciated. I’m not rich enough to offer untold wealth or cool prizes or anything, but I can offer much gratitude, public thanks and kudos, and probably pizza and beer (or a PayPal donation to a ‘pizza and beer’ fund, or some such thing).

And you won’t even have to fight me for the beer — I can’t stand the stuff. ;)

Help search engines index your site

We all know that Google is god. Chances are you’ve used Google when doing a search on the ‘net at least once, if not daily, or many times a day. If not, then I’ve heard rumors that there are other search engines out there — though I haven’t used any in so long, I can’t really vouch for the veracity of that rumor. ;)

I wanted to share a few tricks I use here to help Google (and other search engines) index my site, and to try to ensure that searches that hit my site get the most useful results.

All of the following tips and tricks do require access to your source HTML templates (in TypePad, you’ll need to be using an Advanced Template Set). While I’m writing this for an Advanced TypePad installation, the tips will work just as well in any other website or weblog application where you have access to the HTML code.

Specify which pages get indexed, and which don’t

What? One of the most important pages on a weblog from a user’s point of view is the main page. It has all your latest posts, all the links to your archives, your bio, other sites you enjoy reading, webrings, and who all knows what else. However, from the perspective of a search engine, the main page of a weblog is most likely the single least important page of the entire site!

This is simply because the main page of a weblog is always changing, but search engines can only give good results when the information that they index is still there the next time around. I’ve run into quite a few situations where I’ve done a search for one term or another, and one of the search results leads to someone’s weblog. Unfortunately, when I go to their page, the entry that Google read and indexed is no longer on the main page. At that point, I could start digging through their archives and trying to track down what I’m looking for — but I’m far more likely to just bounce back to Google and try another page.

Thankfully enough, though, there’s an extremely easy fix for this that keeps everyone happy.

How? One short line of code at the top of some of your templates is all it takes to solve the problem. We’re going to be using the robots meta tag in the head of the HTML document. The tag was designed specifically to give robots (or spiders, or crawlers — the automated programs that search engines use to read websites) instructions on what pages should or shouldn’t be indexed.

For the purposes of a weblog, with one constantly changing index page and many static archive pages, the best possible situation would be to tell the search engine to read and follow all the links on an index page (so that it finds all the other pages of a site), but not to index that page. The rest of the site, it will be free to read and index normally.

That’s very easy to set up, as it turns out. The robots meta tag allows four possible arguments:

INDEX
Read and index a page normally
NOINDEX
Do not index any of the text of the page
FOLLOW
Follow all the links on a page to read linked pages
NOFOLLOW
Ignore all links on a page

So, in order to do what we want, we add the following meta tag to our document, in the head section, right next to the meta tags that are already there:

<meta name="robots" content="noindex,follow" />

Now, when a search engine robot visits the index page of the site, it knows that it should not index the page and add it to its database, however, it should follow any links on that page to find other pages within the site. This way, searches that return hits for the site will be sure to find your archive pages for the information that is requested, rather than your front page, which may not have the information anymore.

Update: It turns out that this technique may have some side effects that I hadn’t considered, and might possibly not work at all. For more details, please scroll down to Anode’s comment and my reply in the comment thread for this post. Hopefully I’ll be able to dig up more information on this soon.

Fine tune what sections of a page get indexed

What? There is a proposed extension to the robots meta tag that allows you to not just designate which pages of a site get indexed, but also which sections of a page get indexed. I discovered this when I was setting up a shareware search engine for my old website, and have since gotten in the habit of using it. Now, this is not a formal standard, and I don’t know for sure which search engines support it and which don’t — the creator of this technique has suggested it to the major search sites, but it is not known what the final result was.

Now, why would you want to do this? Simply this: on many weblogs, including TypePad sites, the sidebar information is repeated on every page of the site. There is also certain informational text repeated on every page (for instance, the TrackBack data, the comments form, and so on). This creates a lot of extraneous, mostly useless data — doubly so when that information changes regularly.

By using these proposed tags, any search engine that supports them will only index the sections of a page that we want indexed, and will disregard the rest of the page.

How? Because this is based on the robots meta tag discussed above, it uses the same four arguments (INDEX, NOINDEX, FOLLOW, and NOFOLLOW). Instead of using a meta tag, though, we use HTML comment syntax to designate the different sections of our document.

For instance, every individual archive page on a TypePad weblog that has TrackBack enabled will have the following text (or something very similar):

Trackback
TrackBack URL for this entry:
http://www.typepad.com/t/trackback/(number)

Listed below are links to weblogs that reference (the name of the post)

In order to mark this out as a section that we wanted the search engine not to index and not to follow (as the only link is to the page that the link is on), we would surround it with the following specialized tags:

<!-- robots content="noindex,nofollow" -->
<!-- /robots -->

For example, I would change the code in the TypePad Individual Entry template to look like this:

<mtentryIfAllowPings>
<!-- robots content="noindex,nofollow" -->
<h2><a id="trackback"></a>TrackBack</h2>
TrackBack URL for this entry:<br /><$MTEntryTrackbackLink$>
Listed below are links to weblogs that reference <a href="<$MTEntryPermalink$>"><$MTEntryTitle$></a>:
<!-- /robots -->
<mtpings>

The same technique can be used wherever you have areas in your site with content that doesn’t really need to be indexed.

Now, as I stated above, this is only a proposed specification, and it is not known which (if any) search engines support it. It also requires a healthy chunk of mucking around with your template code. Because of these two factors, it may not be an approach that you want to take, instead simply using the “sledgehammer” approach of the page-level robots meta tag discussed above.

However, I do think that the possible benefits of this being used more widely would be worth the extra time and trouble (at least, for those of us obsessive about our code), and I’d also suggest that should TypePad gain a search functionality, that these codes be recognized and followed by the (purely theoretical, at this point) TypePad search engine.

Put the entry excerpt to use

What? The entry excerpt is another very handy field to use in fine tuning your site. I believe that the field is turned off on the post editing screen by default, but it can be enabled by clicking on the ‘Customize the display of this page’ link at the bottom of the post editing screen.

By default, the entry excerpt is used for two things in TypePad: when you send a TrackBack ping to another weblog, the excerpt is sent along with the ping as a short summary of your post; and it is used as the post summary in your RSS feed if you have selected the ‘excerpts only’ version of the feed in your weblog configuration. However, it can come in handy in a few other instances too. One that I’ve discussed previously is in your archive pages. However, the excerpt can also be used to help out search engines.

You may have noticed that when you do a search on Google, rather than simply returning the link and page title, Google also returns a short snippet of each page that the search finds. Normally, this text snippet is just a bit of text from the page being referenced, intended to give some amount of context to give you a better idea of how successful your search was. There is a meta tag that lets us determine exactly what text is displayed by Google for the summary, though — which is where the extended entry field comes in.

How? We’re adding another meta tag here, so this will go up in the head section of your Individual Archives template. Next to any other meta tags you have, add the following line:

<meta name="description" content="<$MTEntryExcerpt>" />

Then save, and republish your Individual Archives, and you’re done. Now, the next time that Google indexes your site, the excerpt will be saved as the summary for that page, and will display beneath the link when one of your pages comes up in a Google search.

So what happens if you don’t use the entry excerpt field? Well, TypePad is smart enough to do its best to cover for this — if you use the <$MTEntryExcerpt$> tag in a template, and no excerpt has been added to the post, TypePad automatically pulls the first 20 words of your post to be the excerpt. While this works to a certain extent, it doesn’t create a very useful excerpt (unless you’re in the habit of writing extremely short posts). It’s far better to take a moment to create an excerpt by hand, whether it’s a quick cut and paste of relevant text in the post, or whether it’s more detailed (“In which we find out that…yadda yadda yadda.”). In the end, of course, it’s your call!

Use the Keywords

What? Keywords are short, simple terms that are either used in a page, or relate to the page. The original intent was to place a line in the head of an HTML page that listed keywords for that page, which search engines could read in addition to the page content to help in indexing.

Unfortunately, keywords have been heavily abused over the years. ‘Search Engine Optimizers’ started putting everything including the kitchen sink into their HTML pages for keywords in an effort to drive their pages rankings higher in the search engines. Because of this, some of the major search engines (Google included) now disregard the ‘keywords’ meta tag — however, not all of them do, and used correctly, they can be a helpful additional resource for categorizing and indexing pages.

How? One of the various fields you can use for data in each TypePad post is the ‘Keywords’ field. I believe that it is turned off by default, however you can enable it by clicking on the ‘Customize the display of this page’ link at the bottom of your TypePad ‘Post an Entry’ screen.

Once you have the ‘Keywords’ field available, you can add specific keywords for each post. You can either use words that actually appear in the post, or words that relate closely to it — for instance, I’ve had posts where I’ve used the acronym WMD in the body of the post, then added the three keywords ‘weapons mass destruction’ to the keywords field. You never know exactly what terms someone will use in their search, might as well give them the best shot at success, right?

Okay, so now you have keywords in your posts. What now? By default, TypePad’s templates don’t actually use the data in the Keywords field at all. This is fairly easy to fix, however.

In your Individual Archives template, add the following line of code just after the meta tags that are already there:

<meta name="keywords" content="<$MTEntryKeywords$>" />

Then save your template, republish your site (you can republish everything, but doing just the Individual Archives is fine, too, as that’s all that changed), and you’re done! Now, the next time that a search engine that reads the keywords meta tag reads your site, you’ve got that much more information on every individual post to help index your site correctly.

Conclusion

So there we have it. One extremely long post from me, with four hopefully handy tips for you on how you can help Google, and the rest of the search engines out there, index your site more intelligently. If you find this information of use, wonderful! If not…well, I hope you didn’t waste too much of your day reading it. ;)

Feel free to leave any questions, comments, or words of wisdom in the comments below!

Our friend, the humble 'title' attribute

Earlier this evening, I got an e-mail from Pops asking me how I created the little tooltip-style comment text that appears when you hover over links in my posts. I ended up giving him what was probably far more information than he was expecting, but I also figured that it was information worth posting here, on the off chance it might help someone else out.

It’s actually a really easy trick, though not one built into TypePad. Simply add a title declaration to the link itself. For instance, if I wanted the text “Three martinis and a cloud of dust” to appear when someone hovered over a link to Pops’ site, I’d code it like this:

<a href="http://2hrlunch.typepad.com/" title="Three martinis and a cloud of dust">Two Hour Lunch</a>

The end result looks like this (hover over the link to see the title attribute in action):

Two Hour Lunch

That little title attribute comes in wonderfully handy, too, as it can be applied to just about any HTML tag there is.

For instance, good HTML coding includes alt text for all images, so that if someone has image loading turned off in their browser, or if the image fails to load for any other reason, there will be some descriptive text to tell them what gorgeous vistas they are missing. However, in most browsers the only time that text shows is if the image doesn’t load. Using the title attribute in addition to the alt attribute when adding images, we can create that same style of comment when someone hovers over the image. For example:

<img src="lalala.gif" width="360" height="252" alt="NOTICE: I'm not listening!" title="La la la la la la!" />

That way, when displayed in the browser, if the image didn’t load, the text ‘NOTICE: I’m not listening!’ would show instead. In addition, the text ‘La la la la la la!’ will appear if someone lets their cursor pass over the image. Not a necessary thing, but it can be fun for quick, pithy little comments. Here’s the example:

NOTICE: I'm not listening!

Another place I use title tags fairly regularly is when I make changes to a post after it’s first posted. HTML includes two tags (<ins> and <del>, for insert and delete, respectively) for marking up changes to text. When I go back in to edit a post after it first appears on my site, I use those tags with a title attribute to indicate when the change was made.

For example, suppose I posted the following:

Pops is a screaming loony, who shouldn’t be allowed within twenty yards of anyone who isn’t equipped with body armor and a machete.

Later, coming to my senses, I could change that like this:

Pops is a <del title="7/30/03 10pm: I think I was on drugs when I wrote this.">screaming loony, who shouldn't be allowed within twenty yards of anyone who isn't equipped with body armor and a machete</del> <ins title="Here's what I meant to say...">great guy, whose website has pointed me to some fascinating tidbits on a regular basis</ins>.

(I hope Pops doesn’t mind the sample text here.) ;)

On screen, after the update, the deleted text would display as struck through, and the inserted text would display underlined (standard editing notation), with the comments displaying on a cursor hover, like this:

Pops is a screaming loony, who shouldn’t be allowed within twenty yards of anyone who isn’t equipped with body armor and a machete great guy, whose website has pointed me to some fascinating tidbits on a regular basis.

So there ya go — more information on the humble little ‘title’ attribute than you probably ever wanted or needed to know. I hope it helps!

Update: (See? There’s a title attribute right there!) As of this writing, the title attribute is barely supported in Apple’s new web browser, Safari. Titles on links will appear in the status bar at the bottom of the window if the status bar is turned on, but that’s it. No other title text will be visible. I’m hoping that this is fixed in a later update to Safari, but for the moment, that’s what we have to work with.

Pet Peeves

Can we please please please stop using the target="new" attribute in links? I don’t want a new window. If I do want a new window, then I’ll right-click and use the “Open link in new window” command. But I don’t want you deciding that I must want a new window, just because you don’t want me taking the oh-so-horrid step of actually (gasp) leaving your site!

If you’ve got a good site, I’ll use the “back” button and come back. If you don’t have a good site, I’m not likely to come back no matter what the circumstances. But constantly forcing every link to open in a new window, taking control of how I browse away from me, you’re a lot more likely to piss me off to the point where I won’t come back than if you just let me browse normally.

Thank you for your time.

(And on that note, yes, I know that clicking on someone’s name if they leave a comment here does the exact thing I’m bitching about. I haven’t figured out how to get around that yet. If I can, you can be damn sure that I’m turning that little “feature” off.)

Update: Usability Guru Jakob Nielsen also hates this practice, so I’m not alone. Opening new windows for links breaks items one and two of his Top 10 new mistakes of web design article. So there. Bleah. :P