Weekend Project: Keyword Search

Boy, have I got a project for the weekend!

While running ideas and vague concepts related to my tag-categorization wishlist of the other day, I figured it was worth poking around in the Movable Type Support Forums to see if I could find anything of use. A search for ‘keywords’ led me to one thread, which then led me to these posts by ishbadiddle — and that looks to be (nearly) exactly what I’ve been looking for!

Here’s his blog entry on his keyword subject indexing work:

My thinking about the Semantic Web was influenced by Paul Ford’s piece on the subject, which imagines the power of Google harnessing the Semantic Web to make even more money. There’s a good article on the Semantic Web on wikipedia. Basically, it’s adding metadata (data about the data) to web pages. In our case, it’s simply adding “subject” data to each blog post, and then harnessing that to create an index of posts that relate to that subject. Think of it this way: the Category system is like the Table of Contents of a book, listing chapter headings. The Keyword system is like the Index of a book, one that is constantly updated.

So, plan for the upcoming weekend:

Print out ishbadiddle’s instructions, download and install the required plugins (ifEmpty, Loop, Compare, Collate, and Regex), hack the search functions, and then start pounding away on my templates.

About the one downside I can see to this is that I may have to go back to static rendering of my pages rather than the dynamic rendering I’m using now, but I’m okay with that (it’s all a tradeoff anyway, there’s pros and cons to each approach).

It’ll be fun to get into geek mode for a little while as I work on this. I just hope I don’t break anything while I’m working on it…

iTunesSteamroller (Steaming Pig)” by Pigface from the album In Dust We Trust (1997, 3:22).

It’s official: Six Apart acquires LiveJournal

Looks like the rumors were true: Six Apart acquires LiveJournal.

Pertinent posts:

I’ve read none of these yet, as it’s after midnight and I need to get to bed. Should make for interesting reading when I wake up, though.

Update: Okay, I stayed up a bit later and read the posts. Good stuff there — there was a lot of FUD running around due to the rumors, and I think that the three posts above do a good job of dispelling that.

Congrats to both Six Apart and LiveJournal — I’m looking forward to seeing where you all go with this.

iTunesGet Off My Land” by Operatica from the album O Vol. 1 (2000, 5:05).

De-Lurker Day

Don't be a stranger...

Well, the day’s actually almost over, but I just now found about about this thanks to Carla saying hi — today’s been declared De-Lurker Day!

I know you’re out there. My stats tell me you’re out there. And today you have a once in a lifetime chance to let your presence be known.

Well, okay, technically you have that chance everyday, but today is De-Lurking Day! A special day celebrating lurkers, and exhorting you to muster the strength and bravery to click on that comment button and end the deafening silence.

So say hi, or tell me your wish for 2005, or what you’re having for lunch, or your diabolical plan for world domination– whatever.

Sounds good to me — so…who’s out there?

iTunesCrying from Outside” by Tear Garden, The from the album To Be an Angel Blind, the Crippled Soul Divide (1996, 7:03).

Veronica Moser, Type Key Spammer

While there’s a fair amount of chatter today about spammers shifting tactics away from comments and towards Trackback (which my linklog got hit with this morning, actually), I just ran into a different approach — my first TypeKey authenticated spammer.

In theory, enabling TypeKey is supposed to be one of the more effective way of combatting comment spam, as it presents a much higher (and supposedly non-scriptable) barrier to the spammer. As the Six Apart Guide to Comment Spam notes:

The worst case scenario…would be if a spammer created a TypeKey account, and used it to send spam to your weblog. However, because the first comment from any TypeKey user must be approved by your before being published, the only way a spammer could sneak spam onto your site would be to first submit a comment that appears to be legitimate. While it’s possible that some spammers might attempt this, it is highly unlikely that they would be able to do this using automated scripts. If they do and are reported to Six Apart, TypeKey’s terms of service allows us to disable their accounts.

Apparently, that’s just what has happened to me. I noticed a comment that fit the profile of a standard spam comment pop up in my comments RSS feed: all it said was “Very interesting,” and included a link to http://veronicamoser.com/. I didn’t have a clue who Veronica was, so I did a quick Google — the results were pretty telling.

Since this was the first time I’ve seen this type of attack, though, I went ahead and left the comment (though I did edit out the active link) and sent a quick note to Six Apart. I’m rather surprised that someone went through this much trouble — barring a new script attack, ‘Veronica’ would have had to sign up for a TypeKey account, visit my page, sign in to the TypeKey system, and then manually post the comment. I’m also fairly amused that they used the name ‘Type Key Spammer‘ for their TypeKey profile — essentially thumbing their nose at authority, I suppose.

Of course, the one worry is if this might be a test case, and someone actually is working out a script to continue with the comment spam attacks even in the face of TypeKey authentication. We can always report the offending TypeKey account to Six Apart, of course, but if the spammers keep creating new accounts…well, it’ll just be one more side to the battle against spam.

Whee. :P

What do you write about?

From Samantha:

“What do you write about?”

This gives me pause. Do I give him the simple answer, or the complicated one? I’ve spent most of the last week by myself, pacing, waiting for things to happen. I’m impatient and, admittedly, a little grumpy. “I write about, um, memory.”

“I don’t understand.”

This doesn’t surprise me. “Well. You know how, when you have a memory, it’s really just a series of images that are vague, with a couple of points sticking out for reference? And then, when you try to put your memory in words, to tell it to someone, it comes out a little different than how you thought it looked in your head? What I do is, I try and find a way to make the words fit. I try to bring my life into focus. I bite my fingernails and try to tell people the contours of the jagged edges. You know. Like that.”

Sounds good to me — and, admittedly, it’s something I need to work on.

Maybe that’ll be a resolution for the upcoming year.

iTunesPeople Everyday (Reprise)” by Arrested Development from the album 3 Years, 5 Months and 2 Days in the Life of… (1992, 4:56).

Book of Blogs II

Earlier this month, I linked to a project by Tvindy to collect and anthologize some of the better weblog posts by a number of contributing authors. I really liked the idea, and nominated a few entries for potential inclusion.

Tvindy’s hoping for a little more assistance in culling worthwhile posts, preferably posts chosen by regular readers of the participating weblogs, rather than solely author-nominated work.

As readers, are there any posts that stand out in your mind as particularly noteworthy, for whatever reason? They don’t have to be long, or serious, or anything in particular aside from standing out in one way or another. If so, toss ’em in the comments here — if you don’t want to dig through the archives yourself, just throw up whatever details you can remember, and I’ll track it down.

Not only will this help the project, but I’d be interested to see what — if anything — comes out of this.

iTunesConga Fury” by Juno Reactor from the album Bible of Dreams (1997, 8:06).

Mind Hacks

Just added to my daily reads: Mind Hacks, the companion blog to Tom Stafford and Matt Webb’s book Mind Hacks, recently released by O’Reilly.

Full of fascinating brain play (literally), like this post on how we perceive our sleeping habits:

Our own perception of how much we slept during a night can be startlingly inaccurate. Dr Allison Harvey (now of UC Berkley) took insomniacs and measured how much they actually slept during the night. Despite the insomniacs reporting that they had only slept for two or three hours, they had in fact been asleep for an average of 7 hours – only 35 minutes less than a control group who didn’t have any problems sleeping.

This shows that insomniacs (and probably the rest of us) are very bad at judging the time it takes us to get to sleep, and the time we actually are asleep. It also suggests that worrying about sleep, and our beliefs about how we’ve slept, have a big role in the negative affects of what (we believe) is a sleepless night.

I’m looking forward to seeing what else pops up on their weblog, and I will definitely need to pick up the book as soon as I get a chance.

(via Boing Boing)

iTunesThis Hollowed Ground” by Legendary Pink Dots, The from the album From Here You’ll Watch the World Go By (1995, 3:04).

Howdy, Wired readers!

Over a year after the incident, I’m getting another few seconds added to my fifteen minutes of fame: last week I was interviewed by phone by Wired, and their article hit the ‘net today:

What do a flight attendant in Texas, a temporary employee in Washington and a web designer in Utah have in common? They were all fired for posting content on their blogs that their companies disapproved of.

Aside from that leader being a wee bit misleading (I was let go by my previous employer, not the copy company I currently work for), it’s not a bad article.

Update: Wired was kind enough to slightly edit the introductory paragraph to clear up the wording a touch. Thanks much!

If there are any visitors hitting my site for the first time who might be curious about just what happened to me, I can direct you to my fifteen minutes of fame archives, and specifically, the photo, the day I was let go, and my wrapup and responses on the whole shebang.

And, of course, feel free to kick around and poke around the rest of the site. Nice to see you here!

A Book of Blogs

Thanks to Alicia, I just found out about this project of Tvindy’s:

With all the phenomenal writing that has appeared on our various blogs over the past several months, wouldn’t it be cool for us to get together and publish a physical anthology of our greatest posts?

The way I envision it is that several of us agree to participate and have a couple of their entries published in the anthology. Since most people (myself included) find it hard to evaluate their own work, we can make suggestions as to what the best entries of our fellow bloggers are and urge them to choose those. That should make for some interesting debates.

The final product would be a paperback, containing hopefully as many as fifty entries in no particular order. Each entry would identify the name (or pseudonym) of the author and the URL of her/his blog. We’d make a nice cover using combined artwork from various blogs, and there would be an introduction at the beginning explaining what the book was.

He’s got more thoughts on how to approach the project in his next three posts (make that four).

I think this sounds really good, and would love to contribute, if anything I have is deemed worthy of inclusion.

Taking a quick look at my recent Four Years post where I pulled out a lot of highlights, I’m thinking that the following posts would be most likely to work well:

If anyone else has any other nominations, though, I’d be glad to see them. Your views on the “best” posts as readers might be quite different than mine as author.

ecto 2

So ecto 2 is updated, and one of the nifty new features is Amazon integration. It’s pretty slick, with a handy little search window within ecto to find items, one-button posting once you’ve chosen what you want, and a few options for how you want the finished link to appear.

I do have one concern about the link format, though.

I’ve been very careful to make sure that all my Amazon URLs are formatted a specific way, after reading this from kottke early last year:

I’ve noticed lately that when I browse items at Amazon, the URLs now take one of two forms:

http://amazon.com/exec/obidos/ASIN/0684868768/ http://amazon.com/exec/obidos/tg/detail/-/0684868768/

The former URL style has been around for some time, but the latter is relatively new. If you’re an Amazon Associate, the proper way of linking to an individual item (per their linking guide) is to append your Associate code (mine is “0sil8”) to the first URL style, like so:

http://amazon.com/exec/obidos/ASIN/0684868768/0sil8

But if you run across an item at Amazon with the second type of URL, this won’t work:

http://amazon.com/exec/obidos/tg/detail/-/0684868768/0sil8

If you’ve linked to items using that style of URL (something I’ve seen on several sites), check your reports at Amazon…you’ll find that you’re not getting any Associates clickthroughs or credit for those purchases.

Obviously, since I would like to get credit for any clickthroughs I might receive (rare though they may be), I’ve been very careful to make sure to use the ‘ASIN’ format of link, and not the ‘tg/detail/-‘ format.

While experimenting with ecto’s new Amazon integration, I put together a quick link to Neal Stephenson’s ‘Quicksilver’, and checked the URL — and, unfortunately, it came up with the ‘tg/detail/-‘ format (though I’ve fixed it in that link). After poking around in ecto’s settings for the Amazon integration, it doesn’t appear that there is any end-user control over the links (that is, the URL format, not the format of the link itself) other than manually fixing them after they’re inserted. While this really isn’t a major dealbreaker — it’s essentially what I’ve been doing for a while anyway, and the ecto integration does make it much easier to find items — it does leave me with a few questions…

  1. Most importantly, does this still matter? Or does Amazon now give credit correctly for both styles of links? On the assumption that it does still matter…
  2. When constructing the link, does ecto receive the entire URL string from Amazon, or just the ASIN?
  3. If ecto only receives (or needs) the ASIN, can the URL string be changed to the ‘ASIN’ style of link in the next update to ecto?
  4. If ecto receives the entire string…
    1. …is the returned string always ‘tg/details/-‘ format, or does it switch between that and ‘ASIN’ format?
    2. If it’s always ‘tg/details/-‘ format, can that be automatically adjusted within ecto to ‘ASIN’ format?
    3. If it switches, can ecto watch for that string and adjust it when necessary?
  5. And lastly, according to the tail end of Jason’s post, adding ‘ref=nosim/’ before the Associate ID forces Amazon to skip the “You may also be interested in…” page and send you straight to the actual product page. Can an option be added to ecto to add the ‘ref=nosim/’ string in the right place for people who might worry/care about such a thing?

If not…well, I’ll live. I can hope, though! :)

iTunesLeæther Strip Part II” by Leæther Strip from the album Penetrate the Satanic Citizen (1992, 6:00).