Weekend Project: Keyword Search

Boy, have I got a project for the weekend!

While running ideas and vague concepts related to my tag-categorization wishlist of the other day, I figured it was worth poking around in the Movable Type Support Forums to see if I could find anything of use. A search for ‘keywords’ led me to one thread, which then led me to these posts by ishbadiddle — and that looks to be (nearly) exactly what I’ve been looking for!

Here’s his blog entry on his keyword subject indexing work:

My thinking about the Semantic Web was influenced by Paul Ford’s piece on the subject, which imagines the power of Google harnessing the Semantic Web to make even more money. There’s a good article on the Semantic Web on wikipedia. Basically, it’s adding metadata (data about the data) to web pages. In our case, it’s simply adding “subject” data to each blog post, and then harnessing that to create an index of posts that relate to that subject. Think of it this way: the Category system is like the Table of Contents of a book, listing chapter headings. The Keyword system is like the Index of a book, one that is constantly updated.

So, plan for the upcoming weekend:

Print out ishbadiddle’s instructions, download and install the required plugins (ifEmpty, Loop, Compare, Collate, and Regex), hack the search functions, and then start pounding away on my templates.

About the one downside I can see to this is that I may have to go back to static rendering of my pages rather than the dynamic rendering I’m using now, but I’m okay with that (it’s all a tradeoff anyway, there’s pros and cons to each approach).

It’ll be fun to get into geek mode for a little while as I work on this. I just hope I don’t break anything while I’m working on it…

iTunesSteamroller (Steaming Pig)” by Pigface from the album In Dust We Trust (1997, 3:22).

It’s official: Six Apart acquires LiveJournal

Looks like the rumors were true: Six Apart acquires LiveJournal.

Pertinent posts:

I’ve read none of these yet, as it’s after midnight and I need to get to bed. Should make for interesting reading when I wake up, though.

Update: Okay, I stayed up a bit later and read the posts. Good stuff there — there was a lot of FUD running around due to the rumors, and I think that the three posts above do a good job of dispelling that.

Congrats to both Six Apart and LiveJournal — I’m looking forward to seeing where you all go with this.

iTunesGet Off My Land” by Operatica from the album O Vol. 1 (2000, 5:05).

De-Lurker Day

Don't be a stranger...

Well, the day’s actually almost over, but I just now found about about this thanks to Carla saying hi — today’s been declared De-Lurker Day!

I know you’re out there. My stats tell me you’re out there. And today you have a once in a lifetime chance to let your presence be known.

Well, okay, technically you have that chance everyday, but today is De-Lurking Day! A special day celebrating lurkers, and exhorting you to muster the strength and bravery to click on that comment button and end the deafening silence.

So say hi, or tell me your wish for 2005, or what you’re having for lunch, or your diabolical plan for world domination– whatever.

Sounds good to me — so…who’s out there?

iTunesCrying from Outside” by Tear Garden, The from the album To Be an Angel Blind, the Crippled Soul Divide (1996, 7:03).

Veronica Moser, Type Key Spammer

While there’s a fair amount of chatter today about spammers shifting tactics away from comments and towards Trackback (which my linklog got hit with this morning, actually), I just ran into a different approach — my first TypeKey authenticated spammer.

In theory, enabling TypeKey is supposed to be one of the more effective way of combatting comment spam, as it presents a much higher (and supposedly non-scriptable) barrier to the spammer. As the Six Apart Guide to Comment Spam notes:

The worst case scenario…would be if a spammer created a TypeKey account, and used it to send spam to your weblog. However, because the first comment from any TypeKey user must be approved by your before being published, the only way a spammer could sneak spam onto your site would be to first submit a comment that appears to be legitimate. While it’s possible that some spammers might attempt this, it is highly unlikely that they would be able to do this using automated scripts. If they do and are reported to Six Apart, TypeKey’s terms of service allows us to disable their accounts.

Apparently, that’s just what has happened to me. I noticed a comment that fit the profile of a standard spam comment pop up in my comments RSS feed: all it said was “Very interesting,” and included a link to http://veronicamoser.com/. I didn’t have a clue who Veronica was, so I did a quick Google — the results were pretty telling.

Since this was the first time I’ve seen this type of attack, though, I went ahead and left the comment (though I did edit out the active link) and sent a quick note to Six Apart. I’m rather surprised that someone went through this much trouble — barring a new script attack, ‘Veronica’ would have had to sign up for a TypeKey account, visit my page, sign in to the TypeKey system, and then manually post the comment. I’m also fairly amused that they used the name ‘Type Key Spammer‘ for their TypeKey profile — essentially thumbing their nose at authority, I suppose.

Of course, the one worry is if this might be a test case, and someone actually is working out a script to continue with the comment spam attacks even in the face of TypeKey authentication. We can always report the offending TypeKey account to Six Apart, of course, but if the spammers keep creating new accounts…well, it’ll just be one more side to the battle against spam.

Whee. :P

Wishlist: MT ‘tag’ category plugin

Thanks to Flickr, I’m becoming more and more of a fan of keywords or ‘tags‘ as categorization tools. Rather than having a set number of categories or sub-categories, tags are an amazingly simple way to categorize items (such as photos on Flickr, or links on del.icio.us [which I really need to look more closely at]) just by tossing whatever descriptive terms you want into the tag field.

What I want now is a way to use tags in my Movable Type installation rather than categories. I have no idea if this is even possible with the current plugin scheme, or if it would take a lot of lower-level source code hacking (seems like it might…I’m guessing you’d need to disable MT’s category system, replace it with the tag system, remove the Category drop-down menu from the MT interface and replace it with a field for inputting tags, incorporate a tag search feature, etc.), but I’d love to see it. Even better would be if enabling the tag system in MT would automatically create a dynamically-generated tags page similar to Flickr‘s, with the top X (50? 100? 150? User-definable?) tags displayed using variable sizes, and a link to a full tag list.

Okay, I want to rip off Flickr’s entire tag system and use it on my MT blog. Imitation is the sincerest form of flattery, right? ;)

Of course, I can’t code a “hello world” application (well, maybe in BASIC, but not in anything more complex than that), let alone tackle a project like this. But I can dream.

Barring some kind soul figuring out how to shoehorn such a thing into MT, though, do any of the current weblogging tools support tag-based categorization? I’m not entirely sure if that one feature would be enough to tempt me away from MT, but it’s obviously bouncing around my brain enough to make me ask…

Addendum: Just before posting this, I looked at the ‘Keywords’ field in the MT interface. Hmmm. Maybe all we need is a plugin to parse and interact with the keyword field that’s already there? Damn, I wish I knew more about programming…. Ideas, anyone?

Later tonight I may see what resources I can find to toss this idea into the wider MT community and see if some bigger brains than mine feel like poking around with this.

Update: Ben Hammersly is doing something similar, only rather than being an internal categorization system, it uses keywords to link to del.icio.us tags. Not quite what I’m thinking.

A comment there led me to this directory — which might be close to what I’m thinking of, though as the documentation is little more than “put it in your plugins directory”, it’s a little hard to tell what it would actually do.

No solutions yet, but apparently others have at least started looking this direction, so there’s hope…

Update: Another piece of the puzzle, and this from someone who pokes their head in here from time to time: Dan has PHP code for a weighted keyword list. Now, if those could be linked into some sort of category-like listing…

My Netflix

I’ve just added a new page to the site (and linked it in the header navigation of every page): my Netflix queues.

Thanks to the plugin goodness of the Netflix Suite, it lists the movies I currently have checked out, the last 90 days (?) of movies I’ve watched and returned along with what I’ve rated them, and my entire Netflix queue (sitting pretty at 441 as of this moment).

Book of Blogs II

Earlier this month, I linked to a project by Tvindy to collect and anthologize some of the better weblog posts by a number of contributing authors. I really liked the idea, and nominated a few entries for potential inclusion.

Tvindy’s hoping for a little more assistance in culling worthwhile posts, preferably posts chosen by regular readers of the participating weblogs, rather than solely author-nominated work.

As readers, are there any posts that stand out in your mind as particularly noteworthy, for whatever reason? They don’t have to be long, or serious, or anything in particular aside from standing out in one way or another. If so, toss ’em in the comments here — if you don’t want to dig through the archives yourself, just throw up whatever details you can remember, and I’ll track it down.

Not only will this help the project, but I’d be interested to see what — if anything — comes out of this.

iTunesConga Fury” by Juno Reactor from the album Bible of Dreams (1997, 8:06).

New Styles

I’ve done very little posting or reading over the weekend, and I’m up way too late tonight (tomorrow morning is really going to suck), but it’s all for a good cause — well, okay, depending on how you define that — as there are now two new stylesheets available in the switcher over to the right.

Simple Green style screenshot

The first is “Simple Green”. There’s really not a whole lot to look at, as I was mainly using it to play with a couple ideas that I had but wasn’t sure if I could quite get them to work correctly or not. Green monotype text on a black background, very little styling aside from that. In all honesty, while it’s kind of fun for a few moments, I wouldn’t want to read my site this way on a regular basis. Who knows, though, maybe someone will decide that it makes me look more ‘l33t’ and Matrix-y.

Blue Distressed style screenshot

The second is “Blue Distressed” and is the reason I’m up so late. I’m really, really happy with the way this one turned out. Cool blues and greys, distressed edges, and a lot more visually interesting than any design that I’ve come up with so far. Many thanks must go out to Keith Bowman, whose Photoshop brushes and color palletes made this design possible.

Now, neither of these stylesheets have been tested in anything other than Safari yet, so they may very well look like ass in other browsers (especially IE, and even more so with Blue Distressed, as it uses transparent .png images that I don’t believe are supported with IE). Caveat emptor and all that jazz.

For me, though, Blue Distressed is the way I’m viewing my site from now on.

And now — bedtime. I’m so going to hate my alarm in the morning.

Update: After a little tweaking, I’ve deemed the appearance under IE 5 “good enough” to make Blue Distressed the default stylesheet for the site. If you haven’t already used the stylesheet switcher to pick a style or if you’re a brand-new visitor, you should be getting the fancy-shmancy new design now.

I still don’t know how this looks under IE6, though. That’ll have to wait until I actually bother to turn on the PC in my apartment, something that tends to happen about as often as America elects a Democrat to the White House.

Or so.

It’s close.

iTunesGod is a DJ (Edit)” by Faithless from the album Sunday 8 PM (1999, 3:32).

A Book of Blogs

Thanks to Alicia, I just found out about this project of Tvindy’s:

With all the phenomenal writing that has appeared on our various blogs over the past several months, wouldn’t it be cool for us to get together and publish a physical anthology of our greatest posts?

The way I envision it is that several of us agree to participate and have a couple of their entries published in the anthology. Since most people (myself included) find it hard to evaluate their own work, we can make suggestions as to what the best entries of our fellow bloggers are and urge them to choose those. That should make for some interesting debates.

The final product would be a paperback, containing hopefully as many as fifty entries in no particular order. Each entry would identify the name (or pseudonym) of the author and the URL of her/his blog. We’d make a nice cover using combined artwork from various blogs, and there would be an introduction at the beginning explaining what the book was.

He’s got more thoughts on how to approach the project in his next three posts (make that four).

I think this sounds really good, and would love to contribute, if anything I have is deemed worthy of inclusion.

Taking a quick look at my recent Four Years post where I pulled out a lot of highlights, I’m thinking that the following posts would be most likely to work well:

If anyone else has any other nominations, though, I’d be glad to see them. Your views on the “best” posts as readers might be quite different than mine as author.

ecto 2

So ecto 2 is updated, and one of the nifty new features is Amazon integration. It’s pretty slick, with a handy little search window within ecto to find items, one-button posting once you’ve chosen what you want, and a few options for how you want the finished link to appear.

I do have one concern about the link format, though.

I’ve been very careful to make sure that all my Amazon URLs are formatted a specific way, after reading this from kottke early last year:

I’ve noticed lately that when I browse items at Amazon, the URLs now take one of two forms:

http://amazon.com/exec/obidos/ASIN/0684868768/ http://amazon.com/exec/obidos/tg/detail/-/0684868768/

The former URL style has been around for some time, but the latter is relatively new. If you’re an Amazon Associate, the proper way of linking to an individual item (per their linking guide) is to append your Associate code (mine is “0sil8”) to the first URL style, like so:

http://amazon.com/exec/obidos/ASIN/0684868768/0sil8

But if you run across an item at Amazon with the second type of URL, this won’t work:

http://amazon.com/exec/obidos/tg/detail/-/0684868768/0sil8

If you’ve linked to items using that style of URL (something I’ve seen on several sites), check your reports at Amazon…you’ll find that you’re not getting any Associates clickthroughs or credit for those purchases.

Obviously, since I would like to get credit for any clickthroughs I might receive (rare though they may be), I’ve been very careful to make sure to use the ‘ASIN’ format of link, and not the ‘tg/detail/-‘ format.

While experimenting with ecto’s new Amazon integration, I put together a quick link to Neal Stephenson’s ‘Quicksilver’, and checked the URL — and, unfortunately, it came up with the ‘tg/detail/-‘ format (though I’ve fixed it in that link). After poking around in ecto’s settings for the Amazon integration, it doesn’t appear that there is any end-user control over the links (that is, the URL format, not the format of the link itself) other than manually fixing them after they’re inserted. While this really isn’t a major dealbreaker — it’s essentially what I’ve been doing for a while anyway, and the ecto integration does make it much easier to find items — it does leave me with a few questions…

  1. Most importantly, does this still matter? Or does Amazon now give credit correctly for both styles of links? On the assumption that it does still matter…
  2. When constructing the link, does ecto receive the entire URL string from Amazon, or just the ASIN?
  3. If ecto only receives (or needs) the ASIN, can the URL string be changed to the ‘ASIN’ style of link in the next update to ecto?
  4. If ecto receives the entire string…
    1. …is the returned string always ‘tg/details/-‘ format, or does it switch between that and ‘ASIN’ format?
    2. If it’s always ‘tg/details/-‘ format, can that be automatically adjusted within ecto to ‘ASIN’ format?
    3. If it switches, can ecto watch for that string and adjust it when necessary?
  5. And lastly, according to the tail end of Jason’s post, adding ‘ref=nosim/’ before the Associate ID forces Amazon to skip the “You may also be interested in…” page and send you straight to the actual product page. Can an option be added to ecto to add the ‘ref=nosim/’ string in the right place for people who might worry/care about such a thing?

If not…well, I’ll live. I can hope, though! :)

iTunesLeæther Strip Part II” by Leæther Strip from the album Penetrate the Satanic Citizen (1992, 6:00).