I’m Training AI Chat Bots (Non-Consensually)

The Washington Post has published an article looking at the websites used to train “Google’s C4 data set, a massive snapshot of the contents of 15 million websites that have been used to instruct some high-profile English-language AIs, called large language models, including Google’s T5 and Facebook’s LLaMA.” If you scroll down far enough, there’s a section titled “Is your website training AI?” that lets you drop in a URL to see if it was scraped and included in the data set.

I checked three strings — “michaelhans” (to cover both this site and its prior address at michaelhanscom.com), “djwudi” (for my DJ’ing blog), and norwescon (which I’ve written or tweaked and edited much of the content for). All three of them are represented.

  • norwescon.org: 45k tokens, 0.00003% of all tokens, rank 528,147
  • michaelhanscom.com: 37k tokens, 0.00002% of all tokens, rank 635,948
  • djwudi.com: 3.7k tokens, 0.000002% of all tokens, rank 4,002,025

For the record, I’m not terribly excited about this. I’m also under no illusion that anything can be done; this stuff is all out on the open web, and as it’s free for actual people to browse through and read, it’s also free for bots to scrape and ingest into whatever databases they keep. Sometimes this is a good thing, for projects like the Internet Archive. Sometimes it’s unwittingly helping to train our new AI overlords.

Updating My Computing History

Back in 2003, Adam Kalsey started a project he called Newly Digital — a collection of stories about when people first discovered computers, got online, and so on.

At that time, I updated and reposted my “Back in the Day” post from roughly a year before, to contribute to the project.

And now, after looking back at my “Newly Digital” post while once again updating the tail end of it with my current computers, I noticed more and more links succumbing to link rot, so I figured I’d give it another refresh. So here we go!


I was born in 1973 — certainly before home computers were a thing, but at a point where computers were starting to make their way into the school system.

The first computers I can remember playing with were the Apple IIs that my elementary school had. Before long our friends the Burns had one of their own that I got to play with, while my babysitter picked up a Commodore 64 that gave me my first look at the BASIC programming language.

Eventually, my family got our first computer — an Osborne 1. This was a beast of a machine. 64k of RAM, a Z-80 CPU, two 5.25″ floppy drives, and a 5″ monochrome 80×40 greenscreen, all packed into a case the size of a suitcase that weighed about 30 pounds. The keyboard could be snapped up against the face of the computer, allowing it to be carried around — one of the first, if not the very first, “portable” computers! It ran CP/M (a precursor to MS-DOS) — aside from fiddling with the machines at school or at my friends’ houses, my first real command-line experience! There was a 300 baud modem available for the Osborne 1 computer, however my family didn’t get one until years later (when those of our friends who had also had Osborne 1 computers were giving them to us as they upgraded, allowing me to cannibalize parts from two machines to keep one running).

I first got online sometime in 1990, with the first computer I bought myself — an Apple Macintosh Classic with no hard drive (the computer booted System 6.0.7 off one 3.5″ floppy, and I kept MS Word 4 on a second floppy, along with all the papers I typed that year), 1 Mb of RAM — and a 2400 baud modem. Suddenly an entire new world opened up to me. After a brief but nearly disasterous flirtation with America Online at a time when the only way to dial in to AOL from Anchorage, Alaska was to call long distance, I discovered the more affordable world of local BBS’s (Bulletin Board Systems).

I spent many hours over the next few years exploring the BBS’s around Anchorage, from Ak Mac (where most of my time was devoted) to Forest Through the Trees, Roaring Lion, and many others that I can’t remember the names of at the moment. I found some of my first online friends, many of whom I conversed with for months without ever meeting — and many that I never did meet. Most of the Mac-based boards used the Hermes BBS software, which shared its look and feel with whatever the most popular PC-based software was, so virtually all the boards acted the same, allowing me to quickly move from one to the other. After springing the $300 for an external 100Mb hard drive (how would I ever fill up all that space?!?) I downloaded my first ‘warez’ (bootlegged software), at least one of which had a trojan horse that wiped out about half my hard drive. I discovered the joys — and occasional horrors — of free pornography. I found amazing amounts of shareware and freeware, some useful, some useless. It was all amazing, fun, and so much more than I’d found before. In short — I was hooked.

After I graduated from high school in 1991, I had a short-lived stint attending the University of Alaska, Anchorage. One of the perks of being a student was an e-mail account on the university’s VAX computer system. In order to access your e-mail, you could either use one of the computers in the university’s computer lab, or you could dial into their system via modem. Logging in either way gave you access to your shell account, at which point you could use the pine e-mail program. However, I soon learned that the university’s computer was linked to other computers via the still-growing Internet!

And here I’d thought BBS’s were a new world — this Internet thing was even better! Suddenly I was diving into ftp prompts and pulling files to my computer from computers across the globe. Usenet readers introduced me to BBS-style discussions with people chiming in from all over the world, instead of just all over town. I could jump into Internet Relay Chat (IRC) and have real-time conversations with people in other countries. The gopher protocol was essentially a precursor to the World Wide Web: text pages linked to each other by subject. I was fascinated — more information than I had dreamed of was at my fingertips.

By the time I left UAA and lost my student account, the ‘net had started to show up on the radar of public consciousness, but still at a very low level — it was still fairly limited to the ‘geek set.’ That was enough, however, to have convinced some of the local BBS systems to set up primitive (but state of the art at the time) internet links: once a day, generally at some early hour, they would dial into a special node on the ‘net and download a certain set of information, which the BBS users could then access locally. It was slow, time-delayed, and somewhat kludgy, but it worked, and it allowed us to have working e-mail addresses. It wasn’t what I’d had while at the university, but it was certainly better than nothing.

Within a few years, though, the ‘net suddenly exploded across public consciousness with the advent and popularization of the World Wide Web. Suddenly, you didn’t have to do everything on the ‘net through a command line — first using NCSA Mosaic, and later that upstart Netscape Navigator, you could point and click your way through all that information — and some of the pages even had graphics on them! It was simplistic by today’s standards, but at the time it was revolutionary, and I joined in that revolution sometime in 1995 with my first homepage.

Since then, there’s been no turning back. Over the years, my computers have been upgraded from that little Mac Classic to:

And, of course, this blog has been running for more than 20 years. It started as simple hand-coded update posts on my early personal pages in 1996. In late 2000 I found a script called NewsPro that was essentially a very early content management system (CMS), and then just over a year later I moved to MovableType, which was only about three months old at the time. MovableType started strong but eventually pivoted to focus on the enterprise space rather than home users, and in 2006 I moved over to WordPress.

WordPress has lasted by far the longest, though I’ve been getting less enamored with it for a while. But realistically, after this long, I’m unlikely to put the effort into finding something else — and as far as I know, the blogging CMS I really want hasn’t yet been written.

Some years my blog gets more posts than others — the rise of social media sites like Facebook and Twitter certainly pulled me away for a while — but I’ve never let it fade away completely, and I certainly don’t intend to let it die. I may not always be rambling away here a lot (though the demise of Twitter has certainly spurred me to be more active here once more), but I’m unlikely to ever entirely disappear.

Self-Hosted Image Gallery Recommendations?

A lazyweb question: Is there decently modern web image gallery software anywhere?

I’d like to move away from Flickr in favor of self-hosting my photo galleries. But so far all the packages I’ve found are…well, they tend to look and feel (both on the backend admin side and the frontend public gallery side) like they haven’t been updated in the past decade or more.

Admittedly, sometimes this is because that’s exactly the case…which also doesn’t make me want to download them. But sometimes they’re still apparently under active development, but still look and feel like early-2000s projects.

Software I’ve installed, poked at, thought “mmm…well…maybe…”, and looked on to see what else I could find:

  • Piwigo is under active development (last release three weeks ago) but has rather sparse documentation if you’re not a developer building plugins, and needs config file editing just to display more than the most basic image metadata.
  • Zenphoto is also under active development (last release a month ago), but appears to be gearing for a more major update…which could be good, but there’s no indication of when that will happen, and much of the current installation (like every one of the default themes) has a “this has been deprecated” warning. So it doesn’t seem worth investing time into getting it up and running and populated if the current version is soon to be end-of-lifed, with who knows what sort of compatibility with the next version.

Things I’ve looked at but not downloaded:

  • 4Images may or may not be under active development; the last update was in November of ’21.
  • Coppermine‘s last update was in 2018…but the two before that were in 2013 and 2010, so who knows if it’s still active or not.
  • Gallery at least admits it’s dead; it points to Gallery Revival, which hasn’t been updated since November of ’21.
  • Pixelpost: “tldr: This project is abandoned, and has known security issues, use at your own risk.”
  • TinyWebGallery: I can’t quickly figure out when it was last updated, but the header graphic advertises “Flash uploaders”, and there are too many ads for online casinos on that page for me to bother digging around any further.

I’d like to stop giving Flickr money (I have nothing particularly against them, but at this point, I have nothing particularly for them either; their website doesn’t “give me joy”, and when embedding photos, the alt text is just the image title, not even the image comments, let alone any option to add true alt text), and I simply don’t trust Google enough to drop all my images into their systems. I’ve played with SmugMug as well, but again, I’d like to be able to self-host, not pay.

I’m a little surprised that this is such a sparse field, but I suppose that Flickr and Google Photos are “good enough” for most people these days, so there’s not a big market for people like me: a tech-savvy hobbyist photographer who’s not particularly interested in relentlessly pursuing monetization.

Recommendations would be appreciated if I’ve missed something worth investigating. As it is right now, though, I’m guessing my best bet will be to see what I can manage with either Piwigo or ZenPhoto.

Bring Back Blogging

Monique Judge at The Verge, in “Bring Back Personal Blogging“:

In the beginning, there were blogs, and they were the original social web. We built community. We found our people. We wrote personally. We wrote frequently. We self-policed, and we linked to each other so that newbies could discover new and good blogs.

I want to go back there.

Hard agree. This blog got its start in the mid-’90s — the earliest “post” I can still verify was on December 29, 1995, and though it now lives in this blog, was originally a hand-coded entry on a static “Announcements” page — back before “blogging” was even a term. In fact, it wasn’t until February 8, 2001 that I first discovered the word “blog”.

So there’s a lot of what Monique writes about that I remember very clearly. And I miss a lot of it. Which seems kind of funny to say, because in a lot of ways, it really hasn’t ever completely died, but the shift to social media definitely impacted the blogging world.

I’m hopeful (if not optimstic) that just maybe the issues at Twitter, the rise of Mastodon, and the general upheaval in online spaces will actually lead to something of a resurgence of people writing for themselves and in their own spaces.

Buy that domain name. Carve your space out on the web. Tell your stories, build your community, and talk to your people. It doesn’t have to be big. It doesn’t have to be fancy. You don’t have to reinvent the wheel. It doesn’t need to duplicate any space that already exists on the web — in fact, it shouldn’t. This is your creation. It’s your expression. It should reflect you.

Bring back personal blogging in 2023. We, as a web community, will be all that much better for it.

Blogging CMS Wishlist

High on my reasons why I wish I had the knowledge (or the time and energy to gain the knowledge) to code my own software: As far as I can tell, nobody has yet written the CMS I want to use for blogging.

Basically, what I want is early-2000s MovableType, only with some modern updates. I’ve long missed many of the tweaks and customizations that I could manage with MovableType that I can’t do on WordPress.

Pie-in-the-sky featureset:

  • Self-hostable or installable on a hosted server (Dreamhost, etc.)
  • Micropub compatible so I can use MarsEdit or other such third-party editors
  • ActivityPub/IndieWeb compatible for federation (at least outbound, ideally bidirectional so that federated replies could be appended as “comments”)
  • Generates a static website instead of building every page when its called
  • Only regenerates necessary pages when updates are published, full-site rebuilds available on demand
  • Some sort of templating “building blocks” system for assembling different pages, posts, or sections thereof
  • Basic templates that are fully standards-compliant and accessible (HTML5, ARIA when/if necessary (since static pages shouldn’t have much dynamic content), etc.)
  • Templates should also be microblogging compatible
    • Example: Titles are optional, and shouldn’t be the only item used for permalinks to any given post, something that bugs me about my current blog template but I haven’t figured out how to fix yet
  • Markdown for writing and storing posts
  • The ability to generate multiple versions of posts/pages on rebuild
    • Example: Output both .html and .md versions of a blog post, so a “view source” link could be included in the post template; readers could then easily click through to view the Markdown version
  • Import posts exported from existing common blogging or microblogging systems (WordPress and Twitter, in my particular case)

Things I don’t want or care about:

  • Fancy drag-and drop “block” editors like WordPress’s Gutenberg
  • Comments (beyond pingbacks/trackbacks/federated responses)
  • Having to do everything on one machine (edit locally and upload)

I’m sure there are plenty of other things that I could put in the wishlist or the “no thanks” list, but those are the first ones to come to mind. Every time I’ve done a survey of static site generators, they consistently fail one or more of the above.

Honestly, I think I could live without much of the above, if I could find a static site generator that would allow me to blog and manage posts and pages from anywhere (my desktop, my laptop, my iPhone, my iPad, etc.) through the Micropub API; logging into a web interface of some sort should be possible if necessary but not required for general day-to-day post publishing.

Oh, and it needs to be installed and managed by someone who has a higher-than-average knowledge of computing and tech geekery, but doesn’t do this stuff for a living. Someone who gets annoyed when they call tech support and have to start with the “is it plugged in?” level of questioning, but who also gets annoyed when software assumes that you’ve been immersed in this kind of stuff for decades. There doesn’t seem to be much out there other than WordPress that does a good job of bridging between “it just works” and “I eat, drink, and breathe code in all my waking and sleeping hours” levels of capability. I don’t mind, and even enjoy, poking at the guts of things when I have the time and energy, but I don’t want to be required to do a week of research to figure out what the terms in the “how to install” documentation mean.

So — I don’t suppose that anyone knows of my magical unicorn blogging software actually existing anywhere?

Cross-posting from WordPress to Mastodon

I’ve finally got WordPress to Mastodon cross-posting working the way I want: automatically, whether I’m posting through the WordPress web interface or through a desktop or mobile client like MarsEdit or the WordPress mobile app, and with the format that I want:

Title: Excerpt (#tags)

Full post on Eclecticism: URL

I’d been using the Autopost to Mastodon plugin, which works great, and I can recommend it — as long as you only or primarily post using the WordPress web interface.

However, the plug-in is only triggered when publishing a post through the WordPress web interface. Any time I posted through a client, nothing went to Mastodon. So I either had to go into the web interface and manually trigger an update to the post with the “Send to Mastodon” option checked, or just skip out on using anything but the web interface at all, which I’m not a fan of (especially on mobile).

I’d asked the plug-in author, and they’ve said that this is just the way it is.

So I put out a call for help on Mastodon, and got some kind tips from Elephantidae, who pointed out the Share on Mastodon plugin. This looked promising, as its documentation specifically mentions being able to configure it to work with externally created posts. However, looking through the docs made it clear that most of this plugin’s configuration, including changing the format of the text it sends to Mastodon, is done through adding and tweaking PHP functions…and as with most of my coding knowledge, my PHP knowledge is roughly at the “I can usually get a vauge idea of what it’s doing when I read the code, but actually creating something is a whole different ballgame” territory. Plus, dumping PHP code into my theme’s files risks losing those changes the next time the theme files are updated.

Retaining the code through theme updates can be managed through creating a site-specific plugin, however — a handy trick which, somewhat amusingly, I’d never had exactly the right combination of “I want to do this” and “how do I do it” in the past to discover until now.

So, after a bit of fumbling around with the Share on Mastodon plugin documentation and figuring out the right PHP and WordPress function calls, here’s what I’ve ended up adding to my site-specific plugin:

/* Tweaks for the Share on Mastodon plugin */

/* Customize sharing text */

add_filter( 'share_on_mastodon_status', function( $status, $post ) {
  $tags = get_the_tags( $post->ID );

  $status = get_the_title( $post ) . ": " . get_the_excerpt( $post );

  if ( $tags ) {
    $status .= " (";

    foreach ( $tags as $tag ) {
        $status .= '#' . preg_replace( '/\s/', '', $tag->name ) . ' ';
        }

    $status = trim( $status );  
    $status .= ")";
    }

  $status .= "\n\nFull post on Eclecticism: " . get_permalink( $post );
  return html_entity_decode( $status );
  return $status;
}, 10, 2 );

/* Share if sent through XML-RPC */

add_filter ('share_on_mastodon_enabled', '__return_true');

/* End Share on Mastodon tweaks */

And after a few tests to fine-tune everything, it all seems to work just the way I wanted. Success!

(Also, re-reading through this, I’ve realized that since I like to give the background of why and how I stumble my way through things, I end up writing posts that are basically a slightly geekier version of the “stop telling me about your childhood vacations to Europe and just post the damn recipe!” posts that are commonly mocked. And I don’t even have ad blocks all over my site! At least I’m not making you click through several slideshow pages of inane chatter before I get to the good stuff. My inane chatter is easy to scroll through.)

Don’t ever stop talking to each other

This is a long rant by Cat Valente – and it’s really, really good. Though I’m quoting a particularly good bit from the end, it’s worth reading the whole thing.

Don’t ever stop talking to each other. It’s what the internet is really and truly for. Talk to each other and listen to each other. But don’t ever stop connecting. Be a prodigy of the new world. Stand up for the truth no matter how often they take our voices away and try to replace the idea of reality with fucking insane Lovecraftian shit. Don’t give up, don’t let them have this world. Love things. Love people. Love the small and the weird and the new.

Because that’s what fascists can’t do. They don’t love white people or straight people or silent women or binary enforced gender or forced birth or even really money. They want those things to be the only acceptable or even visible choices, but they don’t love them. They don’t even want to think about them. They want them to be automatically considered superior and universally mandated so they don’t have to think about them—or else what do you think the fury over other people wearing masks was ever about? The need to be right without thinking about it, and never have to see anything that wakens a spark of doubt in their own choices.

Obey, do not imagine, do not differ.

That’s nothing to do with love. Love is gentle, love is kind, remember? They need the attention being terrible brings them, but they don’t love it any more than a car loves gas. Sometimes I don’t even think they love themselves. Sometimes I’m pretty sure of it. They certainly never seem happy, even when they win. Musk doesn’t seem happy at all.

Geeks, though. Us weird geeks making communities in the ether? We love. We love so stupidly hard. We try to be happy. We get enthusiastic and devote ourselves to saving whales and trees and cancelled science fiction shows and each other. The energy we make in these spaces, the energy we make when we support and uplift and encourage and excite each other is something people like Musk can never understand or experience, which is why they keep smashing the windows in to try and get it, only to find the light they hungered for is already gone. Moved on, always a little beyond their reach.

Mastodon RSS Tips

  1. Get an RSS feed for any user by appending .rss to the end of their profile URL. For example, my profile is tenforward.social/@djwudi, so the RSS feed of my posts is tenforward.social/@djwudi.rss.

  2. This also works for hashtag searches; handy for keeping an eye on hashtags (without worrying you’ll miss them in your feed). In my case, as our social media manager, I watch for mentions of Norwescon. Since that hashtag search URL is <your server>/tags/norwescon, the RSS feed is <your server>/tags/norwescon.rss. (I’ve also subscribed to feeds for norwescon45, nwc45, and philipkdickaward.)

The feeds-for-users tip I’ve seen going around, but I’d not seen this applied to hashtag searches, so I gave it a shot, and was happy to see it worked. Figured I’d put both in one post for those who might not have known either.

In Search of a MarsEdit Equivalent for iOS

A question for macOS WordPress bloggers who use Red Sweater Software’s excellent MarsEdit: What’s your go-to mobile iOS blogging tool?

MarsEdit is a great example of a “do one thing and do it really well” piece of software, and I’ve yet to find anything equivalent for mobile blogging. I just want exactly what MarsEdit gives me: A list of my most recent posts and pages, a solid plain-text Markdown editor, and access to all the standard WordPress fields and features.

Every other editor I’ve tried either doesn’t do one or more of those things or is otherwise not quite right in some way. Ulysses was the closest and I tried it for a while, but while it’s a great editor, it doesn’t pull a list of posts and pages from the blog, just works with whatever’s local or in its own cloud sync or Dropbox or whatever, and last time I used it, had a bug where alt text wasn’t getting applied to images correctly.

(The WordPress native app drives me up the wall. I don’t want block editing. I want text and Markdown.)

Really, what I want is an iOS version of MarsEdit. But failing that: any recommendations?

AI Art, Ethics, and Where I Stand

While nobody specifically asked, since I have some friends who are all about the AI art and some who believe it’s something that should be avoided because of all the ethical issues, and since I’m obviously having fun playing with it with my “AImoji” project, I figured I’d at least make a nod to the elephant in the room.

An AI generated image of an African elephant standing in what appears to be a Victorian sitting room.

There are absolutely some quite serious ethical questions around AI generated artwork. To my mind the three most serious are (not in any particular order):

  1. Much of the material used to train the AI engines was scraped off the internet, often without any consideration of copyright, certainly without any attempt to get permission from the original creators/artists/photographers/subjects/etc., and some people have even found medical images that were only approved for private use by their doctor, but somehow ended up in the training sets. That situations like this are likely (hopefully) in the minority doesn’t absolve the companies who acquired and used the images to create their AI engines from being responsible for using these images.

  2. As the AI engines continue to improve, it is getting more and more difficult to distinguish an AI generated image from one created by an artist. There are also a number of people and organizations who have flat-out stated that they are looking at AI generated imagery as a way to save money, because it means they now don’t have to pay actual artists to create work. Obviously, this is not a particularly good approach to take.

  3. Because some of the engines are able to create images in the style of a particular artist, and the output quality continues to improve, there have already been instances where a living artist is being credited for creating work that was generated by an AI bot. And, of course, if you can create an image that looks like your favorite artist’s work for low or no cost…well, for a lot of people, they’ll happily settle for an AI generated “close enough” rather than an actual commissioned piece. Obviously, this is also not a particularly good approach to take.

I’m enjoying playing with the AI art generation tools. I’m also watching the discussions around the ethical questions around how they can and should be used.

The issues above are all very real and very serious. It’s also true that AI art can be just another tool in an artist’s toolbox. I’ve seen artists who use AI art generators to play with ideas until they find inspiration, or who use parts of the generated output in their own work. I’ve seen reports of people who want to commission art use the generator to get a rough idea of what they’re looking for that they can give to an artists as a rough example or proof of concept. So there are ways to use AI art generators in, well, more-ethical ways (it’s hard to argue they’d be entirely ethical when the generators have unethical underpinnings).

So, where I stand in my use at this point:

  1. I don’t use living artist’s names to influence the style one way or another, and have only occasionally used dead artist’s names as keywords (I’ll admit, H.R. Giger has been a favorite to play with).

  2. I don’t feed images in, try to generate images of actual people, or use images of actual people (including myself) as source material.

    One caveat: if a tool does all of its processing locally on my device, I may use my own images, including some of myself. But nothing that feeds images into the systems.

  3. And, of course, anything I do is just for fun, and to make me, and maybe a few other people, laugh (or occasionally recoil in horror).

For a few months this past year, I used an AI-generated image of a dragon flying over a city skyline for the Norwescon website and social media banner image. This was always intended as a temporary measure to fill the gap between last year’s convention and getting art from this year’s Artist Guest of Honor, and as soon as we had confirmed art from our GOH, the AI-generated art came down. It was also chosen much earlier in the “isn’t AI art neat” period, before I’d read as much about the issues involved. As such, I won’t be using AI art for Norwescon again, and will go back to sourcing copyright-free images from NASA or other such avenues when we are in the interregnum period.

So: I understand those who see AI art as something that should be avoided. I also understand those who see it as another tool. And, honestly, I also understand those who just see a shiny new toy that they want to play with. I’m somewhere in the midst of all those points of view, and while I don’t personally see the need to avoid AI art bots entirely, I am consciously considering how I use them and what I use them for.