ChatGPT

The following is a (lightly edited) response I gave to a recent accessibility mailing list question from Jane Jarrow, coming out of a question around concerns around the use of various AI or AI-like tools for accessibility in higher education:

Folks responded by noting that they didn’t consider things like spell check, screen readers, voice-to-text, text-to-voice, or grammar checkers to be AI – at least, not the AI that is raising eyebrows on campus. That may be true… but do we have a clean way of sorting that out? Here is my “identity crisis”:

What is the difference between “assistive technology” and “artificial intelligence” (AI)?

This is me speaking personally, not officially, and also as a long-time geek, but not an AI specialist.

I think a big issue here is the genericization of the term “AI” and how it’s now being applied to all sorts of technologies that may share some similarities, but also have some distinct differences.

Broadly, I see two very different technologies at play: “traditional”/“iterative” AI (in the past, and more accurately, termed “machine learning” or “ML”), and “generative” AI (what we’re seeing now with ChatGPT, Claude, etc.).

Spell check, grammar check, text-to-speech, and even speech-to-text (including automated captioning systems) are all great examples of the traditional iterative ML systems: they use sophisticated pattern matching to identify common patterns and translate them into another form. For simpler things like spelling and grammar, I’d question whether that’s really even ML (though modern systems may well be). Text-to-speech is kind of an “in between” state, where the computer is simply converting text strings into audio, though these days, the use of generative AI to produce more natural-sounding voices (even to the point of mimicking real people) is blurring the line a little bit.

Speech-to-text (and automated captioning) is more advanced and is certainly benefitting from the use of large language models (LLM) on the backend, but it still falls more on the side of iterative ML, in much the same way that scientific systems are using these technologies to scan through things like medical or deep-space imagery to identify cancers and exoplanets far faster than human review can manage. They’re using the models to analyze data, identify patterns that match existing patterns in their data set, and then producing output. For scientific fields, that output is then reviewed by researchers to verify it; for speech-to-text systems, the output is the text or captions (which are presented without human review…hence the errors that creep in; manual review and correction of auto-generated captions before posting a video to a sharing site is the equivalent step to scientists reviewing the output of their systems before making decisions based on that output).

Where we’re struggling (both within education and far more broadly) is with the newer, generative “AI”. These systems are essentially souped-up, very fancy statistical modeling — there’s no actual “intelligence” behind it at all, just (though I’ll admit the word “just” is doing a lot of heavy lifting here) a very complex set of algorithms deciding that given this input, when producing output, these words are more likely to go together. Because there’s no real intelligence behind it, there’s no way for these systems to know, judge, or understand when the statistically generated output is nonsensical (or, worse, makes sense but is simply wrong). Unfortunately, they’re just so good at producing output that sounds right, especially when output as very professional/academic-sounding writing (easy to do, as so many of the LLMs have been unethically and (possibly arguably, but I agree with this) illegally trained on professional and academic writing), that they immediately satisfy our need for “truthiness”. If it sounds true, and I got it from a computer, well then, it must be true, right?

(The best and most amusing summary I’ve seen of modern “AI” systems is from Christine Lemmer-Webber by way of Andrew Feeney, who described it as “Mansplaining as a Service: A service that instantly generates vaguely plausible sounding yet totally fabricated and baseless lectures in an instant with unflagging confidence in its own correctness on any topic, without concern, regard or even awareness of the level of expertise of its audience.”)

Getting students (and, really, everyone, including faculty, staff, the public at large, etc.) to understand the distinction between the types of “AI”, when they work well, and when they prove problematic, is proving to be an incredibly difficult thing, of course.

For myself, I’m fine with using traditional/iterative ML systems. I’m generally pretty good with my spelling and grammar, but don’t mind the hints (though I do sometimes ignore them when I find it appropriate to do so), and I find auto-captioning to be incredibly useful, both in situations like Zoom sessions and to quickly create a first pass at captioning a video (though I always do manual corrections before finalizing the captions on a video to be shared). But I draw the line at generative AI systems and steadfastly refuse to use ChatGPT, AI image generators, or other such tools. I have decades of experience in creating artisanally hand-crafted typos and errors and have no interest in statistically generating my mistakes!

I’m afraid I don’t have good suggestions on how to solve the issues. But there’s one (rather long-winded) response to the question you posed about the difference between assistive technology and “artificial intelligence”.