[BRLTTY] Is there a feature-compatible text-based browser
kperry at blinksoft.com
kperry at blinksoft.com
Mon Sep 29 12:00:43 UTC 2025
I think some people are misunderstanding what it means to use AI as a
foundation for screen readers. This doesn't mean letting the AI "make things
up" or hallucinate. Instead, it means using AI the way a sighted person uses
their eyes and brain: looking directly at the screen, identifying important
regions, and making decisions based on what's there.
Today's screen readers rely heavily on operating system APIs. These APIs
give us a structured view of buttons, menus, and text-but when the OS
doesn't expose something correctly, the user is stuck. That's why websites
or apps sometimes feel broken or half-usable. It also means that Mac, X win,
and windows are different to a blind person while to a sighted person they
look some what the same and work almost the same.
Now imagine redesigning a screen reader with AI built in from the start:
AI looks at the screen visually, the way a human does, using techniques like
object recognition and layout analysis.
OCR (Optical Character Recognition) reads text that isn't exposed to the
accessibility layer, such as captions in a video or text drawn inside
images.
The OS APIs are still used as a fallback, to confirm results or to provide
extra detail.
This combination would be more powerful than today's systems because it
doesn't depend on one source of truth. Instead, it blends multiple
perspectives, the way our own brains do.
We've already seen early versions of this idea. For example, in the early
2000s there was an OCR + object-detection screen reader. It was slow,
because computers at that time weren't fast enough, but it could do
impressive things. I once saw a demo where it read captions and described
scenes from a Jaws movie in real time-without modern AI.
Fast-forward to today: OCR has improved dramatically. Modern OCR can even
handle handwriting when combined with AI. AI-guided tools like "AI Guide"
and "ViewPoint" show how models can figure out what's important on the
screen, not just what's printed there.
The key idea is that AI wouldn't just be "reading the screen" in an
uncontrolled way like current GPT chat models. Instead, it would:
Use multiple AI models checking each other's work for accuracy, like how our
brains cross-check what we see and hear.
Identify active regions (buttons, links, form fields) and present them using
the same types of controls screen reader users already know.
Fall back on OS APIs when needed, providing a safety net.
Think about it like having a friend read your screen aloud. That friend
might miss things or make mistakes. An AI system designed the right way
could actually make fewer errors, because it can systematically analyze and
cross-check what it sees.
This isn't science fiction. We've already watched OCR go from clunky and
error-prone to nearly perfect over the past 30 years. With AI added at the
foundation of screen readers, we can take the same kind of leap
forward-building tools that don't just survive on exposed accessibility data
but thrive on understanding the whole screen.
I am not saying this is ready today. In fact there are a few more levels of
speed up that need to happen to some current frameworks. But we could be
working toward it by starting a new Screen reader right now. I already have
a start but I have not open sourced it yet. I will as soon as I have
something I think is ready for people to add their efforts into.
Ken
-----Original Message-----
From: BRLTTY <brltty-bounces at brltty.app> On Behalf Of Kyle
Sent: Monday, September 29, 2025 7:12 AM
To: Informal discussion between users and developers of BRLTTY.
<brltty at brltty.app>
Subject: Re: [BRLTTY] Is there a feature-compatible text-based browser
I do find all this AI stuff to be very interesting indeed, although as has
been said, I can't see it replacing the screen reader entirely; it may work
better as a supplemental tool, much like the way BeMyAI complements and
supplements the BeMyEyes volunteer. As it stands now, I can actually ask
several open source AI models to describe a picture taken with my phone's
camera and get a halfway decent response back in about the same time as it
takes to upload the same picture to BeMyEyes and wait for its AI to come
back with a response.
The main problem with the AI replacing the screen reader though is not
speed, but hallucination, which is still a huge problem with every model
I've ever used for any purpose. I mean BeMyAI described one of the 2000 gold
dollar coins as a giant penny, complete with Lincoln's face and all. But I
knew it was hallucinating, because I knew exactly what coin I was holding.
Many times when the AI hallucinates, we don't know that is what is
happening.
And the bigger problem is that I don't want my computer to try to think for
me. My workflow is pretty straightforward. I want to do something, I either
look in the menu or type in a command to find it, the computer does it. Or
I'm on a website, I want to see what is on the page, and if I'm lucky, the
headers are marked so that I at least get a nice idea.
And if I want a page summary before I get started, my screen reader can do
that at the press of a button. I don't see any benefit of AI here, with the
obvious exception of text recognition or helping to map out an otherwise
inaccessible window so that its characteristics can be sent to the screen
reader, which could then read what was sent to it by the AI as I interact
with it normally. I don't want a detailed description of the whole window,
only the control I'm focusing on at the time and any text that may need my
attention when the window pops up. AI descriptions now are still a bit too
wordy, sometimes leading to additional confusion rather than a
straightforward workflow.
Yes, I for one enjoy the graphical desktop and the consistency that it
provides, i.e. one key combination has its function everywhere instead of
all these little programs having different key sequences that all end up
doing the same thing; e.g. if I press a q here, it closes the application,
but if I press it in another application, nothing happens, and I was
supposed to use control+x, which incidentally is the cut command
*everywhere* on the MATE desktop and GNOME as well. I notice nothing slow
about writing this message in Thunderbird, which used to be pretty darn slow
just 10 or so years ago, but is smooth now, and this is on a laptop that is
about 8 years old and a desktop that is about 12. So I can see how in the
future, AI may become useful enough locally that it can be just as fast as
interacting with a graphical desktop is now. But then most AI models still
rely too heavily on the GPU, so this may come at some point down the road,
not I fear in the next year or so, but maybe in the next 10. Still, the
above problems of hallucination and its attempts to think for me are still a
bit off-putting to me, even if they can fix the lag problems it would
introduce now.
~Kyle
_______________________________________________
This message was sent via the BRLTTY mailing list.
To post a message, send an e-mail to: BRLTTY at brltty.app For general
information, go to: http://brltty.app/mailman/listinfo/brltty
More information about the BRLTTY
mailing list