[BRLTTY] Linux console hacking (was: Re: Footsteps towards better accessibility in Linux)
Nicolas Pitre
nico at fluxnic.net
Wed Apr 15 15:22:34 UTC 2026
On Wed, 15 Apr 2026, Aura Kelloniemi wrote:
> Hi,
>
> On 2026-04-15 at 01:47 +0200, Samuel Thibault <samuel.thibault at ens-lyon.org> wrote:
> > Nicolas Pitre, le jeu. 09 avril 2026 23:12:06 -0400, a ecrit:
> > > Done — the protocol now handles grapheme clusters with a three-layer
> > > approach (single codepoint, continuation cells, overflow area) and the
> > > cell width field covers the whole cluster. See the updated design
> > > document attached.
>
> > I'm a bit afraid of the complexity of the dynamic allocation of the
> > overflow entry. I agree that adding a limitation to 8 combining
> > codepoints brings static limitation, but conversely there have been
> > vulnerabilities found due to unbound combining codepoints management,
> > and for easier adoption, the protocol should be quite simple.
>
> IMHO, if there is a static limitation, it should be futureproof – i.e. insely
> big – like 16 or 32. But of course bumping the protocol version sometimes
> should be acceptable.
There are no static limitations, unless you consider 4294967296 a
practical limitation that is.
1. Common case: The cell's codepoint field holds a single UCS-4 value.
No combining marks.
2. Double-width characters: The adjacent cell (width=0) can hold one
additional codepoint for a combining mark. The reader collects
codepoints from the primary cell and its adjacent cell to form the
full cluster. This covers most emojis.
3. Overflow: When more codepoints are needed than fit in available cells
(e.g., single-width base with combiners, or more combiners than the
adjacent cell in the double-width case), the cell's codepoint field
is set to a special value: 0xFF000000 | offset. The offset is a byte
offset from the start of the shm segment to the overflow entry. The
overflow entry contains the full codepoint sequence as:
uint32_t count; (number of codepoints, uint32 for alignment;
longest known natural-language cluster is
about 9 codepoints)
uint32_t codepoints[]; (the codepoint sequence)
So the actual limit might be the 24-bit offset, meaning that the last
overflow codepoint sequence must not start more than 16777216 bytes
away. And of course, multiple cells may refer to the same sequence in
the overflow area.
Note to self: Make the offset relative to the overflow area, and make
the index 32-bit based. That will push the limit above 256MB.
Nicolas
More information about the BRLTTY
mailing list