[BRLTTY] Linux console hacking (was: Re: Footsteps towards better accessibility in Linux)

Wed Apr 15 15:22:34 UTC 2026

On Wed, 15 Apr 2026, Aura Kelloniemi wrote:

> Hi,
> 
> On 2026-04-15 at 01:47 +0200, Samuel Thibault <samuel.thibault at ens-lyon.org> wrote:
>  > Nicolas Pitre, le jeu. 09 avril 2026 23:12:06 -0400, a ecrit:
>  > > Done — the protocol now handles grapheme clusters with a three-layer 
>  > > approach (single codepoint, continuation cells, overflow area) and the 
>  > > cell width field covers the whole cluster. See the updated design 
>  > > document attached.
> 
>  > I'm a bit afraid of the complexity of the dynamic allocation of the
>  > overflow entry. I agree that adding a limitation to 8 combining
>  > codepoints brings static limitation, but conversely there have been
>  > vulnerabilities found due to unbound combining codepoints management,
>  > and for easier adoption, the protocol should be quite simple.
> 
> IMHO, if there is a static limitation, it should be futureproof – i.e. insely
> big – like 16 or 32. But of course bumping the protocol version sometimes
> should be acceptable.

There are no static limitations, unless you consider 4294967296 a 
practical limitation that is.

1. Common case: The cell's codepoint field holds a single UCS-4 value. 
   No combining marks.

2. Double-width characters: The adjacent cell (width=0) can hold one 
   additional codepoint for a combining mark. The reader collects 
   codepoints from the primary cell and its adjacent cell to form the 
   full cluster. This covers most emojis.

3. Overflow: When more codepoints are needed than fit in available cells 
   (e.g., single-width base with combiners, or more combiners than the 
   adjacent cell in the double-width case), the cell's codepoint field 
   is set to a special value: 0xFF000000 | offset. The offset is a byte 
   offset from the start of the shm segment to the overflow entry.  The 
   overflow entry contains the full codepoint sequence as:

     uint32_t count;         (number of codepoints, uint32 for alignment;
                              longest known natural-language cluster is
                              about 9 codepoints)
     uint32_t codepoints[];  (the codepoint sequence)

So the actual limit might be the 24-bit offset, meaning that the last 
overflow codepoint sequence must not start more than 16777216 bytes 
away. And of course, multiple cells may refer to the same sequence in 
the overflow area.

Note to self: Make the offset relative to the overflow area, and make 
the index 32-bit based. That will push the limit above 256MB.

Nicolas