[BRLTTY] Contracted emojis?
Mario Lang
mlang at blind.guru
Fri Sep 28 08:52:52 EDT 2018
Mario Lang <mlang at blind.guru> writes:
> Samuel Thibault <samuel.thibault at ens-lyon.org> writes:
>
>> It seems CLDR does include an Emoji section that could perhaps be used?
>>
>> http://cldr.unicode.org/translation/short-names-and-keywords
>
> We seem to have found a parseable list of emojis and their translations
> (thanks to Simon).
>
> https://unicode.org/repos/cldr/tags/latest/common/annotations/de.xml
>
> would be the list for german.
>
> I guess we just need to parse this, and generate our own translation
> list.
For demonstration purposes, the script below generates a
de-kurzschrift-emoji.cti.
---<snip>---
#!/usr/bin/python3
from subprocess import Popen, PIPE
from lxml.etree import parse
def contract(text):
proc = Popen(["brltty-ctb", "-c", "de-kurzschrift"],
stdin=PIPE, stdout=PIPE)
out, err = proc.communicate(text.encode('UTF-8'))
out = out.decode('UTF-8').strip()
def dotify(ch):
o = ""
c = ord(ch)-0x2800
if c == 0:
o = "0"
else:
for i in range(8):
if c & (1 << i): o = o + chr(i+1+48)
return o
return '-'.join(map(dotify, out))
de = parse("de.xml")
for item in de.xpath("//annotation[@type='tts']"):
print("word %s %s" % (item.get('cp'), contract(item.text)))
---<snip>---
To run, get the xml via wget and call this script (with python3).
The result is a 1336 line table.
I put it here:
https://blind.guru/de-kurzschrift-emoji.cti
So, we can either generate these at build time, or
have the core do the work inline. Autogeneration has the disadvantages
that later changes in contraction tables will not propagate to the emoji
expansion. However, I guess it is cheaper (performance-wise) to do the
work only once. In any case, this is just a demo and not ment for
inclusion.
Note, that this method of generating the list does not rely on any emoji
property. It just describes every character we have a description for.
--
CYa,
⡍⠁⠗⠊⠕
More information about the BRLTTY
mailing list