primary |
libgrapheme is an extremely simple freestanding C99 library providing
utilities for properly handling strings according to the latest Unicode
standard 15.0.0. It offers fully Unicode compliant
* grapheme cluster (i.e. user-perceived character) segmentation
* word segmentation
* sentence segmentation
* detection of permissible line break opportunities
* case detection (lower-, upper- and title-case)
* case conversion (to lower-, upper- and title-case)
on UTF-8 strings and codepoint arrays, which both can also be
null-terminated.
The necessary lookup-tables are automatically generated from the Unicode
standard data and heavily compressed. Over 10,000 automatically generated
conformance tests and over 150 unit tests ensure conformance and
correctness.
The resulting library is freestanding and thus not even dependent on a
standard library to be present at runtime, making it a suitable choice for
bare metal applications. It is also way smaller and much faster than the
other established Unicode string libraries (ICU, GNU's libunistring,
libutf8proc).
|