https://railgunlabs.com/unicorn/

[ ]

Unicorn

  * Manual
  * Download
  * License

Railgun Labs
Unicorn

Embeddable Unicode(r) algorithms.

Source Code Examples

Essential Algorithms

Unicorn implements the most essential Unicode algorithms:

  * Normalization (NFC, NFD)
  * Case conversion
  * Case folding
  * Collation (via the DUCET)
  * Grapheme, word, and sentence segmentation
  * BOCU-1 short string compression
  * UTF-8, 16, and 32 decoders, encoders, and validators

Fully Customizable

Unicorn is fully customizable. You can choose which Unicode
algorithms and character properties to include. You can even choose
which Unicode character blocks to include!

Ultra Portable

Unicorn does not require an FPU or 64-bit integers. It is written in
C99 and only requires a few features from libc which are listed in
following table.

 Header            Types              Macros           Functions
          int8_t, int16_t,
stdint.h  int32_t
          uint8_t, uint16_t,
          uint32_t
string.h                                          memcpy, memset,
                                                  memcmp
stddef.h  size_t                  NULL
stdbool.h                         bool, true,
                                  false
assert.h                          assert

MISRA C:2012 Compliant

Unicorn honors all Required, Mandatory, and most Advisory rules
defined by MIRSA C:2012. Deviations are documented here. You are
encouraged to audit Unicorn and verify its level of conformance is
acceptable.

Thread Safe

Unicorn is thread-safe except for the following caveats:

 1. Functions that allocate memory are only as thread-safe as the
    allocator itself.
 2. The configuration API is not thread-safe, however, in typical
    usage it's only invoked at application startup and only if the
    default configuration is unsatisfactory.

Atomic Operations

All operations in Unicorn are atomic. That means either an operation
occurs or nothing occurs at all. This guarantees errors, such as
out-of-memory errors, never corrupt internal state. This also means
if an error occurs, like an out of memory error, then you can recover
(free up memory) and try the operation again.

Extensively Tested

  * Official Unicode conformance tests
  * Manually written tests
  * Out-of-memory tests
  * Fuzz tests
  * Static analysis
  * Valgrind analysis
  * Code sanitizers (UBSAN, ASAN, and MSAN)
  * Extensive use of assert() and run-time checks

Encoding Compatible

All functions that operate on text can accept UTF-8, UTF-16, UTF-32,
or Unicode scalar values. UTF-16 and UTF-32 are supported in big
endian, little endian, and native byte orders.

The implementation performs runtime safety checks by default to guard
against malformed or maliciously encoded text. If you know text isn't
malformed you can optionally skip these checks to improve processing
time.

  * About
  * Contribute
  * Legal
  * Contact

Copyright (c) 2024, Railgun Labs. All Rights Reserved.