CRH Transliteration Pattern Matching Fixes
Refactor to match exceptions as patterns, not words
- break exception list to C2L and L2C pattern sets
- change main loop to break only on Roman numerals and transliterate
everything else, rather than tokenizing on single-script words
(this fixes the kmĀ² problem, too)
- update word anchors from ^ and $ to \b
- only process Roman numerals for L2C translit
- add exception for single "Roman" character followed by a period
which looks like an initial
- consolidate multi-step transliteration into regsConverter()
- remove regex support from main exception list to support strtr()
- re-organize some prefix/suffix/whole word patterns to the right place
- add tests for recently fixed use cases
- add support for many-to-one mappings in both directions
- update character classes, exception lists, and regexes based on
speaker feedback and example texts
Misc other fixes:
- fix some character classes errors
- remove unneeded character classes
- add tests for Roman numerals and quotes
- add tests for affixes and regexes
Bug: T188321
Bug: T189512
Change-Id: I056d36ff2b8f63b3998a5d3a442d8d539c15488d