Rule Sets --------- Sets of pattern rules exist for: * each alphabetic letter * symbols (including numbers) Letter rule sets have at least two rules: 1. Solitary letter 2. Context-less Rule Structure -------------- Rules decribe the following expression A [B] C = D where: A - left context B - current grapheme C - right context D - resulting phonemes Each rule consists of a sequence of rule bytes: 1. Left context (optional) 2. Context separator (optional) 3. Right context (optional) 4. Phonemes At least one of the first three items must be present. Rule Byte Structure ------------------- b7 = 1, context b7 = 0, phoneme b6 = 1, keep context b6 = 0, consume context b5 = 1, special match rule Context separator, $BF, %10111111 Rule Processing --------------- The current grapheme is used to locate a specific rule set. Each alphabetic grapheme has a rule set, numbers and symbols share a rule set. The first rule where all context matches is taken, rule sets end with a context-less rule, alphabetic rule sets start with a rule that matches just that letter. Alphabetic rule sets implicitly consume the current grapheme. Left context matches always keep context, right context matches may consume the matching grapheme from the input word. If the lead right context match does not consume context then it is preceded by a context separator. Special Match Rules ------------------- Only with b7 = 1, b6 = 1 +-------+---+-----------------------------+ | B4-0 | | MATCH | +-------+---+-----------------------------+ | 00000 | # | one or more vowels? | +-------+---+-----------------------------+ | 00001 | + | front vowel? | +-------+---+-----------------------------+ | 0xxxx | = | numeric? | +-------+---+-----------------------------+ | 100xx | & | sibilant? | +-------+---+-----------------------------+ | 101xx | % | suffix? | +-------+---+-----------------------------+ | 1101x | : | zero or more consonants? | +-------+---+-----------------------------+ | 11001 | . | voiced consonant? | +-------+---+-----------------------------+ | 11000 | @ | consonant following long U? | +-------+---+-----------------------------+ | 11101 | * | one or more consonants? | +-------+---+-----------------------------+ | 11100 | ^ | consonant? | +-------+---+-----------------------------+ | 11111 | ! | suffix or non-letter? | +-------+---+-----------------------------+ | 11110 | ~ | final S or non-letter? | +-------+---+-----------------------------+ Grapheme Values --------------- +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | |x0|x1|x2|x3|x4|x5|x6|x7|x8|x9|xA|xB|xC|xD|xE|xF| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |0x| Y| E| I| O| U| A| F| K| P| Q| H| T| S| C| X| G| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |1x| J| Z| R| D| L| N| B| V| M| W| |[{|\||]}|^~|_ | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |2x| | !| "| #| $| %| &| '| (| )| *| +| ,| -| .| /| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ |3x| 0| 1| 2| 3| 4| 5| 6| 7| 8| 9| :| ;| <| =| >| ?| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ The value $1A is used to match either: * a non-letter grapheme * the beginning of the word * the end of the word