> besides finite state dictionaries, ..., String mappings based on perfect hash automata
I'm implementing a similar library in Rust. One of the key things I've found missing in Daciuk's great work on the topic is the handling of the registry. In your code[1], it looks like you're using a straight a hash table. I thought you'd like to know that I've had great success sacrificing complete DFA minimization in exchange for a small constant memory overhead by using an LRU cache of only some classes of states.
My code still needs a good deal of polish, and I intend to write a blog post, but I bet you can make sense of the general strategy from the code and reuse it.[2]
N.B. Lucene's implementation permits sacrificing complete minimization, but they don't use an LRU cache and I think still permit the registry to grow without bound.
I'm implementing a similar library in Rust. One of the key things I've found missing in Daciuk's great work on the topic is the handling of the registry. In your code[1], it looks like you're using a straight a hash table. I thought you'd like to know that I've had great success sacrificing complete DFA minimization in exchange for a small constant memory overhead by using an LRU cache of only some classes of states.
My code still needs a good deal of polish, and I intend to write a blog post, but I bet you can make sense of the general strategy from the code and reuse it.[2]
N.B. Lucene's implementation permits sacrificing complete minimization, but they don't use an LRU cache and I think still permit the registry to grow without bound.
[1] - https://github.com/danieldk/dictomaton/blob/master/src/main/...
[2] - https://github.com/BurntSushi/fst/blob/master/src/fst/regist... (The data type is a map from node hash to LRU cache. That is, there is a small LRU cache for each bin in the map.)