trie

trie

class hanlp_trie.trie.Node(value=None)[source]

A node in a trie tree.

Parameters

value – The value associated with this node.

transit(key)[source]

Transit the state of a Deterministic Finite Automata (DFA) with key.

Parameters

key – A sequence of criterion (tokens or characters) used to transit to a new state.

Returns

A new state if the transition succeeded, otherwise None.

class hanlp_trie.trie.Trie(tokens: Optional[Union[Dict[str, Any], Iterable[str]]] = None)[source]

A referential implementation of the trie (De 1959) structure. It stores a dict by assigning each key/value pair a Node in a trie tree. It provides get/set/del/items methods just like a dict does. Additionally, it also provides longest-prefix-matching and keywords lookup against a piece of text, which are very helpful in rule-based Natural Language Processing.

Parameters

tokens – A set of keys or a dict mapping.

parse(text: Sequence[str]) List[Tuple[int, int, Any]][source]

Keywords lookup which takes a piece of text as input, and lookup all occurrences of keywords in it. These occurrences can overlap with each other.

Parameters

text – A piece of text. In HanLP’s design, it doesn’t really matter whether this is a str or a list of str. The trie will transit on either types properly, which means a list of str simply defines a list of transition criteria while a str defines each criterion as a character.

Returns

A tuple of (begin, end, value).

parse_longest(text: Sequence[str]) List[Tuple[int, int, Any]][source]

Longest-prefix-matching which tries to match the longest keyword sequentially from the head of the text till its tail. By definition, the matches won’t overlap with each other.

Parameters

text – A piece of text. In HanLP’s design, it doesn’t really matter whether this is a str or a list of str. The trie will transit on either types properly, which means a list of str simply defines a list of transition criteria while a str defines each criterion as a character.

Returns

A tuple of (begin, end, value).