trie
trie¶
- class hanlp_trie.trie.Node(value=None)[source]¶
A node in a trie tree.
- Parameters
value – The value associated with this node.
- class hanlp_trie.trie.Trie(tokens: Optional[Union[Dict[str, Any], Iterable[str]]] = None)[source]¶
A referential implementation of the trie (De 1959) structure. It stores a dict by assigning each key/value pair a
Node
in a trie tree. It provides get/set/del/items methods just like adict
does. Additionally, it also provides longest-prefix-matching and keywords lookup against a piece of text, which are very helpful in rule-based Natural Language Processing.- Parameters
tokens – A set of keys or a dict mapping.
- parse(text: Sequence[str]) List[Tuple[int, int, Any]] [source]¶
Keywords lookup which takes a piece of text as input, and lookup all occurrences of keywords in it. These occurrences can overlap with each other.
- Parameters
text – A piece of text. In HanLP’s design, it doesn’t really matter whether this is a str or a list of str. The trie will transit on either types properly, which means a list of str simply defines a list of transition criteria while a str defines each criterion as a character.
- Returns
A tuple of
(begin, end, value)
.
- parse_longest(text: Sequence[str]) List[Tuple[int, int, Any]] [source]¶
Longest-prefix-matching which tries to match the longest keyword sequentially from the head of the text till its tail. By definition, the matches won’t overlap with each other.
- Parameters
text – A piece of text. In HanLP’s design, it doesn’t really matter whether this is a str or a list of str. The trie will transit on either types properly, which means a list of str simply defines a list of transition criteria while a str defines each criterion as a character.
- Returns
A tuple of
(begin, end, value)
.