pos

pos

The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging.

To tag a tokenized sentence:

import hanlp

pos = hanlp.load(hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL)
pos(['我', '的', '希望', '是', '希望', '世界', '和平'])
['PN', 'DEG', 'NN', 'VC', 'VV', 'NN', 'VA']

All the pre-trained taggers and their details are listed below.

hanlp.pretrained.pos.C863_POS_ELECTRA_SMALL = 'https://file.hankcs.com/hanlp/pos/pos_863_electra_small_20220217_101958.zip'

Electra small model (Clark et al. 2020) trained on Chinese 863 corpus. Accuracy = 95.19.

hanlp.pretrained.pos.CTB5_POS_RNN = 'https://file.hankcs.com/hanlp/pos/ctb5_pos_rnn_20200113_235925.zip'

An old school BiLSTM tagging model trained on CTB5.

hanlp.pretrained.pos.CTB5_POS_RNN_FASTTEXT_ZH = 'https://file.hankcs.com/hanlp/pos/ctb5_pos_rnn_fasttext_20191230_202639.zip'

An old school BiLSTM tagging model with FastText (Bojanowski et al. 2017) embeddings trained on CTB5.

hanlp.pretrained.pos.CTB9_POS_ALBERT_BASE = 'https://file.hankcs.com/hanlp/pos/ctb9_albert_base_20211228_163935.zip'

ALBERT model (Lan et al. 2020) trained on CTB9 (Xue et al. 2016). This is a TF component.

hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL = 'https://file.hankcs.com/hanlp/pos/pos_ctb_electra_small_20220215_111944.zip'

Electra small model (Clark et al. 2020) trained on CTB9 (Xue et al. 2016). Accuracy = 96.26.

hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL_TF = 'https://file.hankcs.com/hanlp/pos/pos_ctb_electra_small_20211227_121341.zip'

Electra small model (Clark et al. 2020) trained on CTB9 (Xue et al. 2016). Accuracy = 96.75. This is a TF component.

hanlp.pretrained.pos.CTB9_POS_RADICAL_ELECTRA_SMALL = 'https://file.hankcs.com/hanlp/pos/pos_ctb_radical_electra_small_20220215_111932.zip'

Electra small model (Clark et al. 2020) with radical embeddings (He et al. 2018a) trained on CTB9 (Xue et al. 2016). Accuracy = 96.14.

hanlp.pretrained.pos.PKU_POS_ELECTRA_SMALL = 'https://file.hankcs.com/hanlp/pos/pos_pku_electra_small_20220217_142436.zip'

Electra small model (Clark et al. 2020) trained on Chinese PKU corpus. Accuracy = 97.55.

hanlp.pretrained.pos.PTB_POS_RNN_FASTTEXT_EN = 'https://file.hankcs.com/hanlp/pos/ptb_pos_rnn_fasttext_20220418_101708.zip'

An old school BiLSTM tagging model with FastText (Bojanowski et al. 2017) embeddings trained on PTB.