Tutorial
Contents
Tutorial¶
Natural Language Processing is an exciting field consisting of many closely related tasks like lexical analysis and parsing. Each task involves many datasets and models, all requiring a high degree of expertise. Things become even more complex when dealing with multilingual text, as there’s simply no datasets for some low-resource languages. However, with HanLP 2.1, core NLP tasks have been made easy to access and efficient in production environments. In this tutorial, we’ll walk through the APIs in HanLP step by step.
HanLP offers out-of-the-box RESTful API and native Python API which share very similar interfaces while they are designed for different scenes.
RESTful API¶
RESTful API is an endpoint where you send your documents to then get the parsed annotations back. We are hosting a non-commercial API service and you are welcome to apply for an auth key. An auth key is a password which gives you access to our API and protects our server from being abused. Once obtained such an auth key, you can parse your document with our RESTful client which can be installed via:
pip install hanlp_restful
Then initiate a HanLPClient
with your auth key and send a document to have it parsed.
from hanlp_restful import HanLPClient
# Fill in your auth, set language='zh' to use Chinese models
HanLP = HanLPClient('https://hanlp.hankcs.com/api', auth=None, language='mul')
doc = HanLP('In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environments. ' \
'2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。' \
'2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。')
print(doc)
{
"tok": [
["In", "2021", ",", "HanLPv2.1", "delivers", "state-of-the-art", "multilingual", "NLP", "techniques", "to", "production", "environments", "."],
["2021", "年", "、", "HanLPv2.1", "は", "次", "世代", "の", "最", "先端", "多", "言語", "NLP", "技術", "を", "本番", "環境", "に", "導入", "します", "。"],
["2021", "年", "HanLPv2.1", "为", "生产", "环境", "带来", "次世代", "最", "先进的", "多", "语种", "NLP", "技术", "。"]
],
"ner": [
[["2021", "DATE", 1, 2], ["HanLPv2.1", "WORK_OF_ART", 3, 4]],
[["2021 年", "DATE", 0, 2]],
[["2021 年", "DATE", 0, 2], ["HanLPv2.1", "PERSON", 2, 3]]
],
"srl": [
[[["In 2021", "ARGM-TMP", 0, 2], ["HanLPv2.1", "ARG0", 3, 4], ["delivers", "PRED", 4, 5], ["to production environments", "ARG2", 9, 12]]],
[],
[[["2021 年", "ARGM-TMP", 0, 2], ["带来", "PRED", 6, 7]]]
],
"sdp/dm": [
[[], [[1, "ARG2"]], [[1, "orphan"]], [[1, "orphan"], [5, "ARG1"]], [[1, "orphan"]], [[1, "orphan"]], [[1, "orphan"]], [[1, "orphan"]], [[5, "ARG2"], [6, "ARG1"], [7, "ARG1"], [8, "compound"]], [[1, "orphan"]], [[1, "orphan"]], [[5, "ARG3"], [11, "compound"]], [[1, "orphan"]]],
[[], [[1, "ARG2"]], [], [], [], [], [[6, "compound"]], [], [], [], [], [], [], [[12, "compound"], [13, "compound"], [15, "ARG1"], [19, "ARG2"]], [], [], [[16, "compound"], [18, "ARG2"]], [], [], [], []],
[[], [[1, "ARG1"]], [[7, "ARG1"]], [], [], [[4, "ARG2"], [5, "compound"]], [[2, "orphan"]], [], [], [[9, "ARG1"]], [], [[11, "ARG1"]], [], [[7, "ARG2"], [10, "ARG1"], [12, "compound"], [13, "compound"]], []]
],
"sdp/pas": [
[[], [[1, "prep_ARG2"]], [[1, "orphan"]], [[5, "verb_ARG1"]], [[1, "orphan"]], [[1, "orphan"]], [[1, "orphan"]], [[1, "orphan"]], [[5, "verb_ARG2"], [6, "adj_ARG1"], [7, "adj_ARG1"], [8, "noun_ARG1"]], [[1, "orphan"]], [[1, "orphan"]], [[10, "prep_ARG2"], [11, "noun_ARG1"]], [[1, "orphan"]]],
[[], [[1, "adj_ARG1"]], [], [], [], [], [[6, "noun_ARG1"], [8, "prep_ARG2"]], [], [], [], [], [], [], [[9, "noun_ARG1"], [10, "noun_ARG1"], [11, "adj_ARG1"], [12, "noun_ARG1"], [13, "noun_ARG1"], [15, "prep_ARG1"], [19, "verb_ARG2"]], [], [], [[16, "noun_ARG1"], [18, "prep_ARG2"]], [], [], [], []],
[[], [[1, "adj_ARG1"]], [[7, "verb_ARG1"]], [], [], [[4, "prep_ARG2"], [5, "noun_ARG1"]], [], [], [], [[9, "adj_ARG1"]], [], [], [], [[7, "verb_ARG2"], [10, "adj_ARG1"], [11, "adj_ARG1"], [12, "noun_ARG1"], [13, "noun_ARG1"]], []]
],
"sdp/psd": [
[[], [[5, "TWHEN"]], [[1, "orphan"]], [[5, "ACT-arg"]], [[1, "orphan"]], [[9, "RSTR"]], [[9, "RSTR"]], [[9, "RSTR"]], [[5, "PAT-arg"]], [[1, "orphan"]], [[12, "RSTR"]], [[5, "ADDR-arg"]], [[1, "orphan"]]],
[[[2, "RSTR"]], [[19, "TWHEN"]], [], [], [], [[7, "RSTR"]], [[14, "APP"]], [[3, "orphan"]], [], [[14, "RSTR"]], [], [[14, "RSTR"]], [[14, "ID"]], [[19, "PAT-arg"]], [], [[17, "RSTR"]], [[19, "LOC"]], [], [[3, "orphan"]], [[3, "orphan"]], [[3, "orphan"]]],
[[[2, "RSTR"]], [[7, "TWHEN"]], [], [], [[6, "RSTR"]], [[7, "ADDR-arg"]], [], [[14, "RSTR"]], [], [[14, "RSTR"]], [], [[14, "RSTR"]], [[14, "ID"]], [[7, "PAT-arg"]], []]
],
"con": [
["TOP", [["S", [["PP", [["ADP", ["In"]], ["NP", [["NUM", ["2021"]]]]]], ["PUNCT", [","]], ["NP", [["PROPN", ["HanLPv2.1"]]]], ["VP", [["VERB", ["delivers"]], ["NP", [["ADJ", ["state-of-the-art"]], ["ADJ", ["multilingual"]], ["PROPN", ["NLP"]], ["NOUN", ["techniques"]]]], ["PP", [["ADP", ["to"]], ["NP", [["NOUN", ["production"]], ["NOUN", ["environments"]]]]]]]], ["PUNCT", ["."]]]]]],
["TOP", [["IP", [["NUM", ["2021"]], ["NOUN", ["年"]], ["PUNCT", ["、"]], ["NOUN", ["HanLPv2.1"]], ["IP", [["VP", [["VP", [["ADP", ["は"]], ["NOUN", ["次"]], ["NOUN", ["世代"]], ["ADP", ["の"]], ["ADJP", [["ADJP", [["ADJP", [["NOUN", ["最"]]]], ["ADJP", [["NOUN", ["先端"]]]]]], ["ADJP", [["NOUN", ["多"]]]]]]]]]]]], ["NP", [["NP", [["NP", [["NP", [["NP", [["NOUN", ["言語"]], ["NOUN", ["NLP"]], ["NOUN", ["技術"]]]], ["ADP", ["を"]]]], ["NOUN", ["本番"]], ["NOUN", ["環境"]]]], ["PP", [["ADP", ["に"]]]]]], ["VP", [["VERB", ["導入"]], ["AUX", ["します"]]]]]], ["PUNCT", ["。"]]]]]],
["TOP", [["IP", [["NP", [["NUM", ["2021"]], ["NOUN", ["年"]]]], ["NP", [["X", ["HanLPv2.1"]]]], ["VP", [["PP", [["ADP", ["为"]], ["NP", [["NOUN", ["生产"]], ["NOUN", ["环境"]]]]]], ["VP", [["VERB", ["带来"]], ["NP", [["ADJP", [["NOUN", ["次世代"]]]], ["ADJP", [["ADVP", [["ADV", ["最"]]]], ["ADJP", [["ADJ", ["先进的"]]]]]], ["NP", [["QP", [["NUM", ["多"]]]], ["NP", [["NOUN", ["语种"]]]]]], ["NP", [["X", ["NLP"]], ["NOUN", ["技术"]]]]]]]]]], ["PUNCT", ["。"]]]]]]
],
"lem": [
["in", "2021", ",", "HANlpv2.1", "deliver", "state-of-the-art", "multilingual", "NLP", "technique", "to", "production", "environment", "."],
["2021", "年", "、", "HANLPV2.1", "は", "次", "世代", "の", "最", "先端", "多", "言語", "NLP", "技術", "を", "本番", "環境", "に", "導入", "します", "。"],
["2021", "年", "HANlpv2.1", "为", "生产", "环境", "带来", "次世代", "最", "先进的", "多", "语种", "NLP", "技术", "。"]
],
"pos": [
["ADP", "NUM", "PUNCT", "PROPN", "VERB", "ADJ", "ADJ", "PROPN", "NOUN", "ADP", "NOUN", "NOUN", "PUNCT"],
["NUM", "NOUN", "PUNCT", "NOUN", "ADP", "NOUN", "NOUN", "ADP", "NOUN", "NOUN", "NOUN", "NOUN", "NOUN", "NOUN", "ADP", "NOUN", "NOUN", "ADP", "VERB", "AUX", "PUNCT"],
["NUM", "NOUN", "X", "ADP", "NOUN", "NOUN", "VERB", "NOUN", "ADV", "ADJ", "NUM", "NOUN", "X", "NOUN", "PUNCT"]
],
"fea": [
["_", "NumType=Card", "_", "Number=Sing", "Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin", "Degree=Pos", "Degree=Pos", "Number=Sing", "Number=Plur", "_", "Number=Sing", "Number=Plur", "_"],
["_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_"],
["NumType=Card", "_", "_", "_", "_", "_", "_", "_", "_", "_", "NumType=Card", "_", "_", "_", "_"]
],
"dep": [
[[2, "case"], [5, "obl"], [2, "punct"], [5, "nsubj"], [0, "root"], [9, "amod"], [9, "amod"], [9, "compound"], [5, "obj"], [12, "case"], [12, "compound"], [5, "obl"], [5, "punct"]],
[[2, "nummod"], [19, "obl"], [2, "punct"], [19, "nsubj"], [4, "case"], [7, "compound"], [14, "nmod"], [7, "case"], [14, "compound"], [14, "compound"], [14, "compound"], [14, "compound"], [14, "compound"], [19, "obj"], [14, "case"], [17, "compound"], [19, "obl"], [17, "case"], [0, "root"], [19, "aux"], [19, "punct"]],
[[2, "nummod"], [7, "nmod:tmod"], [7, "nsubj"], [6, "case"], [6, "nmod"], [7, "obl"], [0, "root"], [14, "nmod"], [10, "advmod"], [14, "amod"], [12, "nummod"], [14, "nmod"], [14, "nmod"], [7, "obj"], [7, "punct"]]
]
}
Visualization¶
The returned Document
has a handy method pretty_print()
which offers visualization in any mono-width text environment.
doc.pretty_print()
Dep Tree Token Relation Lemma PoS Token NER Type Token SRL PA1 Token PoS 3 4 5 6
────────── ──────────────── ──────── ──────────────── ───── ──────────────── ─────────────── ──────────────── ──────────── ──────────────── ──────────────────────────────────
┌─► In case in ADP In In ◄─┐ In ADP ───────────┐
┌─►├── 2021 obl 2021 NUM 2021 ───►DATE 2021 ◄─┴►ARGM-TMP 2021 NUM ────►NP ───┴────────►PP ───┐
│ └─► , punct , PUNCT , , , PUNCT──────────────────────────┤
│ ┌─► HanLPv2.1 nsubj HANlpv2.1 PROPN HanLPv2.1 ───►WORK_OF_ART HanLPv2.1 ───►ARG0 HanLPv2.1 PROPN───────────────────►NP────┤
┌┬┬─┴──┴── delivers root deliver VERB delivers delivers ╟──►PRED delivers VERB ──────────────────┐ │
│││ ┌───► state-of-the-art amod state-of-the-art ADJ state-of-the-art state-of-the-art state-of-the-art ADJ ───┐ │ │
│││ │┌──► multilingual amod multilingual ADJ multilingual multilingual multilingual ADJ │ │ │
│││ ││┌─► NLP compound NLP PROPN NLP NLP NLP PROPN ├────────►NP────┼►VP────┼►S
││└─►└┴┴── techniques obj technique NOUN techniques techniques techniques NOUN ──┘ │ │
││ ┌──► to case to ADP to to ◄─┐ to ADP ───────────┐ │ │
││ │┌─► production compound production NOUN production production ├►ARG2 production NOUN ──┐ ├►PP ───┘ │
│└───►└┴── environments obl environment NOUN environments environments ◄─┘ environments NOUN ──┴►NP ───┘ │
└────────► . punct . PUNCT . . . PUNCT──────────────────────────┘
Dep Tree Token Relation Lemma PoS Token NER Type Token PoS 3 4 5 6 7 8 9
───────────── ───────── ──────── ───────── ───── ───────── ──────── ───────── ───────────────────────────────────────────────────────────
┌─► 2021 nummod 2021 NUM 2021 ◄─┐ 2021 NUM ───────────────────────────────────────────────────┐
┌────────►├── 年 obl 年 NOUN 年 ◄─┴►DATE 年 NOUN ──────────────────────────────────────────────────┤
│ └─► 、 punct 、 PUNCT 、 、 PUNCT──────────────────────────────────────────────────┤
│┌───────►┌── HanLPv2.1 nsubj HANLPV2.1 NOUN HanLPv2.1 HanLPv2.1 NOUN ──────────────────────────────────────────────────┤
││ └─► は case は ADP は は ADP ───────────────────────────┐ │
││ ┌─► 次 compound 次 NOUN 次 次 NOUN ──────────────────────────┤ │
││ ┌───►├── 世代 nmod 世代 NOUN 世代 世代 NOUN ──────────────────────────┤ │
││ │ └─► の case の ADP の の ADP ───────────────────────────┼►VP ────►VP ────►IP────┤
││ │┌─────► 最 compound 最 NOUN 最 最 NOUN ───►ADJP──┐ │ │
││ ││┌────► 先端 compound 先端 NOUN 先端 先端 NOUN ───►ADJP──┴►ADJP──┐ │ │
││ │││┌───► 多 compound 多 NOUN 多 多 NOUN ───────────►ADJP──┴►ADJP──┘ ├►IP
││ ││││┌──► 言語 compound 言語 NOUN 言語 言語 NOUN ──┐ │
││ │││││┌─► NLP compound NLP NOUN NLP NLP NOUN ├►NP ───┐ │
││┌─►└┴┴┴┴┼── 技術 obj 技術 NOUN 技術 技術 NOUN ──┘ ├►NP ───┐ │
│││ └─► を case を ADP を を ADP ───────────┘ │ │
│││ ┌─► 本番 compound 本番 NOUN 本番 本番 NOUN ──────────────────┼►NP ───┐ │
│││ ┌─►├── 環境 obl 環境 NOUN 環境 環境 NOUN ──────────────────┘ ├►NP ───┐ │
│││ │ └─► に case に ADP に に ADP ────────────────────►PP ───┘ │ │
└┴┴────┴─┬┬── 導入 root 導入 VERB 導入 導入 VERB ──┐ ├────────►NP────┤
│└─► します aux します AUX します します AUX ───┴────────────────────────►VP ───┘ │
└──► 。 punct 。 PUNCT 。 。 PUNCT──────────────────────────────────────────────────┘
Dep Tree Token Relation Lemma PoS Token NER Type Token SRL PA1 Token PoS 3 4 5 6 7 8
──────────── ───────── ───────── ───────── ───── ───────── ────────── ───────── ──────────── ───────── ───────────────────────────────────────────────────
┌─► 2021 nummod 2021 NUM 2021 ◄─┐ 2021 ◄─┐ 2021 NUM ───┐
┌────►└── 年 nmod:tmod 年 NOUN 年 ◄─┴►DATE 年 ◄─┴►ARGM-TMP 年 NOUN ──┴────────────────────────────────►NP ───┐
│┌──────► HanLPv2.1 nsubj HANlpv2.1 X HanLPv2.1 ───►PERSON HanLPv2.1 HanLPv2.1 X ──────────────────────────────────────►NP────┤
││ ┌──► 为 case 为 ADP 为 为 为 ADP ───────────┐ │
││ │┌─► 生产 nmod 生产 NOUN 生产 生产 生产 NOUN ──┐ ├────────────────►PP ───┐ │
││┌─►└┴── 环境 obl 环境 NOUN 环境 环境 环境 NOUN ──┴►NP ───┘ │ │
┌┬─┴┴┴────── 带来 root 带来 VERB 带来 带来 ╟──►PRED 带来 VERB ──────────────────────────┐ ├►VP────┤
││ ┌──────► 次世代 nmod 次世代 NOUN 次世代 次世代 次世代 NOUN ───────────►ADJP──┐ │ │ │
││ │ ┌─► 最 advmod 最 ADV 最 最 最 ADV ────►ADVP──┐ │ ├►VP ───┘ ├►IP
││ │┌──►└── 先进的 amod 先进的 ADJ 先进的 先进的 先进的 ADJ ────►ADJP──┴►ADJP──┤ │ │
││ ││ ┌─► 多 nummod 多 NUM 多 多 多 NUM ────►QP ───┐ ├►NP ───┘ │
││ ││┌─►└── 语种 nmod 语种 NOUN 语种 语种 语种 NOUN ───►NP ───┴►NP────┤ │
││ │││ ┌─► NLP nmod NLP X NLP NLP NLP X ─────┐ │ │
│└─►└┴┴──┴── 技术 obj 技术 NOUN 技术 技术 技术 NOUN ──┴────────►NP ───┘ │
└──────────► 。 punct 。 PUNCT 。 。 。 PUNCT──────────────────────────────────────────┘
Native API¶
Multi-Task Learning¶
If you want to run our models locally or you want to implement your own RESTful server, you can install the native API and call it just like the RESTful one.
import hanlp
HanLP = hanlp.load(hanlp.pretrained.mtl.UD_ONTONOTES_TOK_POS_LEM_FEA_NER_SRL_DEP_SDP_CON_XLMR_BASE)
print(HanLP(['In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environments.',
'2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。',
'2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。']))
{
"tok": [
["In", "2021", ",", "HanLPv2.1", "delivers", "state-of-the-art", "multilingual", "NLP", "techniques", "to", "production", "environments", "."],
["2021", "年", "、", "HanLPv2.1", "は", "次", "世代", "の", "最", "先端", "多", "言語", "NLP", "技術", "を", "本番", "環境", "に", "導入", "します", "。"],
["2021", "年", "HanLPv2.1", "为", "生产", "环境", "带来", "次世代", "最", "先进的", "多", "语种", "NLP", "技术", "。"]
],
"ner": [
[["2021", "DATE", 1, 2], ["HanLPv2.1", "WORK_OF_ART", 3, 4]],
[["2021 年", "DATE", 0, 2]],
[["2021 年", "DATE", 0, 2], ["HanLPv2.1", "PERSON", 2, 3]]
],
"srl": [
[[["In 2021", "ARGM-TMP", 0, 2], ["HanLPv2.1", "ARG0", 3, 4], ["delivers", "PRED", 4, 5], ["to production environments", "ARG2", 9, 12]]],
[],
[[["2021 年", "ARGM-TMP", 0, 2], ["带来", "PRED", 6, 7]]]
],
"sdp/dm": [
[[], [[1, "ARG2"]], [[1, "orphan"]], [[1, "orphan"], [5, "ARG1"]], [[1, "orphan"]], [[1, "orphan"]], [[1, "orphan"]], [[1, "orphan"]], [[5, "ARG2"], [6, "ARG1"], [7, "ARG1"], [8, "compound"]], [[1, "orphan"]], [[1, "orphan"]], [[5, "ARG3"], [11, "compound"]], [[1, "orphan"]]],
[[], [[1, "ARG2"]], [], [], [], [], [[6, "compound"]], [], [], [], [], [], [], [[12, "compound"], [13, "compound"], [15, "ARG1"], [19, "ARG2"]], [], [], [[16, "compound"], [18, "ARG2"]], [], [], [], []],
[[], [[1, "ARG1"]], [[7, "ARG1"]], [], [], [[4, "ARG2"], [5, "compound"]], [[2, "orphan"]], [], [], [[9, "ARG1"]], [], [[11, "ARG1"]], [], [[7, "ARG2"], [10, "ARG1"], [12, "compound"], [13, "compound"]], []]
],
"sdp/pas": [
[[], [[1, "prep_ARG2"]], [[1, "orphan"]], [[5, "verb_ARG1"]], [[1, "orphan"]], [[1, "orphan"]], [[1, "orphan"]], [[1, "orphan"]], [[5, "verb_ARG2"], [6, "adj_ARG1"], [7, "adj_ARG1"], [8, "noun_ARG1"]], [[1, "orphan"]], [[1, "orphan"]], [[10, "prep_ARG2"], [11, "noun_ARG1"]], [[1, "orphan"]]],
[[], [[1, "adj_ARG1"]], [], [], [], [], [[6, "noun_ARG1"], [8, "prep_ARG2"]], [], [], [], [], [], [], [[9, "noun_ARG1"], [10, "noun_ARG1"], [11, "adj_ARG1"], [12, "noun_ARG1"], [13, "noun_ARG1"], [15, "prep_ARG1"], [19, "verb_ARG2"]], [], [], [[16, "noun_ARG1"], [18, "prep_ARG2"]], [], [], [], []],
[[], [[1, "adj_ARG1"]], [[7, "verb_ARG1"]], [], [], [[4, "prep_ARG2"], [5, "noun_ARG1"]], [], [], [], [[9, "adj_ARG1"]], [], [], [], [[7, "verb_ARG2"], [10, "adj_ARG1"], [11, "adj_ARG1"], [12, "noun_ARG1"], [13, "noun_ARG1"]], []]
],
"sdp/psd": [
[[], [[5, "TWHEN"]], [[1, "orphan"]], [[5, "ACT-arg"]], [[1, "orphan"]], [[9, "RSTR"]], [[9, "RSTR"]], [[9, "RSTR"]], [[5, "PAT-arg"]], [[1, "orphan"]], [[12, "RSTR"]], [[5, "ADDR-arg"]], [[1, "orphan"]]],
[[[2, "RSTR"]], [[19, "TWHEN"]], [], [], [], [[7, "RSTR"]], [[14, "APP"]], [[3, "orphan"]], [], [[14, "RSTR"]], [], [[14, "RSTR"]], [[14, "ID"]], [[19, "PAT-arg"]], [], [[17, "RSTR"]], [[19, "LOC"]], [], [[3, "orphan"]], [[3, "orphan"]], [[3, "orphan"]]],
[[[2, "RSTR"]], [[7, "TWHEN"]], [], [], [[6, "RSTR"]], [[7, "ADDR-arg"]], [], [[14, "RSTR"]], [], [[14, "RSTR"]], [], [[14, "RSTR"]], [[14, "ID"]], [[7, "PAT-arg"]], []]
],
"con": [
["TOP", [["S", [["PP", [["ADP", ["In"]], ["NP", [["NUM", ["2021"]]]]]], ["PUNCT", [","]], ["NP", [["PROPN", ["HanLPv2.1"]]]], ["VP", [["VERB", ["delivers"]], ["NP", [["ADJ", ["state-of-the-art"]], ["ADJ", ["multilingual"]], ["PROPN", ["NLP"]], ["NOUN", ["techniques"]]]], ["PP", [["ADP", ["to"]], ["NP", [["NOUN", ["production"]], ["NOUN", ["environments"]]]]]]]], ["PUNCT", ["."]]]]]],
["TOP", [["IP", [["NUM", ["2021"]], ["NOUN", ["年"]], ["PUNCT", ["、"]], ["NOUN", ["HanLPv2.1"]], ["IP", [["VP", [["VP", [["ADP", ["は"]], ["NOUN", ["次"]], ["NOUN", ["世代"]], ["ADP", ["の"]], ["ADJP", [["ADJP", [["ADJP", [["NOUN", ["最"]]]], ["ADJP", [["NOUN", ["先端"]]]]]], ["ADJP", [["NOUN", ["多"]]]]]]]]]]]], ["NP", [["NP", [["NP", [["NP", [["NP", [["NOUN", ["言語"]], ["NOUN", ["NLP"]], ["NOUN", ["技術"]]]], ["ADP", ["を"]]]], ["NOUN", ["本番"]], ["NOUN", ["環境"]]]], ["PP", [["ADP", ["に"]]]]]], ["VP", [["VERB", ["導入"]], ["AUX", ["します"]]]]]], ["PUNCT", ["。"]]]]]],
["TOP", [["IP", [["NP", [["NUM", ["2021"]], ["NOUN", ["年"]]]], ["NP", [["X", ["HanLPv2.1"]]]], ["VP", [["PP", [["ADP", ["为"]], ["NP", [["NOUN", ["生产"]], ["NOUN", ["环境"]]]]]], ["VP", [["VERB", ["带来"]], ["NP", [["ADJP", [["NOUN", ["次世代"]]]], ["ADJP", [["ADVP", [["ADV", ["最"]]]], ["ADJP", [["ADJ", ["先进的"]]]]]], ["NP", [["QP", [["NUM", ["多"]]]], ["NP", [["NOUN", ["语种"]]]]]], ["NP", [["X", ["NLP"]], ["NOUN", ["技术"]]]]]]]]]], ["PUNCT", ["。"]]]]]]
],
"lem": [
["in", "2021", ",", "HANlpv2.1", "deliver", "state-of-the-art", "multilingual", "NLP", "technique", "to", "production", "environment", "."],
["2021", "年", "、", "HANLPV2.1", "は", "次", "世代", "の", "最", "先端", "多", "言語", "NLP", "技術", "を", "本番", "環境", "に", "導入", "します", "。"],
["2021", "年", "HANlpv2.1", "为", "生产", "环境", "带来", "次世代", "最", "先进的", "多", "语种", "NLP", "技术", "。"]
],
"pos": [
["ADP", "NUM", "PUNCT", "PROPN", "VERB", "ADJ", "ADJ", "PROPN", "NOUN", "ADP", "NOUN", "NOUN", "PUNCT"],
["NUM", "NOUN", "PUNCT", "NOUN", "ADP", "NOUN", "NOUN", "ADP", "NOUN", "NOUN", "NOUN", "NOUN", "NOUN", "NOUN", "ADP", "NOUN", "NOUN", "ADP", "VERB", "AUX", "PUNCT"],
["NUM", "NOUN", "X", "ADP", "NOUN", "NOUN", "VERB", "NOUN", "ADV", "ADJ", "NUM", "NOUN", "X", "NOUN", "PUNCT"]
],
"fea": [
["_", "NumType=Card", "_", "Number=Sing", "Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin", "Degree=Pos", "Degree=Pos", "Number=Sing", "Number=Plur", "_", "Number=Sing", "Number=Plur", "_"],
["_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_", "_"],
["NumType=Card", "_", "_", "_", "_", "_", "_", "_", "_", "_", "NumType=Card", "_", "_", "_", "_"]
],
"dep": [
[[2, "case"], [5, "obl"], [2, "punct"], [5, "nsubj"], [0, "root"], [9, "amod"], [9, "amod"], [9, "compound"], [5, "obj"], [12, "case"], [12, "compound"], [5, "obl"], [5, "punct"]],
[[2, "nummod"], [19, "obl"], [2, "punct"], [19, "nsubj"], [4, "case"], [7, "compound"], [14, "nmod"], [7, "case"], [14, "compound"], [14, "compound"], [14, "compound"], [14, "compound"], [14, "compound"], [19, "obj"], [14, "case"], [17, "compound"], [19, "obl"], [17, "case"], [0, "root"], [19, "aux"], [19, "punct"]],
[[2, "nummod"], [7, "nmod:tmod"], [7, "nsubj"], [6, "case"], [6, "nmod"], [7, "obl"], [0, "root"], [14, "nmod"], [10, "advmod"], [14, "amod"], [12, "nummod"], [14, "nmod"], [14, "nmod"], [7, "obj"], [7, "punct"]]
]
}
Due to the fact that the service provider is very likely running a different model or having different settings, the RESTful and native results might be slightly different.
To process Chinese or Japanese, HanLP provides mono-lingual models in each language which significantly outperform the multi-lingual model. See docs for the list of models.
Single-Task Learning¶
HanLP also provides a full spectrum of single-task learning models for core NLP tasks including tagging and parsing. Please refer to the documentations of pretrained
models for details.