HanLP: Han Language Processing

thumb_up Good

Live Demo

Any multilingual sentence within 300 characters
SDP standard
Use multilingual model.

Linguistics

Dep Tree   
────────── 
      ┌──► 
      │┌─► 
┌┬────┴┴── 
│└─►┌───── 
│   │  ┌─► 
│   └─►└── 
└────────► 
Toke 
──── 
二氧化碳 
不    
应    
超过   
0.5  
%    
;    
Relati 
────── 
nsubj  
mark   
root   
xcomp  
nummod 
obj    
punct  
Lemm 
──── 
二氧化碳 
不    
应    
超过   
0.5  
%    
;    
PoS   
───── 
NOUN  
ADV   
VERB  
VERB  
NUM   
NOUN  
PUNCT 
Toke 
──── 
二氧化碳 
不    
应    
超过   
0.5  
%    
;    
NER Type     
──────────── 
             
             
             
             
───►CARDINAL 
             
             
Toke 
──── 
二氧化碳 
不    
应    
超过   
0.5  
%    
;    
SRL PA1      
──────────── 
───►ARG0     
───►ARGM-ADV 
             
╟──►PRED     
◄─┐          
◄─┴►ARG1     
             
Toke 
──── 
二氧化碳 
不    
应    
超过   
0.5  
%    
;    
PoS      3       4       5       6       7       8       9 
───────────────────────────────────────────────────────────
NOUN ───────────────────────────────────►NP ───┐           
ADV ────────────────────────────►ADVP──┐       ├►IP ───┐   
VERB ──────────────────────────┐       ├►VP ───┘       │   
VERB ──────────────────┐       ├►VP ───┘               │   
NUM ───────────┐       ├►VP ───┘                       ├►IP
NOUN ───►CLP ──┴►QP ───┘                               │   
PUNCT──────────────────────────────────────────────────┘   

Lexical

                            二氧化碳 不 应 超过 0.5 % ;
T1 NOUN 0 4 二氧化碳
#1	AnnotatorNotes	T1	noun
T2 ADV 5 6 不
#2	AnnotatorNotes	T2	adverb
T3 VERB 7 8 应
#3	AnnotatorNotes	T3	verb
T4 VERB 9 11 超过
#4	AnnotatorNotes	T4	verb
T5 NUM 12 15 0.5
#5	AnnotatorNotes	T5	numeral
T6 NOUN 16 17 %
#6	AnnotatorNotes	T6	noun
T7 PUNCT 18 19 ;
#7	AnnotatorNotes	T7	punctuation
T8 CARDINAL 12 15 0.5

                            
                        

                            1	二氧化碳	二氧化碳	NOUN	NOUN	_	3	nsubj	_	_
2	不	不	ADV	ADV	_	3	mark	_	_
3	应	应	VERB	VERB	_	0	root	_	_
4	超过	超过	VERB	VERB	_	3	xcomp	_	_
5	0.5	0.5	NUM	NUM	_	6	nummod	_	_
6	%	%	NOUN	NOUN	_	4	obj	_	_
7	;	;	PUNCT	PUNCT	_	3	punct	_	_

                            
                        

%3 1 二氧化碳 3 3->1 ARG1 4 超过 4->1 ARG1 2 5 0.5 6 5->6 ARG1 7

Introduction

The multilingual NLP library for researchers and companies, built on PyTorch and TensorFlow 2.x, for advancing state-of-the-art deep learning techniques in both academia and industry. HanLP was designed from day one to be efficient, user friendly and extendable.

Thanks to open-access corpora like Universal Dependencies and OntoNotes, HanLP 2.1 now offers 10 joint tasks on 104 languages: tokenization, lemmatization, part-of-speech tagging, token feature extraction, dependency parsing, constituency parsing, semantic role labeling, semantic dependency parsing, abstract meaning representation (AMR) parsing. See also GitHub

HanLP versions