HanLP: Han Language Processing

thumb_up Good

Live Demo

Any multilingual sentence within 300 characters
SDP standard
Use multilingual model.

Linguistics

Dep Tre 
─────── 
   ┌──► 
   │┌─► 
┌┬─┴┴── 
││  ┌─► 
│└─►└── 
└─────► 
Toke 
──── 
二氧化碳 
不能   
超过   
0.5  
%    
;    
Relati 
────── 
nsubj  
aux    
root   
nummod 
obj    
punct  
Lemm 
──── 
二氧化碳 
不能   
超过   
0.5  
%    
;    
PoS   
───── 
NOUN  
AUX   
VERB  
NUM   
NOUN  
PUNCT 
Toke 
──── 
二氧化碳 
不能   
超过   
0.5  
%    
;    
NER Type     
──────────── 
             
             
             
───►CARDINAL 
             
             
Toke 
──── 
二氧化碳 
不能   
超过   
0.5  
%    
;    
SRL PA1  
──────── 
───►ARG0 
         
╟──►PRED 
◄─┐      
◄─┴►ARG1 
         
Toke 
──── 
二氧化碳 
不能   
超过   
0.5  
%    
;    
PoS      3       4       5       6       7       8 
───────────────────────────────────────────────────
NOUN ───────────────────────────►NP ───┐           
AUX ───────────────────────────┐       ├►IP ───┐   
VERB ──────────────────┐       ├►VP ───┘       │   
NUM ───────────┐       ├►VP ───┘               ├►IP
NOUN ───►CLP ──┴►QP ───┘                       │   
PUNCT──────────────────────────────────────────┘   

Lexical

                            二氧化碳 不能 超过 0.5 % ;
T1 NOUN 0 4 二氧化碳
#1	AnnotatorNotes	T1	noun
T2 AUX 5 7 不能
#2	AnnotatorNotes	T2	auxiliary
T3 VERB 8 10 超过
#3	AnnotatorNotes	T3	verb
T4 NUM 11 14 0.5
#4	AnnotatorNotes	T4	numeral
T5 NOUN 15 16 %
#5	AnnotatorNotes	T5	noun
T6 PUNCT 17 18 ;
#6	AnnotatorNotes	T6	punctuation
T7 CARDINAL 11 14 0.5

                            
                        

                            1	二氧化碳	二氧化碳	NOUN	NOUN	_	3	nsubj	_	_
2	不能	不能	AUX	AUX	_	3	aux	_	_
3	超过	超过	VERB	VERB	_	0	root	_	_
4	0.5	0.5	NUM	NUM	_	5	nummod	_	_
5	%	%	NOUN	NOUN	_	3	obj	_	_
6	;	;	PUNCT	PUNCT	_	3	punct	_	_

                            
                        

%3 1 二氧化碳 3 超过 3->1 ARG1 2 不能 2->3 ARG1 4 0.5 5 4->5 ARG1 6

Introduction

The multilingual NLP library for researchers and companies, built on PyTorch and TensorFlow 2.x, for advancing state-of-the-art deep learning techniques in both academia and industry. HanLP was designed from day one to be efficient, user friendly and extendable.

Thanks to open-access corpora like Universal Dependencies and OntoNotes, HanLP 2.1 now offers 10 joint tasks on 104 languages: tokenization, lemmatization, part-of-speech tagging, token feature extraction, dependency parsing, constituency parsing, semantic role labeling, semantic dependency parsing, abstract meaning representation (AMR) parsing. See also GitHub

HanLP versions