Tokenization
Tokenization
183/800
Introduction
HanLP supports tokenization of 130 languages trained on UD 2.10. For low-resource languages, this is good. However, for high-resource languages like Chinese and Japanese, a monolingual model is highly recommended.
How to Use
Apply for Auth
We are hosting a non-commercial API service and you are welcome to apply for an auth key. An auth key is a password which gives you access to our API and protects our server from being abused.
Create RESTful Client
Create a HanLPClient
:
from hanlp_restful import HanLPClient # Fill in your auth HanLP = HanLPClient('https://www.hanlp.com/api', auth=None, language='mul')
Tokenize
Set tasks='tok'
to perform tokenization:
HanLP('''In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environments. 2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。 2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。''', tasks='tok')
Native APIs
Please refer to docs.