HanLP supports tokenization of 130 languages trained on UD 2.10. For low-resource languages, this is good. However, for high-resource languages like Chinese and Japanese, a monolingual model is highly recommended.
How to Use
Apply for Auth
We are hosting a non-commercial API service and you are welcome to apply for an auth key. An auth key is a password which gives you access to our API and protects our server from being abused.
Create RESTful Client
from hanlp_restful import HanLPClient # Fill in your auth HanLP = HanLPClient('https://www.hanlp.com/api', auth=None, language='mul')
tasks='tok' to perform tokenization：
HanLP('''In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environments. 2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。 2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。''', tasks='tok')
Please refer to docs.