Tokenization


Tokenization

183/800
loading

Introduction

HanLP supports tokenization of 130 languages trained on UD 2.10. For low-resource languages, this is good. However, for high-resource languages like Chinese and Japanese, a monolingual model is highly recommended.

How to Use

Apply for Auth

We are hosting a non-commercial API service and you are welcome to apply for an auth keyopen in new window. An auth key is a password which gives you access to our API and protects our server from being abused.

Create RESTful Client

Create a HanLPClientopen in new window:

      from hanlp_restful import HanLPClient
# Fill in your auth
HanLP = HanLPClient('https://www.hanlp.com/api', auth=None, language='mul') 

    

Tokenize

Set tasks='tok' to perform tokenization:

      
HanLP('''In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environments.
2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。
2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。''', tasks='tok')

    

Native APIs

Please refer to docsopen in new window.

Last update: 7/3/2022, 9:41:47 AM
Contributors: hankcs