Tokenization

Input a piece of text:

136/800

Introduction

HanLP supports tokenization for 130 languages trained on UD 2.10. While this is beneficial for low-resource languages, it is recommended to use a monolingual model for high-resource languages such as Chinese and Japanese to achieve better performance.

How to Use

Apply for Auth

We are hosting a non-commercial API service and you are welcome to apply for an auth keyopen in new window. An auth key is a password which gives you access to our API and protects our server from being abused.

Create RESTful Client

Create a HanLPClientopen in new window:

      from hanlp_restful import HanLPClient
# Support en: English, zh: Chinese, ja: Japanese, mul: Multilingual
HanLP = HanLPClient('https://www.hanlp.com/api', auth=None, language='mul')

Tokenize

Set tasks='tok' to perform tokenization：

      
HanLP('''In 2021, HanLPv2.1 delivers state-of-the-art multilingual NLP techniques to production environments.
2021年、HanLPv2.1は次世代の最先端多言語NLP技術を本番環境に導入します。
2021年 HanLPv2.1为生产环境带来次世代最先进的多语种NLP技术。''', tasks='tok')

Native APIs

Please refer to docsopen in new window.

Tokenization

# Tokenization

# Introduction

# How to Use

# Apply for Auth

# Create RESTful Client

# Tokenize

# Native APIs

Tokenization

Introduction

How to Use

Apply for Auth

Create RESTful Client

Tokenize

Native APIs