sts

sts package holds pre-trained Semantic Textual Similarity (STS) models. We surveyed both supervised and unsupervised models and we believe that unsupervised models are still immature at this moment. Unsupervised STS is good for IR but not NLP especially on sentences with little lexical overlap.

hanlp.pretrained.sts.STS_ELECTRA_BASE_ZH = 'https://file.hankcs.com/hanlp/sts/sts_electra_base_zh_20210530_200109.zip'

A naive regression model trained on concatenated STS corpora.

import hanlp

sim = hanlp.load(hanlp.pretrained.sts.STS_ELECTRA_BASE_ZH)
sim([
    ['看图猜一电影名', '看图猜电影'],
    ['无线路由器怎么无线上网', '无线上网卡和无线路由器怎么用'],
    ['北京到上海的动车票', '上海到北京的动车票'],
])
[1.0, 0.09334613382816315, 0.05256062000989914]