resources

CoNLL 2003

hanlp.datasets.ner.conll03.CONLL03_EN_DEV = 'https://file.hankcs.com/corpus/conll03_en_iobes.zip#eng.dev.tsv'

Dev set of CoNLL03 (Tjong & De 2003)

hanlp.datasets.ner.conll03.CONLL03_EN_TEST = 'https://file.hankcs.com/corpus/conll03_en_iobes.zip#eng.test.tsv'

Test set of CoNLL03 (Tjong & De 2003)

hanlp.datasets.ner.conll03.CONLL03_EN_TRAIN = 'https://file.hankcs.com/corpus/conll03_en_iobes.zip#eng.train.tsv'

Training set of CoNLL03 (Tjong & De 2003)

MSRA

hanlp.datasets.ner.msra.MSRA_NER_CHAR_LEVEL_DEV = 'http://file.hankcs.com/corpus/msra_ner.zip#dev.tsv'

Dev set of MSRA (Levow 2006) in character level.

hanlp.datasets.ner.msra.MSRA_NER_CHAR_LEVEL_TEST = 'http://file.hankcs.com/corpus/msra_ner.zip#test.tsv'

Test set of MSRA (Levow 2006) in character level.

hanlp.datasets.ner.msra.MSRA_NER_CHAR_LEVEL_TRAIN = 'http://file.hankcs.com/corpus/msra_ner.zip#train.tsv'

Training set of MSRA (Levow 2006) in character level.

hanlp.datasets.ner.msra.MSRA_NER_TOKEN_LEVEL_IOBES_DEV = 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.dev.tsv'

Dev set of MSRA (Levow 2006) in token level.

hanlp.datasets.ner.msra.MSRA_NER_TOKEN_LEVEL_IOBES_TEST = 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.test.tsv'

Test set of MSRA (Levow 2006) in token level.

hanlp.datasets.ner.msra.MSRA_NER_TOKEN_LEVEL_IOBES_TRAIN = 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.train.tsv'

Training set of MSRA (Levow 2006) in token level.

hanlp.datasets.ner.msra.MSRA_NER_TOKEN_LEVEL_SHORT_IOBES_DEV = 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.dev.short.tsv'

Dev set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level.

hanlp.datasets.ner.msra.MSRA_NER_TOKEN_LEVEL_SHORT_IOBES_TEST = 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.test.short.tsv'

Test set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level.

hanlp.datasets.ner.msra.MSRA_NER_TOKEN_LEVEL_SHORT_IOBES_TRAIN = 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.train.short.tsv'

Training set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level.

hanlp.datasets.ner.msra.MSRA_NER_TOKEN_LEVEL_SHORT_JSON_DEV = 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.dev.short.jsonlines'

Dev set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level and jsonlines format.

hanlp.datasets.ner.msra.MSRA_NER_TOKEN_LEVEL_SHORT_JSON_TEST = 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.test.short.jsonlines'

Test set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level and jsonlines format.

hanlp.datasets.ner.msra.MSRA_NER_TOKEN_LEVEL_SHORT_JSON_TRAIN = 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.train.short.jsonlines'

Training set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level and jsonlines format.

OntoNotes5

hanlp.datasets.srl.ontonotes5.chinese.ONTONOTES5_CONLL12_CHINESE_TRAIN = 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/train.chinese.conll12.jsonlines'

Training set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).

hanlp.datasets.srl.ontonotes5.chinese.ONTONOTES5_CONLL12_CHINESE_DEV = 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/development.chinese.conll12.jsonlines'

Dev set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).

hanlp.datasets.srl.ontonotes5.chinese.ONTONOTES5_CONLL12_CHINESE_TEST = 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/test.chinese.conll12.jsonlines'

Test set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).

hanlp.datasets.srl.ontonotes5.chinese.ONTONOTES5_CONLL12_NER_CHINESE_TRAIN = 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/train.chinese.conll12.ner.tsv'

Training set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).

hanlp.datasets.srl.ontonotes5.chinese.ONTONOTES5_CONLL12_NER_CHINESE_DEV = 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/development.chinese.conll12.ner.tsv'

Dev set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).

hanlp.datasets.srl.ontonotes5.chinese.ONTONOTES5_CONLL12_NER_CHINESE_TEST = 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/test.chinese.conll12.ner.tsv'

Test set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).

Resume

hanlp.datasets.ner.resume.RESUME_NER_DEV = 'https://github.com/jiesutd/LatticeLSTM/archive/master.zip#ResumeNER/dev.char.bmes'

Dev set of Resume in char level.

hanlp.datasets.ner.resume.RESUME_NER_TEST = 'https://github.com/jiesutd/LatticeLSTM/archive/master.zip#ResumeNER/test.char.bmes'

Test set of Resume in char level.

hanlp.datasets.ner.resume.RESUME_NER_TRAIN = 'https://github.com/jiesutd/LatticeLSTM/archive/master.zip#ResumeNER/train.char.bmes'

Training set of Resume in char level.

Weibo

hanlp.datasets.ner.weibo.WEIBO_NER_DEV = 'https://github.com/hltcoe/golden-horse/archive/master.zip#data/weiboNER_2nd_conll.dev'

Dev set of Weibo in char level.

hanlp.datasets.ner.weibo.WEIBO_NER_TEST = 'https://github.com/hltcoe/golden-horse/archive/master.zip#data/weiboNER_2nd_conll.test'

Test set of Weibo in char level.

hanlp.datasets.ner.weibo.WEIBO_NER_TRAIN = 'https://github.com/hltcoe/golden-horse/archive/master.zip#data/weiboNER_2nd_conll.train'

Training set of Weibo in char level.