resources¶
CoNLL 2003¶
-
hanlp.datasets.ner.conll03.
CONLL03_EN_DEV
= 'https://file.hankcs.com/corpus/conll03_en_iobes.zip#eng.dev.tsv'¶ Dev set of CoNLL03 (Tjong & De 2003)
-
hanlp.datasets.ner.conll03.
CONLL03_EN_TEST
= 'https://file.hankcs.com/corpus/conll03_en_iobes.zip#eng.test.tsv'¶ Test set of CoNLL03 (Tjong & De 2003)
-
hanlp.datasets.ner.conll03.
CONLL03_EN_TRAIN
= 'https://file.hankcs.com/corpus/conll03_en_iobes.zip#eng.train.tsv'¶ Training set of CoNLL03 (Tjong & De 2003)
MSRA¶
-
hanlp.datasets.ner.msra.
MSRA_NER_CHAR_LEVEL_DEV
= 'http://file.hankcs.com/corpus/msra_ner.zip#dev.tsv'¶ Dev set of MSRA (Levow 2006) in character level.
-
hanlp.datasets.ner.msra.
MSRA_NER_CHAR_LEVEL_TEST
= 'http://file.hankcs.com/corpus/msra_ner.zip#test.tsv'¶ Test set of MSRA (Levow 2006) in character level.
-
hanlp.datasets.ner.msra.
MSRA_NER_CHAR_LEVEL_TRAIN
= 'http://file.hankcs.com/corpus/msra_ner.zip#train.tsv'¶ Training set of MSRA (Levow 2006) in character level.
-
hanlp.datasets.ner.msra.
MSRA_NER_TOKEN_LEVEL_IOBES_DEV
= 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.dev.tsv'¶ Dev set of MSRA (Levow 2006) in token level.
-
hanlp.datasets.ner.msra.
MSRA_NER_TOKEN_LEVEL_IOBES_TEST
= 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.test.tsv'¶ Test set of MSRA (Levow 2006) in token level.
-
hanlp.datasets.ner.msra.
MSRA_NER_TOKEN_LEVEL_IOBES_TRAIN
= 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.train.tsv'¶ Training set of MSRA (Levow 2006) in token level.
-
hanlp.datasets.ner.msra.
MSRA_NER_TOKEN_LEVEL_SHORT_IOBES_DEV
= 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.dev.short.tsv'¶ Dev set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level.
-
hanlp.datasets.ner.msra.
MSRA_NER_TOKEN_LEVEL_SHORT_IOBES_TEST
= 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.test.short.tsv'¶ Test set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level.
-
hanlp.datasets.ner.msra.
MSRA_NER_TOKEN_LEVEL_SHORT_IOBES_TRAIN
= 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.train.short.tsv'¶ Training set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level.
-
hanlp.datasets.ner.msra.
MSRA_NER_TOKEN_LEVEL_SHORT_JSON_DEV
= 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.dev.short.jsonlines'¶ Dev set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level and jsonlines format.
-
hanlp.datasets.ner.msra.
MSRA_NER_TOKEN_LEVEL_SHORT_JSON_TEST
= 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.test.short.jsonlines'¶ Test set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level and jsonlines format.
-
hanlp.datasets.ner.msra.
MSRA_NER_TOKEN_LEVEL_SHORT_JSON_TRAIN
= 'http://file.hankcs.com/corpus/msra_ner_token_level.zip#word_level.train.short.jsonlines'¶ Training set of shorten (<= 128 tokens) MSRA (Levow 2006) in token level and jsonlines format.
OntoNotes5¶
-
hanlp.datasets.srl.ontonotes5.chinese.
ONTONOTES5_CONLL12_CHINESE_TRAIN
= 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/train.chinese.conll12.jsonlines'¶ Training set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).
-
hanlp.datasets.srl.ontonotes5.chinese.
ONTONOTES5_CONLL12_CHINESE_DEV
= 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/development.chinese.conll12.jsonlines'¶ Dev set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).
-
hanlp.datasets.srl.ontonotes5.chinese.
ONTONOTES5_CONLL12_CHINESE_TEST
= 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/test.chinese.conll12.jsonlines'¶ Test set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).
-
hanlp.datasets.srl.ontonotes5.chinese.
ONTONOTES5_CONLL12_NER_CHINESE_TRAIN
= 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/train.chinese.conll12.ner.tsv'¶ Training set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).
-
hanlp.datasets.srl.ontonotes5.chinese.
ONTONOTES5_CONLL12_NER_CHINESE_DEV
= 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/development.chinese.conll12.ner.tsv'¶ Dev set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).
-
hanlp.datasets.srl.ontonotes5.chinese.
ONTONOTES5_CONLL12_NER_CHINESE_TEST
= 'https://catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz#/ontonotes-release-5.0/data/../conll-2012/chinese/test.chinese.conll12.ner.tsv'¶ Test set of OntoNotes5 used in CoNLL12 (Pradhan et al. 2012).
Resume¶
-
hanlp.datasets.ner.resume.
RESUME_NER_DEV
= 'https://github.com/jiesutd/LatticeLSTM/archive/master.zip#ResumeNER/dev.char.bmes'¶ Dev set of Resume in char level.
-
hanlp.datasets.ner.resume.
RESUME_NER_TEST
= 'https://github.com/jiesutd/LatticeLSTM/archive/master.zip#ResumeNER/test.char.bmes'¶ Test set of Resume in char level.
-
hanlp.datasets.ner.resume.
RESUME_NER_TRAIN
= 'https://github.com/jiesutd/LatticeLSTM/archive/master.zip#ResumeNER/train.char.bmes'¶ Training set of Resume in char level.
Weibo¶
-
hanlp.datasets.ner.weibo.
WEIBO_NER_DEV
= 'https://github.com/hltcoe/golden-horse/archive/master.zip#data/weiboNER_2nd_conll.dev'¶ Dev set of Weibo in char level.
-
hanlp.datasets.ner.weibo.
WEIBO_NER_TEST
= 'https://github.com/hltcoe/golden-horse/archive/master.zip#data/weiboNER_2nd_conll.test'¶ Test set of Weibo in char level.
-
hanlp.datasets.ner.weibo.
WEIBO_NER_TRAIN
= 'https://github.com/hltcoe/golden-horse/archive/master.zip#data/weiboNER_2nd_conll.train'¶ Training set of Weibo in char level.