conll2012_dataset
conll2012_dataset¶
- class hanlp.datasets.srl.loaders.conll2012.CoNLL2012SRLDataset(data: Union[str, List], transform: Optional[Union[Callable, List]] = None, cache=None, doc_level_offset=True, generate_idx=None)[source]¶
- A - Datasetwhich can be applied with a list of transform functions.- Parameters
- data – The local or remote path to a dataset, or a list of samples where each sample is a dict. 
- transform – Predefined transform(s). 
- cache – - Trueto enable caching, so that transforms won’t be called twice.
- generate_idx – Create a - IDXfield for each sample to store its order in dataset. Useful for prediction when samples are re-ordered by a sampler.
 
 - load_file(filepath: str)[source]¶
- Load - .jsonlinesCoNLL12-style corpus. Samples of this corpus can be found using the following scripts.- import json from hanlp_common.document import Document from hanlp.datasets.srl.ontonotes5.chinese import ONTONOTES5_CONLL12_CHINESE_DEV from hanlp.utils.io_util import get_resource with open(get_resource(ONTONOTES5_CONLL12_CHINESE_DEV)) as src: for line in src: doc = json.loads(line) print(Document(doc)) break - Parameters
- filepath – - .jsonlinesCoNLL12 corpus.
 
 
