class hanlp.datasets.parsing.conll_dataset.CoNLLParsingDataset(data: Union[str, List], transform: Union[Callable, List] = None, cache=None, generate_idx=None, prune: Callable[[Dict[str, List[str]]], bool] = None)[source]

General class for CoNLL style dependency parsing datasets.

  • data – The local or remote path to a dataset, or a list of samples where each sample is a dict.

  • transform – Predefined transform(s).

  • cacheTrue to enable caching, so that transforms won’t be called twice.

  • generate_idx – Create a IDX field for each sample to store its order in dataset. Useful for prediction when samples are re-ordered by a sampler.

  • prune – A filter to prune unwanted samples.


Both .conllx and .conllu are supported. Their descriptions can be found in hanlp_common.conll.CoNLLWord and hanlp_common.conll.CoNLLUWord respectively.


filepath.conllx or .conllu file path.