conll
conll¶
- class hanlp.datasets.parsing.loaders.conll_dataset.CoNLLParsingDataset(data: Union[str, List], transform: Optional[Union[Callable, List]] = None, cache=None, generate_idx=None, prune: Optional[Callable[[Dict[str, List[str]]], bool]] = None)[source]¶
General class for CoNLL style dependency parsing datasets.
- Parameters
data – The local or remote path to a dataset, or a list of samples where each sample is a dict.
transform – Predefined transform(s).
cache –
True
to enable caching, so that transforms won’t be called twice.generate_idx – Create a
IDX
field for each sample to store its order in dataset. Useful for prediction when samples are re-ordered by a sampler.prune – A filter to prune unwanted samples.
- load_file(filepath)[source]¶
Both
.conllx
and.conllu
are supported. Their descriptions can be found inhanlp_common.conll.CoNLLWord
andhanlp_common.conll.CoNLLUWord
respectively.- Parameters
filepath –
.conllx
or.conllu
file path.