eos
eos¶
- class hanlp.datasets.eos.eos.SentenceBoundaryDetectionDataset(data: Union[str, List], transform: Optional[Union[Callable, List]] = None, cache=None, append_after_sentence=None, eos_chars=None, eos_char_min_freq=200, eos_char_is_punct=True, window_size=5, **kwargs)[source]¶
Dataset for sentence boundary detection (eos).
- Parameters
data – The local or remote path to a dataset, or a list of samples where each sample is a dict.
transform – Predefined transform(s).
cache –
True
to enable caching, so that transforms won’t be called twice.append_after_sentence – A
str
to insert at the tail of each sentence. For example, English always have a space between sentences.eos_chars – Punctuations at the tail of sentences. If
None
, then it will built from training samples.eos_char_min_freq – Minimal frequency to keep a eos char.
eos_char_is_punct – Limit eos chars to punctuations.
window_size – Window size to extract ngram features.
kwargs – Not used.