rnn_ner¶

Tagging based Named Entity Recognition.

class hanlp.components.ner.rnn_ner.RNNNamedEntityRecognizer(**kwargs)[source]¶

An old-school RNN tagger using word2vec or fasttext embeddings.

Parameters: **kwargs – Predefined config.

build_metric(**kwargs)[source]¶

Implement this to build metric(s).

Parameters: **kwargs – The subclass decides the method signature.

evaluate_dataloader(data, criterion, logger=None, ratio_width=None, **kwargs)[source]¶

Evaluate on a dataloader.

Parameters

data – Dataloader which can build from any data source.
criterion – Loss function.
metric – Metric(s).
output – Whether to save outputs into some file.
**kwargs – Not used.

fit(trn_data, dev_data, save_dir, batch_size=50, epochs=100, embed=100, rnn_input=None, rnn_hidden=256, drop=0.5, lr=0.001, patience=10, crf=True, optimizer='adam', token_key='token', tagging_scheme=None, anneal_factor: float = 0.5, delimiter=None, anneal_patience=2, devices=None, token_delimiter=None, logger=None, verbose=True, **kwargs)[source]¶

Fit to data, triggers the training procedure. For training set and dev set, they shall be local or remote files.

Parameters

trn_data – Training set.
dev_data – Development set.
save_dir – The directory to save trained component.
batch_size – The number of samples in a batch.
epochs – Number of epochs.
devices – Devices this component will live on.
logger – Any logging.Logger instance.
seed – Random seed to reproduce this training.
finetune – True to load from save_dir instead of creating a randomly initialized component. str to specify a different save_dir to load from.
eval_trn – Evaluate training set after each update. This can slow down the training but provides a quick diagnostic for debugging.
_device_placeholder – True to create a placeholder tensor which triggers PyTorch to occupy devices so other components won’t take these devices as first choices.
**kwargs – Hyperparameters used by sub-classes.

Returns

Any results sub-classes would like to return. Usually the best metrics on training set.

predict(tokens: Any, batch_size: Optional[int] = None, **kwargs)[source]¶

Predict on data fed by user. Users shall avoid directly call this method since it is not guarded with torch.no_grad and will introduces unnecessary gradient computation. Use __call__ instead.

Parameters

*args – Sentences or tokens.
**kwargs – Used in sub-classes.

save_config(save_dir, filename='config.json')[source]¶

Save config into a directory.

Parameters

save_dir – The directory to save config.
filename – A file name for config.

HanLP Documentation

rnn_ner

rnn_ner¶