rnn_ner

rnn_ner

Tagging based Named Entity Recognition.

class hanlp.components.ner.rnn_ner.RNNNamedEntityRecognizer(**kwargs)[source]

An old-school RNN tagger using word2vec or fasttext embeddings.

Parameters

**kwargs – Predefined config.

build_metric(**kwargs)[source]

Implement this to build metric(s).

Parameters

**kwargs – The subclass decides the method signature.

evaluate_dataloader(data, criterion, logger=None, ratio_width=None, **kwargs)[source]

Evaluate on a dataloader.

Parameters
  • data – Dataloader which can build from any data source.

  • criterion – Loss function.

  • metric – Metric(s).

  • output – Whether to save outputs into some file.

  • **kwargs – Not used.

fit(trn_data, dev_data, save_dir, batch_size=50, epochs=100, embed=100, rnn_input=None, rnn_hidden=256, drop=0.5, lr=0.001, patience=10, crf=True, optimizer='adam', token_key='token', tagging_scheme=None, anneal_factor: float = 0.5, delimiter=None, anneal_patience=2, devices=None, token_delimiter=None, logger=None, verbose=True, **kwargs)[source]

Fit to data, triggers the training procedure. For training set and dev set, they shall be local or remote files.

Parameters
  • trn_data – Training set.

  • dev_data – Development set.

  • save_dir – The directory to save trained component.

  • batch_size – The number of samples in a batch.

  • epochs – Number of epochs.

  • devices – Devices this component will live on.

  • logger – Any logging.Logger instance.

  • seed – Random seed to reproduce this training.

  • finetuneTrue to load from save_dir instead of creating a randomly initialized component. str to specify a different save_dir to load from.

  • eval_trn – Evaluate training set after each update. This can slow down the training but provides a quick diagnostic for debugging.

  • _device_placeholderTrue to create a placeholder tensor which triggers PyTorch to occupy devices so other components won’t take these devices as first choices.

  • **kwargs – Hyperparameters used by sub-classes.

Returns

Any results sub-classes would like to return. Usually the best metrics on training set.

predict(tokens: Any, batch_size: Optional[int] = None, **kwargs)[source]

Predict on data fed by user. Users shall avoid directly call this method since it is not guarded with torch.no_grad and will introduces unnecessary gradient computation. Use __call__ instead.

Parameters
  • *args – Sentences or tokens.

  • **kwargs – Used in sub-classes.

save_config(save_dir, filename='config.json')[source]

Save config into a directory.

Parameters
  • save_dir – The directory to save config.

  • filename – A file name for config.