Task

Task

class hanlp.components.mtl.tasks.Task(trn: Optional[str] = None, dev: Optional[str] = None, tst: Optional[str] = None, sampler_builder: Optional[hanlp.common.dataset.SamplerBuilder] = None, dependencies: Optional[str] = None, scalar_mix: Optional[hanlp.layers.scalar_mix.ScalarMixWithDropoutBuilder] = None, use_raw_hidden_states=False, lr=None, separate_optimizer=False, cls_is_bos=False, sep_is_eos=False, **kwargs)[source]

A task in the multi-task learning framework

Parameters
  • trn – Path to training set.

  • dev – Path to dev set.

  • tst – Path to test set.

  • sampler_builder – A builder which builds a sampler.

  • dependencies – Its dependencies on other tasks.

  • scalar_mix – A builder which builds a ScalarMixWithDropout object.

  • use_raw_hidden_states – Whether to use raw hidden states from transformer without any pooling.

  • lr – Learning rate for this task.

  • separate_optimizer – Use customized separate optimizer for this task.

  • cls_is_bosTrue to treat the first token as BOS.

  • sep_is_eosTrue to treat the last token as EOS.

  • **kwargs – Additional config.

abstract build_dataloader(data, transform: Optional[Callable] = None, training=False, device=None, logger: Optional[logging.Logger] = None, cache=False, gradient_accumulation=1, **kwargs) torch.utils.data.dataloader.DataLoader[source]

Build a dataloader for training or evaluation.

Parameters
  • data – Either a path or a list of samples.

  • transform – The transform from MTL, which is usually [TransformerSequenceTokenizer, FieldLength(‘token’)]

  • training – Whether this method is called on training set.

  • device – The device dataloader is intended to work with.

  • logger – Logger for printing message indicating progress.

  • cache – Whether the dataloader should be cached.

  • gradient_accumulation – Gradient accumulation to be passed to sampler builder.

  • **kwargs – Additional experimental arguments.

abstract build_metric(**kwargs)[source]

Implement this to build metric(s).

Parameters

**kwargs – The subclass decides the method signature.

abstract build_model(encoder_size, training=True, **kwargs) torch.nn.modules.module.Module[source]

Build model.

Parameters
  • trainingTrue if called during training.

  • **kwargs**self.config.

build_optimizer(decoder: torch.nn.modules.module.Module, **kwargs)[source]

Implement this method to build an optimizer.

Parameters

**kwargs – The subclass decides the method signature.

build_samples(inputs, cls_is_bos=False, sep_is_eos=False)[source]

Build samples for this task. Called when this task is the first task. Default behaviour is to take inputs as list of tokens and put these tokens into a dict per sample.

Parameters
  • inputs – Inputs from users, usually a list of lists of tokens.

  • cls_is_bos – Insert BOS to the head of each sentence.

  • sep_is_eos – Append EOS to the tail of each sentence.

Returns

List of samples.

build_tokenizer(tokenizer: hanlp.transform.transformer_tokenizer.TransformerSequenceTokenizer)[source]

Build a transformer tokenizer for this task.

Parameters

tokenizer – A tokenizer which is shared but can be adjusted to provide per-task settings.

Returns

A TransformerSequenceTokenizer.

compute_lens(data: Union[List[Dict[str, Any]], str], dataset: hanlp.common.dataset.TransformableDataset, input_ids='token_input_ids')[source]
Parameters
  • data – Samples to be measured or path to dataset during training time.

  • dataset – During training time, use this dataset to measure the length of each sample inside.

  • input_ids – Field name corresponds to input ids.

Returns

Length list of this samples

evaluate_dataloader(data: torch.utils.data.dataloader.DataLoader, criterion: Callable, output=False, **kwargs)[source]

Evaluate on a dataloader.

Parameters
  • data – Dataloader which can build from any data source.

  • criterion – Loss function.

  • metric – Metric(s).

  • output – Whether to save outputs into some file.

  • **kwargs – Not used.

input_is_flat(data) bool[source]

Check whether the data is flat (meaning that it’s only a single sample, not even batched).

Returns

True to indicate the input data is flat.

Return type

bool

transform_batch(batch: Dict[str, Any], results: Optional[Dict[str, Any]] = None, cls_is_bos=False, sep_is_eos=False) Dict[str, Any][source]

Let the task transform the batch before feeding the batch into its decoder. The default behavior is to adjust the head and tail of tokens, according to cls_is_bos, sep_is_eos passed in and the two settings of the task itself.

Parameters
  • batch – A batch of samples.

  • results – Predicted results from other tasks which might be useful for this task to utilize. Say a dep task uses both token and pos as features, then it will need both tok and pos results to make a batch.

  • cls_is_bos – First token in this batch is BOS.

  • sep_is_eos – Last token in this batch is EOS.

Returns

A batch.