Task¶
-
class
hanlp.components.mtl.tasks.
Task
(trn: str = None, dev: str = None, tst: str = None, sampler_builder: hanlp.common.dataset.SamplerBuilder = None, dependencies: str = None, scalar_mix: hanlp.layers.scalar_mix.ScalarMixWithDropoutBuilder = None, use_raw_hidden_states=False, lr=None, separate_optimizer=False, cls_is_bos=False, sep_is_eos=False, **kwargs)[source]¶ A task in the multi-task learning framework
- Parameters
trn – Path to training set.
dev – Path to dev set.
tst – Path to test set.
sampler_builder – A builder which builds a sampler.
dependencies – Its dependencies on other tasks.
scalar_mix – A builder which builds a ScalarMixWithDropout object.
use_raw_hidden_states – Whether to use raw hidden states from transformer without any pooling.
lr – Learning rate for this task.
separate_optimizer – Use customized separate optimizer for this task.
cls_is_bos –
True
to treat the first token asBOS
.sep_is_eos –
True
to treat the last token asEOS
.**kwargs – Additional config.
-
abstract
build_dataloader
(data, transform: Callable = None, training=False, device=None, logger: logging.Logger = None, cache=False, gradient_accumulation=1, **kwargs) → torch.utils.data.dataloader.DataLoader[source]¶ Build a dataloader for training or evaluation.
- Parameters
data – Either a path or a list of samples.
transform – The transform from MTL, which is usually [TransformerSequenceTokenizer, FieldLength(‘token’)]
training – Whether this method is called on training set.
device – The device dataloader is intended to work with.
logger – Logger for printing message indicating progress.
cache – Whether the dataloader should be cached.
gradient_accumulation – Gradient accumulation to be passed to sampler builder.
**kwargs – Additional experimental arguments.
-
abstract
build_metric
(**kwargs)[source]¶ Implement this to build metric(s).
- Parameters
**kwargs – The subclass decides the method signature.
-
abstract
build_model
(encoder_size, training=True, **kwargs) → torch.nn.modules.module.Module[source]¶ Build model.
- Parameters
training –
True
if called during training.**kwargs –
**self.config
.
-
build_optimizer
(decoder: torch.nn.modules.module.Module, **kwargs)[source]¶ Implement this method to build an optimizer.
- Parameters
**kwargs – The subclass decides the method signature.
-
build_samples
(inputs, cls_is_bos=False, sep_is_eos=False)[source]¶ Build samples for this task. Called when this task is the first task. Default behaviour is to take inputs as list of tokens and put these tokens into a dict per sample.
- Parameters
inputs – Inputs from users, usually a list of lists of tokens.
cls_is_bos – Insert BOS to the head of each sentence.
sep_is_eos – Append EOS to the tail of each sentence.
- Returns
List of samples.
-
build_tokenizer
(tokenizer: hanlp.transform.transformer_tokenizer.TransformerSequenceTokenizer)[source]¶ Build a transformer tokenizer for this task.
- Parameters
tokenizer – A tokenizer which is shared but can be adjusted to provide per-task settings.
- Returns
A TransformerSequenceTokenizer.
-
compute_lens
(data: Union[List[Dict[str, Any]], str], dataset: hanlp.common.dataset.TransformableDataset, input_ids='token_input_ids', length_field='token')[source]¶ - Parameters
data – Samples to be measured or path to dataset during training time.
dataset – During training time, use this dataset to measure the length of each sample inside.
input_ids – Field name corresponds to input ids.
length_field – Fall back to this field during prediction as input_ids may not be generated yet.
- Returns
Length list of this samples
-
evaluate_dataloader
(data: torch.utils.data.dataloader.DataLoader, criterion: Callable, output=False, **kwargs)[source]¶ Evaluate on a dataloader.
- Parameters
data – Dataloader which can build from any data source.
criterion – Loss function.
metric – Metric(s).
output – Whether to save outputs into some file.
**kwargs – Not used.
-
input_is_flat
(data) → bool[source]¶ Check whether the data is flat (meaning that it’s only a single sample, not even batched).
- Returns
True
to indicate the input data is flat.- Return type
-
transform_batch
(batch: Dict[str, Any], results: Dict[str, Any] = None, cls_is_bos=False, sep_is_eos=False) → Dict[str, Any][source]¶ Let the task transform the batch before feeding the batch into its decoder. The default behavior is to adjust the head and tail of tokens, according to
cls_is_bos
,sep_is_eos
passed in and the two settings of the task itself.- Parameters
batch – A batch of samples.
results – Predicted results from other tasks which might be useful for this task to utilize. Say a dep task uses both token and pos as features, then it will need both tok and pos results to make a batch.
cls_is_bos – First token in this batch is BOS.
sep_is_eos – Last token in this batch is EOS.
- Returns
A batch.