sdp
sdp¶
Semantic Dependency Parsing.
- class hanlp.components.mtl.tasks.sdp.BiaffineSemanticDependencyParsing(trn: Optional[str] = None, dev: Optional[str] = None, tst: Optional[str] = None, sampler_builder: Optional[hanlp.common.dataset.SamplerBuilder] = None, dependencies: Optional[str] = None, scalar_mix: Optional[hanlp.layers.scalar_mix.ScalarMixWithDropoutBuilder] = None, use_raw_hidden_states=False, lr=0.002, separate_optimizer=False, punct=False, tree=True, pad_rel=None, apply_constraint=False, single_root=True, no_zero_head=None, n_mlp_arc=500, n_mlp_rel=100, mlp_dropout=0.33, mu=0.9, nu=0.9, epsilon=1e-12, decay=0.75, decay_steps=5000, cls_is_bos=True, use_pos=False, **kwargs)[source]¶
Implementation of “Stanford’s graph-based neural dependency parser at the conll 2017 shared task” (Dozat et al. 2017) and “Establishing Strong Baselines for the New Decade” (He & Choi 2020).
- Parameters
trn – Path to training set.
dev – Path to dev set.
tst – Path to test set.
sampler_builder – A builder which builds a sampler.
dependencies – Its dependencies on other tasks.
scalar_mix – A builder which builds a ScalarMixWithDropout object.
use_raw_hidden_states – Whether to use raw hidden states from transformer without any pooling.
lr – Learning rate for this task.
separate_optimizer – Use customized separate optimizer for this task.
punct –
True
to include punctuations in evaluation.pad_rel – Padding token for relations.
apply_constraint – Enforce constraints (see following parameters).
single_root – Force single root.
no_zero_head – Every token has at least one head.
n_mlp_arc – Number of features for arc representation.
n_mlp_rel – Number of features for rel representation.
mlp_dropout – Dropout applied to MLPs.
mu – First coefficient used for computing running averages of gradient and its square in Adam.
nu – Second coefficient used for computing running averages of gradient and its square in Adam.
epsilon – Term added to the denominator to improve numerical stability
decay – Decay rate for exceptional lr scheduler.
decay_steps – Decay every
decay_steps
steps.cls_is_bos –
True
to treat the first token asBOS
.use_pos – Use pos feature.
**kwargs – Not used.
- build_dataloader(data, transform: Optional[hanlp.common.transform.TransformList] = None, training=False, device=None, logger: Optional[logging.Logger] = None, gradient_accumulation=1, **kwargs) torch.utils.data.dataloader.DataLoader [source]¶
Build a dataloader for training or evaluation.
- Parameters
data – Either a path or a list of samples.
transform – The transform from MTL, which is usually [TransformerSequenceTokenizer, FieldLength(‘token’)]
training – Whether this method is called on training set.
device – The device dataloader is intended to work with.
logger – Logger for printing message indicating progress.
cache – Whether the dataloader should be cached.
gradient_accumulation – Gradient accumulation to be passed to sampler builder.
**kwargs – Additional experimental arguments.
- build_metric(**kwargs)[source]¶
Implement this to build metric(s).
- Parameters
**kwargs – The subclass decides the method signature.
- build_model(encoder_size, training=True, **kwargs) torch.nn.modules.module.Module [source]¶
Build model.
- Parameters
training –
True
if called during training.**kwargs –
**self.config
.
- build_optimizer(decoder: torch.nn.modules.module.Module, **kwargs)[source]¶
Implement this method to build an optimizer.
- Parameters
**kwargs – The subclass decides the method signature.
- build_samples(inputs, cls_is_bos=False, sep_is_eos=False)[source]¶
Build samples for this task. Called when this task is the first task. Default behaviour is to take inputs as list of tokens and put these tokens into a dict per sample.
- Parameters
inputs – Inputs from users, usually a list of lists of tokens.
cls_is_bos – Insert BOS to the head of each sentence.
sep_is_eos – Append EOS to the tail of each sentence.
- Returns
List of samples.