mlm
mlm¶
Masked Language Model (MLM) predicts words that were originally hidden intentionally in a sentence.
To perform such prediction, first load a pre-trained MLM (e.g., bert-base-chinese
):
from hanlp.components.lm.mlm import MaskedLanguageModel
mlm = MaskedLanguageModel()
mlm.load('bert-base-chinese')
Represent blanks (masked tokens) with [MASK]
and let MLM fills them:
mlm('生活的真谛是[MASK]。')
[{'美': 0.3407564163208008,
'爱': 0.2292439043521881,
'乐': 0.032554809004068375,
'人': 0.022961532697081566,
':': 0.01942446455359459,
'笑': 0.017893701791763306,
'-': 0.016441352665424347,
'玩': 0.016314353793859482,
'活': 0.014588544145226479,
'好': 0.013642454519867897}]
Batching is always faster:
mlm(['生活的真谛是[MASK]。', '巴黎是[MASK][MASK]的首都。'])
[[{'美': 0.3407564163208008,
'爱': 0.2292439043521881,
'乐': 0.032554809004068375,
'人': 0.022961532697081566,
':': 0.01942446455359459,
'笑': 0.017893701791763306,
'-': 0.016441352665424347,
'玩': 0.016314353793859482,
'活': 0.014588544145226479,
'好': 0.013642454519867897}],
[{'法': 0.5057310461997986,
'德': 0.08851869404315948,
'歐': 0.06904969364404678,
'巴': 0.04269423708319664,
'瑞': 0.039870887994766235,
'英': 0.03201477229595184,
'美': 0.02721557952463627,
'荷': 0.02194151096045971,
'中': 0.018307117745280266,
'欧': 0.011474725790321827},
{'國': 0.6981891989707947,
'国': 0.10869748890399933,
'洲': 0.03609883040189743,
'蘭': 0.013893415220081806,
'臘': 0.010245074518024921,
'士': 0.009544524364173412,
'盟': 0.00916974525898695,
'西': 0.005254795774817467,
'典': 0.004525361582636833,
'邦': 0.0044594407081604}]]
All the pre-trained MLM models and their details are listed in the docs of Hugging Face 🤗 Transformers.