mlm

mlm

Masked Language Model (MLM) predicts words that were originally hidden intentionally in a sentence. To perform such prediction, first load a pre-trained MLM (e.g., bert-base-chinese):

from hanlp.components.lm.mlm import MaskedLanguageModel
mlm = MaskedLanguageModel()
mlm.load('bert-base-chinese')

Represent blanks (masked tokens) with [MASK] and let MLM fills them:

mlm('生活的真谛是[MASK]。')
[{'美': 0.3407564163208008,
  '爱': 0.2292439043521881,
  '乐': 0.032554809004068375,
  '人': 0.022961532697081566,
  ':': 0.01942446455359459,
  '笑': 0.017893701791763306,
  '-': 0.016441352665424347,
  '玩': 0.016314353793859482,
  '活': 0.014588544145226479,
  '好': 0.013642454519867897}]

Batching is always faster:

mlm(['生活的真谛是[MASK]。', '巴黎是[MASK][MASK]的首都。'])
[[{'美': 0.3407564163208008,
   '爱': 0.2292439043521881,
   '乐': 0.032554809004068375,
   '人': 0.022961532697081566,
   ':': 0.01942446455359459,
   '笑': 0.017893701791763306,
   '-': 0.016441352665424347,
   '玩': 0.016314353793859482,
   '活': 0.014588544145226479,
   '好': 0.013642454519867897}],
 [{'法': 0.5057310461997986,
   '德': 0.08851869404315948,
   '歐': 0.06904969364404678,
   '巴': 0.04269423708319664,
   '瑞': 0.039870887994766235,
   '英': 0.03201477229595184,
   '美': 0.02721557952463627,
   '荷': 0.02194151096045971,
   '中': 0.018307117745280266,
   '欧': 0.011474725790321827},
  {'國': 0.6981891989707947,
   '国': 0.10869748890399933,
   '洲': 0.03609883040189743,
   '蘭': 0.013893415220081806,
   '臘': 0.010245074518024921,
   '士': 0.009544524364173412,
   '盟': 0.00916974525898695,
   '西': 0.005254795774817467,
   '典': 0.004525361582636833,
   '邦': 0.0044594407081604}]]

All the pre-trained MLM models and their details are listed in the docs of Hugging Face 🤗 Transformers.