conll
conll¶
- class hanlp_common.conll.CoNLLWord(id, form, lemma=None, cpos=None, pos=None, feats=None, head=None, deprel=None, phead=None, pdeprel=None)[source]¶
CoNLL (Buchholz & Marsi 2006) format template, see http://anthology.aclweb.org/W/W06/W06-2920.pdf
- Parameters
id (int) – Token counter, starting at 1 for each new sentence.
form (str) – Word form or punctuation symbol.
lemma (str) – Lemma or stem (depending on the particular treebank) of word form, or an underscore if not available.
cpos (str) – Coarse-grained part-of-speech tag, where the tagset depends on the treebank.
pos (str) – Fine-grained part-of-speech tag, where the tagset depends on the treebank.
feats (str) – Unordered set of syntactic and/or morphological features (depending on the particular treebank), or an underscore if not available.
head (Union[int, List[int]]) – Head of the current token, which is either a value of ID, or zero (’0’) if the token links to the virtual root node of the sentence.
deprel (Union[str, List[str]]) – Dependency relation to the HEAD.
phead (int) – Projective head of current token, which is either a value of ID or zero (’0’), or an underscore if not available.
pdeprel (str) – Dependency relation to the PHEAD, or an underscore if not available.
- property nonempty_fields¶
Get the values of nonempty fields as a list.
- class hanlp_common.conll.CoNLLUWord(id: Union[int, str], form, lemma=None, upos=None, xpos=None, feats=None, head=None, deprel=None, deps=None, misc=None)[source]¶
CoNLL-U format template, see https://universaldependencies.org/format.html
- Parameters
id (Union[int, str]) – Token counter, starting at 1 for each new sentence.
form (Union[str, None]) – Word form or punctuation symbol.
lemma (str) – Lemma or stem (depending on the particular treebank) of word form, or an underscore if not available.
upos (str) – Universal part-of-speech tag.
xpos (str) – Language-specific part-of-speech tag; underscore if not available.
feats (str) – List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.
head (int) – Head of the current token, which is either a value of ID, or zero (’0’) if the token links to the virtual root node of the sentence.
deprel (str) – Dependency relation to the HEAD.
deps (Union[List[Tuple[int, str], str]) – Projective head of current token, which is either a value of ID or zero (’0’), or an underscore if not available.
misc (str) – Dependency relation to the PHEAD, or an underscore if not available.
- property nonempty_fields¶
Get the values of nonempty fields as a list.
- class hanlp_common.conll.CoNLLSentence(words=None)[source]¶
A list of
CoNLLWord
orCoNLLUWord
. It is a sub-class oflist
and its words can be accessed in the same way as accessing list elements.- Parameters
words (list[Union[CoNLLWord, CoNLLUWord]]) – A list of words.
- static from_dict(d: dict, conllu=False)[source]¶
Build a CoNLLSentence from a dict.
- Parameters
d – A dict storing a list for each field, where each index corresponds to a token.
conllu –
True
to buildCoNLLUWord
for each token.
- Returns
- static from_file(path: str, conllu=False)[source]¶
Build a CoNLLSentence from
.conllx
or.conllu
file- Parameters
path – Path to the file.
conllu –
True
to buildCoNLLUWord
for each token.
- Returns
- static from_str(conll: str, conllu=False)[source]¶
Build a CoNLLSentence from CoNLL-X format str
- Parameters
conll (str) – CoNLL-X or CoNLL-U format string
conllu –
True
to buildCoNLLUWord
for each token.
- Returns
- property projective¶
True
if this tree is projective.