conll

conll

class hanlp_common.conll.CoNLLWord(id, form, lemma=None, cpos=None, pos=None, feats=None, head=None, deprel=None, phead=None, pdeprel=None)[source]

CoNLL (Buchholz & Marsi 2006) format template, see http://anthology.aclweb.org/W/W06/W06-2920.pdf

Parameters
  • id (int) – Token counter, starting at 1 for each new sentence.

  • form (str) – Word form or punctuation symbol.

  • lemma (str) – Lemma or stem (depending on the particular treebank) of word form, or an underscore if not available.

  • cpos (str) – Coarse-grained part-of-speech tag, where the tagset depends on the treebank.

  • pos (str) – Fine-grained part-of-speech tag, where the tagset depends on the treebank.

  • feats (str) – Unordered set of syntactic and/or morphological features (depending on the particular treebank), or an underscore if not available.

  • head (Union[int, List[int]]) – Head of the current token, which is either a value of ID, or zero (’0’) if the token links to the virtual root node of the sentence.

  • deprel (Union[str, List[str]]) – Dependency relation to the HEAD.

  • phead (int) – Projective head of current token, which is either a value of ID or zero (’0’), or an underscore if not available.

  • pdeprel (str) – Dependency relation to the PHEAD, or an underscore if not available.

get_pos()[source]

Get the precisest pos for this word.

Returns: self.pos or self.cpos.

property nonempty_fields

Get the values of nonempty fields as a list.

class hanlp_common.conll.CoNLLUWord(id: Union[int, str], form, lemma=None, upos=None, xpos=None, feats=None, head=None, deprel=None, deps=None, misc=None)[source]

CoNLL-U format template, see https://universaldependencies.org/format.html

Parameters
  • id (Union[int, str]) – Token counter, starting at 1 for each new sentence.

  • form (Union[str, None]) – Word form or punctuation symbol.

  • lemma (str) – Lemma or stem (depending on the particular treebank) of word form, or an underscore if not available.

  • upos (str) – Universal part-of-speech tag.

  • xpos (str) – Language-specific part-of-speech tag; underscore if not available.

  • feats (str) – List of morphological features from the universal feature inventory or from a defined language-specific extension; underscore if not available.

  • head (int) – Head of the current token, which is either a value of ID, or zero (’0’) if the token links to the virtual root node of the sentence.

  • deprel (str) – Dependency relation to the HEAD.

  • deps (Union[List[Tuple[int, str], str]) – Projective head of current token, which is either a value of ID or zero (’0’), or an underscore if not available.

  • misc (str) – Dependency relation to the PHEAD, or an underscore if not available.

get_pos()[source]

Get the precisest pos for this word.

Returns: self.xpos or self.upos

property nonempty_fields

Get the values of nonempty fields as a list.

class hanlp_common.conll.CoNLLSentence(words=None)[source]

A list of CoNLLWord or CoNLLUWord. It is a sub-class of list and its words can be accessed in the same way as accessing list elements.

Parameters

words (list[Union[CoNLLWord, CoNLLUWord]]) – A list of words.

static from_dict(d: dict, conllu=False)[source]

Build a CoNLLSentence from a dict.

Parameters
  • d – A dict storing a list for each field, where each index corresponds to a token.

  • conlluTrue to build CoNLLUWord for each token.

Returns

A CoNLLSentence.

static from_file(path: str, conllu=False)[source]

Build a CoNLLSentence from .conllx or .conllu file

Parameters
  • path – Path to the file.

  • conlluTrue to build CoNLLUWord for each token.

Returns

A CoNLLSentence.

static from_str(conll: str, conllu=False)[source]

Build a CoNLLSentence from CoNLL-X format str

Parameters
  • conll (str) – CoNLL-X or CoNLL-U format string

  • conlluTrue to build CoNLLUWord for each token.

Returns

A CoNLLSentence.

property projective

True if this tree is projective.

to_markdown(headings: Union[str, List[str]] = 'auto') str[source]

Convert into markdown string.

Parameters

headingsauto to automatically detect the word type. When passed a list of string, they are treated as headings for each field.

Returns

A markdown representation of this sentence.

to_tree(extras: Optional[List[str]] = None) str[source]

Convert into a pretty tree string which can be printed to show the tree structure.

Parameters

extras – Extra table to be aligned to this tree.

Returns

A pretty tree string along with extra table if passed any.