References

Bojanowski et al. 2017

Bojanowski P., Grave E., Joulin A., & Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135–146 (2017).

Buchholz & Marsi 2006

Buchholz S. & Marsi E. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), 149–164. New York City, June 2006. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/W06-2920.

Chang et al. 2009

Chang P., Tseng H., Jurafsky D., & Manning C. Discriminative reordering with Chinese grammatical relations features. In Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009, 51–59. Boulder, Colorado, June 2009. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/W09-2307.

Clark et al. 2019

Clark K., Luong M., Khandelwal U., Manning C., & Le Q. BAM! born-again multi-task networks for natural language understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 5931–5937. Florence, Italy, July 2019. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/P19-1595, doi:10.18653/v1/P19-1595.

Clark et al. 2020

Clark K., Luong M., Le Q., & Manning C. ELECTRA: pre-training text encoders as discriminators rather than generators. In ICLR. 2020. URL: https://openreview.net/pdf?id=r1xMH1BtvB.

Collins & Koo 2005

Collins M. & Koo T. Discriminative reranking for natural language parsing. Computational Linguistics 31, 25–70 (2005).

Conneau et al. 2020

Conneau A., Khandelwal K., Goyal N., Chaudhary V., Wenzek G., et al. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 8440–8451. Online, July 2020. Association for Computational Linguistics. URL: https://aclanthology.org/2020.acl-main.747, doi:10.18653/v1/2020.acl-main.747.

De 1959

De R. File searching using variable length keys. In Papers Presented at the the March 3-5, 1959, Western Joint Computer Conference, IRE-AIEE-ACM ‘59 (Western), 295–298. New York, NY, USA, 1959. Association for Computing Machinery. URL: https://doi.org/10.1145/1457838.1457895, doi:10.1145/1457838.1457895.

Devlin et al. 2019

Devlin J., Chang M., Lee K., & Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/N19-1423, doi:10.18653/v1/N19-1423.

Dozat & Manning 2017

Dozat T. & Manning C. Deep Biaffine Attention for Neural Dependency Parsing. In Proceedings of the 5th International Conference on Learning Representations, ICLR’17. 2017. URL: https://openreview.net/pdf?id=Hk95PK9le.

Dozat et al. 2017

Dozat T., Qi P., & Manning C. Stanford’s graph-based neural dependency parser at the conll 2017 shared task. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 20–30. 2017.

He & Choi 2020

He H. & Choi J. Establishing strong baselines for the new decade: sequence tagging, syntactic and semantic parsing with bert. In The Thirty-Third International Flairs Conference. 2020. URL: https://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS20/paper/view/18438.

He et al. 2019

He H., Wu L., Yan H., Gao Z., Feng Y., et al. Effective neural solution for multi-criteria word segmentation. In Smart Intelligent Computing and Applications, pages 133–142. Springer, 2019.

He et al. 2018a

He H., Wu L., Yang X., Yan H., Gao Z., et al. Dual long short-term memory networks for sub-character representation learning. In Information Technology-New Generations, pages 421–426. Springer, 2018a.

He et al. 2018b

He L., Lee K., Levy O., & Zettlemoyer L. Jointly predicting predicates and arguments in neural semantic role labeling. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 364–369. Melbourne, Australia, July 2018b. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/P18-2058, doi:10.18653/v1/P18-2058.

Koehn 2005

Koehn P. Europarl: a parallel corpus for statistical machine translation. In MT summit, volume 5, 79–86. Citeseer, 2005.

Kondratyuk & Straka 2019

Kondratyuk D. & Straka M. 75 languages, 1 model: parsing universal dependencies universally. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2779–2795. Hong Kong, China, 2019. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/D19-1279.

Lafferty et al. 2001

Lafferty J., McCallum A., & Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. Departmental Papers (CIS), (2001).

Lan et al. 2020

Lan Z., Chen M., Goodman S., Gimpel K., Sharma P., et al. Albert: a lite bert for self-supervised learning of language representations. In International Conference on Learning Representations. 2020. URL: https://openreview.net/forum?id=H1eA7AEtvS.

Levow 2006

Levow G. The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, 108–117. Sydney, Australia, July 2006. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/W06-0115.

Pennington et al. 2014

Pennington J., Socher R., & Manning C. GloVe: global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. Doha, Qatar, October 2014. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/D14-1162, doi:10.3115/v1/D14-1162.

Pradhan et al. 2012

Pradhan S., Moschitti A., Xue N., Uryupina O., & Zhang Y. CoNLL-2012 shared task: modeling multilingual unrestricted coreference in OntoNotes. In Joint Conference on EMNLP and CoNLL - Shared Task, 1–40. Jeju Island, Korea, July 2012. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/W12-4501.

Schweter & Ahmed 2019

Schweter S. & Ahmed S. Deep-EOS: General-Purpose Neural Networks for Sentence Boundary Detection. In Proceedings of the 15th Conference on Natural Language Processing (KONVENS). 2019. accepted.

Smith & Smith 2007

Smith D. & Smith N. Probabilistic models of nonprojective dependency trees. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 132–140. Prague, Czech Republic, June 2007. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/D07-1014.

Tjong & De 2003

Tjong E. & De F. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, 142–147. 2003. URL: https://www.aclweb.org/anthology/W03-0419.

Wang & Xu 2017

Wang C. & Xu B. Convolutional neural network with word embeddings for Chinese word segmentation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 163–172. Taipei, Taiwan, November 2017. Asian Federation of Natural Language Processing. URL: https://www.aclweb.org/anthology/I17-1017.

Xiao et al. 2021

Xiao D., Li Y., Zhang H., Sun Y., Tian H., et al. ERNIE-gram: pre-training with explicitly n-gram masked language modeling for natural language understanding. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1702–1715. Online, June 2021. Association for Computational Linguistics. URL: https://aclanthology.org/2021.naacl-main.136, doi:10.18653/v1/2021.naacl-main.136.

Xue et al. 2021

Xue L., Constant N., Roberts A., Kale M., Al-Rfou R., et al. MT5: a massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 483–498. Online, June 2021. Association for Computational Linguistics. URL: https://aclanthology.org/2021.naacl-main.41, doi:10.18653/v1/2021.naacl-main.41.

Xue et al. 2016

Xue N., Zhang, Xiuhong, Jiang, Zixin, Palmer, Martha, Xia, Fei, et al. Chinese treebank 9.0. 2016. URL: https://catalog.ldc.upenn.edu/LDC2016T13, doi:10.35111/GVD0-XK91.

Yu et al. 2020

Yu J., Bohnet B., & Poesio M. Named entity recognition as dependency parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6470–6476. Online, July 2020. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/2020.acl-main.577, doi:10.18653/v1/2020.acl-main.577.

Zhang et al. 2020

Zhang Y., Zhou H., & Li Z. Fast and accurate neural crf constituency parsing. In Bessiere C., editor, Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, 4046–4053. International Joint Conferences on Artificial Intelligence Organization, 7 2020. Main track. URL: https://doi.org/10.24963/ijcai.2020/560, doi:10.24963/ijcai.2020/560.

Zhang & Clark 2008

Zhang Y. & Clark S. A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 562–571. Honolulu, Hawaii, October 2008. Association for Computational Linguistics. URL: https://www.aclweb.org/anthology/D08-1059.