Rich and Dynamic Treebank for HPSG* (Head-driven Phrase Structure Grammar)

From The Theme

What if we could advance the capabilities of Natural Language Processing (NLP) by developing a parsed text database, or treebank, that was rich, flexible and dynamic?

Rich and Dynamic Treebank Image

We set out to design and develop a dynamic parsed text database (or treebank) for collecting and analyzing natural language. Our goal was to allow linguistic data to be retrieved from the treebank in varying granularity, and to enable regular updates to the treebank so that it could evolve.

The project, named LinGO Redwoods, built the foundations for this new type of treebank, including a set of software tools for annotation and dynamic updates, a database of hand-disambiguited analysis for 10,000 utterances and baseline results for a variety of parse selection models. The results of this seeding activity serve as a Proof of Concept for the proposed new methodology, enabling researchers to circulate the approach with a wider academic and industrial audience and expand the treebank.

Project results at LinGO Redwoods

Stephan Oepen, Dan Flickinger, Kristina Toutanova, Christoper D. Manning. 2002. LinGO Redwoods: A Rich and Dynamic Treebank for HPSG In Proceedings of The First Workshop on Treebanks and Linguistic Theories (TLT2002), Sozopol, Bulgaria.

Kristina Toutanova, Christoper D. Manning, Stephan Oepen. 2002. Parse Ranking for a Rich HPSG Grammar In Proceedings of The First Workshop on Treebanks and Linguistic Theories (TLT2002), Sozopol, Bulgaria.

Stephan Oopen, Kristina Toutanova, Stuart Shieber, Christopher Manning, Dan Flickinger and Thorsten Brants (2002) The LinGO Redwoods Treebank: Motivation and Preliminary Applications In Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, pp. 1253-7

Kristina Toutanova and Christopher D. Manning. 2002. Feature Selection for a Rich HPSG Grammar Using Decision Trees. Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002). Taipei, Taiwan. [pdf]

Stephan Oepen, Ezra Callahan, Dan Flickinger, Christopher D. Manning, and Kristina Toutanova. 2002. LinGO Redwoods: A Rich and Dynamic Treebank for HPSG. Beyond PARSEVAL workshop at the Third International Conference on Language Resources and Evaluation (LREC 2002) Las Palmas, Spain. [pdf]

Christopher ManningChristopher Manning is a Professor of Computer Science and Linguistics at Stanford University. His research goal is computers that can intelligently process, understand, and generate human language material. Manning is a leader in applying Deep Learning to Natural Language Processing, with well-known research on Tree Recursive Neural Networks, sentiment analysis, neural network dependency parsing, the GloVe model of word vectors, neural machine translation, and deep language understanding.

Stephan OepenStephan Oepen is Professor in Computational Linguistics at the University of Oslo, where he heads the Division for Language Technology at the University of Oslo. Since 2000, he has also been a Senior Researcher at the Center for the Study of Language and Information, Stanford University. Stephan Oepen studied Linguistics, German and Russian Philology, Computer Science, and Computational Linguistics at Berlin, Volgograd, and Saarbrücken.

Carl VogelCarl Vogel is Associate Professor in Computation Linguistics at Trinity College, University of Dublin, where he also serves as the Director of the Trinity Centre for Computing and Language Studies. His work in computational linguistics as a cognitive science frequently draws upon evidence abstracted from Internet accessed data, and accordingly he dwells on the accompanying research methodology issues. Dr. Vogel contributes in areas of practical computation, representation of grammatical structures, and general HPSG theory (feature logic background, syntax – semantics interface).