CLEANEVAL home page
CLEANEVAL is a shared task and competitive evaluation on the topic of cleaning arbitrary web pages, with the goal of preparing web data for use as a corpus, for linguistic and language technology research and development.
The first Cleaneval took place (for Chinese and English) over the summer of 2007, with a workshop in Belgium in September (3rd Web as Corpus workshop (WAC3), proceedings here).
- Report on Cleaneval-1, including results here
- Annotation guidelines here.
- Scoring software here.
- Development dataset here.
- Mailing list: Discussion and announcements will take place on the SIGWAC list
- Co-ordinators
CLEANEVAL is an activity of ACL-SIGWAC, the Association for Computational Linguistics (ACL) Special Interest Group on Web as Corpus.
Last modified: Fri Nov 16 12:40:02 GMT Standard Time 2007