Wapiti - A simple and fast discriminative sequence labelling toolkit

Table of contents

Introduction

Wapiti is a very fast toolkit for segmenting and labeling sequences with discriminative models. It is based on maxent models, maximum entropy Markov models and linear-chain CRF and proposes various optimization and regularization methods to improve both the computational complexity and the prediction performance of standard models. Wapiti is ranked first on the sequence tagging task for more than a year on MLcomp web site.

Wapiti is developed by LIMSI-CNRS and was partially funded by ANR projects CroTaL (ANR-07-MDCO-003) and MGA (ANR-07-BLAN-0311-02).

For suggestions, comments, or patchs, you can contact me at lavergne@limsi.fr

If you use Wapiti for research purpose, please use the following citation:

@inproceedings{lavergne2010practical,
	author    = {Lavergne, Thomas and Capp\'{e}, Olivier and Yvon, Fran\c{c}ois},
	title     = {Practical Very Large Scale {CRFs}},
	booktitle = {Proceedings the 48th Annual Meeting of the Association for
        		Computational Linguistics ({ACL})},
	month     = {July},
	year      = {2010},
	location  = {Uppsala, Sweden},
	publisher = {Association for Computational Linguistics},
	pages     = {504--513},
	url       = {http://www.aclweb.org/anthology/P10-1052}
}

News

18/12/2013

Release v1.5.0: Update mode and bug fixes

09/03/2013

GIT repository moved to GitHub

23/04/2012

Release v1.4.0: Forced decoding, optimizer state, and bug fixes

Old news

Features

Download

Models

The following models can be downloaded and used with Wapiti. We provide models for POS-tagging of english, german, and arabic data, as well as a model to joinly perform POS-tagging and segmentation of arabic.

Regularization

Regularization improves numerical stability of the training, reduces the risk of overfitting and allows to automatically select relevant features. Wapiti implements the most used regularization method.

Optimization algorithms

Learning a CRF model amounts to minimizing numerically the empirical risk. Wapiti provides different optimization algorithms:

Licence

Wapiti is licenced under the term of the two-clause BSD Licence:

Copyright (c) 2009-2013  CNRS
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.