Tag Archives: NLP

Java FST framework API Review

Foreword This article summarizes and updates various previous articles [1] related to the implementation of a java weighted finite states transducers framework that can use existing openFst [2] models or export java fst object to openFst format and which is available at the CMUSphinx SVN reopsitory at [3]. The following sections include brief descriptions of… Read More »

Automating the creation of joint multigram language models as WFST: Part 2

(originally posted at http://cmusphinx.sourceforge.net/2012/06/automating-the-creation-of-joint-multigram-language-models-as-wfst-part-2/) Foreword This a article presents an updated version of the model training application originally discussed in [1], considering the compatibility issues with phonetisaurus decoder as presented in [2]. The updated code introduces routines to regenerate a new binary fst model compatible with phonetisaurus’ decoder as suggested in [2] which will be… Read More »

Compatibility issues using binary fst models generated by OpenGrm NGram Library with phonetisaurus decoder

(originally posted at http://cmusphinx.sourceforge.net/2012/06/compatibility-issues-using-binary-fst-models-generated-by-opengrm-ngram-library-with-phonetisaurus-decoder/) Foreword Previous articles have shown how to use OpenGrm NGram Library for the encoding of joint multigram language models as WFST [1] and provided the code that simplifies and automates the fst model training [2]. As described in [1] the generated binary fst models with the procedures described in those articles… Read More »

Automating the creation of joint multigram language models as WFST

Notice: This article is outdated. The application described here is now part of the SphinxTrain application. Please refer to recent articles in CMUSphinx category for the latest info. (originally posted at http://cmusphinx.sourceforge.net/2012/06/automating-the-creation-of-joint-multigram-language-models-as-wfst/) Foreword Previous articles have introduced the C++ code to align a pronounciation dictionary [1] and how this aligned dictionary can be used in… Read More »

Using OpenGrm NGram Library for the encoding of joint multigram language models as WFST

(originally posted at http://cmusphinx.sourceforge.net/2012/06/using-opengrm-ngram-library-for-the-encoding-of-joint-multigram-language-models-as-wfst/) Foreword This article will review the OpenGrm NGram Library [1] and its usage for language modeling in ASR. OpenGrm makes use of functionality in the openFST library [2] to create, access and manipulate n-gram language models and it can be used as the language model training toolkit for integrating phonetisaurus’ model… Read More »

Porting phonetisaurus many-to-many alignment python script to C++

Notice: This article is outdated. The application described here is now part of the SphinxTrain application. Please refer to recent articles in CMUSphinx category for the latest info. (originally posted at http://cmusphinx.sourceforge.net/2012/05/porting-phonetisaurus-many-to-many-alignment-python-script-to-c/) Foreword Following our previous article on phonetisaurus [1] and the decision to use this framework as the g2p conversion method for my GSoC… Read More »