Tag Archives: OpenGrm

Using the grapheme-to-phoneme feature in CMU Sphinx-4

Foreword This article summarizes and updates the previous articles [1] related to the new grapheme-to-phoneme (g2p) feature in CMU Sphinx-4 speech recognizer [2]. In order to support automatic g2p transcription in Sphinx-4 there were created a new weighted finite state transducers (wfst) in java [3] which its current API will be presented in a future… Read More »

Automating the creation of joint multigram language models as WFST: Part 2

(originally posted at http://cmusphinx.sourceforge.net/2012/06/automating-the-creation-of-joint-multigram-language-models-as-wfst-part-2/) Foreword This a article presents an updated version of the model training application originally discussed in [1], considering the compatibility issues with phonetisaurus decoder as presented in [2]. The updated code introduces routines to regenerate a new binary fst model compatible with phonetisaurus’ decoder as suggested in [2] which will be… Read More »

Compatibility issues using binary fst models generated by OpenGrm NGram Library with phonetisaurus decoder

(originally posted at http://cmusphinx.sourceforge.net/2012/06/compatibility-issues-using-binary-fst-models-generated-by-opengrm-ngram-library-with-phonetisaurus-decoder/) Foreword Previous articles have shown how to use OpenGrm NGram Library for the encoding of joint multigram language models as WFST [1] and provided the code that simplifies and automates the fst model training [2]. As described in [1] the generated binary fst models with the procedures described in those articles… Read More »

Automating the creation of joint multigram language models as WFST

Notice: This article is outdated. The application described here is now part of the SphinxTrain application. Please refer to recent articles in CMUSphinx category for the latest info. (originally posted at http://cmusphinx.sourceforge.net/2012/06/automating-the-creation-of-joint-multigram-language-models-as-wfst/) Foreword Previous articles have introduced the C++ code to align a pronounciation dictionary [1] and how this aligned dictionary can be used in… Read More »

Using OpenGrm NGram Library for the encoding of joint multigram language models as WFST

(originally posted at http://cmusphinx.sourceforge.net/2012/06/using-opengrm-ngram-library-for-the-encoding-of-joint-multigram-language-models-as-wfst/) Foreword This article will review the OpenGrm NGram Library [1] and its usage for language modeling in ASR. OpenGrm makes use of functionality in the openFST library [2] to create, access and manipulate n-gram language models and it can be used as the language model training toolkit for integrating phonetisaurus’ model… Read More »