Month: August 2012 | ICT Research Blog

Foreword

This article summarizes and updates various previous articles [1] related to the implementation of a java weighted finite states transducers framework that can use existing openFst [2] models or export java fst object to openFst format and which is available at the CMUSphinx SVN reopsitory at [3].

The following sections include brief descriptions of the main parts and functionality of the framework. In addition to these descriptions, the full java docs are available at [4]

1. Semirings

As described in [5] the fst’s states and arcs weights may represent any set so long as they form a semiring. The semirings related classes are located in edu.cmu.sphinx.fst.semiring package.

There are 3 different semiring implementations TropicalSemiring, LogSemiring and ProbabilitySemiring all inheriting the abstract Semiring class and all of them accept float values.

2. Basic fst classes

The basic fst classes are located under the edu.cmu.sphinx.fst package.

There exist a mutable and an immutable fst implementations in Fst and ImmutableFst classes respectively. The mutable fst holds an ArrayList of State objects allowing additions/deletions. On the other hand the immutable fst holds a fixed size array of ImmutableState objects not allowing additions/deletions.

Similar to the mutable and immutable fst implementations above, a mutable State object holds its outgoing Arc objects in an ArrayList allowing additions/deletions, in contrast with an ImmutableState which holds its outgoing Arc objects in a fixed size array not allowing additions/deletions.

Finally the Arc class implement the fst’s arc functionality, containing basically properties and their getters and setters methods.

3. Fst operations

The supported fst operations are located under the edu.cmu.sphinx.fst.operations package and include the following classes

ArcSort Sorts the arcs in an FST per state. Sorting can be applied either on input or output label based on the provided comparator.

Compose Computes the composition of two Fsts. The two Fsts are augmented in order to avoid multiple epsilon paths in the resulting Fst. [6]

Connect Trims an Fst, removing states and arcs that are not on successful paths.

Determinize Determinizes an fst providing an equivalent fst that has the property that no state has two transitions with the same input label. For this algorithm, epsilon transitions are treated as regular symbols. [7]

ExtendFinal Adds a new final state with a 0.0 (Semiring’s 1) final wight and connects the current final states to it using epsilon transitions with weight equal to the original final state’s weight.

NShortestPaths Calculates the shortest distances from each state to the final. [8]

Project Projects an fst onto its domain or range by either copying each arc’s input label to its output label or vice versa.

Reverse Reverses an fst.

RmEpsilon Removes epsilon transitions from an fst. It return a new epsilon-free fst and does not modify the original fst

4. Working with openFst models

The class Convert in edu.cmu.sphinx.fst.openfst package provides the required methods to read (importFst) or write (exportFst) an openFst model in text format. In the same package there are also two additional classes named Import and Export for exposing the import/export functionality through main functions to a shell command.

Conclusion

The java fst framework described in this article and its implemented functionality, were created for the needs of the to the new grapheme-to-phoneme (g2p) feature in CMU Sphinx-4 speech recognizer [9].

It’s usage and extensive testing in the sphinx-4 g2p decoder suggest that the java fst framework and its implemented functionality are usable in general, although it may luck functionality required in different applications (eg. additional operations).

References

[1] Java FST Framework

[2] OpenFst Library Home Page

[3] Java FST Framework SVN Repository

[4] FST Framework javadocs

[5] J. Salatas, “Porting openFST to java: Part 1”, ICT Research Blog, May 2012.

[6] M. Mohri, “Weighted automata algorithms”, Handbook of Weighted Automata. Springer, pp. 213-250, 2009.

[7] M. Mohri, “Finite-State Transducers in Language and Speech Processing”, Computational Linguistics, 23:2, 1997.

[8] M. Mohri, “Semiring Framework and Algorithms for Shortest-Distance Problems”, Journal of Automata, Languages and Combinatorics, 7(3), pp. 321-350, 2002.

[9] J. Salatas, “Using the grapheme-to-phoneme feature in CMU Sphinx-4”, ICT Research Blog, May 2012.

Foreword

This article summarizes and updates the previous articles [1] related to the new grapheme-to-phoneme (g2p) feature in CMU Sphinx-4 speech recognizer [2].

In order to support automatic g2p transcription in Sphinx-4 there were created a new weighted finite state transducers (wfst) in java [3] which its current API will be presented in a future article. There were also created various new applications for which its installation procedure and usage will be presented in the following sections.

The procedures presented here were verified using openSuSE 12.1 x64 under a VirtualBox machine, but should apply to all recent linux distributions (either 32 or 64 bit). They assume that you are logged in a user test and all required software is saved under /home/test/cmusphinx directory. As a final note, the various commands outputs where omitted in this article, but should be watched for any errors or information especially in case of troubleshooting.

1. Installation

1.1. Required 3rd party libraries and applications

The following 3rd libraries should be installed in your system, before installing and running the main applications. As a notice these are only required in order to train new g2p models. They are not required if you want to use a g2p model in Sphinx-4.

1.1.1. OpenFst

OpenFst [4] is a library written in C++ for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). You can download the latest version available at [4]. This article uses version 1.3.2.

test@linux:~/cmusphinx> wget http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.3.2.tar.gz ... test@linux:~/cmusphinx> tar -xzf openfst-1.3.2.tar.gz test@linux:~/cmusphinx> cd openfst-1.3.2/ test@linux:~/cmusphinx/openfst-1.3.2> ./configure --enable-compact-fsts --enable-const-fsts --enable-far --enable-lookahead-fsts --enable-pdt ... test@linux:~/cmusphinx/openfst-1.3.2> make ... test@linux:~/cmusphinx/openfst-1.3.2> sudo make install ... test@linux:~/cmusphinx/openfst-1.3.2> cd ..

1.1.2. OpenGrm NGram

The OpenGrm NGram library [5] is used for making and modifying n-gram language models encoded as weighted finite-state transducers (FSTs). It makes use of functionality in the OpenFst library to create, access and manipulate n-gram models. You can download the latest version available at [5]. This article uses version 1.0.3.

test@linux:~/cmusphinx> wget http://www.openfst.org/twiki/pub/GRM/NGramDownload/opengrm-ngram-1.0.3.tar.gz ... test@linux:~/cmusphinx> tar -xzf opengrm-ngram-1.0.3.tar.gz test@linux:~/cmusphinx> cd opengrm-ngram-1.0.3/ test@linux:~/cmusphinx/opengrm-ngram-1.0.3> ./configure ... test@linux:~/cmusphinx/opengrm-ngram-1.0.3> make ...

In case the make command fail to complete in 64bit operating systems, try re-executing the configure command and rerun make as follows

test@linux:~/cmusphinx/opengrm-ngram-1.0.3> ./configure LDFLAGS=-L/usr/local/lib64/fst ... test@linux:~/cmusphinx/opengrm-ngram-1.0.3> make ... test@linux:~/cmusphinx/opengrm-ngram-1.0.3> sudo make install ... test@linux:~/cmusphinx/opengrm-ngram-1.0.3> cd ..

1.2. Main applications

1.2.1. SphinxTrain

Having openFst and openGrm libraries installed, the training of a new model can be achieved in SphinxTrain while training a new acoustic model [6].

test@linux:~/cmusphinx> svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinxbase ... test@linux:~/cmusphinx> svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinxbase ... test@linux:~/cmusphinx> svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/pocketsphinx ... test@linux:~/cmusphinx> svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/SphinxTrain ... test@linux:~/cmusphinx> cd sphinxbase/ test@linux:~/cmusphinx/sphinxbase> ./autogen.sh ... test@linux:~/cmusphinx/sphinxbase> make ... test@linux:~/cmusphinx/sphinxbase> sudo make install ... test@linux:~/cmusphinx> cd ../pocketsphinx/ test@linux:~/cmusphinx/sphinxbase> ./autogen.sh ... test@linux:~/cmusphinx/sphinxbase> make ... test@linux:~/cmusphinx/sphinxbase> sudo make install ... test@linux:~/cmusphinx/sphinxbase> cd ../SphinxTrain/ test@linux:~/cmusphinx/SphinxTrain> ./autogen.sh –enable-g2p-decoder ... test@linux:~/cmusphinx/SphinxTrain> make ... test@linux:~/cmusphinx/SphinxTrain> sudo make install ... test@linux:~/cmusphinx/SphinxTrain>

1.2.2. Sphinx-4

The g2p decoding functionality was introduced in revision 11556 in SVN. Further to sphinx-4, you need also to checkout the latest revision of the java fst framework

test@linux:~/cmusphinx> svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/branches/g2p/fst ... test@linux:~/cmusphinx> svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/sphinx4 ... test@linux:~/cmusphinx> cd fst test@linux:~/cmusphinx/fst> ant jar ... test@linux:~/cmusphinx/fst> cp dist/fst.jar ../sphinx4/lib/ test@linux:~/cmusphinx/fst> cd ../sphinx4/lib/ test@linux:~/cmusphinx/sphinx4/lib> ./jsapi.sh ... test@linux:~/cmusphinx/sphinx4/lib> cd .. test@linux:~/cmusphinx/sphinx4/lib> ant ... test@linux:~/cmusphinx/sphinx4/lib>

2. Training a g2p model

2.1. Training through SphinxTrain

Training an acoustic model following the instructions found at [6], can train also a g2p model.

As an addition to [6], for the current revision of SphinxTrain (11554). after running the sphinxtrain -t an4 setup command, you need to enable the g2p functionality by setting the $CFG_G2P_MODEL variable in the same file to

$CFG_G2P_MODEL= 'yes';

running sphinxtrain run according to [6] will produce an output related to the g2p model training similar to the following
... MODULE: 0000 train grapheme-to-phoneme model Phase 1: Cleaning up directories: logs... Phase 2: Training g2p model... Phase 3: Evaluating g2p model... INFO: cmd_ln.c(691): Parsing command line: /usr/local/lib/sphinxtrain/phonetisaurus-g2p \ -model /home/test/cmusphinx/an4/g2p/an4.fst \ -input /home/test/cmusphinx/an4/g2p/an4.words \ -beam 1500 \ -words yes \ -isfile yes \ -output_cost yes \ -output /home/test/cmusphinx/an4/g2p/an4.hyp Current configuration: [NAME] [DEFLT] [VALUE] -beam 500 1500 -help no no -input /home/test/cmusphinx/an4/g2p/an4.words -isfile no yes -model /home/test/cmusphinx/an4/g2p/an4.fst -nbest 1 1 -output /home/test/cmusphinx/an4/g2p/an4.hyp -output_cost no yes -sep -words no yes Words: 13 Hyps: 13 Refs: 13 (T)otal tokens in reference: 50 (M)atches: 46 (S)ubstitutions: 1 (I)nsertions: 4 (D)eletions: 3 % Correct (M/T) -- %92.00 % Token ER ((S+I+D)/T) -- %16.00 % Accuracy 1.0-ER -- %84.00 (S)equences: 13 (C)orrect sequences: 8 (E)rror sequences: 5 % Sequence ER (E/S) -- %38.46 % Sequence Acc (1.0-E/S) -- %61.54 Phase 4: Creating pronunciations for OOV words... Phase 5: Merging primary and OOV dictionaries... ...

The training process generates an additional dictionary for training transcription words not found in the dictionary, and creates pronunciations for them using the trained g2p model. After the training is completed, the model can be found under g2p dir. The openfst binary model is the an4.fst file. If you plan to convert it to a java binary model and use it in Sphinx-4, you need also the openfst text format which consist of the main model file (an4.fst.txt) and the two additional symbol files (an4.input.syms and an4.output.syms).

2.2. Training using the standalone application

Although the default training parameters provide a relative low Word Error Rate (WER), it might be possible to further fine tune the various parameters and produce a model with even lower WER. Assuming that the directory /home/test/cmusphinx/dict contains the cmudict.0.6d dictionary, a model can be build directly from the command line as follows
test@linux:~/cmusphinx/dict> export PATH=/usr/local/lib/sphinxtrain/:$PATH test@linux:~/cmusphinx/dict> g2p_train -ifile cmudict.0.6d INFO: cmd_ln.c(691): Parsing command line: g2p_train \ -ifile cmudict.0.6d Current configuration: [NAME] [DEFLT] [VALUE] -eps <eps> <eps> -gen_testset yes yes -help no no -ifile cmudict.0.6d -iter 10 10 -noalign no no -order 6 6 -pattern -prefix model model -prune no no -s1s2_delim -s1s2_sep } } -seq1in_sep -seq1_del no no -seq1_max 2 2 -seq2in_sep -seq2_del no no -seq2_max 2 2 -seq_sep | | -skip _ _ -smooth kneser_ney kneser_ney -theta 0 0.000000e+00 Splitting dictionary: cmudict.0.6d into training and test set Splitting... Using dictionary: model.train Loading... Starting EM... Iteration 1: 1.23585 Iteration 2: 0.176181 Iteration 3: 0.0564651 Iteration 4: 0.0156775 Iteration 5: 0.00727272 Iteration 6: 0.00368118 Iteration 7: 0.00259113 Iteration 8: 0.00118828 Iteration 9: 0.000779152 Iteration 10: 0.000844955 Iteration 11: 0.000470161 Generating best alignments... Generating symbols... Compiling symbols into FAR archive... Counting n-grams... Smoothing model... Minimizing model... Correcting final model... Writing text model to disk... Writing binary model to disk... test@linux:~/cmusphinx/dict>

the model then can be evaluated through the command line as follows

test@linux:~/cmusphinx/dict> /usr/local/lib64/sphinxtrain/scripts/0000.g2p_train/evaluate.py /usr/local/lib/sphinxtrain/ model.fst model.test eval INFO: cmd_ln.c(691): Parsing command line: /usr/local/lib/sphinxtrain/phonetisaurus-g2p \ -model /home/test/cmusphinx/an4/g2p/an4.fst \ -input /home/test/cmusphinx/an4/g2p/an4.words \ -beam 1500 \ -words yes \ -isfile yes \ -output_cost yes \ -output /home/test/cmusphinx/an4/g2p/an4.hyp Current configuration: [NAME] [DEFLT] [VALUE] -beam 500 1500 -help no no -input /home/test/cmusphinx/an4/g2p/an4.words -isfile no yes -model /home/test/cmusphinx/an4/g2p/an4.fst -nbest 1 1 -output /home/test/cmusphinx/an4/g2p/an4.hyp -output_cost no yes -sep -words no yes Words: 12946 Hyps: 12946 Refs: 12946 (T)otal tokens in reference: 82416 (M)atches: 74816 (S)ubstitutions: 6906 (I)nsertions: 1076 (D)eletions: 694 % Correct (M/T) -- %90.78 % Token ER ((S+I+D)/T) -- %10.53 % Accuracy 1.0-ER -- %89.47 (S)equences: 12946 (C)orrect sequences: 7859 (E)rror sequences: 5087 % Sequence ER (E/S) -- %39.29 % Sequence Acc (1.0-E/S) -- %60.71 test@linux:~/cmusphinx/dict>

3. Using a g2p model in Sphinx-4

Having the openfst text format trained model (this should be located in directory /home/test/cmusphinx/dict) of previous section, in order to use it in sphinx-4, it needs to be converted in the java fst binary format, as follows

test@linux:~/cmusphinx/dict> cd ../fst/ test@linux:~/cmusphinx/fst> ./openfst2java.sh ../dict/model ../sphinx4/models/model.fst.ser

and to use it in an application, add the following lines to the dictionary component in the configuration file

        <property name="allowMissingWords" value="true"/>
        <property name="createMissingWords" value="true"/>
        <property name="g2pModelPath" value="file:///home/test/cmusphinx/sphinx4/models/model.fst.ser"/>
        <property name="g2pMaxPron" value="2"/>

notice that the "wordReplacement" property should not exist in the dictionary component. The property "g2pModelPath" should contain a URI pointing to the g2p model in java fst format. The property "g2pMaxPron" holds the value of the number of different pronunciations generated by the g2p decoder for each word. For more information about sphinx-4 configuration can be found at [7].

Conclusion
This article tried to summarize the recent changes related to the new grapheme-to-phoneme (g2p) feature in CMU Sphinx-4 speech recognizer, from a user’s perspective. Other articles will present the API of the new java fst framework, created for the g2p feature, and it will follow a detailed performance review of the g2p decoder and the java fst framework in general.
As a future work suggestion, it would be interesting to evaluate the g2p decoder in automatic speech recognition context as the measure of of pronunciation variants that are correctly produced, and the number of incorrect variants generated, might not be directly related to the quality of the generated pronunciation variants when used in automatic speech recognition[8].

References
[1] “GSoC 2012: Letter to Phoneme Conversion in CMU Sphinx-4”
[2] CMUSphinx Home Page
[3] Java Fst Framework
[4] OpenFst Library Home Page
[5] OpenGrm NGram Library
[6] Training Acoustic Model For CMUSphinx
[7] Sphinx-4 Application Programmer’s Guide
[8] D. Jouvet, D. Fohr, I. Illina, “Evaluating Grapheme-to-Phoneme Converters in Automatic Speech Recognition Context”, IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2012, Japan

Monthly Archives: August 2012

Java FST framework API Review

Using the grapheme-to-phoneme feature in CMU Sphinx-4