Porting openFST to java: Part 3

July 1, 2012 by

Notice: Parts of this article may be outdated. There are many changes to its API and performance improvements recently in the java fst framework. Please refer to recent articles in Java FST Framework category for the latest info.

(originally posted at http://cmusphinx.sourceforge.net/2012/07/porting-openfst-to-java-part-3/)

Foreword

This article, the third in a series regarding, porting openFST to java, introduces the latest update to the java code, which resolve the previously raised issues regarding the java fst architecture in general and its compatibility with the original openFST format for saving models. [1]

1. Code Changes

1.1. Simplified java generics usage

As suggested in [1], the latest java fst code revision (11456), available in the cmusphinx SVN Repository [2],  assumes only the base Weight class and modifies the State, Arc and Fst classes definition to simply use a type parameter.

The above modifications provide an easier to use api. As an example the construction of a basic FST in the class edu.cmu.sphinx.fst.demos.basic.FstTest is simplified as follows

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
...
Fst<Double> fst = new Fst<Double>();
 
// State 0
State<Double> s = new State<Double>(); 
s.AddArc(new Arc<Double>(new Weight<Double>(0.5), 1, 1, 1));
s.AddArc(new Arc<Double>(new Weight<Double>(1.5), 2, 2, 1));
fst.AddState(s);
 
// State 1
s = new State<Double>();
s.AddArc(new Arc<Double>(new Weight<Double>(2.5), 3, 3, 2));
fst.AddState(s);
 
// State 2 (final)
s = new State<Double>(new Weight<Double>(3.5));
fst.AddState(s);
...

1.2. openFST models compatibilty

Besides the simplified java generics usage above, the most important change is the code to load an openFST model in text format and convert it to a java fst serialized model. This is achieved also in the latest java fst code revision (11456) [2].

2. Converting openFST models to java

2.1. Installation

The procedure below is tested on an Intel CPU running openSuSE 12.1 x64 with gcc 4.6.2, Oracle Java Platform (JDK) 7u5, and ant 1.8.2.

In order to convert an openFST model in text format to java fst model, the first step is to checkout from the cmusphinx SVN repository the latest java fst code revision:

# svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/branches/g2p/fst

Next step is to build the java fst code
# cd fst
# ant jar
Buildfile: <path-to>/fst/build.xml
jar:
build-subprojects:
init:
[mkdir] Created dir: <path-to>/fst/bin
build-project:
[echo] fst: <path-to>/fst/build.xml
[javac] <path-to>/fst/build.xml:38: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
[javac] Compiling 10 source files to <path-to>/fst/bin
[javac] <path-to>/fst/build.xml:42: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
build:
[jar] Building jar: <path-to>/fst/fst.jar
BUILD SUCCESSFUL
Total time: 2 seconds
#

2.2. Usage

Having completed the installation as described above, and trained an openfst model named binary.fst as described in [3], with the latest model training code revision (11455) [4] the model is also saved in the openFST text format in a file named binary.fst.txt. The conversion to a java fst model is performed using the openfst2java.sh which can be found in the root directory of the java fst code. The openfst2java.sh accepts two parameters being the openfs input text model and the java fst output model as follows:

# ./openfst2java.sh binary.fst.txt binary.fst.ser
Parsing input model...
Saving as binary java fst model...
Import completed.
Total States Imported: 1091667
Total Arcs Imported: 2652251
#

The newly generated binary.fst.ser model can then be loaded in java, as follows:

1
2
3
4
5
6
7
8
9
try {
	Fst<Double> fst = (Fst<Double>) Fst.loadModel("binary.fst.ser");
} catch (ClassNotFoundException e) {
	// TODO Auto-generated catch block
	e.printStackTrace();
} catch (IOException e) {
	// TODO Auto-generated catch block
	e.printStackTrace();
}

3. Performance: Memory Usage

Testing the conversion and loading of the cmudict.fst model generated in [3], reveal that the conversion task requires about 1.0GB and the loading of the converted model requires about 900MB of RAM.

4. Conclusion – Future Works

Having the ability to convert and load an openFST model in java, takes the “Letter to Phoneme Conversion in CMU Sphinx-4” project to the next step, which is the port of phonetisaurus decoder to java which will eventually lead to its integration with cmusphinx 4.

A major concern at this point is the high memory utilization while loading large models. Although it is expected for java applications to consume more memory compared to a similar C++ application, this could be a problem especially when running in low end machines and needs further investigation and optimization (if possible).

References

[1] J. Salatas, “Porting openFST to java: Part 2”, ICT Research Blog, May 2012.

[2] Java fst SVN (Revision 11456)

[3] J. Salatas, “Automating the creation of joint multigram language models as WFST: Part 2”ICT Research Blog, June 2012.

[4] openFST model training SVN (Revision 11455)

Leave a Reply

Your email address will not be published. Required fields are marked *