## Java FST framework API Review

**Foreword**

This article summarizes and updates various previous articles [1] related to the implementation of a java weighted finite states transducers framework that can use existing openFst [2] models or export java fst object to openFst format and which is available at the CMUSphinx SVN reopsitory at [3].

The following sections include brief descriptions of the main parts and functionality of the framework. In addition to these descriptions, the full java docs are available at [4]

**1. Semirings**

As described in [5] the fst’s states and arcs weights may represent any set so long as they form a semiring. The semirings related classes are located in `edu.cmu.sphinx.fst.semiring`

package.

There are 3 different semiring implementations `TropicalSemiring`

, `LogSemiring`

and `ProbabilitySemiring`

all inheriting the abstract `Semiring`

class and all of them accept float values.

**2. Basic fst classes**

The basic fst classes are located under the `edu.cmu.sphinx.fst`

package.

There exist a mutable and an immutable fst implementations in `Fst`

and `ImmutableFst`

classes respectively. The mutable fst holds an ArrayList of `State`

objects allowing additions/deletions. On the other hand the immutable fst holds a fixed size array of `ImmutableState`

objects not allowing additions/deletions.

Similar to the mutable and immutable fst implementations above, a mutable `State`

object holds its outgoing `Arc`

objects in an ArrayList allowing additions/deletions, in contrast with an `ImmutableState`

which holds its outgoing `Arc`

objects in a fixed size array not allowing additions/deletions.

Finally the `Arc`

class implement the fst’s arc functionality, containing basically properties and their getters and setters methods.

**3. Fst operations**

The supported fst operations are located under the `edu.cmu.sphinx.fst.operations`

package and include the following classes

`ArcSort`

Sorts the arcs in an FST per state. Sorting can be applied either on input or output label based on the provided comparator.

`Compose`

Computes the composition of two Fsts. The two Fsts are augmented in order to avoid multiple epsilon paths in the resulting Fst. [6]

`Connect`

Trims an Fst, removing states and arcs that are not on successful paths.

`Determinize`

Determinizes an fst providing an equivalent fst that has the property that no state has two transitions with the same input label. For this algorithm, epsilon transitions are treated as regular symbols. [7]

`ExtendFinal`

Adds a new final state with a 0.0 (Semiring’s 1) final wight and connects the current final states to it using epsilon transitions with weight equal to the original final state’s weight.

`NShortestPaths`

Calculates the shortest distances from each state to the final. [8]

`Project`

Projects an fst onto its domain or range by either copying each arc’s input label to its output label or vice versa.

`Reverse`

Reverses an fst.

`RmEpsilon`

Removes epsilon transitions from an fst. It return a new epsilon-free fst and does not modify the original fst

**4. Working with openFst models**

The class `Convert`

in `edu.cmu.sphinx.fst.openfst`

package provides the required methods to read (`importFst`

) or write (`exportFst`

) an openFst model in text format. In the same package there are also two additional classes named `Import`

and `Export`

for exposing the import/export functionality through main functions to a shell command.

**Conclusion**

The java fst framework described in this article and its implemented functionality, were created for the needs of the to the new grapheme-to-phoneme (g2p) feature in CMU Sphinx-4 speech recognizer [9].

It’s usage and extensive testing in the sphinx-4 g2p decoder suggest that the java fst framework and its implemented functionality are usable in general, although it may luck functionality required in different applications (eg. additional operations).

**References**

[3] Java FST Framework SVN Repository

[5] J. Salatas, “Porting openFST to java: Part 1”, ICT Research Blog, May 2012.

[6] M. Mohri, “Weighted automata algorithms”, Handbook of Weighted Automata. Springer, pp. 213-250, 2009.

[7] M. Mohri, “Finite-State Transducers in Language and Speech Processing”, Computational Linguistics, 23:2, 1997.

[8] M. Mohri, “Semiring Framework and Algorithms for Shortest-Distance Problems”, Journal of Automata, Languages and Combinatorics, 7(3), pp. 321-350, 2002.

[9] J. Salatas, “Using the grapheme-to-phoneme feature in CMU Sphinx-4”, ICT Research Blog, May 2012.

fst, java, java fst, language modeling, Natural Language Processing, NLP, wfst