Monthly Archives: May 2012

Porting phonetisaurus many-to-many alignment python script to C++

Notice: This article is outdated. The application described here is now part of the SphinxTrain application. Please refer to recent articles in CMUSphinx category for the latest info.

(originally posted at http://cmusphinx.sourceforge.net/2012/05/porting-phonetisaurus-many-to-many-alignment-python-script-to-c/)

Foreword
Following our previous article on phonetisaurus [1] and the decision to use this framework as the g2p conversion method for my GSoC project, this article will describe the port of the dictionary alignment script to C++.

1. Installation
The procedure below is tested on an Intel CPU running openSuSE 12.1 x64 with gcc 4.6.2. Further testing is required for other systems (MacOSX, Windows).

The alignment script requires the openFST library [2] to be installed on your system. Having downloaded, compiled and installed openFST, the first step is to checkout the alignment code from the cmusphinx SVN repository:

$ svn co https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/branches/g2p/train

and compile it

$ cd train $ make g++ -c -g -o src/align.o src/align.cpp g++ -c -g -o src/phonetisaurus/M2MFstAligner.o src/phonetisaurus/M2MFstAligner.cpp g++ -c -g -o src/phonetisaurus/FstPathFinder.o src/phonetisaurus/FstPathFinder.cpp g++ -g -l fst -l dl -o src/align src/align.o src/phonetisaurus/M2MFstAligner.o src/phonetisaurus/FstPathFinder.o $

2. Usage
Having compiled the script, running it without any command line arguments will print out it’s usage, which is similar to that of the original phonetisaurus m2m-aligner python script:
$ cd src $ ./align Usage: ./align [--seq1_del] [--seq2_del] [--seq1_max SEQ1_MAX] [--seq2_max SEQ2_MAX] [--seq1_sep SEQ1_SEP] [--seq2_sep SEQ2_SEP] [--s1s2_sep S1S2_SEP] [--eps EPS] [--skip SKIP] [--seq1in_sep SEQ1IN_SEP] [--seq2in_sep SEQ2IN_SEP] [--s1s2_delim S1S2_DELIM] [--iter ITER] --ifile IFILE --ofile OFILE --seq1_del, Allow deletions in sequence 1. Defaults to false. --seq2_del, Allow deletions in sequence 2. Defaults to false. --seq1_max SEQ1_MAX, Maximum subsequence length for sequence 1. Defaults to 2. --seq2_max SEQ2_MAX, Maximum subsequence length for sequence 2. Defaults to 2. --seq1_sep SEQ1_SEP, Separator token for sequence 1. Defaults to '|'. --seq2_sep SEQ2_SEP, Separator token for sequence 2. Defaults to '|'. --s1s2_sep S1S2_SEP, Separator token for seq1 and seq2 alignments. Defaults to '}'. --eps EPS, Epsilon symbol. Defaults to ''. --skip SKIP, Skip/null symbol. Defaults to '_'. --seq1in_sep SEQ1IN_SEP, Separator for seq1 in the input training file. Defaults to ''. --seq2in_sep SEQ2IN_SEP, Separator for seq2 in the input training file. Defaults to ' '. --s1s2_delim S1S2_DELIM, Separator for seq1/seq2 in the input training file. Defaults to ' '. --iter ITER, Maximum number of iterations for EM. Defaults to 10. --ifile IFILE, File containing sequences to be aligned. --ofile OFILE, Write the alignments to file. $

The two required options are the pronunciation dictionary to align (IFILE) and the file in which the aligned corpus will be saved (OFILE). The script provide default values for all other options and cmudict (v. 0.7a) can be aligned simply by the following command
$ ./align --seq1_del --seq2_del --ifile <path to cmudict> --ofile <path to aligned corpus>

allowing for deletions in both graphemes and phonemes, and

$ ./align --ifile <path to cmudict> --ofile <path to aligned corpus>

not allowing for deletions.

3. Performance

In order to test the new alignment script’s performance in both its results and its requirements for cpu and memory usage, I have performed two tests for aligning of the full cmudict (v. 0.7a) allowing deletions in both sequenses:

$ ./align --seq1_del --seq2_del --ifile ../data/cmudict.dict --ofile ../data/cmudict.corpus.gsoc

and compared with the original phonetisuarus script

$ ./m2m-aligner.py --align ../train/data/cmudict.dict -s2 -s1 --write_align ../train/data/cmudict.corpus

3.1. Alignment

Comparing of the two outputs using the linux diff util, didn’t result in major differences. Minor differences were noticed in case of the alignment double vowels and consonants with a single phoneme, as in the two following examples:

--- cmudict.corpus +++ cmudict.corpus.gsoc .... -B}B O}AO1 R}_ R}R I}IH0 S}S +B}B O}AO1 R}R R}_ I}IH0 S}S .... -B}B O}AO1 S|C}SH H}_ E}_ E}IY0 +B}B O}AO1 S|C}SH H}_ E}IY0 E}_ ....

3.2. CPU memory usage

In the system described above, the average (of two runs) running time for new aligned command was 1h:14m in comparison to an average of 1h:28m of the original phonetisaurus script. Both scripts consumed the same RAM amount (~ 1.7GB).

Conclusion – Future works

This article presented the new g2p align script which seems to produce the same results as the original one and is a little bit faster than that.

Although it should compile and run as expected to any modern linux system, further testing is reeuired for other systems (like MacOSX, windows). We need also to investigated the alignment differnces (compared to the original script) in the vowels and consonants as described above. Although it doesn’t seem critical, it may cause problems later.

References

[1] J. Salatas, “Phonetisaurus: A WFST-driven Phoneticizer – Framework Review”, ICT Research Blog, May 2012.

[2] OpenFst Library

Porting openFST to java: Part 2

Notice: Parts of this article may be outdated. There are many changes to its API and performance improvements recently in the java fst framework. Please refer to recent articles in Java FST Framework category for the latest info.

(originally posted at http://cmusphinx.sourceforge.net/2012/05/porting-openfst-to-java-part-2/)

Foreword

This article, the second in a series regarding, porting openFST to java, briefly presents some additional base classes and raise some issues regarding the java fst architecture in general and its compatibility with the original openFST binary format for saving models.

1. FST java library base architecture
The first article in the series [1] introduced the Weight<T> class, the semiring operations related classes and interfaces and finally a solid class implementation for the tropical semiring based on float weights. In the last svn commit (revision 11363) [2] includes the edu.cmu.sphinx.fst.weight.LogSemiring class which implements a logarithmic semiring based on Double weight values.

Furthermore revision 11363 includes the edu.cmu.sphinx.fst.state.State<W extends Weight<?>>, edu.cmu.sphinx.fst.arc.Arc<W extends Weight<?>> and edu.cmu.sphinx.fst.fst.Fst<W extends Weight<?>> classes which implement the state, arc and fst functionality, respectively.

2. Architecture design issues

2.1. Java generics support

As described in the first part [1], the edu.cmu.sphinx.fst.weight.Weight<T> acts basically as a wrapper for the weight’s value. The current implementation of State, Arc and Fst classes take as a type parameter any class that extends the Weight base class. Although this approach provides a great flexibility on building custom types of FSTs, the implementations can be greatly simplified if we assume only the base Weight class and modify the State, Arc and Fst classes definition to simply use a type parameter.

As an example the Arc class definition would be simplified to

public class Arc<T> implements Serializable{
 
	private static final long serialVersionUID = -7996802366816336109L;
 
	// Arc's weight
	protected Weight<T> weight;
	// Rest of the code.....
}

instead of its current definition

public class Arc<W extends Weight<?>> implements Serializable{
 
	private static final long serialVersionUID = -7996802366816336109L;
 
	// Arc's weight
	protected W weight;
	// Rest of the code.....
}

The proposed modification can be applied also to State and Fst classes and provide an easier to use api. In that case the construction of a basic FST in the class edu.cmu.sphinx.fst.demos.basic.FstTest would be simplified as follows

// ...
Fst<Double> fst = new Fst<Double>();
 
// State 0
State<Double> s = new State<Double>(); 
s.AddArc(new Arc<Double>(new Weight<Double>(0.5), 1, 1, 1));
s.AddArc(new Arc<Double>(new Weight<Double>(1.5), 2, 2, 1));
fst.AddState(s);
 
// State 1
s = new State<Double>();
s.AddArc(new Arc<Double>(new Weight<Double>(2.5), 3, 3, 2));
fst.AddState(s);
 
// State 2 (final)
s = new State<Double>(new Weight<Double>(3.5));
fst.AddState(s);
// ...

The code could be further simplified by completely dropping generics support in State, Arc and Fst classes by just providing solid implementations based on Weight weights.

2.2. Compatibility with the original openFST binary format

A second issue is the compatibility of the serialized binary format with the original openFST format. A compatible java library that is able to load/save openFST models, would provide us the ability to share trained models between various applications. As an example, in the case of ASR applications, trained models could be easily shared between between sphinx4 and kaldi [3] which is written in C++ and already uses the openFST library.

References

[1] J. Salatas, “Porting openFST to java: Part 1”, ICT Research Blog, May 2012.

[2] CMUSphinx g2p SVN repository

[3] Kaldi Speech recognition research toolkit , last accessed: 18/05/2012.

[4] C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, M. Mohri, “OpenFst: a general and efficient weighted finite-state transducer library”, Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA 2007), pp. 11–23, Prague, Czech Republic, July 2007.

Porting openFST to java: Part 1

Foreword

This article is the first part of a series of articles on porting openFST[1] in java. OpenFST is an open-source C++ library for weighted finite-state transducers (WFSTs) [1] and having a similar java implementation is a crucial first step for the integration of phonetisaurus g2p into sphinx 4 [2]

This article will briefly review some mathematical background of weighted finite-state transducers describe the current implementation of openFST and then start describing the java implementation which will be completed in articles that will follow.

1. Weighted finite-state transducers

Weighted finite-state transducers have been used in speech recognition and synthesis, machine translation, optical character recognition, pattern matching,string processing, machine learning, information extraction and retrieval among others. Having a comprehensive software library of weighted transducer representations and core algorithms is key for using weighted transducers in these applications and for the development of new algorithms and applications. [1]

A weighted finite-state transducer (WFST) is a finite automaton for which each transition has an input label, an output label, and a weight. Figure 1 depicts a weighted finite state transducer: [1]

Figure 1: Example weighted finite-state transducer

The initial state is labeled 0. The final state is 2 with final weight of 3.5. Any state with non-infinite final weight is a final state. There is a transition from state 0 to 1 with input label a, output label x, and weight 0.5. This machine transduces, for instance, the string ac to xz with weight 6.5 (the sum of the arc and final weights). [1]

The weights may represent any set so long as they form a semiring. A semiring $(\mathbb{K}, \oplus, \otimes, \bar{0}, \bar{1})$ is specified by a set of values $\mathbb{K}$ , two binary operations $\oplus$ and $\otimes$ , and two designated values $\bar{0}$ and $\bar{1}$ . The operation $\oplus$ is associative, commutative, and has $\bar{0}$ as identity. The operation $\otimes$ is associative, has identity $ and $\bar{1}$ , distributes with respect to $\oplus$ , and has $\bar{0}$ as annihilator: for all $a \in \mathbb{K} , a \otimes \bar{0} = \bar{0} \otimes a = \bar{0}$ . If $\otimes$ is also commutative, we say that the semiring is commutative. [1]

Table 1 below lists some common semirings. All but the last are defined over subsets of the real numbers (extended with positive and negative infinity). In addition to the familiar Boolean semiring, and the probability semiring used to combine probabilities, two semirings often used in applications are the log semiring which is isomorphic to the probability semiring via the negative-log mapping, and the tropical semiring which is similar to the log semiring except the operation is min. The left (right) string semiring, which is defined over strings, has longest common prefix (suffix) and concatenation as its operations, and has the (extended element) infinite string and the empty string for its identity elements. It only distributes on the left (right). [1]

Table 1: Semiring examples.

2. The openFST C++ library: Representation and Construction

The motivation for OpenFst was to create a library as comprehensive and efficient as the AT&T FSM [3] Library, but that was an open-source project. We also sought to make this library as flexible and customizable as possible given the wide range of applications WFSTs have enjoyed in recent years. It is a C++ template library, allowing it to be both very customizable and efficient. [1]

In the OpenFst Library, a transducer can be constructed from either the C++ level using class constructors and mutators or from a shell-level program using a textual file representation. [1]

In order to create a transducer using openFST we need first to construct an empty VectorFst: [1]

1 2	// A vector FST is a general mutable FST VectorFst<StdArc> fst;

The VectorFst, like all transducer representations and algorithms in this library, is templated on the transition type. This permits customization of the labels, state IDs and weights in a transducer. StdArc defines the library-standard transition representation:

template <class W>
class ArcTpl {
 public:
  typedef W Weight;
  typedef int Label;
  typedef int StateId;
 
  ArcTpl(Label i, Label o, const Weight& w, StateId s)
      : ilabel(i), olabel(o), weight(w), nextstate(s) {}
 
  ArcTpl() {}
 
  static const string &Type(void) {
    static const string type =
        (Weight::Type() == "tropical") ? "standard" : Weight::Type();
    return type;
  }
 
  Label ilabel;
  Label olabel;
  Weight weight;
  StateId nextstate;
};

A Weight class holds the set element and provides the semiring operations. Currently openFST provides many different C++ Template-based implementations like TropicalWeightTpl, LogWeightTpl and MinMaxWeightTpl which extend a base FloatWeightTpl (see float-weight.h for implementation details) and others. Having these Template-based implementations opeFST we need just have a typedef to define a particular Weight such as TropicalWeight:

1 2	// Single precision tropical weight typedef TropicalWeightTpl<float> TropicalWeight;

3. The proposed FST java library

Based on the above description and on technical implementation differences between C++ and Java, and more specific mostly on a) difference between C++ Templates and Java generics [4] and b) the lack of operation overloads in Java, the initial implementation includes the edu.cmu.sphinx.fst.weight.Weight<T> class, acting basically as a wrapper for the weight’s value, which can be on any type. The Weight<T> class can be extended in order to create more advanced implementations, if needed.

There is also a generics based interface edu.cmu.sphinx.fst.weight.Semiring<W extends Weight<?>> which declares the Plus, Times and Divide semiring operations for a Weight<?> class. In addition, it declares the zero and one elements of the semiring and the boolean isMember(Weight<?> w) method which should be implemented in a way that returns true if w is a member of the semiring set of values ( $\mathbb{K}$ ) or false if it is not. The edu.cmu.sphinx.fst.weight.TropicalSemiring class implements this interface in order to create a solid class of a Tropical Semiring based on single-precision Float type for storing the weight’s values.

Finaly the edu.cmu.sphinx.fst.demos.basic package contains a main class for testing the above functionality by instatiating a TropicalSemiring and performing some operations on various Weight values.

4. Conclusion – Future work

This article tried to describe some basic theoritical background on weighted finite-state transducers, provide a brief description on the openFST architect and foundation classes and finally presented an initial design for the FST java library implementation. Following the general open-source philosophy “perform small commits often” the library is available in CMUShinx’ repository created for the integration of phonetisaurus g2p into sphinx 4. [5]

The next steps is to provide the Arc and Fst classes which over time will be extended to provide the required functionality for the various FST operations needed for my GSoC 2012 project. Hopefully, over time, more functionality will be provided by the community.

References

[1] C. Allauzen, M. Riley, J. Schalkwyk, W. Skut, M. Mohri, “OpenFst: a general and efficient weighted finite-state transducer library”, Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA 2007), pp. 11–23, Prague, Czech Republic, July 2007.

[2] J. Salatas, “Phonetisaurus: A WFST-driven Phoneticizer – Framework Review”, ICT Research Blog, May 2012.

[3] M. Mohri, F. Pereira, M. Riley, “The Design Principles of a Weighted Finite-State Transducer Library”, Theoretical Computer Science, pp. 15-32, 2000.

[4] H. M. Qusay, “Using and Programming Generics in J2SE 5.0”, Oracle Technology Network, 2004, last accessed: 08/05/2012.

[5] CMUSphinx g2p SVN repository