DESCRIPTION
   ACCEPTORS
       The usual notion  of  finite-state  acceptor  is  generalized  here  to
       include  costs -- each arc has a cost associated with making that tran-
       sition and each state has a final cost  associated  with  accepting  at
       that state.

       The following is the text representation of a (weighted) acceptor:

          (1)  Each state in the acceptor is specified by a non-negative inte-
          ger. Each distinct arc symbol is specified by a  non-negative  inte-
          ger.  0  is reserved for the epsilon string (epsilon). Each arc cost
          is represented as a floating point number (in the  default  compila-
          tion).

          (2) For each arc in the acceptor, there is a line of the form:

             S1    S2     A     C

          where  S1  is  the  source state number, S2 is the destination state
          number, A is the arc symbol number and C is the arc cost.  The  last
          field is optional; if it is omitted, the arc is assigned cost SMROne
          (see fsmcost(3)).

          (3) For each final state, there is a line of the form:

             S      C

          where S is the state number and C is the cost of accepting  at  that
          state.  The second field is optional; if it is omitted, the state is
          accepted with cost SMROne.

          (4) The complete printed representation of the acceptor consists  of
          lines  of the form in (2) and (3).  The initial state is, by conven-
          tion, the source state on the first line.

       For example, the acceptor represented by a file containing:

             0     0     1     .5
             0     1     2     .3
             1     2     3     .6
             1     2     4     .6
             2

       accepts strings of the (grep) form x*y[zw], where x, y, z,  w  are  the
       symbols numbered  1, 2, 3, 4, respectively.  If the accepted string has
       N occurrences of x, then its cost is .5 N +  .9,  using  the  (default)
       tropical cost semiring (see fsmcost(3)).

   TRANSDUCERS
       The  text  representation of a (weighted) transducer is similar to that
          where S1 is the source state number, S2  is  the  destination  state
          number,  IA is the input symbol number, OA is the output symbol num-
          ber, and C is the arc cost. The last field is  optional;  if  it  is
          omitted, the arc is assigned cost SMROne.

   SYMBOL NAMES
       In  the  above formats, any text string that excludes whitespace may be
       substituted for the numbers in the state, input symbol and output  sym-
       bol  fields,  provided  that  appropriate mappings from text strings to
       numbers are supplied in symbols files.   A  symbols  file  consists  of
       lines  with  two  fields separated by whitespace.  The first field con-
       tains a text string and the second field its corresponding number.  For
       a  state symbols file, there must be a text string for every state num-
       ber encountered. Similarly, for input and output symbol symbols  files.

       For example, the acceptor represented by a file containing:

             BEG   BEG   RED   .5
             BEG   MID   GREEN .3
             MID   END   BLUE  .6
             MID   END   BLACK .6
             END

       with the state symbols file:

             BEG   0
             MID   1
             END   2

       and (input) arc symbols file:

             RED   1
             GREEN 2
             BLUE  3
             BLACK 4

       is identical to the previous example above.

CAVEATS
       Since  some  FSM  commands  (like  FSMConcat  and FSMClosure) introduce
       epsilon symbols into their outputs, it is recommended that a label  for
       0, the epsilon ID, be included in all symbols files.

       Some FSM operations allocate internal arrays based on the maximum inte-
       ger used as an input or output arc symbol.  This design choice,  chosen
       for  efficiency,  requires the user to avoid huge integer labels (e.g.,
       INT_MAX) since memory may otherwise be exhausted.

       When fsmprint is given the -l option, additional information is  output
       that  when  read  by fsmcompile preserves the state potentials, the FSM
       class, the cost semiring, and whether it is an acceptor  or  transducer
       (overriding any conflicting command line options).
       Cyril Allauzen (allauzen@research.att.com)
       Mehryar Mohri (mohri@research.att.com)
       Fernando Pereira (pereira@cis.upenn.edu)
       Michael Riley (riley@research.att.com)
       Copyright (C) 1998-2003 AT&T Corp. All rights reserved.



Version 4.0                                                             FSM(5)