DESCRIPTIONACCEPTORSThe usual notion of finite-state acceptor is generalized here to include costs -- each arc has a cost associated with making that tran- sition and each state has a final cost associated with accepting at that state. The following is the text representation of a (weighted) acceptor:(1)Each state in the acceptor is specified by a non-negative inte- ger. Each distinct arc symbol is specified by a non-negative inte- ger. 0 is reserved for the epsilon string (epsilon). Each arc cost is represented as a floating point number (in the default compila- tion).(2)For each arc in the acceptor, there is a line of the form: S1 S2 A C whereS1is the source state number,S2is the destination state number,Ais the arc symbol number andCis the arc cost. The last field is optional; if it is omitted, the arc is assigned cost SMROne (seefsmcost(3)).(3)For each final state, there is a line of the form: S C whereSis the state number andCis the cost of accepting at that state. The second field is optional; if it is omitted, the state is accepted with cost SMROne.(4)The complete printed representation of the acceptor consists of lines of the form in(2)and(3).The initial state is, by conven- tion, the source state on the first line. For example, the acceptor represented by a file containing: 0 0 1 .5 0 1 2 .3 1 2 3 .6 1 2 4 .6 2 accepts strings of the (grep) formx*y[zw], wherex,y,z,ware the symbols numbered 1, 2, 3, 4, respectively. If the accepted string hasNoccurrences ofx,then its cost is.5N+.9,using the (default) tropical cost semiring (seefsmcost(3)).TRANSDUCERSThe text representation of a (weighted) transducer is similar to that whereS1is the source state number,S2is the destination state number,IAis the input symbol number,OAis the output symbol num- ber, andCis the arc cost. The last field is optional; if it is omitted, the arc is assigned cost SMROne.SYMBOLNAMESIn the above formats, any text string that excludes whitespace may be substituted for the numbers in the state, input symbol and output sym- bol fields, provided that appropriate mappings from text strings to numbers are supplied insymbolsfiles. Asymbolsfile consists of lines with two fields separated by whitespace. The first field con- tains a text string and the second field its corresponding number. For a statesymbolsfile, there must be a text string for every state num- ber encountered. Similarly, for input and output symbolsymbolsfiles. For example, the acceptor represented by a file containing: BEG BEG RED .5 BEG MID GREEN .3 MID END BLUE .6 MID END BLACK .6 END with the state symbols file: BEG 0 MID 1 END 2 and (input) arc symbols file: RED 1 GREEN 2 BLUE 3 BLACK 4 is identical to the previous example above.CAVEATSSince some FSM commands (likeFSMConcatandFSMClosure) introduce epsilon symbols into their outputs, it is recommended that a label for 0, the epsilon ID, be included in all symbols files. Some FSM operations allocate internal arrays based on the maximum inte- ger used as an input or output arc symbol. This design choice, chosen for efficiency, requires the user to avoid huge integer labels (e.g., INT_MAX) since memory may otherwise be exhausted. Whenfsmprintis given the-loption, additional information is output that when read byfsmcompilepreserves the state potentials, the FSM class, the cost semiring, and whether it is an acceptor or transducer (overriding any conflicting command line options). Cyril Allauzen (allauzen@research.att.com) Mehryar Mohri (mohri@research.att.com) Fernando Pereira (pereira@cis.upenn.edu) Michael Riley (riley@research.att.com)Copyright(C)1998-2003AT&TCorp.Allrightsreserved.Version 4.0FSM(5)