CIF 3

Text formatting details

For text formatting, CIF features the fmt standard library function. This page describes all the details of using that function. For more introductory information and examples of the use of the fmt function for print output, see the Text formatting tutorial.

Format patterns

The fmt function requires at least one argument, the format pattern. The remaining arguments of the function are the values that can be inserted into the format pattern, to allow variable output. For instance, consider:

fmt("%s %s", x, y)

Here, "%s %s" is the format pattern, x is the first value, and y is the second value.

The first argument, the format pattern, must be a string literal. That is, it must be text between double quotes (").

The real usefulness of format patterns comes from the inclusion of the values into the format pattern. The values can be included by inserting format specifiers (e.g. %s) into the format pattern. Multiple values may be used by including multiple format specifiers. The first format specifier includes the first value, the second format specifier includes the second value, etc.

The result of fmt applied to it arguments is the text of the format pattern, with the format specifiers replaced by their corresponding values.

Format specifiers can be customized to influence how their corresponding values are to be inserted. For instance, consider:

fmt("%s %.2e", x, y)

Then, assuming variable x has value 3 and variable y has value 5.6, the result of the fmt function is 3 5.60e+00.

Specifiers

Format patterns support various types of format specifiers, each performing a different kind of conversion on its corresponding value. The following table lists the available conversions, the types of values that they support, and a short description of the output of the conversion:

Conversion Supported types Output
%b bool Either true or false.
%B bool Same as %b, but in upper case.
%d int Decimal integer notation.
%x int Hexadecimal integer notation, using 0-9 and a-f.
%X int Hexadecimal integer notation, using 0-9 and A-F.
%e real Computerized scientific notation, using e.
%E real Computerized scientific notation, using E.
%f real Decimal number notation.
%g real General scientific notation, using e.
%G real General scientific notation, using E.
%s any type General textual representation.
%S any type General textual representation, in upper case.

For the %d, %x, and %X specifiers, int typed values are supported, as are integer types with ranges (e.g. int[0..5]). The %s and %S specifiers support values of any type.

The output of the %B specifier is identical to the output of the %b specifier, where all letters in the output are converted to upper case letters. This duality (lower case specifier conversion versus upper case specifier conversion) is present in all conversions that can have letters in their output.

Specifier syntax

Specifiers can be customized to influence the conversion of their corresponding values to text. The general syntax of specifiers is:

%[value_index$][flags][width][.precision]conversion

All specifiers start with a percentage character (%).

They are optionally followed by a value index. If the value index is specified, it must be followed by a dollar sign ($). The value index is a positive decimal integer indicating the position of the value in the value list. The first value is referenced by 1$, the second by 2$, etc. Index zero and indices starting with zero are not allowed.

The index is followed by optional flags. Flags are characters that modify the output, and may be specified in any order. Each flag may only be specified once per specifier. The set of valid flags depends on the conversion.

After the flags, an optional width may be specified. The width is a non-negative decimal integer indicating the minimum number of characters that should be included in the result, for that specific specifier.

After the width, a precision may optionally be specified. A precision is always preceded by a dot (.). The precision is a non-negative decimal integer used to restrict the number of characters. The specific behavior depends on the conversion.

Specifiers always end with the character indicating the conversion to perform.

Implicit and explicit indexing

By default, format specifiers are processed in order. The first specifier then implicitly uses the first value, the second specifier implicitly uses the second value, etc. However, if an explicit value index is given, that explicit index indicates the value to use, and the specifier with the explicit index does not influence subsequent implicit indexing. That is, consider the following:

fmt("%s %1$s %3$f %d %f %1$s", 1, 2, 3.0);

The first specifier (%s) does not specify an explicit index, and thus implicitly uses the first value. The second specifier (%1$s) explicitly specifies index 1 and thus uses the first value. The third specifier (%3$f) explicitly specifies index 3 and thus uses the third value. The fourth specifier (%d) is the second specifier to not explicitly specify an index, and thus implicitly uses the second value. The fifth specifier (%f) is the third specifier to not explicitly specify an index, and thus implicitly uses the third value. Finally, the sixth specifier (%1$s) explicitly specifies index 1, and thus uses the first value. The result of the formatting is:

``1 1 3.000000 2 3.000000 1``

Flags

The following flags are available:

Flag b/B d x/X e/E f g/G s/S Effect
- yes yes yes yes yes yes yes The result will be left-justified.
+ no yes no yes yes yes no The result will always include a sign.
(space) no yes no yes yes yes no The result will include a leading space for non-negative values.
0 no yes yes yes yes yes no The result will be zero-padded.
, no yes no no yes yes no The result will include commas (,) as grouping separator.

The first column shows the available flags, the minus (-), the plus (+), the space, the zero (0), and the comma (,). The middle columns indicate for each of the different conversions, whether the flags are supported. The last column gives short descriptions of the effects of the flags.

The - flag can be used to left-justify the text in case a width is used, as the default is to right-justify. It is supported by all format specifiers. The - flag requires the use of a width, and can not be combined with the 0 flag.

The + flag can be used to always include a sign. It only applies to certain numeric format specifiers. That is, by default non-negative numbers don’t have a sign, and negative numbers start with a - character. By including the + flag, non-negative numbers start with a + character, and negative numbers still start with a - character. The + and space flags can not be combined.

The space flag can be used force a leading space to be included for non-negative values. It only applies to certain numeric format specifiers. That is, by default non-negative numbers don’t have a sign, and negative numbers start with a - character. By including the space flag, non-negative numbers start with a space character, and negative numbers still start with a - character. The + and space flags can not be combined.

The 0 flag can be used to zero pad numbers, in case a width is used, the text is right-justified, and the text is shorter than the width. It only applies to certain numeric format specifiers. The 0 flag requires the use of a width, and can not be combined with left-justification (the - flag).

The , flag can be used to include commas (,) as grouping separators. It only applies to certain numeric format specifiers. That is, by default numbers are just a sequence of digits. By using the , flag, longer numbers are placed in groups of thousands and the ‘thousands separator’ (the , character) occurs every three digits. For instance, 12345678 would then become 12,345,678.

Further details on the effects of flags are given in the sections describing the individual conversions.

Width

The width is a non-negative decimal integer indicating the minimum number of characters that should be included in the result, for that specific specifier. If no width is given, there is no minimum number of characters.

If the textual representation of the value is shorter than the width, the text is right-justified. That is, the text will be padded by spaces to the left of the text until the total number of characters equals the width. To left-justify the text, use the - flag in combination with a width. This results in padding with spaces to the right of the text.

If the 0 flag is specified, and the textual representation of the value is shorter than the width, 0 padding is used (to the left, as the - flag may not be used when the 0 flag is used). See the sections describing the individual number related specifiers for further details.

If the text is longer than the width, the whole text is included in the result (it is not truncated), and the width essentially has no effect.

%b and %B specifiers

The %b and %B specifiers convert boolean values to text. The specifiers only support boolean values. The specifiers support explicit indices, widths, and the - flag. They don’t supports any of the other flags, and precisions.

The resulting text is constructed as follows:

  • Value true results in the text true, and value false results in the text false.
  • For the %B specifier, the text is converted to upper case, i.e. TRUE and FALSE respectively.
  • If a width is specified, and the resulting text so far is shorter than that width, spaces are added until the given width is reached. If the - flag is given, spaces are added to the right, otherwise spaces are added to the left.

Here are some examples, assuming b has value true:

fmt("%b", b);       // true
fmt("%B", b);       // TRUE
fmt("_%10b_", b);   // _      true_
fmt("_%-10b_", b);  // _true      _

%d specifier

The %d specifier converts integer numbers to text, using decimal representations of integers. The specifier only support integer values (int typed values, and values of integer types with ranges, such as int[0..5]). The specifier supports explicit indices, widths, and all flags. It doesn’t support precisions.

The resulting text is constructed as follows:

  • The resulting text is initialized to the decimal representation of the absolute value of the integer number. It thus consists of only the digits 0 through 9.
  • If the , flag is given, thousand separators (, characters) are inserted as needed, for longer numbers.
  • If the number is negative, it is prefixed with a - character.
  • If the number is non-negative, and the space flag is given, a space is added before the text.
  • If the number is non-negative, and the + flag is given, a + character is added before the text.
  • If a width is given, the text so far is shorter than the width, and the 0 flag is given, then 0 characters are added before the other digits, and after any non-digit characters, until the desired width is reached.
  • If a width is given, the text so far is shorter than the width, and the - flag is given, then space characters are added before the result, until the desired width is reached.
  • If a width is given, the text so far is shorter than the width, and neither the 0 flag nor the - flag is given, then space characters are added after the result, until the desired width is reached.

Here are some examples, assuming x has value 12345 and y has value -2345:

fmt("%d", x);           // 12345
fmt("%,d", x);          // 12,345
fmt("_%10d_", x);       // _     12345_
fmt("_%-10d_", x);      // _12345     _
fmt("_%0,10d_", x);     // _000012,345_
fmt("_%- 10d_", x);     // _ 12345    _
fmt("_%- 10d_", y);     // _-2345     _
fmt("_%-+10d_", x);     // _+12345    _
fmt("_%-+10d_", y);     // _-2345     _
fmt("_%3d_", x);        // _12345_

%x and %X specifiers

The %x and %X specifiers convert integer numbers to text, using hexadecimal representations of integers. The specifiers only support integer values (int typed values, and values of integer types with ranges, such as int[0..5]). The specifiers supports explicit indices, widths, and the - and 0 flags. They don’t supports any of the other flags, and precisions.

The resulting text is constructed as follows:

  • The signed integer number in range [-2,147,483,648 .. 2,147,483,647] is first converted to an unsigned integer number in range [0 .. 4,294,967,295]. That is, for negative numbers, 232 is added.
  • The resulting text is initialized to the hexadecimal representation of the unsigned integer number. It thus consists the digits 0 through 9 and letters a through f.
  • For the %X specifier, the text is converted to upper case, i.e. letters a through f are converted to letters A through F.
  • If a width is given, the text so far is shorter than the width, and the 0 flag is given, then 0 characters are added before the result, until the desired width is reached.
  • If a width is given, the text so far is shorter than the width, and the - flag is given, then space characters are added before the result, until the desired width is reached.
  • If a width is given, the text so far is shorter than the width, and neither the 0 flag nor the - flag is given, then space characters are added after the result, until the desired width is reached.

Here are some examples, assuming a has value 5543 and b has value -1:

fmt("%x", a);           // 15a7
fmt("0x%x", a);         // 0x15a7
fmt("0x%X", a);         // 0x15A7
fmt("0x%X", b);         // 0xFFFFFFFF
fmt("_%10x_", a);       // _      15a7_
fmt("_%-10x_", a);      // _15a7      _
fmt("_%3x_", a);        // _15a7_

%e and %E specifiers

The %e and %E specifiers convert real numbers to text, using computerized scientific notation. The specifiers only support real values. The specifiers supports explicit indices, widths, the -, +, space, and 0 flags, and precisions. They don’t supports any of the other flags.

Real numbers include a decimal mark, a symbol used to separate the integer part from the fractional part of number, when written in decimal form. This decimal mark is denoted by a dot (.).

The resulting text is constructed as follows:

  • The decimal mark of the real number is shifted to ensure that at most one non-zero digit occurs to the left of it. That is, for real numbers 0.012, 0.12, 1.2, 12.0, and 120.0, the decimal mark is shifted -2, -1, 0, 1, and 2 digits to the left, respectively. This results in the following real numbers: 1.2, 1.2, 1.2, 1.2, and 1.2. For zero, the decimal mark is not shifted.
  • The single decimal digit before the decimal mark is included in the result. If there is no digit before the decimal mark (in case the real number is zero), a single 0 digit is included in the result.
  • If the precision is specified and is not zero, or if the default precision is used, a dot (.) is added after the single digit.
  • The digits after the decimal mark are added after the dot. Exactly ‘precision’ digits will be added. If no precision is specified, it defaults to 6 digits after the dot. If not enough digits are available after the dot, additional 0 characters are added after them, to reach the desired precision. If too many digits are available after the dot, digits are removed from the right until the desired precision is reached. Rounding using the ‘half up’ algorithm is used to ensure correct results in case digits are removed.
  • If the %e specifier is used, an e character is added to the end. If the %E specifier is used, an E character is added to the end.
  • A sign character is added after that. That is, if the decimal mark was shifted a negative number of digits to the left, a - character is added, otherwise a + character is added.
  • The number of digits that was shifted is added as a decimal number, at the end. At least two decimal digits are added, so if the number of digits that was shifted can be represented using a single decimal digit, a 0 is added between the e or E character and the number of digits that was shifted.
  • If the real number is negative, the text is prefixed with a - character.
  • If the real number is non-negative, and the space flag is given, a space is added before the text.
  • If the real number is non-negative, and the + flag is given, a + character is added before the text.
  • If a width is given, the text so far is shorter than the width, and the 0 flag is given, then 0 characters are added before the other digits, and after any non-digit characters, until the desired width is reached.
  • If a width is given, the text so far is shorter than the width, and the - flag is given, then space characters are added before the result, until the desired width is reached.
  • If a width is given, the text so far is shorter than the width, and neither the 0 flag nor the - flag is given, then space characters are added after the result, until the desired width is reached.

Here are some examples, assuming a has value 12345.6789 and b has value -0.00002345678:

fmt("%e", a);           // 1.234568e+04
fmt("%E", a);           // 1.234568E+04
fmt("%.3e", a);         // 1.235e+04
fmt("%.3e", b);         // -2.346e-05
fmt("_%20e_", a);       // _        1.234568e+04_
fmt("_%-20e_", a);      // _1.234568e+04        _
fmt("_%5e_", a);        // _1.234568e+04_
fmt("_%020e_", a);      // _000000001.234568e+04_
fmt("_%-+20e_", a);     // _+1.234568e+04       _
fmt("_%-+20e_", b);     // _-2.345678e-05       _
fmt("_%- 20e_", a);     // _ 1.234568e+04       _
fmt("_%- 20e_", b);     // _-2.345678e-05       _

%f specifier

The %f specifier converts real numbers to text, using a decimal number notation. The specifier only supports real values. The specifier supports explicit indices, widths, all flags, and precisions. That is, all features of format specifiers are supported.

Real numbers include a decimal mark, a symbol used to separate the integer part from the fractional part of number, when written in decimal form. This decimal mark is denoted by a dot (.).

The resulting text is constructed as follows:

  • The decimal digits before the decimal mark are included in the result. If there are no digits before the decimal mark, a single 0 digit is included in the result.
  • If the , flag is given, thousand separators (, characters) are inserted as needed, for longer numbers.
  • If the precision is specified and is not zero, or if the default precision is used, a dot (.) is added after the digits.
  • The digits after the decimal mark are added after the dot. Exactly ‘precision’ digits will be added. If no precision is specified, it defaults to 6 digits after the dot. If not enough digits are available after the dot, additional 0 characters are added after them, to reach the desired precision. If too many digits are available after the dot, digits are removed from the right until the desired precision is reached. Rounding using the ‘half up’ algorithm is used to ensure correct results in case digits are removed.
  • If the real number is negative, the text is prefixed with a - character.
  • If the real number is non-negative, and the space flag is given, a space is added before the text.
  • If the real number is non-negative, and the + flag is given, a + character is added before the text.
  • If a width is given, the text so far is shorter than the width, and the 0 flag is given, then 0 characters are added before the other digits, and after any non-digit characters, until the desired width is reached.
  • If a width is given, the text so far is shorter than the width, and the - flag is given, then space characters are added before the result, until the desired width is reached.
  • If a width is given, the text so far is shorter than the width, and neither the 0 flag nor the - flag is given, then space characters are added after the result, until the desired width is reached.

Here are some examples, assuming a has value 12345.6789 and b has value -0.00002345678:

fmt("%f", a);           // 12345.678900
fmt("%f", b);           // -0.000023
fmt("%.3f", a);         // 12345.679
fmt("_%20f_", a);       // _        12345.678900_
fmt("_%-20f_", a);      // _12345.678900        _
fmt("_%-,20f_", a);     // _12,345.678900       _
fmt("_%5f_", a);        // _12345.678900_
fmt("_%020f_", a);      // _0000000012345.678900_
fmt("_%-+20f_", a);     // _+12345.678900       _
fmt("_%-+20f_", b);     // _-0.000023           _
fmt("_%- 20f_", a);     // _ 12345.678900       _
fmt("_%- 20f_", b);     // _-0.000023           _

%g and %G specifiers

The %g and %G specifiers convert real numbers to text, using general scientific notation. The specifiers only support real values. The specifiers supports explicit indices, widths, all flags, and precisions. That is, all features of format specifiers are supported.

Real numbers include a decimal mark, a symbol used to separate the integer part from the fractional part of number, when written in decimal form. This decimal mark is denoted by a dot (.).

If the real number is is greater than or equal to 10-4 but less than 10precision, the result is as %f. Otherwise, the result is as %e and %E, for %g and %G respectively. However, the total number of significant digits of the result is equal to the precision of the %g or %G specifier, and defaults to 6 if not specified. If the specified precision is 0, it is taken to be 1 instead.

Here are some examples, assuming a has value 12345.6789 and b has value -0.00002345678:

fmt("%g", a);           // 12345.7
fmt("%g", b);           // -2.34568e-05
fmt("%G", b);           // -2.34568E-05
fmt("%.3g", a);         // 1.23e+04
fmt("_%20g_", a);       // _             12345.7_
fmt("_%-20g_", a);      // _12345.7             _
fmt("_%-,20g_", a);     // _12,345.7            _
fmt("_%5g_", a);        // _12345.7_
fmt("_%020g_", a);      // _000000000000012345.7_
fmt("_%-+20g_", a);     // _+12345.7            _
fmt("_%-+20g_", b);     // _-2.34568e-05        _
fmt("_%- 20g_", a);     // _ 12345.7            _
fmt("_%- 20g_", b);     // _-2.34568e-05        _

%s and %S specifiers

The %s and %S specifiers convert any value to text. The specifiers support values of any type. The specifiers supports explicit indices, widths, and the - flag. They don’t supports any of the other flags, and precisions.

The resulting text is constructed as follows:

  • If the value has a string type, the value of that string is used as is, without escaping and double quoting. Otherwise, the value is converted to a textual representation closely resembling the CIF textual syntax (ASCII syntax), with string values surrounded by double quotes, and with special characters (tabs, new lines, double quotes, and backslashes) escaped.
  • For the %S specifier, the text is converted to upper case, i.e. letters a through z are converted to letters A through Z.
  • If a width is specified, and the resulting text so far is shorter than that width, spaces are added until the given width is reached. If the - flag is given, spaces are added to the right, otherwise spaces are added to the left.

Here are some examples, assuming x has value "aBcD":

fmt("%s", x);       // aBcD
fmt("%S", x);       // ABCD
fmt("_%10s_", x);   // _      aBcD_
fmt("_%-10s_", x);  // _aBcD      _

As indicated above, string typed values are of special interest. For instance, consider the following examples:

fmt("a %s b", ("some \\ text", 6));     // a ("some \\ text", 6) b
fmt("a %s b", "some \\ text");          // c some \ text d

The first example has a tuple as value, with a string and an integer number. The output of this example is a ("some \\ text", 6) b. Note how the string in the tuple is included in the output with double quotes around it, and how the escaped backslash is also escaped in the resulting text.

The second example has a value that is a string directly (e.g. not contained in a tuple or some other container). This mapping results in c some \ text d. Note how the string is included in the output ‘as is’, i.e. it is not surrounded by double quotes, and the backslash is not escaped.

In general, when using a %s specifier, string values are double quoted and escaped. If however the entire value is a string, then that string is used ‘as is’ (without quoting, and without escaping).

Also of special interest are real values. The number of digits after the decimal point can vary. For instance, consider the following examples:

fmt("a %s b", 1 / 3);       // a 0.3333333333333333 b
fmt("a %s b", 1 / 2);       // c 0.5 d

For more control over the number of digits after the decimal point, use one of the floating point number format specifiers: %e, %E, %f, %g, or %G.

Escaping and quoting

Double quote terminate string literals, and thus also a format pattern, as format patterns are string literals. It is therefore not possible to include a double quote in the text of a format pattern without escaping it. That is, use \" inside a format pattern to include a single double quote. There are other escape sequences as well. Use \\ to include a single backslash (\), \n to include a single new line (to continue on a new line), and \t to include a tab character (for alignment).

Format specifiers start with a percentage character (%). If a percentage character is to be included as a percentage character instead of being interpreted as a format specifier, it needs to be escaped as well. That is, to include a percentage character as a percentage character, use %% in the format pattern.

For instance, the format pattern "a\"b\nc\td\\e%%f" results in the following text:

a"b
c       d\e%f

Unused values

The format pattern is automatically checked for unused values. That is, if a value is not used in the format pattern, there is no use in specifying it, and a warning will indicate this.