ExpandCollapsePrev Next Index

+ 4.1 String Literals

Felix provides 2 kinds of strings: string, and ustring. The most commonly used string is string, it is an 8 bit clean string implement by the C++ {::std::basic_string<char>}.

The ustring type is based on ISO10646/Unicode character set with code points represented by type uint32. As this type currently has limited functionality we will focus on the string type.

+ 4.1.1 String literals

Felix provides string literals modelled on Python's system.

+ 4.1.1.1 Basic quotation

Strings can use either single {'} quotes or double {"} quotes, such strings must not span lines (i.e. they may not contain literal newline characters). Single quoted string can contain double quotes and double quoted strings can contain single quotes:

  println "Hello";
  println 'Hello';
  println "'ello world";
  println 'Say "Hello World"';

+ 4.1.1.2 Extended quotation: Preformatted text

To span many lines you can use triple quotes:

  println 
  """
    This is a preformatted text,
    It allows another line to be next,
    Leading spaces are kept,
    "Confusing", he wept
    Programming languages must be hex'd.
  """;
  
  println '''Tripled single quotes are 
  allowed too''';

With preformatted text the initial newline is kept too.

+ 4.1.1.3 Character Escapes

Strings using basic or extended quotation can have certain special characters embedded in them using slosh (backslash) escapes. The standard escapes are shown below. The encoding is from the old ASCII character set. The so-called "control characters" were used in the old days to control an electric typewritter.

  Escape            Replacement   Old meaning
  --------------------------------------------------
  \a                char 7        bell
  \b                char 8        BS: backspace 
  \t                char 9        HT: horizontal tab
  \n                char 10       LF: linefeed (newline)
  \v                char 11       VT: vertical tab
  \f                char 12       FF: form feed
  \r                char 13       CR: carriage return

The following escapes are used to control quoting:

  Escape            Replacement   
  -----------------------------
  \'                '            
  \"                "           
  \\                \          

In addition, slosh space normally emits a space. However slosh followed by 0 or more spaces followed by a newline causes all the spaces and the newline to be elided, effectively concatenating two lines of text. For example

  println """\
  Hello \ 
  World\
  """;

prints the hello world message on a single line. This is similar to the usual C and Unix end of line processing except extra spaces are allowed before the newline. The reason is that it is hard, if not impossible, to actually see if the character following the slosh is a newline or some spaces followed by a newline. The unix and C rules are two fragile for safe use.

There's another way to get nice formatted text: string folding:

  println$
    "This is a preformatted text,\n"
    "It allows another line to be next,\n"
    "Leading spaces are kept,\n"
    '"Confusing", he wept\n'
    "Programming languages must be hex'd.\n"
  ;

Here we manually inserted the line ends. Incidentally such folding is not just a pre-processing step. When one string expression is applied to another the result is the concatenation of the strings. However if literals are used, the concatenation is done by the parser instead of at run time.

+ 4.1.1.4 Numeric Escapes

There is another form of escaping where one or more characters are inserted into a string based on a numeric code. These are:

  Escape            Scan  Chars      Name
  --------------------------------------------------
  \oOOO             0-3     Octal    Octal Escape
  \dDDD             0-3     Decimal  Decimal Escape
  \xHH              0-2     Hex      Hex Escape
  \uHHHH            0-4     Hex      Short unicode escape 
  \UHHHHHHH         0-8     Hex      Long unicode escape

The rules are a bit tricky. All these escapes scan for between 0 and N characters, as indicated in the scan column. The scan never exceeds the maximum number of characters. It is also stopped when a character is not in the indicated character set, for example {\xx} will emit a code 0 or nul character followed by an x.

The unicode escapes emit a stream of characters, namely the decoded code point encoded by {utf-8}. Thus, if you consider your 8 bit clean strings as containing {utf-8} encoded unicode you can use these escapes. Note that {\xFF} is quite distinct from {\uFF} since the latter emits a single character with code point 255, whereas the former emits two characters being the {utf-8} representation of code point 255.

+ 4.1.1.5 Other escapes

All other escapes are left intact, that is, including the slosh, for example {\c} translates to {\c} and not just c.

+ 4.1.2 Raw strings: escaping escapes

It's sometimes annoying to have to quote or escape escapes. This is particularly true with regular expression strings which already contain a lot of sloshes.

So, following Python you can have raw strings by using an r or R prefix:

    var r = r"\(.*\)"; // easier than '\\(.*\\)'

Note only double quoted strings, or triple double or single quoted strings can be raw due to a lexical conflict with identifier {r'}.