4.1 String Literals
Felix provides 2 kinds of strings: string
, and ustring
.
The most commonly used string is string
, it is an 8 bit clean string
implement by the C++ {::std::basic_string<char>}.
The ustring
type is based on ISO10646/Unicode character set with
code points represented by type uint32
. As this type currently
has limited functionality we will focus on the string
type.
4.1.1 String literals
Felix provides string literals modelled on Python's system.
4.1.1.1 Basic quotation
Strings can use either single {'} quotes or double {"} quotes, such strings must not span lines (i.e. they may not contain literal newline characters). Single quoted string can contain double quotes and double quoted strings can contain single quotes:
println "Hello"; println 'Hello'; println "'ello world"; println 'Say "Hello World"';
4.1.1.2 Extended quotation: Preformatted text
To span many lines you can use triple quotes:
println """ This is a preformatted text, It allows another line to be next, Leading spaces are kept, "Confusing", he wept Programming languages must be hex'd. """; println '''Tripled single quotes are allowed too''';
With preformatted text the initial newline is kept too.
4.1.1.3 Character Escapes
Strings using basic or extended quotation can have certain special characters embedded in them using slosh (backslash) escapes. The standard escapes are shown below. The encoding is from the old ASCII character set. The so-called "control characters" were used in the old days to control an electric typewritter.
Escape Replacement Old meaning -------------------------------------------------- \a char 7 bell \b char 8 BS: backspace \t char 9 HT: horizontal tab \n char 10 LF: linefeed (newline) \v char 11 VT: vertical tab \f char 12 FF: form feed \r char 13 CR: carriage return
The following escapes are used to control quoting:
Escape Replacement ----------------------------- \' ' \" " \\ \
In addition, slosh space normally emits a space. However slosh followed by 0 or more spaces followed by a newline causes all the spaces and the newline to be elided, effectively concatenating two lines of text. For example
println """\ Hello \ World\ """;
prints the hello world message on a single line. This is similar to the usual C and Unix end of line processing except extra spaces are allowed before the newline. The reason is that it is hard, if not impossible, to actually see if the character following the slosh is a newline or some spaces followed by a newline. The unix and C rules are two fragile for safe use.
There's another way to get nice formatted text: string folding:
println$ "This is a preformatted text,\n" "It allows another line to be next,\n" "Leading spaces are kept,\n" '"Confusing", he wept\n' "Programming languages must be hex'd.\n" ;
Here we manually inserted the line ends. Incidentally such folding is not just a pre-processing step. When one string expression is applied to another the result is the concatenation of the strings. However if literals are used, the concatenation is done by the parser instead of at run time.
4.1.1.4 Numeric Escapes
There is another form of escaping where one or more characters are inserted into a string based on a numeric code. These are:
Escape Scan Chars Name -------------------------------------------------- \oOOO 0-3 Octal Octal Escape \dDDD 0-3 Decimal Decimal Escape \xHH 0-2 Hex Hex Escape \uHHHH 0-4 Hex Short unicode escape \UHHHHHHH 0-8 Hex Long unicode escape
The rules are a bit tricky. All these escapes scan for between
0 and N characters, as indicated in the scan column. The scan
never exceeds the maximum number of characters. It is also stopped
when a character is not in the indicated character set, for example
{\xx} will emit a code 0
or nul
character followed by an x
.
The unicode escapes emit a stream of characters, namely the decoded code point encoded by {utf-8}. Thus, if you consider your 8 bit clean strings as containing {utf-8} encoded unicode you can use these escapes. Note that {\xFF} is quite distinct from {\uFF} since the latter emits a single character with code point 255, whereas the former emits two characters being the {utf-8} representation of code point 255.
4.1.1.5 Other escapes
All other escapes are left intact, that is, including the
slosh, for example {\c} translates to {\c} and not just c
.
4.1.2 Raw strings: escaping escapes
It's sometimes annoying to have to quote or escape escapes. This is particularly true with regular expression strings which already contain a lot of sloshes.
So, following Python you can have raw strings by using an r
or R
prefix:
var r = r"\(.*\)"; // easier than '\\(.*\\)'
Note only double quoted strings, or triple double or single quoted strings can be raw due to a lexical conflict with identifier {r'}.