Status: Implemented
New string interpolation syntax.
The problems with string.format
are many.
%d
is the idiomatic default for most people, but this loses precision.
%d
casts the number into long long
, which has a lower max value than double
and does not support decimals.%f
by default will format to the millionths, e.g. 5.5
is 5.500000
.%g
by default will format up to the hundred thousandths, e.g. 5.5
is 5.5
and 5.5312389
is 5.53123
. It will also convert the number to scientific notation when it encounters a number equal to or greater than 10^6.%s
, but even so the type checker assumes you actually wanted strings.boolean
. You must use %s
and call tostring
.__tostring
metamethod.%
is in itself a dangerous operation within string.format
.
"Your health is %d% so you need to heal up."
causes a runtime error because % so
is actually parsed as (%s)o
and now requires a corresponding string.To fix all of those issues, we need to do a few things.
string.format
to accept values of arbitrary types (fixes #1, #2, #3, #4)Because we care about backward compatibility, we need some new syntax in order to not change the meaning of existing strings. There are a few components of this new expression:
`...{
, }...{
, and }...`
) where ...
is a range of 0 to many characters.
\
escapes `
, {
, and itself \
.\
escapes the newline) but expressions needn’t be on the same line.An expression between the braces. This is the value that will be interpolated into the string.
{{
as it is considered an attempt to escape and get a single {
character at runtime.%*
token will be used.To put the above into formal EBNF grammar:
stringinterp ::= <INTERP_BEGIN> exp {<INTERP_MID> exp} <INTERP_END>
Which, in actual Luau code, will look like the following:
local world = "world"
print(`Hello {world}!`)
--> Hello world!
local combo = {5, 2, 8, 9}
print(`The lock combinations are: {table.concat(combo, ", ")}`)
--> The lock combinations are: 5, 2, 8, 9
local set1 = Set.new({0, 1, 3})
local set2 = Set.new({0, 5, 4})
print(`{set1} ∪ {set2} = {Set.union(set1, set2)}`)
--> {0, 1, 3} ∪ {0, 5, 4} = {0, 1, 3, 4, 5}
print(`Some example escaping the braces \{like so}`)
print(`backslash \ that escapes the space is not a part of the string...`)
print(`backslash \\ will escape the second backslash...`)
print(`Some text that also includes \`...`)
--> Some example escaping the braces {like so}
--> backslash that escapes the space is not a part of the string...
--> backslash \ will escape the second backslash...
--> Some text that also includes `...
As for how newlines are handled, they are handled the same as other string literals. Any text between the {}
delimiters are not considered part of the string, hence newlines are OK. The main thing is that one opening pair will scan until either a closing pair is encountered, or an unescaped newline.
local name = "Luau"
print(`Welcome to {
name
}!`)
--> Welcome to Luau!
print(`Welcome to \
{name}!`)
--> Welcome to
-- Luau!
We currently prohibit using interpolated strings in function calls without parentheses, this is illegal:
local name = "world"
print`Hello {name}`
Note: This restriction is likely temporary while we work through string interpolation DSLs, an ability to pass individual components of interpolated strings to a function.
The restriction on {{
exists solely for the people coming from languages e.g. C#, Rust, or Python which uses {{
to escape and get the character {
at runtime. We’re also rejecting this at parse time too, since the proper way to escape it is \{
, so:
print(`{{1, 2, 3}} = {myCoolSet}`) -- parse error
If we did not apply this as a parse error, then the above would wind up printing as the following, which is obviously a gotcha we can and should avoid.
--> table: 0xSOMEADDRESS = {1, 2, 3}
Since the string interpolation expression is going to be lowered into a string.format
call, we’ll also need to extend string.format
. The bare minimum to support the lowering is to add a new token whose definition is to perform a tostring
call. %*
is currently an invalid token, so this is a backward compatible extension. This RFC shall define %*
to have the same behavior as if tostring
was called.
print(string.format("%* %*", 1, 2))
--> 1 2
The offset must always be within bound of the numbers of values passed to string.format
.
local function return_one_thing() return "hi" end
local function return_two_nils() return nil, nil end
print(string.format("%*", return_one_thing()))
--> "hi"
print(string.format("%*", Set.new({1, 2, 3})))
--> {1, 2, 3}
print(string.format("%* %*", return_two_nils()))
--> nil nil
print(string.format("%* %* %*", return_two_nils()))
--> error: value #3 is missing, got 2
It must be said that we are not allowing this style of string literals in type annotations at this time, regardless of zero or many interpolating expressions, so the following two type annotations below are illegal syntax:
local foo: `foo`
local bar: `bar{baz}`
String interpolation syntax will also support escape sequences. Except \u{...}
, there is no ambiguity with other escape sequences. If \u{...}
occurs within a string interpolation literal, it takes priority.
local foo = `foo\tbar` -- "foo bar"
local bar = `\u{0041} \u{42}` -- "A B"
If we want to use backticks for other purposes, it may introduce some potential ambiguity. One option to solve that is to only ever produce string interpolation tokens from the context of an expression. This is messy but doable because the parser and the lexer are already implemented to work in tandem. The other option is to pick a different delimiter syntax to keep backticks available for use in the future.
If we were to naively compile the expression into a string.format
call, then implementation details would be observable if you write `Your health is {hp}% so you need to heal up.`
. When lowering the expression, we would need to implicitly insert a %
character anytime one shows up in a string interpolation token. Otherwise attempting to run this will produce a runtime error where the %s
token is missing its corresponding string value.
Rather than coming up with a new syntax (which doesn’t help issue #5 and #6) and extending string.format
to accept an extra token, we could just make %s
call tostring
and be done. However, doing so would cause programs to be more lenient and the type checker would have no way to infer strings from a string.format
call. To preserve that, we would need a different token anyway.
Language | Syntax | Conclusion |
---|---|---|
Python | f'Hello {name}' |
Rejected because it’s ambiguous with function call syntax. |
Swift | "Hello \(name)" |
Rejected because it changes the meaning of existing strings. |
Ruby | "Hello #{name}" |
Rejected because it changes the meaning of existing strings. |
JavaScript | `Hello ${name}` |
Viable option as long as we don’t intend to use backticks for other purposes. |
C# | $"Hello {name}" |
Viable option and guarantees no ambiguities with future syntax. |
This leaves us with only two syntax that already exists in other programming languages. The current proposal are for backticks, so the only backward compatible alternative are $""
literals. We don’t necessarily need to use $
symbol here, but if we were to choose a different symbol, #
cannot be used. I picked backticks because it doesn’t require us to add a stack of closing delimiters in the lexer to make sure each nested string interpolation literals are correctly closed with its opening pair. You only have to count them.