Glint's Quirky Call Syntax
In case you don't know, I'm the creator of and main contributor to the Lensor Compiler Collection (abbreviated LCC). I am also the designer and creator of a language LCC compiles, Glint. Herein lies the quirks of Glint's syntax, and why it took so long to put together a formal grammar (or, even a comprehensive tree-sitter one).
Glint is, well, Glint. You know it when you see it, and there's a couple reasons for that. One is that it is quite opinionated, but also not at all. You'll see what I mean.
Let's take a look at what is often the most noticeable difference between Glint and other popular programming languages, like C and LISP: call expressions.
Implicit Invocation
Let's assume we have some function foo which takes no parameters, and we want to call it.
In C:
foo();
In LISP:
(foo)
Now, in Glint:
foo;
Uhhh… Where are the parentheses?! Well, you see, this is one place where Glint keeps it interesting: implicit invocation. That is, the semantics of Glint state that an expression with a function type may be implicitly converted to an invocation of that function, and the type of the expression updated to the return type of the function. That's just a fancy way of saying, for a function with no parameters, you don't need to use parentheses to call it. Cool! I like typing less.
NOTE: the idea for implicit invocation is inspired from Algol 68; LCC's test suite was originally written in Algol just for the fun of it.
Calls With Arguments
Now, let's assume we have some function bar which takes two integer parameters, and we want to call it.
In C:
bar(27, 42);
In LISP:
(bar 27 42)
Now, in Glint:
bar 27, 42;
Wait, where are the parentheses? That's right, Glint doesn't associate parentheses with call expressions in any scenario. You never need them, unless you do (more on that later).
The reason behind this design choice was simple: I like LISP, but I hate (yet also understand) when people complain about the noise of the parentheses. So, Glint attempts to be as LISP-like as possible while not requiring any in the most common cases. Like how English changes a lot of word's hard-to-make sounds to the schwa sound (IPA ə) to make it easier and quicker to say, Glint alters the syntax of expressions to not require parentheses (or noise, in general) in as many places as possible.
When Things Get Complicated
Now, you may be wondering something: if a call is just (at least) two expressions next to each other, how do we parse binary operators in relation to calls?
That is, how should the following be interpreted?
blah 27 + 42;
Is this a call expression, with callee blah and one binary expression argument 27 + 42?
Or, is this a binary expression, with a call expression blah 27 on the left hand side, and 42 on the right hand side?
Call
|-- identifier: blah
`-- Binary: +
|-- number: 27
`-- number: 42
Binary: + |-- Call | |-- identifier: blah | `-- number: 27 `-- number: 42
If you are familiar with writing parsers, than you are probably thinking, "Normally, we'd handle this sort of thing with precedence."
Unfortunately, there is one more quirk of Glint we have to consider, and it's that some unary prefix expressions share the same token with binary infix expressions; of those, & is the most common. & as a unary prefix operator gets the address of it's contained expression. & as a binary infix operator performs a bitwise AND operation on it's contained expressions, and returns the result.
Now, how should the following be interpreted?:
blah &x;
To most, it would make most sense, if blah is a function accepting a pointer argument, that this would be a call to that function with a single argument.
Notice, however, how that interpretation depends on the type of the object that the identifier blah refers to. Information like that isn't available until after semantic analysis, which means a formal grammar can't encode that information into it. So, if we want the grammar to not be (ULTRA) ambiguous, we have to decide on a rule to parse this.
You may be thinking, "If we already parsed the declaration of blah, doesn't the parser already know it's type?"—and you would be correct. If the declaration of blah happens to be before this expression, then yes, we could know that. The problem is that Glint's function declarations are not in any order; you can use a function before defining it, since call expressions don't need resolved until semantic analysis (AFTER everything has been parsed). This means that, at the time of parsing, you cannot guarantee that you will know the type of anything (and that tracks, since parsing shouldn't really do anything based on types other than parse them). Because of this, we can't conditionally parse a call or a binary expression based on the type of the lhs, and we can't just collect these parsed expressions and figure it out in semantic analysis because each path would further change the parsed tree (i.e. precedence wouldn't apply properly).
Okay, so, our rule (mentioned above) is as such: binary expressions are parsed before call arguments. So, if there is a prefix unary operator that may also be a binary infix operator directly following some parsed expression, it is treated as the binary infix operator. Notice that this is different from precedence, since we encounter a token that could either be a binary operator containing this expression, and should be parsed, or a unary prefix one for a further expression, that should be not parsed until parsing the next expression.
This means, sadly, that the concession of an unambiguous grammar means we have to sacrifice this specific use case of parentheses if we want it to be a call. Do note that the parentheses have nothing to do with it being parsed as a call; they are just a way to group expressions explicitly so that precedence doesn't come into effect.
A similar issue applies to subscript not knowing how tight to bind with respect to calls. Is foo 42[0]; a subscript of the call foo 42, or is it a call of foo with a 42[0] subscript argument?
As you can see, there would be an ambiguity here if not for the following rule: a binary expression may not directly contain a call expression. If you want to apply a binary expression to a call, group the call's expressions with parentheses and subscript the paren expression.
So, binary expressions actually only accept a subset of Glint's expressions (ones that aren't calls). This is put in practice by the simple ordering of gathering binary operators before call arguments during parsing (such that a call is parsed after binary expressions are, it can't directly appear as a child of one (since it hasn't yet been parsed) (because the binary expression binds more tightly)).
lhs = ParsePrimary(); lhs = ParseBinary(); if (at separator) return lhs; lhs = GatherCallArguments();
This means, for our subscript example, foo 42[0], foo is parsed as a primary expression (identifier), there is no valid binary operator and no expression separator, so we treat foo as a callee and begin gathering call arguments. We parse 42 as our primary, and, now, there is a valid binary operator ([ for subscript operation), so we parse that with the lhs being 42. As you can see, a binary operator can never have a call as it's left hand side, because a call hasn't been parsed yet. This effectively means that the lhs of a binary expression is not allowed to be a call expression, which reduces the ambiguity in the grammar.
So, for - and &, both unary prefix and binary infix operators, they "can't" appear as unary operators as the direct first argument to a function call. They can, however, appear as unary operators as the direct child of a paren expression, so we "escape" to the full Glint grammar using parentheses. This allows Glint to be as parentheses-less as possible, unless they are absolutely needed to disambiguate the expression.
I am currently considering altering bitwise and operator to be bit& instead of just &, so that parentheses wouldn't be needed in the addressof case (since that tends to be somewhat common).
If you are interested in how this sort of grammar would be parsed, there is the LCC implementation in C++, and a tree-sitter grammar declared in JavaScript with a very understandable structure.