No account? Create an account

Mixfix parsing / chain-associative operators

« previous entry | next entry » 16th May 2013 | 00:25

Earlier this evening I was reading about parsing user-defined mixfix operators in Agda.

Mixfix operators are actually quite common though they are usually called something else. The Danielsson and Norell paper describes operators using a notation in which underscores are placeholders for the operands, so (using examples from C) we have

• postfix operators
• _ ++
• _ .member
• prefix operators
• ! _
• & _
• infix operators
• _ + _
• _ == _

There are also circumfix operators, of which the most common example is (_) which boringly does nothing. Mathematical notation has some more fun examples, such as ceiling ⎡_⎤ and floor ⎣_⎦.

Mixfix operators have a combination of fixities. C has a few examples:

• _ [ _ ]
• _ ? _ : _
You can also regard function call syntax as a variable-arity mixfix operator :-)

The clever part of the paper is how it handles precedence in a way that is reasonably easy for a programmer to understand when defining operators, and which allows for composing libraries which might have overlapping sets of operator definitions.

One thing that their mixfix parser does not get funky about is associativity: it supports the usual left-, right-, and non-associative operators. One of my favourite syntactic features is chained relational operators, as found in BCPL and Python. (For fun I added the feature to Lua - see this and the following two patches.) You can write an expression like

```    a OP b OP c OP d
```
which is equivalent to
```    a OP b && b OP c && c OP d
```
except that each operand is evaluated at most once. (Though unfortunately BCPL may evaluate inner operands twice.) This is not just short-cutting variable-arity comparison because the operators can differ.

So I wonder, are there other examples of chain-associative operators? They might have a different underlying reduction operator instead of &&, perhaps, which would imply different short-cut behaviour.

Perhaps an answer would come to mind if I understood more of the category-theory algebraic functional programming stuff like bananas and lenses and what have you...

from:simontdate: 16th May 2013 08:26 (UTC)Permalink

The most fiddly circumfix operator in mathematics is the modulus/norm sign |_|, of course, because of the enormous parsing headache arising from the opening and closing parts of it being the same!

It occurred to me in a whimsical moment recently that if "a++" increments a from 3 to 4 and returns 3, and "++a" increments a from 3 to 4 and returns 4, then by interpolation "+a+" ought to increment a from 3 to 4 and return 3.5 :-) (Which also suggests a model of array index and pointer variables in which they can point to either an element or a gap between elements, with the latter represented as integers and the former as integer-and-a-half, so that *+p+ increments p from one gap to the next and returns the element it passed over!)

from:fanfdate: 17th May 2013 09:20 (UTC)Permalink

That headache is I think why I dislike the current popularity of Smalltalk-style |_,_| argument brackets - e.g. http://static.rust-lang.org/doc/rust.html#lambda-expressions and http://www.robertsosinski.com/2008/12/21/understanding-ruby-blocks-procs-and-lambdas/ and it frequently comes up as a suggested feature for Lua http://lua-users.org/wiki/ShortAnonymousFunctions

from:simontdate: 16th May 2013 08:36 (UTC)Permalink

Chain-associative operators: if what you mean by that is simply "something where `a OP b OP c` is something distinct from both of `(a OP b) OP c` and `a OP (b OP c)`", then Python's comma operator sort of counts, in that you can write `a,b` to combine two things and get back a 2-tuple, but writing `a,b,c` gives you a 3-tuple rather than a 2-tuple one of whose elements is another 2-tuple.

Other associativity oddities: when I've found myself defining expression syntaxes in the last few years I've tended to rule that `&&` and `||` occupy two distinct but non-nested precedence levels and hence do not associate at all with each other – so you can write `a && b && c` and `a || b || c`, but writing `a && b || c` is a parse error and you must add parentheses to disambiguate. I decided this because in my experience gcc's warning "suggest parentheses around `&&` within `||`" is only right about half the time and the other half is a patronising way of saying "you need parentheses around the `||`"...

from:cartesiandaemondate: 16th May 2013 11:22 (UTC)Permalink

"My compromise proposal for 0.5-indexed counting was rejected without, I thought, due consideration" :)

from:fanfdate: 17th May 2013 09:27 (UTC)Permalink

Hmm, I think tuple constructors are better viewed as variable arity operators, since there aren't multiple different operators in play, as in min <= index < max.

The paper is quite keen on non-associative operators. I wonder if Agda uses them a lot, and how much they help in that kind of code.

I have been reminding myself about top-down operator precedence parsing (aka Pratt parsers) which seem to make it hard to support non-associative operators - though they can do mixfix rather nicely.

from:simontdate: 17th May 2013 09:35 (UTC)Permalink

I'm also reminded that an acquaintance of mine at Trinity designed a variant of BASIC (no, I don't know why), and wrote an interpreter for it, in which "min <= index < max" was a valid operand to the for statement – you'd write things like "for 0 <= i < 10". :-)

from:fanfdate: 17th May 2013 09:41 (UTC)Permalink

Oh yes, thanks for reminding me of that! Fun :-)

from:bellinghmandate: 16th May 2013 11:03 (UTC)Permalink

The C++ streaming idiom:

os << "Hello " << name << ", how are you today?";

Also function chaining such as:

obj.foo().bar().baz();

where each method returns a new reference on which a new method is called. (And in fact the streaming idiom is syntactic sugar for this.)

In C, of course: start->next->next->next;

from:deborah_cdate: 16th May 2013 12:07 (UTC)Permalink

I think that's different: in Python, the chained relational operators are something that has to be understood specially by the parser -- consider (-3 < -2) < -1 vs -3 < -2 < -1 -- whereas the C++ stream operators are simply ordinary left-associative operators.

from:ewxdate: 16th May 2013 12:31 (UTC)Permalink

While that's true, there's a guarantee that the return value of the base operator is the same as the LHS, which makes it a bit different from e.g. an ordinary addition operator.
Specifically, the transformation is A op B op C -> A op B rop A op C, rather than A op B op C -> A op B rop B op C. (Where rop='reduction operator', to use Tony's terminology.)

from:fanfdate: 17th May 2013 09:32 (UTC)Permalink

Those are all normal left-associativity: a OP b OP c is evaluated as (a OP b) OP c. There's no implication that a OP b and b OP c are evaluated separately and subsequently combined in some way.