Sharedvar unboxing is a little difficult, since there are two types of local variables: those in closures, and those in parameters:
(let kept ()
(set keeper
(fn (x)
(set kept (cons x kept)))))
; vs.
(set fooer
(fn (x)
(set x (rev x))
(do-something x)))
So basically we need two types of local-variable-set primitives: one for closures-variable-set (first case above) and another for parameter-variable-set (second case)
Re: Stalin - is it that slow? Meaning an order of magnitude improvement of time is needed to make it comfortable?
Edit:
Type inference: well I can't think of a good way of getting type inference generically, but certainly it's possible for e.g. '+. '+ requires that all parameters are either numbers, or all strings, or all lists, and if we can determine that one parameter is of a specific type, we can put the checking that the other parameters are of that type and immediately bind the + to the specific type.
For example if we have %n+ for numeric addition, %s-join for string concatenation, and %l-join for list concats:
(+ x y z)
=>
(+ x y z) ; can't determine type
(+ 1 x)
=>
(%n+ 1 (let check x
(if (is (type check) 'num)
check
(err "+: type mismatch"))))
(+ (list 1 2 3) x)
=>
(%l-join (list 1 2 3)
(let check x
(if (is (type check) 'cons)
check
(err "+: type mismatch"))))
Stalin might be the most optimizing but slowest functional language compiler ever written.
Sharedvar unboxing is not an important issue because it doesn't make much difference in efficiency. Most scheme programs don't update local variables very often. The most useful optimization parts are(in my opinion): Special treatment of let and letrec, inlining and known function detection.
General ML-style type inference for scheme is impossible. What we can do is to infer as more types as possible.
1)Trasforming let and letrec to ((fn (...) ...) ...) is not efficient. First, it would allocate a closure. Second, it would perform a function call. Instead, the variable bound by let and letrec should be allocated on the stack, and no function calls are needed.
2)For example, in:
(f x)
If f is statically known, and f's environment is null or is the same as the environment of the calling site, then the function call should be a direct jump. It eliminates the cost of (1)global fetching of 'f, (2)extracting the information of the address and environment, (3)switching the environment, (4)an indirect jump.
By known function detection, tail recursive functions can be compiled to exactly the same code as loops in imperative languages do.
1) Given that everything is transformed to CPS, pretty much everything - including sequences I think - ends up being a function call. In fact, only jumps exist at all.
Not sure about how the closure-conversion works. This may be workable, would you be willing to work on this?
2) I get this now, although I'm not sure how to translate this into the current Arc2c output. I'll discuss the current arc2c calling convention and you tell me how workable your proposal is.
------
Currently, arc2c output works out like this:
1) Each function is simply a case in a large switch statement:
jump: switch(pc){
case 0:
...code for function 0...
case 1:
...code for function 1...
}
2) There exists a "stack" which is not the C-stack:
3) At the start of each function (with the exception of function 0, which is the top-level), the stack contains a [0] closure for the current function, [1] the continuation, and [2+] the Arc parameters. This is assured by the calling function.
4) Functions are passed around as closure structures. The first eleemnt of the closure structure is a non-encoded number, representing the case for that function, while the rest is simply an array of closure variables.
5) Functions simply use the stack for temporary scratch space. For example this is how (+ 1 2) would compile to:
PUSH(FIX2OBJ(1));
PUSH(FIX2OBJ(2));
ADD();
6) Just prior to calling a function, the calling function pushes the parameters in order: [0] closure (the function to call), [1] continuation [2+] arguments. The number of elements N for the call is computed by the compiler
7) Then at the function call, the calling function copies the top N elements of the stack into the bottommost N elements, and assures that sp = &stack[N]. Then it sets the C-variable pc to the closure's function field, and does a C goto jump;
Well, I only have experience of writing direct-style compilers, not CPS-style ones, so my advice needs to be adapted.
But from mechanism of the current arc2c output you showed above, I see many places for improvement:
1)In a function:
(fn (x y z ...) (g A B C D ...)),
if B doesn't rely on x, C doesn't rely on x and y, D doesn't rely on x, y and z...etc, the calling function could avoid copying elements to the bottom. Instead, it moves the stack pointer to the bottom first, and then pushes the arguments.
2)For functions having no environments, we don't have to push a full closure, we just have to push pc.
3)For known functions, we just do a C goto jump not to the jump label, but to the (case n), because C cases are in fact labels.
Finally, in my opinion, a CPS-style compiler is no longer a better choice nowadays. It complicates the source, the debugging information and the (human) analysis of the program structure. Since we are already using a separate stack that is different to C's, continuations can be implemented in direct-style compilers as easily as in CPS-style ones. And codegen for direct-style compilers is just slightly more difficult, which isn't an issue. In addition, a naive direct-style compiler performs much better than a naive CPS-style one. The latter needs a source simplifying step to eliminate unnecessary closures and function calls produced by CPS conversion.
1) personally I think this is a rare case, but I could be wrong
2) arc2c closures are very lightweight: it's just a simple array of obj(s), with the first obj being the pc. So in effect for functions having no environment, we are pushing a pointer to the pc.
That said, closures are also used to represent functions that can be passed around. Unfortunately closures are currently untyped, so we expect the current closure style to be changed.
Also we need to support the possibility that a "function" being called isn't really a function: after all table syntax is just (tb key). And this is perfectly valid Arc:
(let sometable (table)
(each k lst
(= sometable.k (generate-something k)))
(map sometable ; yes, we're passing a table as if it were a function!
foolst))
3) I was actually thinking of this too, although I haven't gotten around to it.
re: CPS: I wouldn't really know. Me, I'm just hacking around at the transformations before the CPS and Closure conversions. Because of the somewhat modular construction of arc2c, in theory you could write a drop-in replacement for CPS and Closure conversions, as well as code generator, and we can then put either CPS or the direct style as options, maybe.
1)It's not a rare case. It's important for speed improvement for most of useful programs. For example, map & foreach, which are used quite often, can be optimized by not copying data on stack.