Arc Forum | Well, you just have to study type inference. ^^ Haskell is a good example ^^.re...

Arc Forum

4 points by almkglor 6433 days ago | link | parent

Well, you just have to study type inference. ^^ Haskell is a good example ^^.

re: "n could have been changed by a function call inside the optimized function". This normally can't happen on a function call. This can only happen in the following cases:

1) n is a global variable, in which case you can't safely optimize on it anyway

2) n is enclosed both by this function and the function it calls - e.g. if the function it calls is a subfunction that can see this local variable - and the function it calls does 'set on that variable.

3) if the function itself does 'set on that variable.

Otherwise, 'n cannot possibly change (i.e. purely functional programming)

Note also that you simply need to create a really big data structure to handle each part of the program.

Say you have the following canonical definition:

  (def fib (n)
    (if (< n 2)
      n
      (+ (fib (- n 1)) (fib (- n 2)))))

This defines an infinite set of functions, where (type n) is all the possible types (including types that haven't been defined yet, which is why it's an infinite set). Most of those functions will really be (err "<: expects <number>, got ..."), but hey ^^.

(of course, infinite sets can be modelled using lazy eval.)

Now let's also consider the "+ could have been modified". In most cases this happens only if someone has a new type. Otherwise, any modification of '+ which doesn't involve someone defining a new type is probably someone screwing your system, in which case we assume that it's screwed anyway so you might as well ignore that possibility (and consider how to stop someone screwing your system in the first place).

But if the modification of + is itself based on having a new type, then we should also include '+ itself in the optimizing type system. So now '+ itself represents an infinite set of functions, based on (type n). Adding a definition for '+ based on type must thus extend the lazy set of '+ functions.

Thus at compilation, we only have to detect what gets called at top-level (that is, the repl). We can trivially determine the types of the arguments at top-level (just do 'type on each of them!).

Now any variables defined in the called function will be either the arguments themselves or will be other local variables, based on computations. Those computations will simply be a call to another function, based on the arguments (and globals).

For the functions called within other functions, we simply analyze them downwards and determine: if a function is given type a, what is the output type? Then we can determine the type of local variables.

We can determine the output type of the called functions by simply recursively determining what the possible outputs are. The output of the function is always the result of the last expression in the function and we can then determine the type from there. If the last expression is some sort of conditional, then it could be the output of the types from any of them.

Then we eventually reach one of a set of basic functions. These basic functions would have defined exactly what type it returns, based on its inputs. For example '+ would be applied to a list of stuff, [a], and return a type of a. (obviously such basic functions can also be defined by the user - all you have to do is support some sort of a -> a syntax, like in Haskell). Basically, '+ promises that if given a list of objects of type a, it will return an object of type a, or die trying (i.e. raise an exception).

Then we can select which '+ to use, based on the actual type we've determined.

So the flow on your 'fib function would be:

1) 'fib receives an object of type a. If it's received an object of this type before, it will use the function created for that type.

2) otherwise, the analyzer goes through the 'fib code, looking for locals and temporaries (i.e. return values of functions, which will be temporarily placed in a stack-like location) and determining their types.

3) the analyzer reaches (if ...) and enters into each expression in it

4) it sees (< n 2) and determines that < is a (a,b) -> t/nil basic function. It then knows that the 'if it has to use just has to handle t and nil, and doesn't have to handle random objects (could be useful nfo, who knows? maybe some computer hardware somewhere can branch based on that faster than normal)

5) since the last expression in 'fib is an 'if form, it knows that 'fib itself returns several possible types. The first type is (type n) itself.

6) The other type is the type returned on '+. It knows that the type returned by '+ is the same type as all its inputs.

7) The type input into '+ is the type of 'fib. Since we're analyzing 'fib itself, we use a separate type, b.

8) The input into 'fib is the output of '-, which is also a basic function whose type is the same as its inputs. Since the inputs are of (type n), we know that the input to 'fib is therefore (type n).

9) The hard part: how do we know that 'fib therefore is a b -> b function (i.e. returns a b, given a b)?

1 point by sacado 6433 days ago | link

Hmm... I think I'll have to read that more deeply later on... There are obviously interesting ideas in it, but hard to implement... Not sure I understood everything yet...

Currently, I'm working on a naïve approach : it supposes native numerical functions (only numerics as for now) are not redefined, or, at least, that they are redefined in a conservative way (that is, as you mention it, e.g. + was redefined to be able to add apple, a user type, but still works the regular way with numbers), and it knows that, if a numerical operation is called only with numbers :

- it can use its inlined version,

- the result will be a number too, so nested numerical operations can be taken into account too.

For example, when we call (fib 30), the compiler knows the n arg and literal numbers are numbers, so (- n 1) and (- n 2) and (< n 2) are numbers too and this gets translated into (n- n 1), (n- n2) and (n< n 2). However, it cannot know (yet) (fib (n- n 1)) is a number, so the final sum cannot rely on the inlined + :

  (gen-ad-hoc (listtab '((n int))) '(fn (n)
    (if (< n 2)
      n
      (+ (fib (- n 1) (fib (- n 2)))))))

  -> (fn (n) (if (n< n 2) n (+ ((fib (n- n 1) (fib (n- n 2)))))))

The gen-ad-hoc function generates the ad hoc code, based on the fact that n is an int. I still have a few bugs (too many parens somewhere), but that's a good start :)

-----