Arc Forumnew | comments | leaders | submitlogin
1 point by almkglor 6122 days ago | link | parent

>> Try this: ...

Doesn't help. I inserted (map idfn ...) on the value given to enformat, which is really the currying of enformat-base (the very short (fn ) at the end):

  (fn (p)
    (carry-out (parse (many formatting) (map idfn p)))))
Hmm. I tried refactoring the (apply alt (map str-seq ...)) thing to an alt-str-seq, basically I created a trie and then did alt on each key, with each sub-trie another 'alt and stored sequences converted to 'seq, and stored end-of-strings (nil keys) as 'nothing, but it was buggy and didn't improve performance T.T.

I've been doing quite a few attempts at improving performance but often I ended up getting more bugs, waa!



2 points by raymyers 6122 days ago | link

Hmm. I just tried replacing the scanner. Just as you also found, it didn't help performance.

  (def scanner-string2 (s (o start 0) (o end (len s)))
    (map idfn (scanner-string s start end)))
I'll see if I can find time soon to digest the wiki-arc grammar. Possibly it could be optimized, but I suspect treeparse could be handling it better. There are probably lessons to be learned from how Parsec gets good performance.

-----

1 point by almkglor 6121 days ago | link

Found the parsec paper:

http://legacy.cs.uu.nl/daan/download/papers/parsec-paper.pdf

http://legacy.cs.uu.nl/daan/pubs.html#parsec

It seems that part of Parsec's optimization is that it is actually limited to an LL(1) grammar (whatever that means) and it's <|> ('alt) will fail immediately if the first parser ever consumed anything at all. Not sure how that translates to treeparse.

-----

1 point by raymyers 6121 days ago | link

LL(1) grammars only require one token of look-ahead to parse.

Parsec does not strictly require this, it can handle infinite look-ahead grammars. However, for good performance, it is best to use LL(1) grammars -- so there will be no backtracking required.

When using Parsec, I have often been surprised by the quick-failing behavior of <|> that you mentioned. Thus, I did not duplicate it in treeparse.

-----

1 point by almkglor 6120 days ago | link

Hmm. Apparently it's the fast way of doing it, although I'm not sure how to implement it in the first place.

-----