Arc Forumnew | comments | leaders | submitlogin
1 point by rkts 5874 days ago | link | parent

Here's mine. It seems to be faster than the versions presented so far, assuming I haven't made any dumb mistakes. You'll have to put

  (xdef 'sub substring)
in your ac.scm first.

  (= wordfile "big.txt")

  (= *nwords* (table))

  (w/infile f wordfile (whilet l readline.f (counts (tokens downcase.l whitec) *nwords*)))

  (= str [coerce _ 'string])

  (= alphabet (accum a (each c "abcdefghijklmnopqrstuvwxyz" (a str.c))))

  (def edits1 (w)
    (let n len.w
      (accum a
        (for i 0 (- n 1) (a:+ (sub w 0 i) (sub w (+ i 1))))
        (for i 0 (- n 2) (a:+ (sub w 0 i) (str:w (+ i 1)) (str:w i) (sub w (+ i 2))))
        (for i 0 (- n 1) (each c alphabet (a:+ (sub w 0 i) c (sub w (+ i 1)))))
        (for i 0 n       (each c alphabet (a:+ (sub w 0 i) c (sub w i)))))))

  (def known (ws) (keep [*nwords* _] ws))

  (def known-edits2 (w)
    (accum a (each e edits1.w (each e edits1.e (if *nwords*.e a.e)))))

  (def correct (w)
    (best (compare > [*nwords* _])
          (or (known:list w) (known:edits1 w) (known-edits2 w))))
The main optimizations I made were replacing cut with sub, replacing string with +, and dropping the dedup.


1 point by rkts 5874 days ago | link

Also, here's a version which uses iterators to reduce consing. It's ~2x faster on long words (again, assuming no blunders on my part).

  (= wordfile "big.txt")

  (= *nwords* (table))

  (w/infile f wordfile (whilet l readline.f (counts (tokens downcase.l whitec) *nwords*)))

  (= str [coerce _ 'string])

  (= alphabet (accum a (each c "abcdefghijklmnopqrstuvwxyz" (a str.c))))

  (def edits1 (w)
    (fn (f)
      (let n len.w
        (for i 0 (- n 1) (f:+ (sub w 0 i) (sub w (+ i 1))))
        (for i 0 (- n 2) (f:+ (sub w 0 i) (str:w (+ i 1)) (str:w i) (sub w (+ i 2))))
        (for i 0 (- n 1) (each c alphabet (f:+ (sub w 0 i) c (sub w (+ i 1)))))
        (for i 0 n       (each c alphabet (f:+ (sub w 0 i) c (sub w i)))))))

  (def editsn (n w)
    (let ws [_ w]
      (for i 1 n (= ws imappend.edits1.ws))
      (iornil:ikeep [*nwords* _] ws)))

  (def correct (w)
    (aif (or editsn.0.w editsn.1.w editsn.2.w)
      (ibest (compare > [*nwords* _]) it)))

  ; iterator utils

  (def iempty (i) (ccc (fn (c) (i [c nil]) t)))

  (def iornil (i) (and (~iempty i) i))

  (def imappend (r i) (fn (f) (i [r._ f])))

  (def ikeep (p i) (fn (f) (i [if p._ f._])))

  (def ifoldl (r q i) (i [= q r.q._]) q)

  (def ibest (gt i) (ifoldl (fn (y x) (if (or no.y gt.x.y) x y)) nil i))

-----

4 points by rkts 5847 days ago | link

Well, not that anyone cares, but it turns out that the changes I made here do close most of the performance gap between Arc and Python. Python is still faster, but it's nothing to lose sleep over. The most important change was replacing Arc's cut with MzScheme's substring, which works the same way but is implemented in C. The other changes were important too, though, especially replacing lists with iterators. So I guess the lessons are

1. Any Arc library function that has an MzScheme equivalent should be rewritten to use the MzScheme version.

2. Using iterators to reduce consing is a win. So, maybe we should make a library of common iterator utilities (map, filter, etc.)?

-----