Arc Forumnew | comments | leaders | submitlogin
2 points by tokipin 6100 days ago | link | parent

i don't know the technical terms, but probably one of the things that gives Lua its speed is that if you have multiple strings in the program that are the same, the VM assigns them the same pointer. string comparisons are therefore trivial and i imagine this mechanism would make table lookup very direct


5 points by absz 6100 days ago | link

That's what Arc's symbols are for. Generally speaking, you should be keying your tables with symbols for exactly that reason: every 'a is at the same place in memory, but every "a" is not (which allows mutable strings).

-----

3 points by kens1 6098 days ago | link

You had me worried, but I'm pretty sure there's absolutely no problem with using strings as the keys for tables.

The MzScheme documentation says: make-hash-table ... 'equal -- creates a hash table that compares keys using equal? instead of eq? (needed, for example, when using strings as keys).

Checking ac.scm, sure enough:

  (xdef 'table (lambda () (make-hash-table 'equal))
Likewise, the "is" operation in Arc uses MzScheme's string=? . (That's a statement, not a question :-) So string comparison works, although in O(n) time.

Net net: strings are okay for comparison and table keys in Arc.

-----

2 points by absz 6098 days ago | link

I wasn't saying that they weren't usable, just that they were, in fact, slower; that's what you observed. Symbol comparison is O(1) time, whereas string comparison is O(n) time. That's all.

-----

3 points by are 6099 days ago | link

I would rather have immutable strings + unification of symbols and strings.

- Any string could have an associated value, like symbols today.

- "foo", 'foo and (quote foo) would be the same object (you would allow Lisp-style prepend-quoting of non-whitespace strings for convenience).

- foo, (unquote "foo") or (unquote 'foo) would then be used to evaluate, so even non-whitespace strings like "bar baz" could act as symbols (but with less convenience, of course, since you would have to use the unquote form to get them evaluated).

- Since such a unified string/symbol would also act as a perfectly good key/value pair, a simple list of such strings will in effect act as a string-keyed hashtable (since duplicate strings in the list would be the same immutable key), and can be used wherever you need symbol tables (e.g. for lexical environments). In fact, string-keyed hash tables would be a subtype of any-sort-of-key hashtables, and probably used much more.

-----

2 points by absz 6099 days ago | link

Right now, you can do (= |x y| 3) to get at oddly-named symbols, or

  arc> (eval `(= ,(coerce "x y" 'sym) 42))
  42
  arc> |x y|
  42
. And by (unquote "foo"), do you mean (eval "foo")? Or do you mean `,"foo"? The latter makes more sense here.

At any rate, I'm not convinced that this is actually a benefit. Strings and symbols are logically distinct things. Strings are used when you want to know what they say, symbols when you want to know what they are. Unifying them doesn't seem to add anything, and you lose mutability (which, though unfunctional, can be quite useful).

-----

3 points by are 6099 days ago | link

Good feedback.

> Strings and symbols are logically distinct things. Strings are used when you want to know what they say, symbols when you want to know what they are.

Fine. But this breaks down anyway when you force people to use (immutable) symbols instead of strings for efficient allocation. When using symbols as keys in hashtables, you do not "want to know what they are", you "want to know what they say".

And unification would possibly have good consequences for simplifying macros and code-as-data (especially if characters are also just strings of length 1). Most code fragments would then literally be strings (most notable exceptions would be numeric literals, list literals and the like).

-----

2 points by absz 6099 days ago | link

Actually, in a hash table, I usually don't care what the key says, any more than I care about the name of the variable used to store an integer. I care about it for code readability, but I'm usually not concerned about getting a rogue key (where I do care what it says). In that case, I would either use string keys or (coerce input 'sym).

I'm not convinced that characters being strings of length one is a good idea... it seems like the "character" is actually a useful concept. But I don't have a huge opinion about this.

Code fragments would still be lists, actually: most code involves at least one function application, and that's a list structure. Only the degenerate case of 'var would be a string.

-----

1 point by are 6098 days ago | link

> Actually, in a hash table, I usually don't care what the key says, any more than I care about the name of the variable used to store an integer.

That's fine again, but my point is just that by using symbols as keys in hashtables, you never care about the value part of that symbol (you just need an immutable key); you're not using the symbol "as intended", for value storage.

> most code involves at least one function application, and that's a list structure.

Yep. But in the case where that function application does not contain another function application (or list literal) in any of its argument positions, we would, with my proposal, be talking about a list of strings, which could then again be seen as a string-keyed hash table...

-----

1 point by absz 6098 days ago | link

Symbols are not "intended" for value storage, symbols happen to be used for value storage. Symbols have exactly the same properties as, say, named constants in C, and thus fit in the same niche. They also have the properties of variable names, and so fit in that niche too. Symbols are a generally useful datatype, and they are no more intended for just "value storage" than cons cells are intended for just list construction.

A list of strings is still a list, which is not what you said; right now, it's a list of symbols, and I don't see the benefit of a list of strings over a list of symbols.

-----

3 points by almkglor 6098 days ago | link

Really, I think most people are confused by the boundary between interface and implementation.

It's possible to have a mutable string interface built on an immutable string implementation. Just add indirection! We can get the best of both worlds: immutability of actual strings optimizes comparison of strings, while the pseudo-mutable strings allow you to pass data across functions by a method other than returning a value (i.e. mutating the string).

-----

1 point by absz 6098 days ago | link

That's a good point. However, it leaves open the question of what "a new string" creates. One can build either one on top of something else (e.g. immutable strings on top of symbols [though one could argue that that's what symbols are, I suppose]), so the real question (it seems to me) is what the default should be.

-----

3 points by almkglor 6098 days ago | link

This is where "code is spec" breaks down, again ^^; \/

I suppose if the user uses symbol syntax, it's an immutable string, while if the user uses "string syntax", it's a mutable string. Interface, anyone? ^^

edit: typical lisps implement symbols as references to mutable strings; some newer ones implement mutable strings as references to immutable strings, with symbols referring also to immutable strings.

-----

3 points by absz 6098 days ago | link

This isn't so much code is spec, though: Arc only has mutable strings and symbols. You could consider symbols immutable strings, but they exist to fill a different niche.[1] If mutable and immutable strings were created, then the code-spec would have to deal with this; I think it would be capable of doing so.

I'm not so concerned with how Lisps represent symbols and (mutable) strings as long as (1) my strings can be modified, and (2) comparing symbols takes constant time. If it's the Lisp interpreter protecting the string-representing-the-symbol, so be it; that doesn't affect me as a Lisp programmer.

[1]: Although if I recall, Ruby 2.0 will make its Symbols "frozen" (immutabilified) Strings, thus allowing things like regex matching on Symbols. This might be an interesting approach...

-----