Arc Forumnew | comments | leaders | submitlogin
Are characters really useful as a separate type?
6 points by randallsquared 6130 days ago | 7 comments
The Python way of having characters just be strings of length one seems simpler and more consistent than having both a character type and a string type. So, with the proposed subseq and indexing additions:

    > ("foobar" 3)
    "b"
    > ("foobar" 2 -2)
    "oba"
Other than offending people who'd be startled that strings are a sequence of strings, is there some reason not to do this?


1 point by are 6129 days ago | link

Would have been interesting to hear PG's thoughts on unifying chars, strings AND symbols...

[Edit: Much updated below this] Here are my 2 cents':

- A string would be either the empty string, or a non-empty sequence of strings of length 1.

- A string would also have an _associated value_ (nil by default), that can be of any type (including functions, macros, special forms or other strings). A string would then also be a key-value pair, and a keyed hashtable could be treated conceptually as a collection of strings with no duplicates.

- A non-whitespace, no-paren (alphanumeric?) string will return its associated value (= be evaluated) in an Arc expression if you _don't quote it_, making such unquoted strings convenient for using as variable or function names. In addition, if such an unquoted string immediately follows a left-paren in an Arc expression, its value will be treated as a function and applied. (All this is different for special forms and macros, of course.)

- To quote a string, you would use double-quotes around it, to keep it from being evaluated, and returning itself rather than its associated value. Quoted strings are then string literals, and in this usage, the associated value will often be nil. String literals of length 1 are used as character literals.

- A non-whitespace, no-paren (alphanumeric?) string can alternatively be quoted by prepending a single-quote, in the same manner that you quote a composite Arc expression by prepending a single-quote before the leading left-paren. If you intend to use the associated value of the string, this is the preferred way of quoting it (for clarity).

- In a double-quoted string, you will have to escape parens, double-quotes and single-quotes by prepending a single-quote.

- To return the associated value for a string that includes whitespace or parens (or non-alphanumeric parts?), you would have to do explicit evaluation by using eval with the double-quoted string. This would discourage (but not disallow) using such strings for variables and definitions.

I'm guessing this proposal is inconsistent and unworkable, but I'm throwing it out there anyway...

-----

1 point by are 6129 days ago | link

This comment has now bee posted as a submission. Please comment there.

-----

2 points by mdemare 6130 days ago | link

Yes, there is a reason. Characters contain a lot of information: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character....

-----

4 points by ryantmulligan 6129 days ago | link

Correct me if I'm wrong, please, but isn't this information required of a string of length 1 anyway?

-----

4 points by emmett 6130 days ago | link

Ruby, in 1.8, has arc-style strings: "foo"[0] == 102

In 1.9, it's moving towards Python: "foo"[0] == "f"

This is extremely sensible. I have never wanted the character code of an arbitrary index in a string. I always want a single character substring.

-----

2 points by scav 6130 days ago | link

And in 3.0, Python is adding a bytes type, so b"foo"[0]==102

Obviously both ways are useful. If I had to guess what Arc ends up doing, I'd guess : whatever leads to the programmer having to type less tokens, or whatever facilitates clever macro definitions, leading to same.

-----

5 points by randallsquared 6130 days ago | link

I would suggest that marking a literal vector of bytes is not the most useful role double quotes could play. Python has a lot of history of using that, due to exactly this kind of confusion of strings and vectors of bytes, leading to b"", u"", etc.

-----