I think the phrasing of the announcement may be a large part of the problem. Not doing unicode is about the only concrete thing it says about Arc. People read it expecting to find out why Arc is going to be great. And it didn't really say, except for talking about the principles of conciseness and power. Even the tutorial doesn't say "To allow the writing of concise, powerful programs, Arc introduces features X, Y and Z." It just says "Arc (and lots of other Lisps) have features A, B, C, D, E, F..."
People don't grasp abstract principles well, and even when they do, they don't trust you to mean what they think you mean until they see some concrete evidence. Your problem is that the "unicode admission" and the stuff about the HTML library are the only solid statements in the announcement about what they can expect to find in Arc. That's what they latch on to instead of a vague promise that Arc will let them write shorter programs.
I think the point is that, in the presence of combining diacritics, even 32 bits isn't enough. A character is (roughly) one "base" 32-bit code plus zero or more "combining" 32-bit codes. And equality between two characters isn't purely structural - you might re-order its combining codes or use a pre-combined code. (Not all combinations have pre-combined codes.)
I will point out that I know very little about Unicode, so I might be a bit off. I can't say that I'm even very interested in the whole Unicode debate, so long as it all gets sorted out at some point in the future.
The only reason Unicode contains combined forms is for compatibility with existing standards: you cannot invent new code points representing a novel combination of base and combining characters. The Unicode normalization forms deal with these issues.
Unicode support is a complex issue: fundamentally there are the issues of low-level character representation (e.g., internal representation) followed by library support to handle normalization and higher-level text processing operations.
True, I should have said unicode code points rather than characters. I believe the fundamentals is that strings should always be sequences of unicode code points, and shouldn't be conflated with byte arrays. The thorny issues of normalization, comparing, sorting, rendering combined characters and so on could be handled with libraries at a later stage.