Arc Forumnew | comments | leaders | submitlogin
4 points by digitalis_ 3054 days ago | link | parent

If (when!) Arc becomes the 100 Year Language that we all want it to become, it's various libraries will become rich and numerous.

On the one hand, I don't want my .arc files filled with boilerplate code to load the libraries I want; on the other, I don't want a huge number of functions to be loaded that I won't actually use.

I feel that autoloading is the perfect compromise: granted, the 'autoload pointers' will have to be loaded (by default and automatically -- otherwise that's just more boilerplate) -- but this yields far less overhead than loading the actual functions.

For big programs, pg's dead right: far more of the speed will come from optimising the program -- however, for small, sysadmin-type programs, where your program is so small that your algorithm is already optimal, but the actual problem is quite large (e.g. a gargantuan directory structure), the speed comes from a tight language. (And you'll write far more of these microprograms in your lifetime than big, optimisable programs.)

Please don't get me wrong: I agree with you wholeheartedly that there should be a profiler in Arc, and that this should be used to optimise Arc --

And I don't think it needs to be an either-or.



5 points by zck 3054 days ago | link

>...I don't want a huge number of functions to be loaded that I won't actually use.

What's the concern here? I don't really care how many functions are loaded, even if I don't use them, because it doesn't affect me.

Now, startup time? That effects me. But I don't know how much of the startup time is because of loading more functions.

Alternately, name collisions? I don't see that as a giant issue, and isn't solved by autoloading anyway.

> And I don't think it needs to be an either-or.

I agree. I'm just not all that sure what autoloading gets us. I'm not against it, per se, I just think it's something moderately complicated that might not help us get what we want.

Edit: Also, welcome! Glad to see someone new here. Feel free to contact me off-forum if you like; my email's in my profile.

-----

4 points by digitalis_ 3054 days ago | link

I think I've been misunderstanding something here, and your posts have cleared this up for me. (So, thank you!)

For some reason (and believe me, it seems stunningly obvious to me now), I was under the impression that loading a load of functions into memory would somehow not only affect startup, but also the speed of programs after startup. (ik, ik) I think I've sidetracked myself from my original goal of discussing ways of getting rid of typical '#include'-esque boilerplate.

Though my argument about already-optimal small programs (which have a large workload) might apply to compiler optimisations, it has little to do with autoloading.

I think autoloading would become useful if startup time were pushed up to anything more than a second -- after that, I think most people (that is, I) get quite angry!

Again: thanks for clearing this up for me.

Moving on from autoloading: how are name collisions dealt with in Arc? The only thing I can think of is the old elisp "prefix all yer functions with the package name" ideology. (So the functions of the string library would be called `string-foo`, `string-bar-baz-qux`, etc.) This'd be important, given that (one way or another) ALL THE FUNKSHUNZ are loaded.

-----

4 points by zck 3054 days ago | link

> For some reason (and believe me, it seems stunningly obvious to me now), I was under the impression that loading a load of functions into memory would somehow not only affect startup, but also the speed of programs after startup.

It might! I wouldn't heavily bet on it, but I don't actually know. We certainly could^1 test it! We probably should! But if it doesn't change it much, I think the autoloading isn't worth the complexity cost.

> I think I've sidetracked myself from my original goal of discussing ways of getting rid of typical '#include'-esque boilerplate.

Yeah, that happens. They're certainly both worthwhile conversations.

> I think autoloading would become useful if startup time were pushed up to anything more than a second -- after that, I think most people (that is, I) get quite angry!

Yeah. Certainly right now I wouldn't write something small that I want to use from the command line in Arc (e.g., a Unix utility), because of the runtime. It would be nice to be slower.

> Moving on from autoloading: how are name collisions dealt with in Arc? The only thing I can think of is the old elisp "prefix all yer functions with the package name" ideology. (So the functions of the string library would be called `string-foo`, `string-bar-baz-qux`, etc.) This'd be important, given that (one way or another) ALL THE FUNKSHUNZ are loaded.

Defining a function with the same name as an old one overrides the old one's behavior. So yeah, you either hope nothing collides or you prefix it somehow^2. This is less of an issue in Arc than it is in Emacs both because fewer functions come in the box, and because fewer libraries are pulled in. But certainly collisions can/will happen.

[1] The obvious way I think of is by removing a bunch of unused things from various files that make up Arc, and then see if either startup time, or runtime of a sample program changes.

[2] In my emacs code lately, I've been preferring using a slash, like zck/do-some-stuff. At one point, 5% of the characters in my minesweeper implementation were these prefixes. (https://bitbucket.org/zck/minesweeper.el)

-----

4 points by akkartik 3053 days ago | link

"Defining a function with the same name as an old one overrides the old one's behavior. So yeah, you either hope nothing collides or you prefix it somehow. This is less of an issue in Arc than it is in Emacs both because fewer functions come in the box, and because fewer libraries are pulled in. But certainly collisions can/will happen."

Just to add to this, and at the risk of boring the veterans here to death by repeating myself :)

Reading between the lines of the implementation, I think the Arc way of dealing with the possibility of collisions is to not worry about them until they happen. When they happen Arc will show you a warning:

  *** redefining foo
And it's up to the programmer to manually decide how to fix the collision.

This makes a lot more sense than adding prefixes because the names chosen are in the context of the application being built rather than some one-size-fits-all name chosen by a library. That makes them likely to be better names in the context of the application.

Avoiding prefixes also keeps most names from becoming unnecessarily long when they don't need to be.

It would be tedious to have to do this if you have a lot of collisions. But that's a good source of pressure to keep the codebase small and minimal and not pull in a lot of libraries, as people are wont to do in Ruby or Node. Any sort of namespacing mechanism encourages programmers to take less responsibility for their code.

-----

4 points by digitalis_ 3053 days ago | link

[There doesn't seem to be a reply button under your last reply -- we've possibly hit a maximum depth -- so I'm replying here. Sorry!]

I'm talking about "a namespace mechanism" here, without actually knowing how one in Arc would work.

Just thinking about it for a moment, I think you'd set the prefix

  (= namespace-prefix 'awesome-package)
and then there'd be some symbol (in the textual sense) that you'd put in front of all the library functions; maybe a slash.

  (def /awesome-function (...)
    (...))
Or, instead of setting the namespace with a variable, you'd have

  (w/namespace awesome-package
    ...
    (def /awesome-function (...)
      (...))
    ...)
[I think the latter's probably better -- it's clearer where the namespace ends.]

In the end, you have to have some way of avoiding conflicts -- and all of these boil down to tacking something on the front (math.sqrt in python, for instance); I (personally) would rather have a better way of doing this than manually typing out `awesome-package` at the start of every function/macro/variable/whatever.

(Though we may eventually have to "agree to disagree" on this, I think the discussion's worth it!)

-----

4 points by akkartik 3053 days ago | link

Yup!

The trick for replying deep in discussions: click on the 'link' of the comment you want to reply to. news.arc hides links to replies below some depth for some period of time to discourage flame wars. You can see the code for this at https://github.com/arclanguage/anarki/blob/15481f843d/lib/ne...

-----

2 points by digitalis_ 3053 days ago | link

Thank you for the enlightenment! (I promise not to abuse this power.)

-----

4 points by digitalis_ 3053 days ago | link

Interesting...

I think, given that Arc is supposed to be "a language for good programmers" [1], that it's silly to impose restrictions like withholding a namespacing mechanism to encourage a certain type of programming.

Though I like keeping codebases minimal, I also like freedom.

[1] http://paulgraham.com/design.html

-----

3 points by akkartik 3053 days ago | link

To my mind it is namespaces that add restrictions. Arc's warning about redefining names can be ignored, but a namespace mechanism is usually more insistent about violations.

I try not to use words like 'freedom' in these discussions because navigating by them is a subtle business. While having a feature can sometimes provide freedom, in general I tend to assume that features cost freedom. Like the saying that your possessions own you. A namespace mechanism is more code to write, more places for bugs to be hiding in, more errors for the user to run into, more places for one programmer to hide things where they can trip another programmer up. All these issues are places for degrees of freedom to be used up rather than created.

So yes, we are absolutely in agreement that Arc is for good/experienced programmers. But there are multiple visions out there for how you go about helping experienced programmers. My vision is admittedly unconventional, a minority view. It's possible that Arc just doesn't have namespaces because the authors haven't gotten around to implementing them yet. But somehow I doubt that.

-----

2 points by Pauan 3036 days ago | link

Actually, because Node.js has a proper module/namespace system, people are instead encouraged to create a lot of micro libraries (e.g. less than 200 lines of code).

So rather than importing a single big library in your application, you would instead import a lot of micro libraries (each of which is versioned separately, and might have their own dependencies).

There are pros and cons to big libraries, and pros and cons to little libraries. I think languages should have good support for both.

-----

1 point by akkartik 3036 days ago | link

Yes, I love the trend towards micro libraries because they encourage people to pull in precisely the functionality they need. That aspect of Node is not part of the problem.

-----

2 points by Oscar-Belletti 3053 days ago | link

I agree that prefixes are not a good solution. And namespaces is more code to write both in app's code and in the arc implementation code.

However, if you use two libraries which make a collision, what do you do?

-----

3 points by akkartik 3053 days ago | link

In principle I don't have a fully general solution yet :/ It's something I'm working on.

What I would like to happen is that my application only contains dependencies it really needs, and that each dependency includes no superfluous/dead interfaces or code. Under these circumstances I would like to live in a world where I can go in and modify the libraries to have different names, with the difference in names making sense in the context of the application. Then I would bundle the application with all its libraries included.

Of course this doesn't scale to large libraries, because managing a fork today involves an amount of work that ranges from non-trivial to intractable. But this would be my ideal.

Past writings on this subject: http://akkartik.name/post/libraries; http://akkartik.name/post/libraries2. My current project which tries to make fork-management tractable: https://github.com/akkartik/mu

Bear in mind that it's only a hard problem for collisions in the interface of the two libraries. Functions that are used only internally can be wrapped inside closures so they're only accessible to the library that cares about them.

-----

3 points by rocketnia 3053 days ago | link

I've been noticing continuities between social code distribution, modularity, and variable scope. A guiding example is code verification:

  Unrecorded reasoning, existing mainly in our minds.
  -->
  Codebases dedicated to proofs or tests.
  -->
  Proofs or tests located in the codebase they apply to.
  -->
  A type/contract declaring a module interface.
  -->
  A type/contract annotation for a function definition.
  -->
  A type/contract annotation for an individual expression.
  -->
  A type/contract annotation for an individual built-in operator, but at
  this point it becomes implicit in the operator itself, and we just
  have structured programming, enjoying properties by construction.
Verification is a simplified version of a build process; it's just a build with a yes or no answer. So the design of a build system has similar continuity:

  Unrecorded how-to knowledge, existing mainly in our minds.
  -->
  Codebases or how-to guides dedicated to curated builds (e.g. distros).
  -->
  Build scripts and docs located in the codebase they apply to.
  -->
  Macroexpansion-time glue code, importing compiler extensions by name.
  -->
  Load-time glue code, importing runtime extensions by name.
  -->
  Service-startup-time glue code, obtaining dependency-injected fields
  by name.
  -->
  An expression, taking free variables from its lexical scope by name.
  (This is a build at "evaluation of this particular expression" time.)
There might be some rough parts in here. I might be taking things for granted that I don't want to, like taking for granted that we want unambiguous named references from one module to another. My point with this continuity is to note that if I don't want named imports, then maybe I don't want named local variables either; maybe tweaks to one design should apply to the other.

And this means that even local syntactic concerns extrapolate to social decisions about how we expect to deal with our unrecorded knowledge. Every design decision has a lot to go by. :)

---

Another exciting part is that I think nested quasiquotation shows us a more general theory of lexical locality. If we're dealing with syntax as text, then locations in that text have an order, and we can isolate code snippets at intervals along that order (and mark them with parentheses). Intervals are partially ordered by containment, so we can isolate code snippets at meta-intervals between an outer interval and multiple nonoverlapping inner intervals (and mark them with parentheses with nonoverlapping parentheses-shaped holes: quasiquotations).

That "nonoverlapping" part seems awkward, but I think there's a simple concept somewhere in here.

With this concept of intervals, I'm considering higher degrees of lexical structure past quasiquotation, and I'm considering what kind of parentheses or quasiquotations would exist for non-textual syntaxes.

A module system deals with a non-textual syntax: The syntax of a bundle of modules. If the modules have no order to them, then we don't even have parentheses to work with, let alone quasiquotation. But they can have an order to them. We can impose one from outside:

  Module A precedes module B.
And anything we can impose from outside, we might want to add as a module:

  Module A says, "..."
  Module B says, "..."
  Module C says, "Module A precedes module B."
This is prone to contradictions and ambiguities. If we can say how to resolve these ambiguities from the outside, we should be able to do so as a module:

  Module A says, "..."
  Module B says, "..."
  Module C says, "Module A precedes module B."
  Module D says, "Module B precedes module A."
  Module E says, "If module C and module D disagree, listen to module C."
  Module F says, "If module C and module D disagree, listen to module D."
  Module G says, "If module E and module F disagree, listen to module E."
This should lead to a very complete system of closed-system extensibility: For any given set of modules, if the set's self-proclaimed ordering between A and B is currently unambiguous, then we might as well listen to it! If we don't like it, we can add more contradictions and disambiguations until we do, right up to and including "Ignore all those other modules and do it like this." :)

With this ability to disambiguate when things go wrong, we can model lexical scope:

  Module A says, "Export foo = (import bar from system {B, C})."
  Module B says, "Export foo = 2."
  Module C says, "Export bar = foo + foo."
  
  Result: foo = 4.
While both A and B have an export named "foo," this conflict is disambiguated by the fact that module A is treating {B, C} as a local scope. I intend this to mean that bar isn't at the top level either.

If we really want access to bar at the top level, we can refer to it again, and we can even be sloppy about it and make up for our sloppiness with disambiguations:

  Module A says, "Export foo = (import bar from system {B, C})."
  Module B says, "Export foo = 2."
  Module C says, "Export bar = foo + foo."
  
  Module D says, "Export all imports from system {B, C}."
  Module E says, "If A and D export the same variable, listen to A."
  
  Result: foo = 4; bar = 4.
If we want, we can have the top-level bar see the version of foo exported by A, even though the version of bar used by A still uses the foo from B:

  Module A says, "Export foo = (import bar from system {B, C})."
  Module B says, "Export foo = 2."
  Module C says, "Export bar = foo + foo."
  
  Module D says, "Export all imports from system {A, B, C}."
  Module E says, "Export all imports from system {C, F}."
  Module F says, "Export foo = (import foo from system {A, B, C})."
  Module G says, "If D and E export the same variable, listen to D."
  
  Result: foo = 4; bar = 8.
Not easy enough to extend? Define some structure. Write modules that assign folksonomic tags to other modules or themselves, and then refer to the system of all modules with a given tag. Write modules that act as parentheses, and write modules that determine enough of an order to decide which modules those parentheses contain. Here's an example of the latter:

  Module A says, "Export foo = (import bar from range R1)."
  Module B says, "Export interval R1, and begin it here."
  Module C says, "Export foo = 2."
  Module D says, "Export bar = foo + foo."
  Module E says, "End interval."
  Module F says, "These modules are in order: B, C, D, E."
The flexibility is obviously really open-ended here, and it's going to be a challenge to make this a well-defined idea. :-p

-----

2 points by Oscar-Belletti 3053 days ago | link

>What I would like to happen is that my application only contains dependencies it really needs, and that each dependency includes no superfluous/dead interfaces or code.

Do you want to avoid the situation, wich happens in c, where when you need the sqrt you have to include the whole math file? I totally agree.

>Under these circumstances I would like to live in a world where I can go in and modify the libraries to have different names, with the difference in names making sense in the context of the application

This looks right to me. Perhaps it could be something like python's

    from library import function as good_name_for_your_project
>Then I would bundle the application with all its libraries included.

I'm not sure making a (even not full) copy of a library is a good idea because it would lead the user to have many copies of the same libraries. On my windows machine I ended having 4 versions of python! I think that common parts should be in common.

-----

3 points by akkartik 3053 days ago | link

> Do you want to avoid the situation.. where when you need the sqrt you have to include the whole math file? I totally agree.

Yes, definitely. In the Javascript world it's called tree-shaking: https://medium.com/@Rich_Harris/tree-shaking-versus-dead-cod...

> from library import function as good_name_for_your_project

What's happening here is that you're a) adding a feature in Python to support 'from..as', b) including an external library and c) continuing to keep around an old name that you don't really care about. You're essentially preserving the old name just because other people who your application doesn't care about use it.

Imagine a world where maintaining forks was tractable. Would this still be a good idea? Why not just do a search and replace and maintain a private fork, eliminating all this complexity in your private stack? Just delete 'from..as' from your private Python! :o)

> I'm not sure making a copy of a library is a good idea because it would lead the user to have many copies of the same libraries.

Yes, this is a fundamental difference in outlook/ideology. I think that copying isn't always bad. We culturally tend to emphasize the issues with copying a lot more than the costs of avoiding duplication.

A degenerate example is to observe that there are tons of 'e's in the novel I'm reading and try to deduplicate them. That is of course obviously farcical, but it at least serves to illustrate that there's a trade-off, and that always DRY'ing your code isn't obviously a good idea. Another example is to observe that the internet has many copies of the same libraries running at any given time. You can argue that they're on different machines, but then imagine a 'machine' consisting of multiple cores and private caches and non-uniform memory access and RAID-partitioned disks. Changing latency costs can make it reasonable to maintain multiple copies of some immutable data in a single 'machine'. Now consider that development is yet another cost that is open to variation. If (automatically) creating copies of something eases development, it's at least worth considering. For example, optimizing compilers can sometimes specialize a function differently for different callsites. That's duplication often inside a single binary, and it makes sense in some contexts.

The npm eco-system promiscuously duplicates dependencies inside the node_modules/ directory, so that is at least some evidence that the approach I'm suggesting isn't too insane :)

-----

2 points by Oscar-Belletti 3052 days ago | link

Ok, this maybe could be the way to go. Adapting little libraries isn't a problem, and it probably makes your program better. This defeats collisions, useless code and is ok for autoloading. But this approach will work only if our libraries will be small enough. For now this is ok.

Duplicating libraries isn't a problem: disk space for ease of development is an exchange which is getting more and more convenient.

For autoloading: the interpreter/compiler could load all .arc files in current directory (or current-directory/lib), or scan them for function definitions (without loading them) and making elisp autoload automatically for every function. I prefer the first option.

-----

2 points by digitalis_ 3052 days ago | link

One possibility for this bundling is that Arc looks first where it would expect a library to be (in an equivalent of npm_modules), then looks for it in the usual place (/usr/lib or wherever).

Or, if it all needs to be bundled, you could have symlinks for the libraries you don't change.

What do you think?

-----

2 points by digitalis_ 3053 days ago | link

Thing is...this is about as verbose as you can get!

If a name's already good, you're not going to change it; if it's bad, you should push that change upstream! (If the name's bad, it's likely that the original author didn't put much time into choosing the name, so I think it would be fairly straightforward to get that merged.)

[As much as I love this idea of implicit importing, I'm sure the explicit side -- which'll let you change whatever names you like -- will need to be there as well. So we can all chill.]

-----

3 points by rocketnia 3053 days ago | link

Quality of a name is relative to a purpose. The more public we go, the more meanings compete for a single name, making us resort to jargon. If a language really only uses homogenous intensional equality, being able to call it = is a relief. If someone wants to build a side-by-side comparison of several versions of an extension, they might prefer for some of the names to be different in every version while others stay the same.

But it's not just names per se. In that side-by-side comparison, they might also want to merge and branch parts of the code whose assumed invariants have now changed; invariants can act as Schelling points, like invisible names. Modifying code is something we do sometimes, and I think akkartik wants to see how much simplicity we'll get if everyone who wants a simpler system has the tooling support to modify the code and make it simpler themselves.

Personally, I find it fascinating how to design a language for multiple people to edit the code at the same time, a use case that can singlehandedly justify information hiding, modules, and versioning. But I think existing module systems enforce information hiding even more than they have to, so that in the cases where people do need to invade that hidden information, they face unnecessary difficulties. I think a good module system will support akkartik's way of pursuing simplicity.

But... my module system ideas aren't finished. At a high level:

- You can invade implementation details you already know. You can prove this by having their entire code as a first-class value with the expected hash.

- You can invade implementation details if you can authorize yourself as their author.

-----

2 points by akkartik 3053 days ago | link

"If a name's already good, you're not going to change it; if it's bad, you should push that change upstream! (If the name's bad, it's likely that the original author didn't put much time into choosing the name, so I think it would be fairly straightforward to get that merged.)"

Not necessarily. 'Good' and 'bad' are not absolute, they are extremely contextual. A name that is good for a general-use library might be sub-optimal for your application, or vice versa. Subjective taste is also a thing. So while you should certainly send out a pull request for the change, our model of the world shouldn't rely on the change actually getting pushed.

In general it is amazing to me how often a blindingly obvious Pull Request gets rejected or just sits in the queue, untouched. There's lots of different kinds of people out there. Which is why I tend to think more like a barbarian[1] about collaboration: think of other people as islands with whom you might collaborate if the stars align. But don't rely on the collaboration. Be self-sufficient.

[1] http://www.ribbonfarm.com/2011/03/10/the-return-of-the-barba...

---

"As much as I love this idea of implicit importing, I'm sure the explicit side -- which'll let you change whatever names you like -- will need to be there as well."

I actually interpreted your original post that kicked off this thread as implicit loading since Arc has no notion of modules or import. So the question of changing names did not arise. That seemed like a tangent to the original question.

These seem like separate questions:

1. Should Arc know how to react with implicit symbols?

2. Should Arc provide namespaces?

One the one hand, you can have implicit loading without needing a module/namespace system. On the other hand, I don't see how you can have implicit loading in the presence of namespaces. Without the "from..as" construct how would your system know which library to load a symbol from, if there's a collision?

Summary: even if you have namespaces, you're still going to be doing your own collision-detection if you want implicit loading. What's the point of a module system then?

-----

3 points by digitalis_ 3053 days ago | link

Is there a naming scheme for Arc? From reading the stuff that's already out there, I found `w/uniq`, which I suppose is shorthand for `with-unique` -- which means that, for consistency, all other "with"-type functions/macros should be "w/".

For example:

string-to-number, or string->number, or string->num, or str->num, or ston, or ...

1. Whatever's chosen needs to be consistent (i.e. it should always be "str" -- or "string" -- but not a mix).

2. When choosing, readability and terseness need to be balanced.

(I have a feeling this is really a new discussion, and if this does start to kick off, I'll start a new thread.)

[BTW, I do prefix my stuff (that I don't share -- so init code) with "daio/"; but if I was going to put it in a package, I'd use the package name and a hyphen (so "minesweeper-sweep", or whatever).]

-----