where I describe an a-la-carte language feature compiler toolchain
Semantics in general
When I was first learning programming, I thought that the syntax of a language (keywords, spacing, how to write functions, &c) was the only thing that I needed to learn. Then I started learning about something called programming language semantics. Semantics in general refers to the meaning behind words and symbols, both in linguistics and in programming language theory.
For an extended example, I took CS50, the Introduction to Computer Science course that Harvard offers via edx.org. In the 2014 version of the course, they taught C as the main language. So, I learned about memory management with
alloc and friends. Then, I taught myself Python after the course, and I learned about the “garbage collector,” and that was my first lesson in semantics. Even though the languages didn’t make a specific syntactic declaration about how memory is managed, the semantics of the two runtimes are quite different.
Semantic Differences Matter
In the many discussions I’ve had about programming languages at meetups and other places, is that although syntax is something that many programmers care deeply about, one thing that really separates languages apart are their semantics. Specifically, their language features that are built behind the scenes, like garbage collection, or a module system, and so on. This is also a source of language envy and innovation. Object orientation is a language feature that adds specific semantics to a computer language; ditto for functional semantics.
In theory, you could have the same syntax but many, many different semantic components, and you’d describe a whole slew of languages and their associated run-times.
What if we made that a feature? What if there was a language built that allowed for multiple semantic components that could be chosen for each program written? It would be similar to Racket’s
#lang syntax, but maybe in a build description rather than each file separately.
Imagine that there was a programming language system where all major language features were multiple choice.
Choice 1: Type Checker
What if we could choose which type checker we wanted to use within the same toolchain? Turn it off completely for simple scripts, turn it all the way up for complex applications.
- Unitype Checker: only report runtime errors, be completely dynamic
- Gradual Type Checker with type inference (with compilation errors?)
- Full Powered Type System with all the compilation errors
Choice 2: Memory Allocation
I’ve heard more than one programmer say that they miss C because of the simplicity of the system. I don’t know if that includes memory management, but I do know that for some instances, garbage collection is not appropriate. So, what if the garbage collector was optional?
- Manual Memory Management, like C
- Garbage Collection, like Python and Haskell
In fact, there could be multiple garbage collectors built into a toolchain, each with different properties. So, you can declare at the start of a project, what kind of memory semantics you want to have, without changing the syntax, and with the type system also being a-la-carte.
Choice 3: Concurrency System
One thing about Python that hurts performance but makes the semantics easier to understand to a beginner in the GIL, which forces Python to be single-threaded. Well, CPython at least. But what if Python was made in this a-la-carte fashion, and projects could decide if they want the GIL or not? Or go even further, to a system like in Erlang?
- Single-threaded only
- Simple Message Passing System
- Something like OTP/BEAM in Erlang
How many “languages” is this?
What I’ve described here could counted as 12 different languages. Or, if built right, could be one language with 12 different operating modes.
This system would require something like 8 different components to the compilation toolchain, in addition to the usual ones, like lexer/parser, assembler/linker, and so on. So, there would be a lot of work. But, the upside is that more people could use the language to fit their own needs. A language where a component could be written as a deep research topic, and allow users to determine if they want to use that component or not?
Specifically, look at GHC, the de facto compiler for Haskell. It has so many different language extensions, and often they come in groups. Instead of having to specify a collection of extentions for type-level programming, there was a “Type-Level Programming” option? So, when you wanted to level up your Haskell, you could turn that feature on? That component could have its own focused documentation on how to use it properly.
Possible Problem: too much work
Would the additional work be worth the benefit? I have no idea.
Is this too weird of an idea?
Is there any merit to this proposed approach? I’ll openly admit it’s a thought experiment.
Comment below to let me know!