Idea, Concept and Philoshophy of the Lapyst Language
So you wanna read about why I wrote that monstrosity? Really? Then prepare you for.... the plain reason I want to cram features into it!!!
A quick rant
You might thought of this as of some kind of joke, but the reality is more or less that: I'm fed up with most of todays languages: they give you freedom in one area, but either lack complete support in others, are just painfull to use, or are clunky beyond recognition. But the 2 worst things of all: firstly plain stupid maintainers that try to compensate their ego with plainly refusing to better their language / libraries because they simply wanna feel like they are once in charge and just wanna have power over people. *cough* golang and curly braces *cough* golang and its "if err" boilerplate hell *cough*.
The other reason are a toxic community; I consider here communities as toxic when they either wanna tell me how to write my code *cough* PEP standards *cough*,*cough* rust unsafe *cough*, or plainly run after everything just "because its writen in language X". This makes nothing better; actually the opposite! Trusting without any hasistation is the first step to becoming ingulfed in someones net of lies and misdirections so you end up as nothing more as a mere puppet.
...and then there was hope
So the only solution would be to write my own language then! I jumped rather quickly in, made A TON of mistakes along the way but also learned a lot over languages, low-level representation of programs and much more!
The basic idea behind lapyst is to create a easy-to-understand language: no overly long or complex keywords, no clunky standard library or tons of objects to implent just to get basic things done. This includes parallelism / asyncronous code, iterators, printing and much more.
I also dont wanna leave some paradigm or way of doing things left behind just because. While the saying in python goes "theres one way of doing it", and perls mantra is more like "hey you wanna a bunch of symbols for breakfest that ALSO can do a shitton of magic things" (not to mention ruby with "we are the fucking magic. dont look at it funny. seriously. do you wanna have Proc magic all over again????"), lapysts motto should be more like "holy shit, the maintainer simply didnt stopped adding new things!".
Thats why lapyst will not only have good ol' OOP, it will have OOP with multiple inheritance, UFCS, external implementation a-la rust, roles like golang's interfaces, and value types; all with Generics ofcourse! And when I'm bored one day even explicit implemented interfaces/roles like Java does! Nothing will be left under the bus, even if it takes it time.
Also: macros! Codegen at compile-time, comptime function evaluation (thats why there will be a full blown interpreter!), to make it a joy to write syntactially pleasing code, while hiding the tedious work behind quick-n-easy to understand concepts, structures and names. There will be even a plugin system to even further integrate into the compiler / workflow of the system to maybe even process files from DSL's or other languages inside your project's build step! All ofc while maintaining a rather modular system so frontends, middle-parts (like lexer, parser, static analyser, passes), and backends (llvm, interpreter, whatever) can be fully plugable so anyone can simply start working with lapyst as a library. This will also be apart of a feature in lapyst feature: being able to compile lapyst code on-the-fly at runtime, to provide support for things like highly optimized regex's, that are only known at runtime.
Just to give you another example of how far I wanna push this: while the basic syntax of blocks / scopes is directly inspired by ruby, I also have plans for the future to allow C-style syntax (aka curly braces), as well as a python/nim like syntax (whitespace dependent). Just because I really hate being told what to write, a language should be adaptable in that regard in my opinion. I mean it's just text, we've managed to write languages that are literally unwriteable because no-one sane enough could keep malbolge's permutation tables in their mind while writing the next big app. Hek, we even are at the verge of the area of AI, we tricked a rock into creating music, voices, novels and even art! A bit text-parsing should be easy-peasy-lemon-squeezy for us!
But will it be ever enough?
The biggest problem is not the amount of ideas; its the time it takes. Im certain I will someday get most of the things to work. Somehow anyway. But one think it will take like nothing other will be time. Even If it stay's in beta for millenias: atleast I kept trying. So will it be a language for you? Propably not. Its a highly subjective language. I would not be surprised if no-one ever uses this; but it's okay. Maybe it will atleast inspire others to create their own language, and thats really price enough for me to be honest.
But what about the name?!?
Since the basic syntax is a lot inspired by ruby (atleast until I add alternatives, like mentioned above!), it was clear I had to pick something other gem-like. I also like shaphires, like a lot. Idk, theres something mystical in that beautifull deep blue that I simply cant describe. So after a bit thinking, I also came across the word "Lapislazuli", primarily from minecraft, but also when searching after other blue gems.
So it was decided: "Lapyst"; created from the short form of Lapislazuli, "Lapis", and quickly changed a few letters and voilà: a new name!
But you also are called Mai Lapyst, thats so confusing!!!! - Yeaaaah. About that: Somewhere in the middle of all that, I needed a new name for my online persona; The old had a lot of problems: old memories, being far to edgy, and other problems like me discovering being trans. Sooooo I needed a new one! "Mai" was very cute, but a bit to short. So I decided to make a fake "real name", aka a name that looks real, but isn't, yk? So for the firstname I picked "Mai", but what about the lastname? Because I was probably tired at that point (like in the literal sense, sleepy and all that), I simply used "Lapyst", and I sticked to it until today. So yeah; not all that surprising after all x3.
The boring part:
History
Lapyst first sprang into live around May 2019 and in the beginning was just called "ownlang" at that time. After a frustrating time with C++ (under windows!) ive re-written the complete parser in like a day in ruby somewhere in September 2019, driven by pure anger about the previous failed attempt. The main factor for the rewrite was that I could simply get away without any lexer and just use regexes all over the place. O boy was I wrong!!
It went well for a year of very sporadic development until around May 2020 (last commit ruby-version) - Dec 2020 (first commit lapyst v3), where I finally got enough of ruby and rewrote it again in C++, but this time with a proper lexer/parser. The majority of the structure of how the lexer/parser/ast works where stable at this point and will remain largely so going forwards. But eventually this iteration came once again to its demise around December 2021, mainly because I had the very stupid idea to save memory by not storing the token's value into memory, instead only the offset and length of it. This lead to a bunch of calls to char* Token::getValue()
which allocated memory, read the token from the disk and gives the ownage to the caller. Which inturn lead to a bunch of getValue()
combined with free()
calls, sprinkled all over the codebase everytime I needed to actually look onto the value of a token (which is a bit more often than I realized).
One extra funny sidenode for all these attempts: I rarely commited and only uploaded huge "Update" commits which a ton of stuff, which lead to this repos only having a couple commit (v1:3, v2:13, v2:4).
Then came 11. December 2021, where I started the current run of implementing lapyst, and nearly 2 years later (October 2023), it's still going strong with over 900 commits to date. As before, the structure of streams / lexer / parser / ast was largely keept from the last iteration. The first change was ofcourse to directly store the token's value inside the token as well as using cxxspec as a testing framework to test the basic structures as well as lexer and parser in the beginning. In retrospect, this is honestly the largest lesson learned in all these failed attempts:write your goddamn tests!. Seriously. Espcially if youre thinking like me its just unnecessery bloat! It will help not only uncover bugs but also will remind you to check ALL of your callsites when a method / structure / class changes.
But atleas, even tho the iteration begans strong and goes strong regardless, it wasnt without hickups. Around July 2022 - to Mar 2023 I had several bugs in my compilers architecture, again... Mostly AST related this time: I thought it would be really good to just make a basic Node
and extend literally EVERY OTHER GODDAMN ASTNODE from it. Let me tell you: biggest. fucking. mistake. Seriously. It was so bad I was in a month-long hiatus, where the only way out seemed to once again throw everything away and rewrite it from scratch. But not this time! This time I decided after much thinking to just push through, and glad that I did! After sensibily restructuring the AST, now with proper Base-types for declarations, statements and expressions, the type system took the work from me to always ensure I had the correct nodes, I just could use the correct basetype and all was good!
The next issue I encountered was Ownage of the value of Tokens.... - ah yes tokens my old friend, so we'll meet again!!! - while I now had the token always in memory, the problem rather was who owns the token, and to that extend the tokens value? In the beginning I really didnt paid much attention so in quite a few places of the AST a Token*
has wound up. This was a problem, since originally the token was owned by the TokenStream
, where the lexer would store all created tokens until consumed by the parser. After the parse step I would liked to free the TokenStream
to reclaim unneded memory since the majority of tokens (i.e. keywords, whitespaces etc.) where of no use more. But the problems where the few remaining ones, primarily literals and identifiers. The solution was simple: just another qick hiatus where I questioned if I just wanna trash the iteration and just start over... but I had no use: restructure the AST once again to store the value we need directly, this means literals copied their data (or directly parsed them into their value, like numerals), and identifiers got a own type to store the token's value inside them. The optimization I added here back was that I allowed a Identifier to move the value out of an token. This was possible, since I only created Identifiers in the parser once I was sure that the complete declaration or whatever was parsed, and no backtracking was neccessary anymore.
But sadly this was not the last time; having all your data, its encapsulation and so on in your mind can be hard. For a more recent example: until like 2 weeks (Sep 2023), I had on some of the AST nodes following field: void* extra;
. It was used to store "extra" or "additional" data that the LLVM backend needed. While this seems like a good idea on paper, in reality it was not. Since freeing them was a nightmare I always put off with the argument "can do it when I release it". TL;DR: I restructured again, moved all those extra data into vectors and maps into the LLVM compiler (or more specifically the LLVMContainer
, which I already had for llvm related cacheing or helpers), so they can be all managed with type-informations so they are better free-able, the AST is decoupled from random data from some backend, and overall cleaner code!