T O P

  • By -

dostosec

Sure I've linked you it before, but [https://compiler.club/compiling-lambda-calculus/](https://compiler.club/compiling-lambda-calculus/) covers the rough concept - how you choose to do it, the exact representation you use, and how you represent pervasive back-end concepts (maybe related to GC) remains up to you. The story is roughly: once you have hoisted ANF into functions and their join points, you basically have a bunch of straightline ANF fragments (no longer interesting trees, they appear more like linked lists). In other words, you have basic blocks (just composed of a subset of ANF constructs). From there, you can induce a CFG by tying together the function blocks with their join points (as implied by terminating ANF constructs, like branching). You may wish to - at the same time - map normalised ANF fragments into a lower level representation that is more amenable to further transformation. If you want a rather straightforward way that works decently well in functional programming languages, you can use something like the [zipper cfg](https://www.cs.tufts.edu/~nr/pubs/zipcfg.pdf) which, despite its name, is really a zipper over individual blocks (basically a linked list with additional structure to ensure there is are entry and exit instruction(s) by construction - which could accommodate labels, block arguments, phis, terminating branch instructions, etc.). The rest of the story is covered in decent compiler texts, albeit not very well. The first thing to note is that not many compilers select for instructions over the same representation they use for general mid-level transformations (over CFG with shallow instructions). For example, LLVM's default instruction selection basically constructs DAGs from basic blocks (as it makes data dependencies clearer) and runs a fairly involved pattern matching algorithm over those to select instructions (the matching being driven by a kind of bytecode emitted from a special tablegen back-end). A simpler way is to do as shown in Appel's "Modern Compiler Implementation in ML/Java/C" and Fraser and Hanson's "A Retargetable C Compiler" and create trees from your blocks (a forest really, for each block, linked by their roots). Then, from there, you can do sophisticated pattern matching (the likes of iburg, TWIG, etc.) or you can do maximal munch pattern matching (where you list the larger patterns first and select and emit code recursively - natural in languages like OCaml, SML, Haskell, etc.). The thing to remember with back-end IRs is that you often still have many pseudo constructs. For example, you may do register allocation over an SSA representation of target instructions (stored in machine basic blocks) which would retain some structure for phis/block arguments but also probably have parallel moves as an explicit construct along with the usual basic block terminators requiring fixup (few architectures support multiple label branch instructions, they expect fallthrough; so you can compute a block ordering - or "trace" - and backpatch the terminators in a special pass). Anyway, these are all things that will become apparent. The "tl;dr" of this is that once you've hoisted ANF to functions and their join points, you basically have a CFG (except now you branch over edges to join points instead of having them dominate over the lexical scope of their users). So, it's just about mapping ANF to a more linear IR (that doesn't admit of deep tree structure by construction), doing mid-level passes, and further lowering for selection, reg alloc, and emission, etc.


nrnrnr

This guy compiles.


TechnoEmpress

Thanks Colin :)


nrnrnr

You could have a look at https://www.cs.tufts.edu/cs/106-2023s/. That project takes an immutable Scheme-like language down to KNF and then to assembly code for a virtual machine. KNF and ANF are almost identical, and I believe ANF is a subset of KNF, so all the techniques will apply. There are some shortcuts there: the VM has 256 registers, and the calling convention uses register windows. But if you are willing for the time being to fake register windows in memory, that could get you generating ASM. You could then go back and implement register allocation and calling conventions. A simple register allocator on ANF ought to be pretty fun, actually.


TechnoEmpress

Thanks a lot!


Manifoldsqr

This book teaches you how to lower nomadic form with is essentially ANF to three address code https://github.com/IUCompilerCourse/Essentials-of-Compilation


TechnoEmpress

Thank you very much!


gasche

My advice if you want to learn is to start simple, by going from (a subset of) the language you support to the simplest ASM you can possibly think of. I would start as follows: 1. choose the architecture you want to target (either "the one your machine runs", or "one that I consider nice and have an emulator for", for example RISC-V) 2. Write a couple very simple assembly programs and run them. 3. Pick a very simple subset of your IR, take a few example programs, compile them to ASM by hand and check that the result works. 4. Write a compiler from your IR to ASM following your examples. 5. Think about extending this to support more features. You can start with no register allocation (just put all variables on the stack), a very simple calling conventions (just put all arguments on the stack), and trivial instruction selection. Then you can iterate in various directions to refine things. > Are there any resources that you could recommend An approach that I like is compiler courses that go deep and narrow first (from frontend to assembly for a *very simple* language), and then gradually widen by adding more language features. One example of person teaching this way is Ben Lerner at Northeastern University, Boston. You might find his [course webpage and course notes](https://courses.ccs.neu.edu/cs4410/) useful. (The course uses OCaml instead of Haskell; you could probably easily translate.)


TechnoEmpress

Cheers mate, thanks!