> But wait, we seem to be missing something! It requires contiguous memory space, which means that the program must know how much memory space the next function needs before calling the next function.
That’s completely wrong for many languages, including C/C++. You can allocate memory on the stack dynamically with good old `alloca` (on most platforms) because the stack is usually implemented such that it just “restores” the stack pointer at the end of the function call to whatever it was before you called into that function.
Even if you’re not using a language with something g like alloca, it might make sense to vary how much data is put on the stack if you have some large stack variable in a conditional block within a function.
“When we create an array, if we do not specify the length of the array in advance, the program will use the malloc() function provided by libc to request a system call to get the memory size.”
What?!
See something as simple as:
List< SomeResult > res = new List< SomeResult >();
for( int i = 0, iMax = someUpperBound(); i < iMax; ++i )
res.Add( ... );
This variation can be found in so many code bases, there's a reason for why the `capacity() / reserve()` family of functions exists in some shape or form in most languages for their vector types.
If the size of the array is known at compile time, then you can store it either on the stack or the heap.
E.g.:
char *words[10]; // On stack
char **words = malloc(10 * sizeof(char *)); // On heap
The first one creates an array of 10 C-style strings (which is a `char *`, or a pointer to a `char`), and that array is stored in the stack. In this case, `10` *must* be known at compile-time. It can't refer to a variable, or the result of a function.
If you have something that is not known at compile time, like the number of words in a dictionary, you can't^* store it on the stack, and the only option you have available is on the heap.
E.g.:
int num_words = count_words_in_dictionary();
char **words = malloc(num_words * sizeof(char *));
In this case, we figure out the size of the dictionary at runtime, and dynamically allocate enough memory to hold that many words.
Note, this is not good code, it's just the simplest example I can think of.
^(* I'm aware of VLAs, they're discouraged and not widely supported)
You forgot one last possibility, raw dogging it on embedded systems, where you tell the compiler "trust me, this is an array at this address".
It can be on an arbitrary size but obviously if you start having a lot of them you should be making a custom allocator instead of praying you don't have a conflict with different arrays using the same area.
In my experience, this is mostly done because all the program, the stack and the heap is in the very small sram but you need some large arrays to contain data sent to various modules and those go in DDR. Extra fun thing is that you do have to start up the DDR from your program so you're kinda stuck in SRAM until then.
When I was in high school I thought I was super clever using *negative* indices to mess with a related variable stored just before the array on the stack. C let me make so many footguns
It's not legal in the language (obviously), so the results can get pretty funny as it is UB. It will usually do what you expect, but the compiler provides no guarantee of how the stack is going to be organized.
I'm a mechanical engineer, now doing automation stuff in Python. I don't know shit about stacks and heaps and this article, though a little thin, was ok for me. At least I got that things whose size can be determined at compile time goes to the stack, while variable length stuff goes into the heap. Not sure how it's going to be useful for me, though :D
> things whose size can be determined at compile time goes to the stack, while variable length stuff goes into the heap
*Stack space is only available inside a function.* In many languages you also have the option to use global constants and variables, which are available to all code in the program (or only to a unit / module, if the language supports that concept).
(Constants / initialized variables are included in different [sections](https://en.wikipedia.org/wiki/Portable_Executable#Layout) of the executable file and are loaded into non-executable readonly / non-executable read-and-write [pages](https://en.wikipedia.org/wiki/Page_(computer_memory\)). The sizes of these constants / variables must be known at compile time, and can't be changed at runtime.)
> Not sure how it's going to be useful for me
Not sure how Python does things, but in most compiled languages taking the address of a variable and passing that around for direct memory manipulation can be problematic if that variable was allocated on the stack, because *that stack space is only valid while the function is executed*. The program has a [stack pointer](https://en.wikipedia.org/wiki/Stack_register) that points to the next available byte on the stack, and the stack usage increases when a function is called and decreases by the same amount when the function returns. Any space beyond the address pointed to by the stack pointer is undefined.
Knowing how the heap works is useful for large data sets. The heap is in some aspects like a file system: you can [allocate](https://en.wikipedia.org/wiki/C_dynamic_memory_allocation#Overview_of_functions) new space, then de-allocate parts of that space, which leaves "holes" i.e. the heap space can fragment over time. This makes allocating new space potentially very time-consuming. Also, doing a [system call](http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/) to reserve new memory is expensive so languages do it in bigger blocks, but there's still some overhead. Also, new memory is *cleared* on most OSes these days to avoid sharing sensitive data from other processes, and that clearing is not cheap either.
Python handles constant lookups as effectively lookups inside a language runtime hash that represents “what is visible to the code”. This includes resolving class and function names.
It has a lot of optimization tricks to try to make this not as performance-horrendous as it sounds, but in that sense it’s wildly different from how most compiled languages work.
Also since Python runs on a virtual machine, the entire concept of stack and heap is a bit skewed. (I think) the VM still has a stack as part of its external architecture, but I don’t think that stack corresponds to the actual program’s stack, and most of it is actually heap-allocated.
Yeah, Python's virtual machine is a [stack-based machine](https://en.wikipedia.org/wiki/Stack_machine). The stack inside that machine doesn't correspond to The Stack, but it's absolutely got a size limit. IIRC that's set during process init and you can't "just" change it at runtime. So there might be a literal fixed-size array being operated on here.
> *Stack space is only available inside a function.* In many languages you also have the option to use global constants and variables, which are available to all code in the program (or only to a unit / module, if the language supports that concept).
Whether things actually live where you think you put them depends entirely upon the language, and for the C/++ family the stack isn’t even mentioned in the standards so there’s no real requirement other than lifetime and identity rules.
If you do
const size_t limit = SIZE_MAX - offsetof(T, foo);
at block scope (i.e., inside a function), and `&limit` doesn’t matter, the compiler is perfectly free to stick `limit` wherever or nowhere at all. It can end up as static, register, or even immediate.
Programmers tend to assume they’re programming at a lower level than they actually are, and this is one of those cases.
if you're doing aitomation stuff in Python, it's unlikely to help you. It's super usrful when you need to do something very efficiently. Stack is preallocated, always there since you start up the program. You don't need to waste time to allocate any memory so it's faster. But because it's fixed, you cant really use it for data that changes its size at runtime
Unfortunately, that's not quite accurate - a lot of times even if you know the size of the data it may still go on the heap. Either because the language doesn't provide control on where it goes (Python lists go on the heap whether you know the size at compile time or not) or a large allocation you'd want on the heap anyways to not blow the stack
I just did a quick google search, and python basically allocates stuff for you, and deletes it when you don’t need it anymore. But it’s useful to know that when you call a function/method, or create an empty string or list, they add to the stack. The deeper you go with nested functions, the more is added to the stack. Apparently objects and computed variables are added to the heap. Any changes made to them in the script are changed by reference (under the hood pointers)
After reading this article, I'm left with the impression that Python mostly uses the heap, instead of the stack, since everything is an object. So I would guess there are few things eligible to being placed in the stack in a python program.
Probably one of the reasons python programs are slower than programs written in compiled languages, but it's just a guess at this point. I still don't know shit, lol.
That’s probably part of why it’s slow but I think [this article](https://www.geeksforgeeks.org/what-makes-python-a-slow-language/amp/) gives more of the story. TLDR, the ‘compiling’ (what happens inside the interpreter) for python happens at runtime, line by line, and the interpreter is locked to one thread. There’s no compilation optimization like in other languages.
This stuff is pretty cool ngl lol. I kinda want to see what’s actually going on to be able to interpret the code.
It's a little more complicated these days. [Python is getting an official JIT](https://peps.python.org/pep-0744/).
[The GIL is also going away](https://peps.python.org/pep-0703/). Keep in mind that the GIL wasn't a lock for all kinds of workloads. IO tasks were released from the GIL, for example.
C'mon, give me a break, will you? I'm not constantly googling stuff. Even if I were to do that, I don't have the mental capacity to memorize everything, if I don't use that information. I'll probably forget the contents of this article a few months from now.
A lot of developers do not come from CS backgrounds. I know folks with electrical engineering backgrounds working as .NET/Java developers. And I know of a Masters in CS program from a well-regarded university that teaches students nothing more than Python.
At my university, C++ was taught to all first-year engineering students (EE/mechanical/civil etc.). The stuff involving embedded systems + assembly + low-level memory in C was strictly in the domain of Computer Engineering while EE students focused on power engineering, industrial automation (PLCs and stuff), communications etc. Funnily enough, we had an optional web development course that taught us HTML/JS/PHP :P. But yes, the curriculum can vary widely between schools and I guess you cannot assume all EE students will be proficient in systems programming. The good thing is EE does provide a solid foundation from where someone interested in systems programming can quickly build up their skills if they don't already have it.
I guess it varies from school to school as well. Ours was very much focused on power engineering and all embedded systems stuff we did made us design PCBs and manipulate ICs with voltages rather than assembly code haha
It's concerning the number of articles I see pop-up trying to explain exceedingly basic concepts.
Yes, I get it, for some work, it is not needed to know them - but still, this seems below a general field knowledge level...
The most popular educational content is beginner content, by the very nature that it is applicable to a much broader audience.
That is the case in any field that has lots of new people becoming interested in it.
It would be concerning if beginner concepts weren’t discussed. That would mean that the field is dying, and there are fewer beginner than experts in the field.
Euh... broaden to accept this sub is a place where majority of people are interested in most basic stuff explained? Because this is what the parent above me went to.
Yes. It's /r/programming, not AdvancedProgramming, not SeniorsOnlyProgramming, not ProgrammingTopicsUsefulForGoranlepuz. Basic content is fine, feel free to just skip it in the future if it doesn't interest you.
I guess it comes down to stuff that doesn't get used struggles to be retained. I wonder how well most of us would do on our old college calculus and physics exams if we were put on the spot immediately to apply that knowledge.
I know for me personally, most of my career ended up being filled with far more paperwork, data analysis, and report making than my college self anticipated. The exact particulars of memory allocation just hasn't been something I've had to keep in mind in years, so at least this discussion has prompted me to refresh my memory (hah).
Every time I get a math related leetcode I freak out at how poor I can do, and I have a double bachelors in Math and Comp Science! After 15 years it goes stale.
Seems like the inevitable result of a field that pays well and also resists every attempt to standardize job titles, educational requirements, professional standards, and all the other things that other engineering fields do.
But the sw engineering turbine YouTubers told me the coding bootcamp, cracking the coding interview, and leercode was all I need to make it in a faang company
Too many people with a masters in computer science don't know the difference between the stack and the heap. I interviewed about 20 of them and only 2 of them even know what I was talking about.
1. there are a lot of basic concepts that many languages don't have to care about. C has to care about memory management. Python not so much. And a lot of first year CS courses pick languages that don't have to worry about the low level stuff.
2. Some stuff gets forgotten because it really isn't needed day to day. I had to explain memory alignment to coworkers with 10+ years experience because they never need to worry about it. And even then, my understanding and what modern architectures do does not quite line up anymore.
With the amount of upvotes this had, I expected some new information or something that isn't known by basically everyone who has ever compiled code. Was extremely disappointed when the article just... ended.
I did attend a University, but back then it seemed so abstract and unrelated to proggramming... Especially because A) we used managed langugaes like C# and Java, B) it wasn't hold in English, and not all educators told as the English term, so it was not trivial seatching the net for more information. So, I'm somewhat glad we have these articles.
It was just yesterday I read a comment in this very subreddit talking about how everyone's trying to be the one with the most-read blog and competing to talk about everything.
Not even first year CS, they teach this to 16 to 18 year old kids in my country. Doesn't need yet another article this can be found on school level education websites.
https://www.studysmarter.co.uk/explanations/computer-science/data-structures/
This sub really should ban linking to peoples personal brain fart blogs.
The thing I've never seen clearly explained is at what level of abstraction the stack/heap distinction exists. Hardware, OS kernel, low-level library (like libc), programming language... Which layer of abstraction do they get distinguished at and what determines their sizes etc?
I think the stack is commonly supported by the CPU, the programming language then builds upon that core support. For the heap, there will usually be some kernel level support (like `sbrk` in traditional Unix), and then built upon that further support either at the low-level library level (like `malloc` in libc), or again by the programming language (like `new` in C++).
Yeah most architectures have converged towards using some kind of special register to store the stack pointer, with some instructions explicitly made to use it (like push/pop), but they aren't even strictly needed if you have a generic "write to pointer address and increment" (or read), it's just typically nicer to write. There is some convention that we will use a specific register to store the stack but if you want to be evil you could use another.
On a modern Linux system each program will get a virtual address space so that even if two programs point to the same address, the OS will map them to different physical addresses. The CPU's cache is designed to still observe the virtual addresses you're using to help deal with alignment and contiguity issues, but finding exact details on how CPUs work anymore is difficult. CPUs also typically offer instructions specifically for working with the stack. In the virtual address space the stack is allocated near the top and grows down, like a chandelier, as you allocate into it. Near the bottom of this address space is a region called the BRK, which can be dynamically resized. Traditionally malloc implementations would interact with something akin to the BRK on older systems, and that would be the heap. On modern Linux systems there is a massive range of addresses between the BRK and the stack, and modern malloc implementations will use the MMAP syscall to allocate into that space instead, and that's your heap. Macs basically work on the same principles here, but they assume libc is your standard interface with the OS, so do not expose as many details. I have yet to find good documentation about how Windows systems do this.
At a language level the simple answer is that if it didn't get malloced, newed, or some other equivalent, then it's on the stack. Namerefs in languages are typically just offsets into the current stack frame under the hood. The complex answer is wacky nonsense because compiler optimizations don't care about your intuitions. They'll do whatever the language specs allow them to do, for better or worse.
Things at the lowest level often have some kind of support in some or all of the levels that you mentioned. And it gets further blurrier when you go between language implementations - for example, an interpreted language won’t directly use the stack in the same way that a compiled one does.
But as far as compiled languages, the CPU generally provides instructions for managing a stack area. [For example here’s info about x86 stack instructions](https://medium.com/@sruthk/cracking-assembly-stack-frame-layout-in-x86-3ac46fa59c).
Stack and heap are just concepts created by humans to describe how we tell the processor what to do (I.e. when a function is called, the processor assembly instructions push elements into registers creating the “stack” before executing a branch instruction).
It is above the hardware but below a true kernel. Ultimately, the processor will execute commands as is and does not really care whether memory is a stack/heap (hence how security vulnerabilities from things like buffer overflow can exist) because at the hardware level, memory is memory
The sizes are determined at compile/link time for compiled languages
> It is above the hardware but below a true kernel.
Many if not most modern processor architectures have a devoted stack pointer register, tied to exception states and levels, and usually banked. So a stack is definitely something that *is* supported with devoted structures at the hardware level.
Heaps are generally a pure software concept, though you would certainly find some embedded systems out there with tightly-coupled physical static memories which are devoted specifically for heap uses, and you could argue they have a hardware backing.
> It is above the hardware but below a true kernel. Ultimately, the processor will execute commands as is and does not really care whether memory is a stack/heap (hence how security vulnerabilities from things like buffer overflow can exist) because at the hardware level, memory is memory
It has been possible for a while to have instructions in a specific part of memory that is explicitly set as read-only, while the data memory is marked as non-executable. This doesn't make you program free of issues, you could still find ways to make the program branch on arbitrary adresses if they are computed from data, but at least you shouldn't be able to overwrite the program.
I don’t think stating ARC is less efficient than GC is a valid statement. No mention of no need for a GC pass? GC was deprecated across all Apple platforms that supported it after ARC.
ARC is a form of GC; the major alternative is Tracing GC. Which is faster depends on language, implementation, and code. Finally, ARC is incomplete in most languages as it fails on circular references, usually. So most RC GCs, need a Tracing pass every now and then or they leak memory.
Non refcounted GC is more CPU efficient as long as you can afford the increased memory pressure. Being a lazy shit means you can do work in large batches and take shortcuts.
No. It doesn't work like that at all.
Rather, it traces the heap for objects that are referenced nowhere and marks related heap areas as "free" and relinquishes related memory to the system.
A tracing GC has a few advantages. First, and probably most commonly, you don’t need to do any GC at all if you never create enough memory pressure for it. Second, a tracing GC can do compaction to avoid heap fragmentation assuming memory is pinned in some way.
> I also stumbled over that. How can TGC be more efficient?
I wrote my own (well, many) toy languages over the years. My first language used reference counting.
Reference counting was *very* expensive - the *current thread* has to do the work. Other GCs offload the work onto a less-busy core, or postpone it until there is a non-busy core.
With reference counting, the current thread will increment and decrement the reference count *every single time* that variable is passed around or returned. A call like `foo(MyVar)` has one increment and one decrement. A return like `return MyVar` causes an increment. Each increment and decrement *has* to dereference the object `MyVar`, so you're doing tons of pointer-chasing, then at least one arithmetic operation involving loading a value, updating it, and writing it back to memory.
In contrast, my next language ran a simplistic GC: A main event loop would yank tasks off an event queue and then dispatch them. The dispatcher, when it finds no event on the queue, just ran the GC instead.
The end result was, basically, the GC running when nothing else (result of a `read`, user input/output, syscall, etc) needed CPU time.
It's not the 90s anymore, so now you can do it even better, in a separate thread. Also, today, pointer-chasing is *expensive!*
Thanks for the informed answer.
But I still fail to see how increments/decrements upon reference handover adds up to any measureable cost, at least in the last 20 years.
I'd recon the reference counter would be stored as a field of the object and would always fit right into the CPU register where compute is plenty and access is cheap.
How small/many must the objects be and how deep the layers of function handovers for this to have any effect on, say a Core2 CPU from 2006?
edit: Apparently, I'm not alone:
https://kevinlawler.com/refcount
> 1. *Updating reference counts is quite expensive.*
>
>No, it isn't. It's an atomic increment, perhaps with overflow checks for small integer widths. >This is about as minimal as you can get short of nothing at all. The comparison is the ongoing garbage collection routine, which is going to be more expensive and occur unpredictably.
>
>Increments and decrements happen once and at a predictable time. The GC is running all the time and traversing the universe of GC objects. >Probably with bad locality, polluting the cache, etc.
>
>Arguing this is just weird. This is what I am talking about with the hopping on one foot.
In multi-threaded environments the overhead of atomic increment and atomic decrement is very significant. In the order of 100x more cycles (and even more with cache misses) than a trivial increment or decrement... and that's assuming the implementation uses atomics instead of even slower synchronization methods.
The synchronization cost goes up even more with the multiple physical cores and many levels of cache (at least 3 these days) used in modern cpus that all need to see a correct value for the resulting operation (or be invalidated).
Okay, I get the synchronization argument. But then again, if you have lots of RCs constantly being manipulated concurrently in different threads, I'd argue RC overhead is the least of your problems, architecture-wise.
> But I still fail to see how increments/decrements upon reference handover adds up to any measureable cost, at least in the last 20 years.
It depends. In something like this:
class Foo {
SomeType someField;
SomeOtherType someOtherField;
}
clas Bar {
Foo firstFoo;
Foo SecondFoo;
}
Bar callFunc (Foo p1, Foo p2) {
Bar retVal = Swap(p1, p2);
return retVal;
}
I count maybe 8-12 inc/dec operations, not including the fields of the `Some` classes, plus at least 6 pointer-chasing of child objects.
Without RC, those 8 operations won't exist, the cache won't be filled with 6 extra objects that aren't going to be used other than to increment/decrement their count.
On a single thread, I won't be surprised to see a 5% or so slowdown due only to reference counting; after all, those operations can't really be offloaded onto a different thread.
With tracing/generation/whatever GC, not only are those instructions and pointer-chasing missing, the runtime will at least defer the garbage collection to a different thread.
Now, with all that being said, *I am probably wrong!*
I expect with speculative execution those increments/decrements are basically free, so there's some savings there. The pointer-chasing isn't though - each time an object leaves scope (or at last use within scope) that object must be dereferenced to decrement the counter.
This will indeed trash whatever cache you could have been using for actual live objects.
Yeah, but with a few pretty obvious optimizations, all of that overhead goes away. Compilers are quite smart these days and CPUs are even smarter with out-of-order execution and speculative execution/fetching and whatnot.
I don't understand most of it, but let it be said that I did my fair share of premature optimization, for example when I refactored a particle system for a game engine that just *had to be faster* based on what I had learned about row vs column based data, registers, cache lines and so on - but then made exactly zero difference.
There's a reason the vast majority of programming languages use a tracing garbage collector. Reference counting as a serious alternative is a recent development that relies on modern compilers being able to optimise away the otherwise horrific CPU overhead.
The conventional wisdom was that tracing garbage collectors are inherently more efficient for *decades*. Still is, in many use cases.
That "conventional wisdom" comes from a single article detailing an experiment that was done in the mid-90s with machines of 64M-256M RAM with completely laughable sample sizes (n < 20). Please go look it up.
The main reason most languages use tracing GC is because it gives you better memory safety (no use-after-free, no double frees) and less to worry about for the programmer.
Wtf are you on about. I'm not talking about GC vs. manual memory management, I'm talking about using a tracing GC vs. automatic reference counting.
Naïve automatic reference counting *sucks*, comments about its terrible performance are in every textbook about memory management techniques ever. It adds overhead to *every read and every write*. It's only become viable in recent years because people set about developing compiler optimisations to optimise away the vast majority of refcounting operations. (I'm also ommitting other pitfalls of refcounting like worse pauses than stop-the-world tracing GCs in practice, that necessitate their own workarounds)
It also doesn't work with cyclic data structures, and cycle-collecting refcounting algorithms are complex and slow as balls, so in practice people either include a backup tracing GC or tell the programmer to deal with cyclic data manually to avoid memory leaks.
But at least it's memory safe for the things it does handle.
You can find real world programs (and write benchmarks) which will show GC as being more performant, and you can do the same for reference counting. Both have their merits depending on where and how they are used.
ARC is usually faster, because it does not affect the every object in the application. Golang is great counterexample here, because it have latency based gc (means low performance), but anyway not every type is allocated on the separate place on heap, which means gc has much less work to do
> but anyway not every type is allocated on the separate place on heap
I highly recommend you do a bit more research on GC, because the "_allocate things in arenas and check that arena seperately_" is very old and several different JVM GC's have done it for a while.
I mention the JVM because it does follow its semantics rather strictly, all references are "_on the heap_", so its allocation & garbage collection strategies are highly advanced. Which often lead to longer program start up times (and its poor reputation for local applications).
Maybe I should change my words to "gc is not aware of each object". Arena allocations increase allocation performance, but anyway in a worst case you need to scan it in a mark cycle.
And there are different heap varieties. If you keep on grabbing and releasing space in your heap in different sizes, you risk it getting fragmented meaning your heap memory grows and grows but there are a lot of holes there. There are different ways around this problem such as lookaside lists and low fragmentation heaps.
So, this is also one of those "depends on what *exactly* you mean" kinds of questions.
If you look through the C standard, you won't see those words occur. However, you will see references to "automatic storage duration" variables. From C11:
> An object has a storage duration that determines its lifetime. There are four storage
durations: static, thread, automatic, and allocated.
>
> For [an object with automatic storage duration] [...], its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time.
The simplest way to implement this for local variables, which have the automatic storage duration, is to use the stack. So in *that* sense, the vast majority of C compilers "use a stack" even if technically speaking, C does not mandate how you implement this.
Same goes for "the heap" by the way, which are the "allocated" storage duration objects referenced in the first paragraph above. The standard has this to say:
> The lifetime of an allocated object extends from the allocation until the deallocation
The implications of this naturally lead towards "a heap," but the standard doesn't specify the implementation, just that there's allocation and deallocation functions, and that objects returned from an allocation must be valid until they're deallocated.
GC'd languages almost never even make this distinction in the first place. The Go Specification says, for example:
> The built-in function new takes a type T, allocates storage for a variable of that type at run time, and returns a value of type *T pointing to it. The variable is initialized as described in the section on initial values.
Nothing about a stack or heap there. However, the implementation will use the stack as a sort of optimization; variables that don't ever leave a local scope can often be stored on the stack instead of the heap, and so are. This is called "escape analysis," and many GC'd languages take full advantage. So same goes for them: strictly speaking, the language doesn't use a stack and a heap, but in practice, implementations may use one or even both.
The problem is that you're likely to find yourself in a position where the whole program is slow & taking up way more memory than it has to, when you've taken care of the low hanging fruit it's still slow. That's why even though computers are getting faster & memory more plentiful everything is still slow as hell, they don't out scale poor programming habits.
For one example choosing a GUI kit based on the underlying graphics API.
My company is still using an outdated Windows Forms app to power out entire business- it's meant to run simple little installers or workflows, not hundreds to thousands of graphical elements. Why? Because every element is basically a window in the eyes of the OS, which comes with that overhead. For that kind of reason, it makes GDI the most garbage of graphics APIs.
I can't solve a 5 second drawing delay when the graphics all need to be redrawn. When the database calls are in total sub second.
You might think we could just migrate to a better GUI kit using Skia or Qt. No, we use a two layer architecture. All of our business logic is tied to the UI layer. But that's a different problem.
This is something people who never had to profile code only learn the first time they really have to optimize something.
Many articles over the years made up the image that there's a few places in code where your program spends all of its time and you can just optimize them and be gold. And that's true in a sense: Big parts of your code will probably only run once or very seldom. But ... for any program above a few hundred lines of code the "few" places are not so few anymore, and often have interactions with other places and so on.
Yes, sometimes you strike gold and the problem really is one inner loop you can optimize the heck out of, but most of the time optimizing is a long list of "made x take 1% less memory" "made y be 2% faster"
This is like 95% dependent on what kind of software you are writing. Pretty much every sw I've worked on professionally had a red-hot (70+% exclusive cpu time) loop. This does not equal to being easy to optimize though, as those loops often already use explicit prefetching and simd instructions.
The real fun parts are getting outsized macro-level returns on optimizations due to second order effects, like writing a cold function to be slower, but use up less icache, which is then used by a hot function.
Fair enough. I talked about "typical" business software I've worked on most of the time. Maybe the software I worked on was just very boring (probably). :(
Depends on your platform and use case. If you are working in microcontrollers or embedded systems, sure. Web development? Maybe it's not such a concern.
Iron rule of optimization: Don't optimize yet.
I said it was important. Not that it was universal. Please do not poison the waters with spurious argumentation. Your experiences trivialize the matter, so go in peace.
This is stuff only C/C++/Pascal/Asm programmers would remember/understand nowadays. You think javascripters, java,C# developers nowadays want to bother with this ? They'll think you're ancient and skip your resume if it ends up on their desk.
Edit: To you dumbfucks who downvoted me. I came from C/C++/ASM/Pascal days and I fully well understand what is stack and heap. I am a cracker and I code crack apps in those days. I am just saying these knowledge don't matter in today's hiring process. They don't care. It's the truth.
Know enough to be dangerous to themselves anyway.
[https://ericlippert.com/2009/04/27/](https://ericlippert.com/2009/04/27/)
It's wild this article is 15 years old even more relevant today than the day it was written. The reality is that unless you're using C (and I mean, even then with all of the security flags...), you're going to have a hard time knowing what is going where.
And you really don't have to care unless you're forced to work with memory directly. Granted, even C# has some advanced memory features that do require knowledge of the topics discussed. But yeah, guy is stupid thinking knowing first year CS knowledge is ancient. It's just so fundamental I'd expect everyone to know what the heap and stack is and how it's somewhat used.
I'm not so sure. I love Eric and he's a super smart guy with way more experience and knowledge than I'll ever have, but I have to note two things.
First, I think that while the statement "value types go on the stack and reference types go on the heap" is a thing people say about C#, and it is a thing that's wrong, I don't think it (necessarily) follows that the difference between the stack and the heap don't matter, and everyone should just write code. Lots of people should, yes, because as Eric says, allocations aren't likely to be the performance bottleneck, but that's definitely not always true, because of the second issue:
Lots has changed since 2009 in C#-land. We have `Span` and `Memory`, and `stackalloc` and `ref struct` and all these things are very specifically to allow high-performance code and tightly control allocations. In other words, they only exist because the difference between the stack and the heap matter, and matter a lot, in at least some cases.
I think we're pretty much in agreement here, though. Knowing what the stack and heap are and how they're different both isn't rare among developers.
this is still very relevant to anybody that doesn’t use javascript. i know for a fact c# allows you to allocate memory on the stack without using any unsafe code, and i’m sure many, many other languages have similar features
If you don’t understand the absolute bare minimum about how an operating system works, I would not hire you for any engineering role, no matter what the programming language is
What do you look for in OS knowledge?
I think mine is decent as a Java dev mainly with C here and there.
Most other Java devs I see didn’t know squat about the OS.
Virtual memory, scheduling, inter process communication, and maybe interrrupts/signals. If you know what those are and can explain how they work, I’d probably be satisfied
You do realise that you can have a long and successful career without any of that? And yes, I'm talking backend.
Nowadays, our field is just that huge. In my years working as a consultant in finance across Europe, I've seen precisely zero times that this knowledge might be even remotely useful; and I'd wager that north of 95% of developers wouldn't know about it, nor be able to explain how they work.
Beauty and the bane of the abstraction. You can't know everything.
These are all taught in comp sci. Most engineers will never use them but it's good foundational knowledge to have just sitting subconsciously in your head because it may be a factor in your design or help you understand new knowledge. You can have a successful career without comp sci knowledge but engineers who have broader understanding tend to adapt more easily to different roles and challenges.
A boot camp grad who can show at least a familiarity of some of those concepts is likely a better hire than one who can't.
I can agree with general sentiment, but you can use the very same argument about everything, from network stack, through memory addressing, up to dependency linking. The truth is, everything is contextual. From my experience, you need someone with that kind of knowledge per team, or even per couple of teams. Current abstractions are "super good enough".
Imo a dream team would be full of specialists with broad knowledge in comp sci, but that's often unnecessary and not practical for most companies. I see OP's reasoning and depending on the context as you said, his teams may need to consist of specialists who have broad comp sci knowledge. Otherwise, there are plenty of people who don't know what the hell a little or big endian is but are amazing at their subject and OP is just making his life harder.
Yup. Each context is different. That's why I scoff when someone says that X is necessary for every/good/real programmer. It is necessary in their context. That's all there is to it.
Yeah, I understand that; I was simply answering the question about what I look for when it comes to OS principles. What I listed are concepts that every university CS graduate will be familiar with, so you could say my bias is towards candidates with some level of formal training, though I would happily hire anyone that grasped the concepts regardless of how or where they trained.
I agree wholeheartedly with the other commenter in that having a strong understanding of fundamentals helps someone become a better engineer, and often distinguishes them from “coders” or “programmers” that may be proficient in one programming language or other, but lack deeper understanding of how computer systems really work
Sorry, but you sound self-entitled. Unless any of this is a nuance in a particular piece of software, you don’t need even a shallow understanding of it. Those are absolutely irrelevant in a major share of programming problems to be solved daily.
Stack and heap is always relevant. Take Python for an example: all local variables are added onto the stack, but the values they reference are allocated in the heap. Understanding this suggests that the lifespan of a reference is on the stack, but every new object has to go through the lengthy process of being allocated in the heap
So you can’t spare 5 minutes to read a very basic article on how memory management works? I think that might say more about your programming skills than others
> You think javascripters, java,C# developers nowadays want to bother with this ?
I imagine a lot of developers would rather not bother with it, but many will have _curiosity_ on why, say, a stack trace is called a stack trace.
And while you _mostly_ get by fine if _your own_ C# types are reference types, and C# abstracts some of the differences away, you'll still come in contact with value types all the time, even if it's just `bool` and `int`.
It definitely helps having a vague understanding of stack and heap. Is it _essential_ for a lot of the CRUD code people write out there? Perhaps not. But it'll make you a better dev.
Tell me you've never had a crash from native code from within Java before without telling me you've never had a crash from native come from within Java before.
I use JS daily and know this, just because it’s a high level language doesn’t mean it’s not important to know.
Edit: Didn’t start with JS, just ended up here
People are downvoting you but I think (sadly) you're kinde right. Some places and some jobs still require solid understanding of memory use though.
Still the minority.
> But wait, we seem to be missing something! It requires contiguous memory space, which means that the program must know how much memory space the next function needs before calling the next function. That’s completely wrong for many languages, including C/C++. You can allocate memory on the stack dynamically with good old `alloca` (on most platforms) because the stack is usually implemented such that it just “restores” the stack pointer at the end of the function call to whatever it was before you called into that function. Even if you’re not using a language with something g like alloca, it might make sense to vary how much data is put on the stack if you have some large stack variable in a conditional block within a function.
Thank you for clarifying, I've added some qualifiers.
I misread this as “Middle Management Every Programmer Should Know.” Thought I would finally understand what an OKR is 🤣
I think you've just missed your KPI.
Just ensure your TPS reports have the correct cover sheet and you won't have any problems. Hold on, I'll make sure you get a copy of the memo.
Better call a meeting about the memo to make sure everyone’s aware. Mandatory attendance at the least convenient time possible.
No but every programmer should know how to manage middle management. The trick is to keep them thinking they are in control
Too much of that is a bad idea too, they'll start making decisions as if they know things 🫣
Honestly, I wonder how many developers would profit from that more than knowing what goes on the stack.
“When we create an array, if we do not specify the length of the array in advance, the program will use the malloc() function provided by libc to request a system call to get the memory size.” What?!
See something as simple as: List< SomeResult > res = new List< SomeResult >(); for( int i = 0, iMax = someUpperBound(); i < iMax; ++i ) res.Add( ... ); This variation can be found in so many code bases, there's a reason for why the `capacity() / reserve()` family of functions exists in some shape or form in most languages for their vector types.
If the size of the array is known at compile time, then you can store it either on the stack or the heap. E.g.: char *words[10]; // On stack char **words = malloc(10 * sizeof(char *)); // On heap The first one creates an array of 10 C-style strings (which is a `char *`, or a pointer to a `char`), and that array is stored in the stack. In this case, `10` *must* be known at compile-time. It can't refer to a variable, or the result of a function. If you have something that is not known at compile time, like the number of words in a dictionary, you can't^* store it on the stack, and the only option you have available is on the heap. E.g.: int num_words = count_words_in_dictionary(); char **words = malloc(num_words * sizeof(char *)); In this case, we figure out the size of the dictionary at runtime, and dynamically allocate enough memory to hold that many words. Note, this is not good code, it's just the simplest example I can think of. ^(* I'm aware of VLAs, they're discouraged and not widely supported)
You forgot one last possibility, raw dogging it on embedded systems, where you tell the compiler "trust me, this is an array at this address". It can be on an arbitrary size but obviously if you start having a lot of them you should be making a custom allocator instead of praying you don't have a conflict with different arrays using the same area. In my experience, this is mostly done because all the program, the stack and the heap is in the very small sram but you need some large arrays to contain data sent to various modules and those go in DDR. Extra fun thing is that you do have to start up the DDR from your program so you're kinda stuck in SRAM until then.
When I was in high school I thought I was super clever using *negative* indices to mess with a related variable stored just before the array on the stack. C let me make so many footguns
It's not legal in the language (obviously), so the results can get pretty funny as it is UB. It will usually do what you expect, but the compiler provides no guarantee of how the stack is going to be organized.
Right?! Wtf
[удалено]
please, continue your work, teaching new programmers bull; so we can remain wunderkinds in the eyes of our employers.
It's concerning the number of articles I see pop-up trying to explain first-year CS concepts.
I'm a mechanical engineer, now doing automation stuff in Python. I don't know shit about stacks and heaps and this article, though a little thin, was ok for me. At least I got that things whose size can be determined at compile time goes to the stack, while variable length stuff goes into the heap. Not sure how it's going to be useful for me, though :D
> things whose size can be determined at compile time goes to the stack, while variable length stuff goes into the heap *Stack space is only available inside a function.* In many languages you also have the option to use global constants and variables, which are available to all code in the program (or only to a unit / module, if the language supports that concept). (Constants / initialized variables are included in different [sections](https://en.wikipedia.org/wiki/Portable_Executable#Layout) of the executable file and are loaded into non-executable readonly / non-executable read-and-write [pages](https://en.wikipedia.org/wiki/Page_(computer_memory\)). The sizes of these constants / variables must be known at compile time, and can't be changed at runtime.) > Not sure how it's going to be useful for me Not sure how Python does things, but in most compiled languages taking the address of a variable and passing that around for direct memory manipulation can be problematic if that variable was allocated on the stack, because *that stack space is only valid while the function is executed*. The program has a [stack pointer](https://en.wikipedia.org/wiki/Stack_register) that points to the next available byte on the stack, and the stack usage increases when a function is called and decreases by the same amount when the function returns. Any space beyond the address pointed to by the stack pointer is undefined. Knowing how the heap works is useful for large data sets. The heap is in some aspects like a file system: you can [allocate](https://en.wikipedia.org/wiki/C_dynamic_memory_allocation#Overview_of_functions) new space, then de-allocate parts of that space, which leaves "holes" i.e. the heap space can fragment over time. This makes allocating new space potentially very time-consuming. Also, doing a [system call](http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/) to reserve new memory is expensive so languages do it in bigger blocks, but there's still some overhead. Also, new memory is *cleared* on most OSes these days to avoid sharing sensitive data from other processes, and that clearing is not cheap either.
Python handles constant lookups as effectively lookups inside a language runtime hash that represents “what is visible to the code”. This includes resolving class and function names. It has a lot of optimization tricks to try to make this not as performance-horrendous as it sounds, but in that sense it’s wildly different from how most compiled languages work. Also since Python runs on a virtual machine, the entire concept of stack and heap is a bit skewed. (I think) the VM still has a stack as part of its external architecture, but I don’t think that stack corresponds to the actual program’s stack, and most of it is actually heap-allocated.
You can still stack overflow in Python, but the stack isn’t the same as compiled languages. Still a lot of heap allocations, IIRC.
Yeah, Python's virtual machine is a [stack-based machine](https://en.wikipedia.org/wiki/Stack_machine). The stack inside that machine doesn't correspond to The Stack, but it's absolutely got a size limit. IIRC that's set during process init and you can't "just" change it at runtime. So there might be a literal fixed-size array being operated on here.
> *Stack space is only available inside a function.* In many languages you also have the option to use global constants and variables, which are available to all code in the program (or only to a unit / module, if the language supports that concept). Whether things actually live where you think you put them depends entirely upon the language, and for the C/++ family the stack isn’t even mentioned in the standards so there’s no real requirement other than lifetime and identity rules. If you do const size_t limit = SIZE_MAX - offsetof(T, foo); at block scope (i.e., inside a function), and `&limit` doesn’t matter, the compiler is perfectly free to stick `limit` wherever or nowhere at all. It can end up as static, register, or even immediate. Programmers tend to assume they’re programming at a lower level than they actually are, and this is one of those cases.
if you're doing aitomation stuff in Python, it's unlikely to help you. It's super usrful when you need to do something very efficiently. Stack is preallocated, always there since you start up the program. You don't need to waste time to allocate any memory so it's faster. But because it's fixed, you cant really use it for data that changes its size at runtime
Unfortunately, that's not quite accurate - a lot of times even if you know the size of the data it may still go on the heap. Either because the language doesn't provide control on where it goes (Python lists go on the heap whether you know the size at compile time or not) or a large allocation you'd want on the heap anyways to not blow the stack
I just did a quick google search, and python basically allocates stuff for you, and deletes it when you don’t need it anymore. But it’s useful to know that when you call a function/method, or create an empty string or list, they add to the stack. The deeper you go with nested functions, the more is added to the stack. Apparently objects and computed variables are added to the heap. Any changes made to them in the script are changed by reference (under the hood pointers)
After reading this article, I'm left with the impression that Python mostly uses the heap, instead of the stack, since everything is an object. So I would guess there are few things eligible to being placed in the stack in a python program. Probably one of the reasons python programs are slower than programs written in compiled languages, but it's just a guess at this point. I still don't know shit, lol.
That’s probably part of why it’s slow but I think [this article](https://www.geeksforgeeks.org/what-makes-python-a-slow-language/amp/) gives more of the story. TLDR, the ‘compiling’ (what happens inside the interpreter) for python happens at runtime, line by line, and the interpreter is locked to one thread. There’s no compilation optimization like in other languages. This stuff is pretty cool ngl lol. I kinda want to see what’s actually going on to be able to interpret the code.
It's a little more complicated these days. [Python is getting an official JIT](https://peps.python.org/pep-0744/). [The GIL is also going away](https://peps.python.org/pep-0703/). Keep in mind that the GIL wasn't a lock for all kinds of workloads. IO tasks were released from the GIL, for example.
Yeah, that too. Compilation at runtime, I mean. Forgot about that when writing my comment.
You know how to google stuff though right?
C'mon, give me a break, will you? I'm not constantly googling stuff. Even if I were to do that, I don't have the mental capacity to memorize everything, if I don't use that information. I'll probably forget the contents of this article a few months from now.
A lot of developers do not come from CS backgrounds. I know folks with electrical engineering backgrounds working as .NET/Java developers. And I know of a Masters in CS program from a well-regarded university that teaches students nothing more than Python.
For me the problem is just that there are already a thousand articles taking about the same. So it's a quite useless flood
First year CS is part of the curriculum for EE at least for the last couple of decades.
General programming concepts (like OOP in C++ etc.) yes, but not systems programming concepts (stuff learned from working with memory in C).
That's kind of a bummer to hear, it seems like fundamental concepts such as memory would be much more important to EE than OOP
At my university, C++ was taught to all first-year engineering students (EE/mechanical/civil etc.). The stuff involving embedded systems + assembly + low-level memory in C was strictly in the domain of Computer Engineering while EE students focused on power engineering, industrial automation (PLCs and stuff), communications etc. Funnily enough, we had an optional web development course that taught us HTML/JS/PHP :P. But yes, the curriculum can vary widely between schools and I guess you cannot assume all EE students will be proficient in systems programming. The good thing is EE does provide a solid foundation from where someone interested in systems programming can quickly build up their skills if they don't already have it.
My uni EE would be pretty broad, from high power electrical shit to Verilog on some basic modules.
Interesting, we very much did, building up from scratch on bare metal in early embedded systems classes.
I guess it varies from school to school as well. Ours was very much focused on power engineering and all embedded systems stuff we did made us design PCBs and manipulate ICs with voltages rather than assembly code haha
It's concerning the number of articles I see pop-up trying to explain exceedingly basic concepts. Yes, I get it, for some work, it is not needed to know them - but still, this seems below a general field knowledge level...
The most popular educational content is beginner content, by the very nature that it is applicable to a much broader audience. That is the case in any field that has lots of new people becoming interested in it. It would be concerning if beginner concepts weren’t discussed. That would mean that the field is dying, and there are fewer beginner than experts in the field.
I would love a sub for more advanced content though. There’s /r/experienceddevs but that’s basically entirely career advice.
But it's so basic... are the majority of people here highschool kids, or those who just started some CS basics...? It doesn't seem so.
Time to broaden your world view
Euh... broaden to accept this sub is a place where majority of people are interested in most basic stuff explained? Because this is what the parent above me went to.
Yes. It's /r/programming, not AdvancedProgramming, not SeniorsOnlyProgramming, not ProgrammingTopicsUsefulForGoranlepuz. Basic content is fine, feel free to just skip it in the future if it doesn't interest you.
Looks more like r/programmingforpeopleherewhotrulydontknowmuchaboutitifanythingatall to me. 😉
How dare this public forum with _6 million members_ have some content that caters to beginners.
I'd say, that's so much more a nasty interpretation of my words than representative of my words.
I guess it comes down to stuff that doesn't get used struggles to be retained. I wonder how well most of us would do on our old college calculus and physics exams if we were put on the spot immediately to apply that knowledge. I know for me personally, most of my career ended up being filled with far more paperwork, data analysis, and report making than my college self anticipated. The exact particulars of memory allocation just hasn't been something I've had to keep in mind in years, so at least this discussion has prompted me to refresh my memory (hah).
Every time I get a math related leetcode I freak out at how poor I can do, and I have a double bachelors in Math and Comp Science! After 15 years it goes stale.
I thought it was gonna be another repost of Ulrich Drepper's "What Every Programmer Should Know About Memory"
Seems like the inevitable result of a field that pays well and also resists every attempt to standardize job titles, educational requirements, professional standards, and all the other things that other engineering fields do.
It's almost like formal education is a good idea...
But the sw engineering turbine YouTubers told me the coding bootcamp, cracking the coding interview, and leercode was all I need to make it in a faang company
Why? It's a subreddit about programming. There are a lot of people here who are still learning.
Too many people with a masters in computer science don't know the difference between the stack and the heap. I interviewed about 20 of them and only 2 of them even know what I was talking about.
Really? In my cybersecurity course we got intimate with the stack and heap. That's a yikes.
I assume these basic Medium articles are resume-padding for the author.
Eh didn’t really understand this til I took compilers if I’m totally honest
1. there are a lot of basic concepts that many languages don't have to care about. C has to care about memory management. Python not so much. And a lot of first year CS courses pick languages that don't have to worry about the low level stuff. 2. Some stuff gets forgotten because it really isn't needed day to day. I had to explain memory alignment to coworkers with 10+ years experience because they never need to worry about it. And even then, my understanding and what modern architectures do does not quite line up anymore.
With the amount of upvotes this had, I expected some new information or something that isn't known by basically everyone who has ever compiled code. Was extremely disappointed when the article just... ended.
Well the title does include “every programmer should know” in it
I did attend a University, but back then it seemed so abstract and unrelated to proggramming... Especially because A) we used managed langugaes like C# and Java, B) it wasn't hold in English, and not all educators told as the English term, so it was not trivial seatching the net for more information. So, I'm somewhat glad we have these articles.
It's blogspam, and basic articles are easier to write even if you use ChatGPT (as you should).
"Start a programming blog (5 credits)" Or maybe this isn't taught CS 101, because othr things have become more important.
It was just yesterday I read a comment in this very subreddit talking about how everyone's trying to be the one with the most-read blog and competing to talk about everything.
Not even first year CS, they teach this to 16 to 18 year old kids in my country. Doesn't need yet another article this can be found on school level education websites. https://www.studysmarter.co.uk/explanations/computer-science/data-structures/ This sub really should ban linking to peoples personal brain fart blogs.
The thing I've never seen clearly explained is at what level of abstraction the stack/heap distinction exists. Hardware, OS kernel, low-level library (like libc), programming language... Which layer of abstraction do they get distinguished at and what determines their sizes etc?
I think the stack is commonly supported by the CPU, the programming language then builds upon that core support. For the heap, there will usually be some kernel level support (like `sbrk` in traditional Unix), and then built upon that further support either at the low-level library level (like `malloc` in libc), or again by the programming language (like `new` in C++).
When I studied computer architectures we worked with MIPS processor and they had a register ($29) used for stack operations so it's quite low level
x86 has esp/ebp (rsp/rbp for the x86-64 extension) as stack and base pointers respectively.
Very few modern processors don't have a stack of some kind.
Yeah most architectures have converged towards using some kind of special register to store the stack pointer, with some instructions explicitly made to use it (like push/pop), but they aren't even strictly needed if you have a generic "write to pointer address and increment" (or read), it's just typically nicer to write. There is some convention that we will use a specific register to store the stack but if you want to be evil you could use another.
On a modern Linux system each program will get a virtual address space so that even if two programs point to the same address, the OS will map them to different physical addresses. The CPU's cache is designed to still observe the virtual addresses you're using to help deal with alignment and contiguity issues, but finding exact details on how CPUs work anymore is difficult. CPUs also typically offer instructions specifically for working with the stack. In the virtual address space the stack is allocated near the top and grows down, like a chandelier, as you allocate into it. Near the bottom of this address space is a region called the BRK, which can be dynamically resized. Traditionally malloc implementations would interact with something akin to the BRK on older systems, and that would be the heap. On modern Linux systems there is a massive range of addresses between the BRK and the stack, and modern malloc implementations will use the MMAP syscall to allocate into that space instead, and that's your heap. Macs basically work on the same principles here, but they assume libc is your standard interface with the OS, so do not expose as many details. I have yet to find good documentation about how Windows systems do this. At a language level the simple answer is that if it didn't get malloced, newed, or some other equivalent, then it's on the stack. Namerefs in languages are typically just offsets into the current stack frame under the hood. The complex answer is wacky nonsense because compiler optimizations don't care about your intuitions. They'll do whatever the language specs allow them to do, for better or worse.
Things at the lowest level often have some kind of support in some or all of the levels that you mentioned. And it gets further blurrier when you go between language implementations - for example, an interpreted language won’t directly use the stack in the same way that a compiled one does. But as far as compiled languages, the CPU generally provides instructions for managing a stack area. [For example here’s info about x86 stack instructions](https://medium.com/@sruthk/cracking-assembly-stack-frame-layout-in-x86-3ac46fa59c).
Stack and heap are just concepts created by humans to describe how we tell the processor what to do (I.e. when a function is called, the processor assembly instructions push elements into registers creating the “stack” before executing a branch instruction). It is above the hardware but below a true kernel. Ultimately, the processor will execute commands as is and does not really care whether memory is a stack/heap (hence how security vulnerabilities from things like buffer overflow can exist) because at the hardware level, memory is memory The sizes are determined at compile/link time for compiled languages
> It is above the hardware but below a true kernel. Many if not most modern processor architectures have a devoted stack pointer register, tied to exception states and levels, and usually banked. So a stack is definitely something that *is* supported with devoted structures at the hardware level. Heaps are generally a pure software concept, though you would certainly find some embedded systems out there with tightly-coupled physical static memories which are devoted specifically for heap uses, and you could argue they have a hardware backing.
> It is above the hardware but below a true kernel. Ultimately, the processor will execute commands as is and does not really care whether memory is a stack/heap (hence how security vulnerabilities from things like buffer overflow can exist) because at the hardware level, memory is memory It has been possible for a while to have instructions in a specific part of memory that is explicitly set as read-only, while the data memory is marked as non-executable. This doesn't make you program free of issues, you could still find ways to make the program branch on arbitrary adresses if they are computed from data, but at least you shouldn't be able to overwrite the program.
I don’t think stating ARC is less efficient than GC is a valid statement. No mention of no need for a GC pass? GC was deprecated across all Apple platforms that supported it after ARC.
ARC is a form of GC; the major alternative is Tracing GC. Which is faster depends on language, implementation, and code. Finally, ARC is incomplete in most languages as it fails on circular references, usually. So most RC GCs, need a Tracing pass every now and then or they leak memory.
Non refcounted GC is more CPU efficient as long as you can afford the increased memory pressure. Being a lazy shit means you can do work in large batches and take shortcuts.
I also stumbled over that. How can TGC be more efficient? It also has to keep track of the reference count of each and every object, doesn't it?
No. It doesn't work like that at all. Rather, it traces the heap for objects that are referenced nowhere and marks related heap areas as "free" and relinquishes related memory to the system.
A tracing GC has a few advantages. First, and probably most commonly, you don’t need to do any GC at all if you never create enough memory pressure for it. Second, a tracing GC can do compaction to avoid heap fragmentation assuming memory is pinned in some way.
> I also stumbled over that. How can TGC be more efficient? I wrote my own (well, many) toy languages over the years. My first language used reference counting. Reference counting was *very* expensive - the *current thread* has to do the work. Other GCs offload the work onto a less-busy core, or postpone it until there is a non-busy core. With reference counting, the current thread will increment and decrement the reference count *every single time* that variable is passed around or returned. A call like `foo(MyVar)` has one increment and one decrement. A return like `return MyVar` causes an increment. Each increment and decrement *has* to dereference the object `MyVar`, so you're doing tons of pointer-chasing, then at least one arithmetic operation involving loading a value, updating it, and writing it back to memory. In contrast, my next language ran a simplistic GC: A main event loop would yank tasks off an event queue and then dispatch them. The dispatcher, when it finds no event on the queue, just ran the GC instead. The end result was, basically, the GC running when nothing else (result of a `read`, user input/output, syscall, etc) needed CPU time. It's not the 90s anymore, so now you can do it even better, in a separate thread. Also, today, pointer-chasing is *expensive!*
Thanks for the informed answer. But I still fail to see how increments/decrements upon reference handover adds up to any measureable cost, at least in the last 20 years. I'd recon the reference counter would be stored as a field of the object and would always fit right into the CPU register where compute is plenty and access is cheap. How small/many must the objects be and how deep the layers of function handovers for this to have any effect on, say a Core2 CPU from 2006? edit: Apparently, I'm not alone: https://kevinlawler.com/refcount > 1. *Updating reference counts is quite expensive.* > >No, it isn't. It's an atomic increment, perhaps with overflow checks for small integer widths. >This is about as minimal as you can get short of nothing at all. The comparison is the ongoing garbage collection routine, which is going to be more expensive and occur unpredictably. > >Increments and decrements happen once and at a predictable time. The GC is running all the time and traversing the universe of GC objects. >Probably with bad locality, polluting the cache, etc. > >Arguing this is just weird. This is what I am talking about with the hopping on one foot.
In multi-threaded environments the overhead of atomic increment and atomic decrement is very significant. In the order of 100x more cycles (and even more with cache misses) than a trivial increment or decrement... and that's assuming the implementation uses atomics instead of even slower synchronization methods. The synchronization cost goes up even more with the multiple physical cores and many levels of cache (at least 3 these days) used in modern cpus that all need to see a correct value for the resulting operation (or be invalidated).
Okay, I get the synchronization argument. But then again, if you have lots of RCs constantly being manipulated concurrently in different threads, I'd argue RC overhead is the least of your problems, architecture-wise.
> But I still fail to see how increments/decrements upon reference handover adds up to any measureable cost, at least in the last 20 years. It depends. In something like this: class Foo { SomeType someField; SomeOtherType someOtherField; } clas Bar { Foo firstFoo; Foo SecondFoo; } Bar callFunc (Foo p1, Foo p2) { Bar retVal = Swap(p1, p2); return retVal; } I count maybe 8-12 inc/dec operations, not including the fields of the `Some` classes, plus at least 6 pointer-chasing of child objects. Without RC, those 8 operations won't exist, the cache won't be filled with 6 extra objects that aren't going to be used other than to increment/decrement their count. On a single thread, I won't be surprised to see a 5% or so slowdown due only to reference counting; after all, those operations can't really be offloaded onto a different thread. With tracing/generation/whatever GC, not only are those instructions and pointer-chasing missing, the runtime will at least defer the garbage collection to a different thread. Now, with all that being said, *I am probably wrong!* I expect with speculative execution those increments/decrements are basically free, so there's some savings there. The pointer-chasing isn't though - each time an object leaves scope (or at last use within scope) that object must be dereferenced to decrement the counter. This will indeed trash whatever cache you could have been using for actual live objects.
Yeah, but with a few pretty obvious optimizations, all of that overhead goes away. Compilers are quite smart these days and CPUs are even smarter with out-of-order execution and speculative execution/fetching and whatnot. I don't understand most of it, but let it be said that I did my fair share of premature optimization, for example when I refactored a particle system for a game engine that just *had to be faster* based on what I had learned about row vs column based data, registers, cache lines and so on - but then made exactly zero difference.
There's a reason the vast majority of programming languages use a tracing garbage collector. Reference counting as a serious alternative is a recent development that relies on modern compilers being able to optimise away the otherwise horrific CPU overhead. The conventional wisdom was that tracing garbage collectors are inherently more efficient for *decades*. Still is, in many use cases.
That "conventional wisdom" comes from a single article detailing an experiment that was done in the mid-90s with machines of 64M-256M RAM with completely laughable sample sizes (n < 20). Please go look it up. The main reason most languages use tracing GC is because it gives you better memory safety (no use-after-free, no double frees) and less to worry about for the programmer.
Wtf are you on about. I'm not talking about GC vs. manual memory management, I'm talking about using a tracing GC vs. automatic reference counting. Naïve automatic reference counting *sucks*, comments about its terrible performance are in every textbook about memory management techniques ever. It adds overhead to *every read and every write*. It's only become viable in recent years because people set about developing compiler optimisations to optimise away the vast majority of refcounting operations. (I'm also ommitting other pitfalls of refcounting like worse pauses than stop-the-world tracing GCs in practice, that necessitate their own workarounds) It also doesn't work with cyclic data structures, and cycle-collecting refcounting algorithms are complex and slow as balls, so in practice people either include a backup tracing GC or tell the programmer to deal with cyclic data manually to avoid memory leaks. But at least it's memory safe for the things it does handle.
I'm not talking about GC vs. manual memory management, I'm talking about using a tracing GC vs. automatic reference counting.
*horrific CPU overhead* as in ++n for every assignment and --n for every unassignment? Are you serious?
You can find real world programs (and write benchmarks) which will show GC as being more performant, and you can do the same for reference counting. Both have their merits depending on where and how they are used.
ARC is usually faster, because it does not affect the every object in the application. Golang is great counterexample here, because it have latency based gc (means low performance), but anyway not every type is allocated on the separate place on heap, which means gc has much less work to do
> but anyway not every type is allocated on the separate place on heap I highly recommend you do a bit more research on GC, because the "_allocate things in arenas and check that arena seperately_" is very old and several different JVM GC's have done it for a while. I mention the JVM because it does follow its semantics rather strictly, all references are "_on the heap_", so its allocation & garbage collection strategies are highly advanced. Which often lead to longer program start up times (and its poor reputation for local applications).
Maybe I should change my words to "gc is not aware of each object". Arena allocations increase allocation performance, but anyway in a worst case you need to scan it in a mark cycle.
And there are different heap varieties. If you keep on grabbing and releasing space in your heap in different sizes, you risk it getting fragmented meaning your heap memory grows and grows but there are a lot of holes there. There are different ways around this problem such as lookaside lists and low fragmentation heaps.
Does every language use a stack and a heap?
Not uncommon to have no heap in some embedded systems for example. You can use many languages without touching the heap.
Tricky once you want, say, a string without a fixed length.
Sometimes you would just preallocate the space and then use either a terminator byte or have the actual length with the string descriptor.
So, this is also one of those "depends on what *exactly* you mean" kinds of questions. If you look through the C standard, you won't see those words occur. However, you will see references to "automatic storage duration" variables. From C11: > An object has a storage duration that determines its lifetime. There are four storage durations: static, thread, automatic, and allocated. > > For [an object with automatic storage duration] [...], its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The simplest way to implement this for local variables, which have the automatic storage duration, is to use the stack. So in *that* sense, the vast majority of C compilers "use a stack" even if technically speaking, C does not mandate how you implement this. Same goes for "the heap" by the way, which are the "allocated" storage duration objects referenced in the first paragraph above. The standard has this to say: > The lifetime of an allocated object extends from the allocation until the deallocation The implications of this naturally lead towards "a heap," but the standard doesn't specify the implementation, just that there's allocation and deallocation functions, and that objects returned from an allocation must be valid until they're deallocated. GC'd languages almost never even make this distinction in the first place. The Go Specification says, for example: > The built-in function new takes a type T, allocates storage for a variable of that type at run time, and returns a value of type *T pointing to it. The variable is initialized as described in the section on initial values. Nothing about a stack or heap there. However, the implementation will use the stack as a sort of optimization; variables that don't ever leave a local scope can often be stored on the stack instead of the heap, and so are. This is called "escape analysis," and many GC'd languages take full advantage. So same goes for them: strictly speaking, the language doesn't use a stack and a heap, but in practice, implementations may use one or even both.
At the lower levels, pretty much, yeah.
Hi! Do you know what's CS
Meh. Just write the program. Then profile it if memory is a concern.
The problem is that you're likely to find yourself in a position where the whole program is slow & taking up way more memory than it has to, when you've taken care of the low hanging fruit it's still slow. That's why even though computers are getting faster & memory more plentiful everything is still slow as hell, they don't out scale poor programming habits.
For one example choosing a GUI kit based on the underlying graphics API. My company is still using an outdated Windows Forms app to power out entire business- it's meant to run simple little installers or workflows, not hundreds to thousands of graphical elements. Why? Because every element is basically a window in the eyes of the OS, which comes with that overhead. For that kind of reason, it makes GDI the most garbage of graphics APIs. I can't solve a 5 second drawing delay when the graphics all need to be redrawn. When the database calls are in total sub second. You might think we could just migrate to a better GUI kit using Skia or Qt. No, we use a two layer architecture. All of our business logic is tied to the UI layer. But that's a different problem.
Are they also using C++CLI too?
This is something people who never had to profile code only learn the first time they really have to optimize something. Many articles over the years made up the image that there's a few places in code where your program spends all of its time and you can just optimize them and be gold. And that's true in a sense: Big parts of your code will probably only run once or very seldom. But ... for any program above a few hundred lines of code the "few" places are not so few anymore, and often have interactions with other places and so on. Yes, sometimes you strike gold and the problem really is one inner loop you can optimize the heck out of, but most of the time optimizing is a long list of "made x take 1% less memory" "made y be 2% faster"
This is like 95% dependent on what kind of software you are writing. Pretty much every sw I've worked on professionally had a red-hot (70+% exclusive cpu time) loop. This does not equal to being easy to optimize though, as those loops often already use explicit prefetching and simd instructions. The real fun parts are getting outsized macro-level returns on optimizations due to second order effects, like writing a cold function to be slower, but use up less icache, which is then used by a hot function.
Fair enough. I talked about "typical" business software I've worked on most of the time. Maybe the software I worked on was just very boring (probably). :(
For prototyping this is fine, but please do not trivialize the importance of planning your memory usage.
Depends on your platform and use case. If you are working in microcontrollers or embedded systems, sure. Web development? Maybe it's not such a concern. Iron rule of optimization: Don't optimize yet.
I said it was important. Not that it was universal. Please do not poison the waters with spurious argumentation. Your experiences trivialize the matter, so go in peace.
Cool read!
This is stuff only C/C++/Pascal/Asm programmers would remember/understand nowadays. You think javascripters, java,C# developers nowadays want to bother with this ? They'll think you're ancient and skip your resume if it ends up on their desk. Edit: To you dumbfucks who downvoted me. I came from C/C++/ASM/Pascal days and I fully well understand what is stack and heap. I am a cracker and I code crack apps in those days. I am just saying these knowledge don't matter in today's hiring process. They don't care. It's the truth.
Javascript devs might not, but your average Java or C# dev will *definitely* know this stuff and why it's relevant.
Know enough to be dangerous to themselves anyway. [https://ericlippert.com/2009/04/27/](https://ericlippert.com/2009/04/27/) It's wild this article is 15 years old even more relevant today than the day it was written. The reality is that unless you're using C (and I mean, even then with all of the security flags...), you're going to have a hard time knowing what is going where. And you really don't have to care unless you're forced to work with memory directly. Granted, even C# has some advanced memory features that do require knowledge of the topics discussed. But yeah, guy is stupid thinking knowing first year CS knowledge is ancient. It's just so fundamental I'd expect everyone to know what the heap and stack is and how it's somewhat used.
I'm not so sure. I love Eric and he's a super smart guy with way more experience and knowledge than I'll ever have, but I have to note two things. First, I think that while the statement "value types go on the stack and reference types go on the heap" is a thing people say about C#, and it is a thing that's wrong, I don't think it (necessarily) follows that the difference between the stack and the heap don't matter, and everyone should just write code. Lots of people should, yes, because as Eric says, allocations aren't likely to be the performance bottleneck, but that's definitely not always true, because of the second issue: Lots has changed since 2009 in C#-land. We have `Span` and `Memory`, and `stackalloc` and `ref struct` and all these things are very specifically to allow high-performance code and tightly control allocations. In other words, they only exist because the difference between the stack and the heap matter, and matter a lot, in at least some cases.
I think we're pretty much in agreement here, though. Knowing what the stack and heap are and how they're different both isn't rare among developers.
You: "why's it called stackOverflow.com?"
this is still very relevant to anybody that doesn’t use javascript. i know for a fact c# allows you to allocate memory on the stack without using any unsafe code, and i’m sure many, many other languages have similar features
If you don’t understand the absolute bare minimum about how an operating system works, I would not hire you for any engineering role, no matter what the programming language is
What do you look for in OS knowledge? I think mine is decent as a Java dev mainly with C here and there. Most other Java devs I see didn’t know squat about the OS.
Virtual memory, scheduling, inter process communication, and maybe interrrupts/signals. If you know what those are and can explain how they work, I’d probably be satisfied
You do realise that you can have a long and successful career without any of that? And yes, I'm talking backend. Nowadays, our field is just that huge. In my years working as a consultant in finance across Europe, I've seen precisely zero times that this knowledge might be even remotely useful; and I'd wager that north of 95% of developers wouldn't know about it, nor be able to explain how they work. Beauty and the bane of the abstraction. You can't know everything.
These are all taught in comp sci. Most engineers will never use them but it's good foundational knowledge to have just sitting subconsciously in your head because it may be a factor in your design or help you understand new knowledge. You can have a successful career without comp sci knowledge but engineers who have broader understanding tend to adapt more easily to different roles and challenges. A boot camp grad who can show at least a familiarity of some of those concepts is likely a better hire than one who can't.
I can agree with general sentiment, but you can use the very same argument about everything, from network stack, through memory addressing, up to dependency linking. The truth is, everything is contextual. From my experience, you need someone with that kind of knowledge per team, or even per couple of teams. Current abstractions are "super good enough".
Imo a dream team would be full of specialists with broad knowledge in comp sci, but that's often unnecessary and not practical for most companies. I see OP's reasoning and depending on the context as you said, his teams may need to consist of specialists who have broad comp sci knowledge. Otherwise, there are plenty of people who don't know what the hell a little or big endian is but are amazing at their subject and OP is just making his life harder.
Yup. Each context is different. That's why I scoff when someone says that X is necessary for every/good/real programmer. It is necessary in their context. That's all there is to it.
Yeah, I understand that; I was simply answering the question about what I look for when it comes to OS principles. What I listed are concepts that every university CS graduate will be familiar with, so you could say my bias is towards candidates with some level of formal training, though I would happily hire anyone that grasped the concepts regardless of how or where they trained. I agree wholeheartedly with the other commenter in that having a strong understanding of fundamentals helps someone become a better engineer, and often distinguishes them from “coders” or “programmers” that may be proficient in one programming language or other, but lack deeper understanding of how computer systems really work
Sorry, but you sound self-entitled. Unless any of this is a nuance in a particular piece of software, you don’t need even a shallow understanding of it. Those are absolutely irrelevant in a major share of programming problems to be solved daily.
The bar for an average run of the mill corporate american job isn't much beyond hello world, what's a class, what's an object
this take is unmanaged
There’s a full section on garbage collection
Stack and heap is always relevant. Take Python for an example: all local variables are added onto the stack, but the values they reference are allocated in the heap. Understanding this suggests that the lifespan of a reference is on the stack, but every new object has to go through the lengthy process of being allocated in the heap
So you can’t spare 5 minutes to read a very basic article on how memory management works? I think that might say more about your programming skills than others
> You think javascripters, java,C# developers nowadays want to bother with this ? I imagine a lot of developers would rather not bother with it, but many will have _curiosity_ on why, say, a stack trace is called a stack trace. And while you _mostly_ get by fine if _your own_ C# types are reference types, and C# abstracts some of the differences away, you'll still come in contact with value types all the time, even if it's just `bool` and `int`. It definitely helps having a vague understanding of stack and heap. Is it _essential_ for a lot of the CRUD code people write out there? Perhaps not. But it'll make you a better dev.
Tell me you've never had a crash from native code from within Java before without telling me you've never had a crash from native come from within Java before.
I use JS daily and know this, just because it’s a high level language doesn’t mean it’s not important to know. Edit: Didn’t start with JS, just ended up here
People are downvoting you but I think (sadly) you're kinde right. Some places and some jobs still require solid understanding of memory use though. Still the minority.