Learning C – Part 3 Stacks, Heaps, and Pointers

The terms Stack, Pointers, and Memory are some of the most difficult to master in the C programming language. For this reason, a majority of students and even long-time professionals struggle with this content. I advise anyone reviewing these topics take a great deal of care in mastering these topics to the best of their ability. With that out of the way, today we’re going to cover “Memory”. Memory is the backbone that allows modern applications to work from the lowest machine level to the programmer typing at their computer. We will cover three major topics of memory: Stacks, Heaps, and Pointers as well as the usage of these in actual C programs.

What is Memory?

We’ve said how important memory is, but we still haven’t defined memory. Starting at hardware, memory majorly (but not entirely) consists of RAM or Random Access Memory. These sticks of circuitry allow the computer to save data for some amount of time. Meaning that if the machine is turned off, the memory will go with it. This is highly beneficial for programmers as they can use this memory to store numbers, text, addresses, and much more as a program runs. This data can then be accessed at any point during the program’s execution so long as it isn’t freed or overwritten.

Memory is set during a process called “Allocation” where a program says it wants some memory of a certain size and the Operating System returns an address (location) to the new memory. When the programmer is done using the memory and wants to give it back to the Operating System, the memory is “Freed” and using the memory again can lead to a number of consequences we will explain in an “Attacks Explained – Use-After-Free (UAF)” article in the future. The address returned by the Operating System is similar to, but not exactly the address of the data on the physical RAM stick. These are called Physical Addresses and Virtual Addresses where a Physical Address is the location on the RAM stick itself that contains the data and the Virtual Address is the location of the data inside of the program.

The reasoning behind Physical Addresses and Virtual Addresses may seem strange, but there is a good reason for it! Every program that starts on a computer is given a “Virtual Address Space” by the Operating System. This makes the application think it has access to the entire memory space (and more) all to itself. It also allows the program to see data consistently next to each other even if two pieces of data are at completely different physical locations on a RAM stick. This also prevents one application from directly altering the memory of another application maliciously (there are ways to do this, but they are outside the scope of this article). All allocations done throughout this article will be inside of the program’s Virtual Memory.

What is the Stack?

An important concept to remember with memory is that all memory is temporary, but some memory is more temporary than others. Call stacks or Memory stacks are a good example of this as they are designed to be temporary and have their data overwritten when it is no longer required. When a function is called, just as we did in the last part, the variables we pass in have to go somewhere or else the new function wouldn’t be able to use them!

This is where the stack comes in, the variables are “pushed” onto the stack starting with the last variable and ending with the first. The stack, just as its name implies, can be thought of as stacking data. A programmer “pushes” something onto the stack just as someone would put a dinner plate on top of another dinner plate, and the programmer “pops” something off the stack just as someone would take the top dinner plate off of the stack of plates. Along with the variables passed into the function, the address where we return to is also put onto the stack so we know where to go to after the function call returns.

Every time data is pushed, the “stack pointer” is subtracted by the length of the largest possible address (4 bytes for a 32-bit application, 8 bytes for a 64-bit application) and the data is put into this new slot. Any data that might have been there before is now considered “garbage” and overwritten entirely. Along with the return value being pushed onto the stack, the original “stack base” is also pushed to stack and then overwritten to create a new base. All of this is part of a process called “creating the stack frame”. The base determines the stack of this frame’s data and what offset the function variables are from it.

A large amount of what we have just discussed is primarily background information, but it is always useful to understand how the stack work for a program under the hood. Primarily we want to discuss how the stack handles data from local variables. When creating a variable in C that is not allocated using something like malloc or calloc, that variable is part of the stack. In the picture below, our function has a variable z that holds the results of x and y added together. Its value will be saved once the function exits, but for now it must be stored on the stack as a temporary variable.

Because the variable is temporary, if its value is not saved once the function exits then the data may be destroyed at any time. This is a feature of the stack, not a bug or a disadvantage. This also means that any data that needs to stay alive through many function calls is best put onto the heap instead.

What is the Heap?

While the stack is fairly universal between Operating Systems, the heap is less-so. For this reason we will not be diving into the mechanics of the heap as deeply as we did for the stack. The primary idea with the heap is that it is a “heap” of data. Just as with a heap of anything, adding more to it means putting more things on top. For this reason, it is said the heap “grows up” and the stack “grows down”. The purpose of the heap in memory is to (more) permanently save and load data. Data that is allocated to the heap is not automatically freed and is the cause for almost every memory leak to exist. Unlike the stack, data on the heap needs to be explicitly freed after it is allocated. In C this is done with the free() function.

What is a Pointer?

To tie everything together, we can now cover “pointers”. A pointer is a fairly simple concept at its core, it is a data type that “points” to a place in memory. Pointers are just large numbers that are the maximum possible length for a data type on the system (4 bytes for 32-bit, 8 bytes for 64-bit). Because these numbers are so large, they are generally expressed in hexadecimal as shown in the picture below. This allows easy visualization of the lowest (0x00000000) and the largest (0xFFFFFFFF) 4 byte numbers. Do note that the pointer values shown below are entirely arbitrary and do not mean that a process would use those exact addresses for those sections of memory. The pointer value for processes is a Virtual Address as we mentioned earlier. It is possible to get a Physical Address pointer, but that is outside the scope of this lesson.

As we can see from the picture above, the various pointers point to different locations in memory. Some point to areas in memory that make up the program itself (instructions section), some point to the stack, and some point to the heap.

Using the stack and the heap

Now we can go over a good example demonstrating stack and heap in C.

#include <stdio.h>

int main()
{
char* stack_array1 = "Hello World";
int number_array[4];
number_array[0] = 1;
number_array[1] = 2;
number_array[2] = 3;
number_array[3] = 4;
char* heap_array = malloc(128);
sprintf(heap_array, "Hello World");
return 0;
}

In the given example, number_array and stack_array1 both are on the stack. The fixed string “Hello World” is baked into the program and its addressed is put onto stack as part of the main function running. Then the four numbers for the number array are put onto the stack as 4, 3, 2, 1 from highest address to lowest address. Then the call to malloc creates an area on the heap of size 128 bytes. The data “Hello World” is then written to that location in heap. Once main exits, stack_array1 and number_array will be destroyed if their data is not preserved somehow. Heap_array will stay in memory and will not be destroyed until the application closes with the given program.

Finally, for the topic of pointers, all of these are pointers! Number_array, stack_array1, and heap_array are all pointers to various locations in memory. Stack_array1 and number_array are both pointers to locations on the stack and heap_array is just a pointer to a location on the heap. A natural question to have now is “But heap_array and stack_array1 are both strings, not pointers”. Both of these statements are true as strings and arrays are both pointers in C. The major difference is how the data is treated as strings are just arrays of char(acter)s and arrays are just pointers to consecutive data.

Wrapping Up

Now we’ve gone over Stack, Heap, and Pointers in full for the C programming language. These concepts are difficult to master, but they serve as some of the most useful in the language. Understanding them is absolutely key to writing good C code which is why a majority of this article was spent on explaining the concept rather than the code. I hope this proved useful to those reading and I’ll see you next time.

Next >>