In student projects I often see malloc/new littered throughout their C/C++ code. Any time memory needs to be allocated for a non-native type or an array, they allocate memory from the heap. These are just course projects, but I feel that it shows a lack of understanding of memory management in C/C++. Especially in students that learn these as first languages, I’m certain that they cast aside the details in favor of whatever works at the time. Further, since there is nothing semantically wrong with using malloc at every possible instance, this behavior is never corrected by faculty. In this post I try to explain what malloc is, what it does, and and good memory management practices with and without malloc.

Memory Allocator

malloc is short for memory allocator. It is a function that is typically included as part of the standard C library on most systems. Specifically, malloc allocates memory in the process heap. There are other memory allocators for different use cases and regions of memory.

The role of a memory allocator is to allocate dynamic memory for a process. In this context, dynamic means during runtime. This is arguably the most important point about malloc- it should be used for allocating quantities of memory at runtime that we are not able to allocate prior to runtime. The second key point is that it should only be used to allocate memory that needs to be on the heap. Let’s break down these two points.

The Stack at Compile Time

The stack is the ideal place to perform most memory transactions. The process stack is memory that is allocated to the process at process creation time. This means that we can already use it without having to allocate memory. Further, stack management is handled for us by the compiler. At compile time, the compiler associates some information with a function about that function’s “stack frame.” The stack frame is primarily the amount of memory needed for one invocation of that function. The compiler generates code for that function to manage its stack frame. As long as the compiler can infer the amount of memory required by a stack frame, it can do all of this basically for free.

#define ARRAY_SIZE 400

// here's some code

void foo()
{
    int array[ARRAY_SIZE];
    //do stuff
}

With just those lines of code, we have placed 400 integers on the stack with no concern over memory management - the space is already allocated and the compiler handles the setup and cleanup. To clarify, setup and cleanup mostly involves handling the stack pointer, ensuring that if we call functions from within this function that we do not overwrite data on the stack that we still need.

The Stack at Runtime

So, what if the compiler cannot infer the size of the stack frame at runtime? We need dynamic memory now - do we use the heap?

Nope. Since C99, C has supported variable-length arrays. See this code example:

#include <stdlib.h>
#include <time.h>
void foo(size_t length)
{
    int array[length];
    //do stuff
}

int main(void)
{
    srand(time(0));
    // send foo an integer between 0 and 999
    foo(rand() % 1000);
    return 0;
}

Prior to C99, this code would probably get you a compile-time error - the compiler can not figure out how big to make the stack frame. How is C99 able to figure out the size of the stack frame? It cannot. We need dynamic memory here. However, we do not need heap memory yet. For this problem, C99 is using syntactic sugar to hide effectively just a call to alloca. alloca is another memory allocator - but this one allocates memory on the stack! alloca does not grow the stack - the stack has fixed size. alloca effectively just grows the current stack frame. C99 supports the magic, but you can use alloca yourself, if you are careful not to smash the stack. This is ideal because we are still using memory already allocated to us, and it is still automatically freed with the stack frame.

A Heap of Trouble

So far we have seen that we can get a lot of flexibility out of the stack. There are cases where we need the heap, though. Let’s look at this code:

#include <stdlib.h>
#include <time.h>
void foo(size_t length)
{
    int array[length];
    //do stuff
}

int main(void)
{
    srand(time(0));
    // send foo an integer between 0 and RAND_MAX
    foo(rand());
    return 0;
}

The above code is dangerous because of the worst-case guarantee on RAND_MAX. On my machine, RAND_MAX is defined to be the same as INT_MAX \(== \frac{2^{32}}{2} - 1\). This gives us a worst case of around ~2GB, far more than the typical stack allocation. Now we finally need memory from the heap. The proper implementation of foo here is below:

void foo(size_t length)
{
    int* array = malloc(length);
    if (!malloc)
    {
        printf("Couldn't allocate memory!");
        exit(ERR_OOM);
    }
    //do stuff
    free(array);
}

Whoa, shouldn’t this be a one-liner? Student code almost never contains the check for a bad malloc return, but if for some reason malloc is unable to allocate the requested memory, then it fails and returns a null pointer. In this situation, the behavior of the following code is undefined and we should exit. If the first element accessed is a null pointer, then we may be saved by the operating system’s page guard on address 0x0, if it has one. However, we asked for 2GB of memory. The code could access array[MAX_INT - 1] first - who knows where this virtual address is and what damage we could cause. Finally, we have to free the memory, because there is no way for the memory allocator to know when we are done with it. It is true that most modern operating systems will reclaim memory that is not freed by the process explicitly once the application has finished. This is bad practice, however, and can leave the program consuming unnecessary amounts of memory for a long time. If you forget to free within a daemon that runs 24/7, who knows when that memory will finally be freed?

Why Does It Matter?

So we have looked at the extra burden malloc places on the programmer. Perhaps, though you should, you do not care about the free and the error checking. Why not just use malloc anyway?

malloc is not free. I emphasized in the first part that the stack is already allocated to the process when it is created. Heap space is different. malloc hides a lot of the complexity of managing a heap from the caller. Heap space must first be allocated to the calling process. This a kernel-level procedure that involves mapping physical memory pages into a process’ virtual address and tracking the associated physical memory within the process. Since this has to take place through a syscall, it can be quite expensive. It is especially expensive to do it multiple times. Therefore, heap memory allocators, like malloc, may allocate a large section of memory to a process the first time, and then return to the process a pointer to a chunk of that memory. With multiple calls, it returns more chunks of the original memory. This requires the memory allocator to keep track of the chunks. Therefore, the memory allocator usually writes headers into the memory space just before the pointer that it returns. It tracks each of these headers by containing them in (usually) a doubly linked list. That sounds a little complicated just to get some memory - and it is. Here is the memory allocator presented by Kernighan and Ritchie in The C Programming Language, 2nd edition, Section 8.7:

typedef long Align;

union header {
  struct {
    union header *ptr;
    uint size;
  } s;
  Align x;
};

typedef union header Header;

static Header base; /* empty list to get started */
static Header *freep; /* start of free list */

/* free: put block ap in free list */
void
free(void *ap)
{
  Header *bp, *p;

  bp = (Header*)ap - 1; /* point to block header */
  for(p = freep; !(bp > p && bp < p->s.ptr); p = p->s.ptr)
    if(p >= p->s.ptr && (bp > p || bp < p->s.ptr))
      break;
  if(bp + bp->s.size == p->s.ptr){ /* join to upper nbr */
    bp->s.size += p->s.ptr->s.size;
    bp->s.ptr = p->s.ptr->s.ptr;
  } else
    bp->s.ptr = p->s.ptr;
  if(p + p->s.size == bp){ /* join to lower nbr */
    p->s.size += bp->s.size;
    p->s.ptr = bp->s.ptr;
  } else
    p->s.ptr = bp;
  freep = p;
}

#define NALLOC 1024
static Header*
morecore(uint nu)
{
  char *p;
  Header *hp;

  if(nu < NALLOC)
    nu = NALLOC;
  p = sbrk(nu * sizeof(Header));
  if(p == (char*)-1) /* no space at all */
    return 0;
  hp = (Header*)p;
  hp->s.size = nu;
  free((void*)(hp + 1));
  return freep;
}

void*
malloc(uint nbytes)
{
  Header *p, *prevp; 
  uint nunits;

  nunits = (nbytes + sizeof(Header) - 1)/sizeof(Header) + 1;
  if((prevp = freep) == 0){ /* no free list yet */
    base.s.ptr = freep = prevp = &base;
    base.s.size = 0;
  }
  for(p = prevp->s.ptr; ; prevp = p, p = p->s.ptr){
    if(p->s.size >= nunits){ /* big enough */
      if(p->s.size == nunits) /* exactly */
        prevp->s.ptr = p->s.ptr;
      else { /* allocate tail end */
        p->s.size -= nunits;
        p += p->s.size;
        p->s.size = nunits;
      }
      freep = prevp;
      return (void*)(p + 1);
    }
    if(p == freep) /* wrapped around free list */
      if((p = morecore(nunits)) == 0)
        return 0; /* none left */
  }
}

Suffice to say that the modern stdlib malloc is far more complex than the version listed here. Calling malloc once invokes a syscall (sbrk) to allocate pages, and sets up the entire infrastructure for tracking alloc’d memory - you use your memory than you request because you must track the allocated memory. Systemcalls, of course, must be trapped and shift the CPU into kernel mode (this involves a hardware interrupt and can be costly). This is a large amount of overhead for something that would fit on the stack. Making multiple malloc calls is even more egregious - forcing the memory allocator to make repeated system calls to the kernel. If you are going to use malloc, you should try to limit it to as few calls as possible. Repetitive syscalls and reallocation of memory can be incredibly slow, and we can optimize it by just allocating it all at once and splitting it up into chunks ourselves. Repetitive calls may also lead to use receiving more non-sequential pages, creating poor locality. We also live in a Meltdown/Spectre world, where syscalls (sbrk) on most modern systems can cause the cache to be flushed, further diminishing our ability to leverage locality. The performance impacts of malloc will not cripple modern hardware, but could have some impact on smaller, older, or embedded systems. It can also just be bad practice, and lead to overcomplicated code along with the other issues that we have discussed.

I should mention that Valgrind is a wonderful tool for checking memory issues. It is able to track and identify unfreed memory and other memory leaks, give you a count of your memory references, and evaluate memory errors in your code. It is definitely worth learning to use, and is fairly simply to use as well. It also has other cool utilies, such as Helgrind for debugging race conditions. Try it out!

To sum everything up, malloc is usually overkill. It leads us to have a higher memory footprint and do more work to complete the same job and take extra measures to ensure memory security. There are times when we need malloc, primarily when data will not fit on the stack, but we should at least try to minimize the amount of times that we use it. Finally, malloc can lead us to undefined behavior and memory leaks if we are not careful with how we use it. Use malloc responsibly!