The performance of dynamic strings implementations

Let’s say you want to create a nice string abstract data type to use in your C code. You can end with a structure like this:

typedef struct mstring mstring;

struct mstring {
    int buflen;
    int len;
    char *buf;

mstring *mstring_new() {
    mstring *result;

    // insert memory allocation errors check here
    result = (mstring*)malloc(sizeof(struct mstring));
    result->len = 0;
    result->buflen = 128;
    result->buf = (char *)malloc(result->buflen);

    return result;

void mstring_append(mstring *dest, const char *src) {
  • the buflen field keeps track of the length of the allocated buffer;
  • the len field keeps track of the length of the string, so that you can know the string length in constant time;
  • the buf field keeps track of the memory address where the string is allocated. This field will be changed when you reallocate the string when the buffer is full.

It sound nice and you can also have manipulation functions that reallocate the string when necessary. They will be really useful! Let’s try:

mstring *m = mstring_new();

mstring_append(m, "Hi this is me! I'm ");
mstring_append(m, myNameHere);
printf("%s", m->buf);

mstring_delete(m); m=NULL;

This is nice indeed! But is really not that efficient from a memory point of view.
In your heap you will have:

  • The string descriptor (12 byte on a 32-bit architecture)
  • The string contents (variable length)

Your implementation of the holy tuple malloc and free must keep track of the string descriptor and of the strings contents, which can be (and will be) non-contiguous in memory.

You can do really better allocating the string descriptor directly before the string contents and having mstring_new return the address of the string contents instead of the string descriptor. Your strings will be C-compatible and more efficient.
Let’s try again:

typedef char *fstring;

struct fstring_header {
    int buflen;
    int len;

fstring fstring_new() {
    struct fstring_header *header;
    fstring result;

    header = (struct fstring_header*) malloc(128);
    header->buflen=128-sizeof(struct fstring_header);

    result = ((char*)header)+sizeof(struct fstring_header);

    return result;

But now your fstring_append function can’t simply change the buf field when a buffer reallocation is needed. You must to reallocate the entire string, descriptor included.

fstring* fstring_append(fstring *dest, const char *src) {

So your code will change like this:

fstring *m = fstring_new();

m = fstring_append(m, "Hi this is me! I'm ");
m = fstring_append(m, myNameHere);
printf("%s", m);

fstring_delete(m); m=NULL;

You may say that this is premature optimization but wait! You saved:

  • one malloc
  • one free

and now the string descriptor is also attached to the string, and your cache will be happy. Really more efficient.
In my tests this string implementation is two times more efficient that the previous.

Want to try a really good dynamic string implementation? Try SDS. This is used in Redis.
You can also try my C all-purpose library here: CommonLib.

The performance of dynamic strings implementations