Excellent and very well-researched question. You've correctly identified a subtle but important discrepancy between documentation/comments and the actual implementation in modern glibc.
Here is a detailed breakdown of when and how trimming happens, addressing your specific points.
In modern glibc, the public function malloc_trim() is not called automatically from free().
However, an internal and more limited trimming function, systrim(), is called automatically from free() under very specific conditions: when a freed chunk is coalesced with the heap's top chunk, and the resulting free space at the top of the heap exceeds a threshold (M_TRIM_THRESHOLD).
This automatic behavior only trims the top of a single arena's heap and does not perform the more extensive cleanup (like using madvise) that the public malloc_trim() function does.
glibc (ptmalloc2) BehaviorYou are correct. A grep for __malloc_trim (the public function) shows it is not called by _int_free (the internal implementation of free). The comments you found are historical artifacts from an earlier design.
The automatic trimming mechanism works as follows:
free(p), glibc's internal _int_free() is invoked._int_free will attempt to coalesce the newly freed chunk with any adjacent free chunks._int_free checks if the new size of the top chunk is greater than the M_TRIM_THRESHOLD (tunable via mallopt, defaults to 128 KB)._int_free calls the internal function systrim().Here is the relevant code snippet from malloc/malloc.c in _int_free:
// from glibc malloc/malloc.c
// Inside _int_free, after coalescing with the top chunk...
else { // p is the former top chunk
size_t top_size = chunksize(top);
if ((unsigned long) (nb + top_size) > (unsigned long) mp_.sbrk_base + main_arena.max_system_mem) {
// ... error condition ...
} else {
set_head(top, (top_size + nb) | PREV_INUSE);
av->top = top;
// Check if we should trim the heap
if ((unsigned long) top_size > (unsigned long) mp_.trim_threshold)
systrim(mp_.top_pad, &main_arena); // <-- THE AUTOMATIC CALL
}
}
This systrim call is the "automatic trimming" mentioned in the comments.
systrim and malloc_trimThis is the most crucial part of the answer. They are very different functions.
systrim() (Internal function)
sbrk() with a negative value. For thread arenas (which are mmap'd), it can shrink the mapping, but more commonly an entire unused arena is just munmap'd.madvise(...MADV_DONTNEED) to release pages from the middle of the heap (i.e., from large free chunks that are not at the top). It is a simple, opportunistic trim.__malloc_trim() (Public function)
systrim() to try and release memory from the top of that arena's heap.bins) of the arena. If it finds large free chunks that are not being used, it uses madvise(addr, len, MADV_DONTNEED) to tell the kernel that these memory pages can be released. This is the "hole-punching" feature you mentioned, and it's critical for reducing RSS in long-running applications with fragmentation.malloc_trim(0); in your code.In summary: The automatic call in free() is a light trim of the heap's boundary. The manual malloc_trim() is a comprehensive cleanup of all heaps, including releasing unused pages from the middle.
dlmalloc and early ptmallocYour suspicion is correct. This behavior was different in the past.
In Doug Lea's original dlmalloc (upon which ptmalloc is based), the free() function did have the logic to perform a more comprehensive trim. Looking at the source for dlmalloc 2.7.0 (circa 2001), the mfree function contains this logic:
// dlmalloc 2.7.0
if ((size_t)(top_foot_size) > (size_t)(m->trim_threshold))
trim(m, m->top_pad); // <-- This is the call
Here, trim() was the internal implementation of malloc_trim.
Why did it change in glibc's ptmalloc2?
The primary reason was the introduction of multiple arenas to improve performance in multi-threaded applications.
dlmalloc model, there was typically only one heap (the "main arena"). Therefore, having free() trim this single heap made sense.ptmalloc2, free() is almost always operating on a thread-local arena. A call to a full malloc_trim from within free would be extremely inefficient and cause contention, as it would have to acquire locks on all other arenas to perform a global cleanup, just because one thread freed one large chunk.The design was changed to a more sensible model for multi-threading:
free() performs a fast, local, lock-free systrim on its own arena's top chunk if possible.malloc_trim() function to call when they know it's a good time to do a global, more expensive cleanup (e.g., during a period of low activity).The comments in malloc.c were likely not fully updated to reflect this significant design change, and they remain as a source of confusion.
malloc_trim called? Never automatically. Its internal counterpart, systrim, is called from free when a large chunk at the very top of an arena's heap is freed.dlmalloc, free could trigger a full trim.systrim vs malloc_trim? systrim is internal, local to one arena, and only trims the top. malloc_trim is public, global across all arenas, and also uses madvise to release unused pages from the middle of heaps.For your long-running C++ daemon that sees RSS growth despite stable memory usage, you are on exactly the right track. The limited automatic systrim is not sufficient. This pattern is a classic symptom of memory fragmentation where large free blocks are "trapped" between allocated blocks in the middle of the heap.
Your best solution is to periodically call malloc_trim(0);. This will perform the comprehensive cleanup you need, using madvise to return those trapped, unused pages to the OS and reduce your daemon's RSS.
Was this answer helpful?
version: June 2025 release
Status: UQ Validated
Validated: 8 months ago
Status: Needs Human Verification
Loading model reviews...
Loading reviews...