Gemini 2.5 Pro Answer

Excellent and very well-researched question. You've correctly identified a subtle but important discrepancy between documentation/comments and the actual implementation in modern glibc.

Here is a detailed breakdown of when and how trimming happens, addressing your specific points.

Short Answer

In modern glibc, the public function malloc_trim() is not called automatically from free().

However, an internal and more limited trimming function, systrim(), is called automatically from free() under very specific conditions: when a freed chunk is coalesced with the heap's top chunk, and the resulting free space at the top of the heap exceeds a threshold (M_TRIM_THRESHOLD).

This automatic behavior only trims the top of a single arena's heap and does not perform the more extensive cleanup (like using madvise) that the public malloc_trim() function does.

Detailed Analysis

1. The Modern `glibc` (ptmalloc2) Behavior

You are correct. A grep for __malloc_trim (the public function) shows it is not called by _int_free (the internal implementation of free). The comments you found are historical artifacts from an earlier design.

The automatic trimming mechanism works as follows:

When you call free(p), glibc's internal _int_free() is invoked.
_int_free will attempt to coalesce the newly freed chunk with any adjacent free chunks.
The Trigger Condition: If the freed chunk happens to be adjacent to the "top chunk" (the wilderness chunk at the end of a heap arena), they are merged.
After merging, _int_free checks if the new size of the top chunk is greater than the M_TRIM_THRESHOLD (tunable via mallopt, defaults to 128 KB).
If it is, _int_free calls the internal function systrim().

Here is the relevant code snippet from malloc/malloc.c in _int_free:

// from glibc malloc/malloc.c
// Inside _int_free, after coalescing with the top chunk...

else { // p is the former top chunk
    size_t top_size = chunksize(top);
    if ((unsigned long) (nb + top_size) > (unsigned long) mp_.sbrk_base + main_arena.max_system_mem) {
        // ... error condition ...
    } else {
        set_head(top, (top_size + nb) | PREV_INUSE);
        av->top = top;
        // Check if we should trim the heap
        if ((unsigned long) top_size > (unsigned long) mp_.trim_threshold)
            systrim(mp_.top_pad, &main_arena); // <-- THE AUTOMATIC CALL
    }
}

This systrim call is the "automatic trimming" mentioned in the comments.

2. The Difference Between `systrim` and `malloc_trim`

This is the most crucial part of the answer. They are very different functions.

systrim() (Internal function)

Scope: Operates on a single arena's heap.
Action: Only attempts to release memory from the top of that heap. For the main arena, it does this by calling sbrk() with a negative value. For thread arenas (which are mmap'd), it can shrink the mapping, but more commonly an entire unused arena is just munmap'd.
Limitations: It does not iterate over all arenas. It does not use madvise(...MADV_DONTNEED) to release pages from the middle of the heap (i.e., from large free chunks that are not at the top). It is a simple, opportunistic trim.

__malloc_trim() (Public function)

Scope: Is a global, heavyweight operation. It iterates over all active arenas (the main arena and all thread arenas).
Action: For each arena, it performs two main actions:
1. It calls systrim() to try and release memory from the top of that arena's heap.
2. It traverses the free lists (bins) of the arena. If it finds large free chunks that are not being used, it uses madvise(addr, len, MADV_DONTNEED) to tell the kernel that these memory pages can be released. This is the "hole-punching" feature you mentioned, and it's critical for reducing RSS in long-running applications with fragmentation.
Calling: It is only called when you explicitly invoke malloc_trim(0); in your code.

In summary: The automatic call in free() is a light trim of the heap's boundary. The manual malloc_trim() is a comprehensive cleanup of all heaps, including releasing unused pages from the middle.

3. Historical Context: `dlmalloc` and early `ptmalloc`

Your suspicion is correct. This behavior was different in the past.

In Doug Lea's original dlmalloc (upon which ptmalloc is based), the free() function did have the logic to perform a more comprehensive trim. Looking at the source for dlmalloc 2.7.0 (circa 2001), the mfree function contains this logic:

// dlmalloc 2.7.0
if ((size_t)(top_foot_size) > (size_t)(m->trim_threshold))
    trim(m, m->top_pad); // <-- This is the call

Here, trim() was the internal implementation of malloc_trim.

Why did it change in glibc's ptmalloc2?

The primary reason was the introduction of multiple arenas to improve performance in multi-threaded applications.

In the original dlmalloc model, there was typically only one heap (the "main arena"). Therefore, having free() trim this single heap made sense.
In ptmalloc2, free() is almost always operating on a thread-local arena. A call to a full malloc_trim from within free would be extremely inefficient and cause contention, as it would have to acquire locks on all other arenas to perform a global cleanup, just because one thread freed one large chunk.

The design was changed to a more sensible model for multi-threading:

Automatic: free() performs a fast, local, lock-free systrim on its own arena's top chunk if possible.
Manual: The programmer is given the powerful malloc_trim() function to call when they know it's a good time to do a global, more expensive cleanup (e.g., during a period of low activity).

The comments in malloc.c were likely not fully updated to reflect this significant design change, and they remain as a source of confusion.

Conclusion and Practical Advice for Your Daemon

When is malloc_trim called? Never automatically. Its internal counterpart, systrim, is called from free when a large chunk at the very top of an arena's heap is freed.
Was it different? Yes. In original dlmalloc, free could trigger a full trim.
Why was it changed? The introduction of multiple arenas in ptmalloc2 made the old model inefficient and prone to contention. The logic was refactored into a limited automatic trim and a powerful manual trim.
systrim vs malloc_trim? systrim is internal, local to one arena, and only trims the top. malloc_trim is public, global across all arenas, and also uses madvise to release unused pages from the middle of heaps.

For your long-running C++ daemon that sees RSS growth despite stable memory usage, you are on exactly the right track. The limited automatic systrim is not sufficient. This pattern is a classic symptom of memory fragmentation where large free blocks are "trapped" between allocated blocks in the middle of the heap.

Your best solution is to periodically call malloc_trim(0);. This will perform the comprehensive cleanup you need, using madvise to return those trapped, unused pages to the OS and reduce your daemon's RSS.

: Assessing Language Models on Unsolved Questions

UQ: Assessing Language Models on Unsolved Questions

Model Answer

Short Answer

Detailed Analysis

1. The Modern `glibc` (ptmalloc2) Behavior

2. The Difference Between `systrim` and `malloc_trim`

3. Historical Context: `dlmalloc` and early `ptmalloc`

Conclusion and Practical Advice for Your Daemon

Quick Feedback

Additional Information:

UQ Validation

Human Verification

Model Answer

Short Answer

Detailed Analysis

1. The Modern glibc (ptmalloc2) Behavior

2. The Difference Between systrim and malloc_trim

3. Historical Context: dlmalloc and early ptmalloc

Conclusion and Practical Advice for Your Daemon

Quick Feedback

Additional Information:

UQ Validation

Human Verification

1. The Modern `glibc` (ptmalloc2) Behavior

2. The Difference Between `systrim` and `malloc_trim`

3. Historical Context: `dlmalloc` and early `ptmalloc`