Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit bca67592 authored by Mel Gorman's avatar Mel Gorman Committed by Linus Torvalds
Browse files

mm, vmstat: remove zone and node double accounting by approximating retries

The number of LRU pages, dirty pages and writeback pages must be
accounted for on both zones and nodes because of the reclaim retry
logic, compaction retry logic and highmem calculations all depending on
per-zone stats.

Many lowmem allocations are immune from OOM kill due to a check in
__alloc_pages_may_oom for (ac->high_zoneidx < ZONE_NORMAL) since commit
03668b3c ("oom: avoid oom killer for lowmem allocations").  The
exception is costly high-order allocations or allocations that cannot
fail.  If the __alloc_pages_may_oom avoids OOM-kill for low-order lowmem
allocations then it would fall through to __alloc_pages_direct_compact.

This patch will blindly retry reclaim for zone-constrained allocations
in should_reclaim_retry up to MAX_RECLAIM_RETRIES.  This is not ideal
but without per-zone stats there are not many alternatives.  The impact
it that zone-constrained allocations may delay before considering the
OOM killer.

As there is no guarantee enough memory can ever be freed to satisfy
compaction, this patch avoids retrying compaction for zone-contrained
allocations.

In combination, that means that the per-node stats can be used when
deciding whether to continue reclaim using a rough approximation.  While
it is possible this will make the wrong decision on occasion, it will
not infinite loop as the number of reclaim attempts is capped by
MAX_RECLAIM_RETRIES.

The final step is calculating the number of dirtyable highmem pages.  As
those calculations only care about the global count of file pages in
highmem.  This patch uses a global counter used instead of per-zone
stats as it is sufficient.

In combination, this allows the per-zone LRU and dirty state counters to
be removed.

[mgorman@techsingularity.net: fix acct_highmem_file_pages()]
  Link: http://lkml.kernel.org/r/1468853426-12858-4-git-send-email-mgorman@techsingularity.netLink: http://lkml.kernel.org/r/1467970510-21195-35-git-send-email-mgorman@techsingularity.net


Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
Suggested by: Michal Hocko <mhocko@kernel.org>
Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
parent e2ecc8a7
Loading
Loading
Loading
Loading
+17 −3
Original line number Diff line number Diff line
@@ -4,6 +4,22 @@
#include <linux/huge_mm.h>
#include <linux/swap.h>

#ifdef CONFIG_HIGHMEM
extern atomic_t highmem_file_pages;

static inline void acct_highmem_file_pages(int zid, enum lru_list lru,
							int nr_pages)
{
	if (is_highmem_idx(zid) && is_file_lru(lru))
		atomic_add(nr_pages, &highmem_file_pages);
}
#else
static inline void acct_highmem_file_pages(int zid, enum lru_list lru,
							int nr_pages)
{
}
#endif

/**
 * page_is_file_cache - should the page be on a file LRU or anon LRU?
 * @page: the page to test
@@ -29,9 +45,7 @@ static __always_inline void __update_lru_size(struct lruvec *lruvec,
	struct pglist_data *pgdat = lruvec_pgdat(lruvec);

	__mod_node_page_state(pgdat, NR_LRU_BASE + lru, nr_pages);
	__mod_zone_page_state(&pgdat->node_zones[zid],
		NR_ZONE_LRU_BASE + !!is_file_lru(lru),
		nr_pages);
	acct_highmem_file_pages(zid, lru, nr_pages);
}

static __always_inline void update_lru_size(struct lruvec *lruvec,
+0 −4
Original line number Diff line number Diff line
@@ -110,10 +110,6 @@ struct zone_padding {
enum zone_stat_item {
	/* First 128 byte cacheline (assuming 64 bit words) */
	NR_FREE_PAGES,
	NR_ZONE_LRU_BASE, /* Used only for compaction and reclaim retry */
	NR_ZONE_LRU_ANON = NR_ZONE_LRU_BASE,
	NR_ZONE_LRU_FILE,
	NR_ZONE_WRITE_PENDING,	/* Count of dirty, writeback and unstable pages */
	NR_MLOCK,		/* mlock()ed pages found and moved off LRU */
	NR_SLAB_RECLAIMABLE,
	NR_SLAB_UNRECLAIMABLE,
+0 −1
Original line number Diff line number Diff line
@@ -307,7 +307,6 @@ extern void lru_cache_add_active_or_unevictable(struct page *page,
						struct vm_area_struct *vma);

/* linux/mm/vmscan.c */
extern unsigned long zone_reclaimable_pages(struct zone *zone);
extern unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat);
extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
					gfp_t gfp_mask, nodemask_t *mask);
+19 −1
Original line number Diff line number Diff line
@@ -1438,6 +1438,11 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
{
	struct zone *zone;
	struct zoneref *z;
	pg_data_t *last_pgdat = NULL;

	/* Do not retry compaction for zone-constrained allocations */
	if (ac->high_zoneidx < ZONE_NORMAL)
		return false;

	/*
	 * Make sure at least one zone would pass __compaction_suitable if we continue
@@ -1448,14 +1453,27 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
		unsigned long available;
		enum compact_result compact_result;

		if (last_pgdat == zone->zone_pgdat)
			continue;

		/*
		 * This over-estimates the number of pages available for
		 * reclaim/compaction but walking the LRU would take too
		 * long. The consequences are that compaction may retry
		 * longer than it should for a zone-constrained allocation
		 * request.
		 */
		last_pgdat = zone->zone_pgdat;
		available = pgdat_reclaimable_pages(zone->zone_pgdat) / order;

		/*
		 * Do not consider all the reclaimable memory because we do not
		 * want to trash just for a single high order allocation which
		 * is even not guaranteed to appear even if __compaction_suitable
		 * is happy about the watermark check.
		 */
		available = zone_reclaimable_pages(zone) / order;
		available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
		available = min(zone->managed_pages, available);
		compact_result = __compaction_suitable(zone, order, alloc_flags,
				ac_classzone_idx(ac), available);
		if (compact_result != COMPACT_SKIPPED &&
+0 −2
Original line number Diff line number Diff line
@@ -513,9 +513,7 @@ int migrate_page_move_mapping(struct address_space *mapping,
		}
		if (dirty && mapping_cap_account_dirty(mapping)) {
			__dec_node_state(oldzone->zone_pgdat, NR_FILE_DIRTY);
			__dec_zone_state(oldzone, NR_ZONE_WRITE_PENDING);
			__inc_node_state(newzone->zone_pgdat, NR_FILE_DIRTY);
			__inc_zone_state(newzone, NR_ZONE_WRITE_PENDING);
		}
	}
	local_irq_enable();
Loading