Donate to e Foundation | Murena handsets with /e/OS | Own a part of Murena! Learn more

Commit 9d9420f1 authored by Linus Torvalds's avatar Linus Torvalds
Browse files

Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf updates from Ingo Molnar:
 "Kernel side updates:

   - Fix and enhance poll support (Jiri Olsa)

   - Re-enable inheritance optimization (Jiri Olsa)

   - Enhance Intel memory events support (Stephane Eranian)

   - Refactor the Intel uncore driver to be more maintainable (Zheng
     Yan)

   - Enhance and fix Intel CPU and uncore PMU drivers (Peter Zijlstra,
     Andi Kleen)

   - [ plus various smaller fixes/cleanups ]

  User visible tooling updates:

   - Add +field argument support for --field option, so that one can add
     fields to the default list of fields to show, ie now one can just
     do:

         perf report --fields +pid

     And the pid will appear in addition to the default fields (Jiri
     Olsa)

   - Add +field argument support for --sort option (Jiri Olsa)

   - Honour -w in the report tools (report, top), allowing to specify
     the widths for the histogram entries columns (Namhyung Kim)

   - Properly show submicrosecond times in 'perf kvm stat' (Christian
     Borntraeger)

   - Add beautifier for mremap flags param in 'trace' (Alex Snast)

   - perf script: Allow callchains if any event samples them

   - Don't truncate Intel style addresses in 'annotate' (Alex Converse)

   - Allow profiling when kptr_restrict == 1 for non root users, kernel
     samples will just remain unresolved (Andi Kleen)

   - Allow configuring default options for callchains in config file
     (Namhyung Kim)

   - Support operations for shared futexes.  (Davidlohr Bueso)

   - "perf kvm stat report" improvements by Alexander Yarygin:
       -  Save pid string in opts.target.pid
       -  Enable the target.system_wide flag
       -  Unify the title bar output

   - [ plus lots of other fixes and small improvements.  ]

  Tooling infrastructure changes:

   - Refactor unit and scale function parameters for PMU parsing
     routines (Matt Fleming)

   - Improve DSO long names lookup with rbtree, resulting in great
     speedup for workloads with lots of DSOs (Waiman Long)

   - We were not handling POLLHUP notifications for event file
     descriptors

     Fix it by filtering entries in the events file descriptor array
     after poll() returns, refcounting mmaps so that when the last fd
     pointing to a perf mmap goes away we do the unmap (Arnaldo Carvalho
     de Melo)

   - Intel PT prep work, from Adrian Hunter, including:
       - Let a user specify a PMU event without any config terms
       - Add perf-with-kcore script
       - Let default config be defined for a PMU
       - Add perf_pmu__scan_file()
       - Add a 'perf test' for tracking with sched_switch
       - Add 'flush' callback to scripting API

   - Use ring buffer consume method to look like other tools (Arnaldo
     Carvalho de Melo)

   - hists browser (used in top and report) refactorings, getting rid of
     unused variables and reducing source code size by handling similar
     cases in a fewer functions (Namhyung Kim).

   - Replace thread unsafe strerror() with strerror_r() accross the
     whole tools/perf/ tree (Masami Hiramatsu)

   - Rename ordered_samples to ordered_events and allow setting a queue
     size for ordering events (Jiri Olsa)

   - [ plus lots of fixes, cleanups and other improvements ]"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (198 commits)
  perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment
  perf/x86/intel/uncore: Fix minor race in box set up
  perf record: Fix error message for --filter option not coming after tracepoint
  perf tools: Fix build breakage on arm64 targets
  perf symbols: Improve DSO long names lookup speed with rbtree
  perf symbols: Encapsulate dsos list head into struct dsos
  perf bench futex: Sanitize -q option in requeue
  perf bench futex: Support operations for shared futexes
  perf trace: Fix mmap return address truncation to 32-bit
  perf tools: Refactor unit and scale function parameters
  perf tools: Fix line number in the config file error message
  perf tools: Convert {record,top}.call-graph option to call-graph.record-mode
  perf tools: Introduce perf_callchain_config()
  perf callchain: Move some parser functions to callchain.c
  perf tools: Move callchain config from record_opts to callchain_param
  perf hists browser: Fix callchain print bug on TUI
  perf tools: Use ACCESS_ONCE() instead of volatile cast
  perf tools: Modify error code for when perf_session__new() fails
  perf tools: Fix perf record as non root with kptr_restrict == 1
  perf stat: Fix --per-core on multi socket systems
  ...
parents 6d5f0ebf cc6cd47e
Loading
Loading
Loading
Loading
+8 −0
Original line number Diff line number Diff line
@@ -51,6 +51,14 @@
	 ARCH_PERFMON_EVENTSEL_EDGE  |	\
	 ARCH_PERFMON_EVENTSEL_INV   |	\
	 ARCH_PERFMON_EVENTSEL_CMASK)
#define X86_ALL_EVENT_FLAGS  			\
	(ARCH_PERFMON_EVENTSEL_EDGE |  		\
	 ARCH_PERFMON_EVENTSEL_INV | 		\
	 ARCH_PERFMON_EVENTSEL_CMASK | 		\
	 ARCH_PERFMON_EVENTSEL_ANY | 		\
	 ARCH_PERFMON_EVENTSEL_PIN_CONTROL | 	\
	 HSW_IN_TX | 				\
	 HSW_IN_TX_CHECKPOINTED)
#define AMD64_RAW_EVENT_MASK		\
	(X86_RAW_EVENT_MASK          |  \
	 AMD64_EVENTSEL_EVENT)
+3 −1
Original line number Diff line number Diff line
@@ -39,7 +39,9 @@ obj-$(CONFIG_CPU_SUP_AMD) += perf_event_amd_iommu.o
endif
obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_knc.o perf_event_p4.o
obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore.o perf_event_intel_rapl.o
obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore.o perf_event_intel_uncore_snb.o
obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore_snbep.o perf_event_intel_uncore_nhmex.o
obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_rapl.o
endif


+12 −2
Original line number Diff line number Diff line
@@ -243,7 +243,8 @@ static bool check_hw_exists(void)

msr_fail:
	printk(KERN_CONT "Broken PMU hardware detected, using software events only.\n");
	printk(KERN_ERR "Failed to access perfctr msr (MSR %x is %Lx)\n", reg, val_new);
	printk(boot_cpu_has(X86_FEATURE_HYPERVISOR) ? KERN_INFO : KERN_ERR
	       "Failed to access perfctr msr (MSR %x is %Lx)\n", reg, val_new);

	return false;
}
@@ -387,7 +388,7 @@ int x86_pmu_hw_config(struct perf_event *event)
			precise++;

			/* Support for IP fixup */
			if (x86_pmu.lbr_nr)
			if (x86_pmu.lbr_nr || x86_pmu.intel_cap.pebs_format >= 2)
				precise++;
		}

@@ -443,6 +444,12 @@ int x86_pmu_hw_config(struct perf_event *event)
	if (event->attr.type == PERF_TYPE_RAW)
		event->hw.config |= event->attr.config & X86_RAW_EVENT_MASK;

	if (event->attr.sample_period && x86_pmu.limit_period) {
		if (x86_pmu.limit_period(event, event->attr.sample_period) >
				event->attr.sample_period)
			return -EINVAL;
	}

	return x86_setup_perfctr(event);
}

@@ -980,6 +987,9 @@ int x86_perf_event_set_period(struct perf_event *event)
	if (left > x86_pmu.max_period)
		left = x86_pmu.max_period;

	if (x86_pmu.limit_period)
		left = x86_pmu.limit_period(event, left);

	per_cpu(pmc_prev_left[idx], smp_processor_id()) = left;

	/*
+43 −6
Original line number Diff line number Diff line
@@ -67,8 +67,10 @@ struct event_constraint {
 */
#define PERF_X86_EVENT_PEBS_LDLAT	0x1 /* ld+ldlat data address sampling */
#define PERF_X86_EVENT_PEBS_ST		0x2 /* st data address sampling */
#define PERF_X86_EVENT_PEBS_ST_HSW	0x4 /* haswell style st data sampling */
#define PERF_X86_EVENT_PEBS_ST_HSW	0x4 /* haswell style datala, store */
#define PERF_X86_EVENT_COMMITTED	0x8 /* event passed commit_txn */
#define PERF_X86_EVENT_PEBS_LD_HSW	0x10 /* haswell style datala, load */
#define PERF_X86_EVENT_PEBS_NA_HSW	0x20 /* haswell style datala, unknown */

struct amd_nb {
	int nb_id;  /* NorthBridge id */
@@ -252,18 +254,52 @@ struct cpu_hw_events {
	EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK)

#define INTEL_PLD_CONSTRAINT(c, n)	\
	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
			   HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LDLAT)

#define INTEL_PST_CONSTRAINT(c, n)	\
	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_ST)

/* DataLA version of store sampling without extra enable bit. */
#define INTEL_PST_HSW_CONSTRAINT(c, n)	\
	__EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK, \
/* Event constraint, but match on all event flags too. */
#define INTEL_FLAGS_EVENT_CONSTRAINT(c, n) \
	EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS)

/* Check only flags, but allow all event/umask */
#define INTEL_ALL_EVENT_CONSTRAINT(code, n)	\
	EVENT_CONSTRAINT(code, n, X86_ALL_EVENT_FLAGS)

/* Check flags and event code, and set the HSW store flag */
#define INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_ST(code, n) \
	__EVENT_CONSTRAINT(code, n, 			\
			  ARCH_PERFMON_EVENTSEL_EVENT|X86_ALL_EVENT_FLAGS, \
			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_ST_HSW)

/* Check flags and event code, and set the HSW load flag */
#define INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD(code, n) \
	__EVENT_CONSTRAINT(code, n, 			\
			  ARCH_PERFMON_EVENTSEL_EVENT|X86_ALL_EVENT_FLAGS, \
			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LD_HSW)

/* Check flags and event code/umask, and set the HSW store flag */
#define INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(code, n) \
	__EVENT_CONSTRAINT(code, n, 			\
			  INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_ST_HSW)

/* Check flags and event code/umask, and set the HSW load flag */
#define INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(code, n) \
	__EVENT_CONSTRAINT(code, n, 			\
			  INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS, \
			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LD_HSW)

/* Check flags and event code/umask, and set the HSW N/A flag */
#define INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_NA(code, n) \
	__EVENT_CONSTRAINT(code, n, 			\
			  INTEL_ARCH_EVENT_MASK|INTEL_ARCH_EVENT_MASK, \
			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_NA_HSW)


/*
 * We define the end marker as having a weight of -1
 * to enable blacklisting of events using a counter bitmask
@@ -409,6 +445,7 @@ struct x86_pmu {
	struct x86_pmu_quirk *quirks;
	int		perfctr_second_write;
	bool		late_ack;
	unsigned	(*limit_period)(struct perf_event *event, unsigned l);

	/*
	 * sysfs attrs
+199 −30
Original line number Diff line number Diff line
@@ -220,6 +220,15 @@ static struct event_constraint intel_hsw_event_constraints[] = {
	EVENT_CONSTRAINT_END
};

static struct event_constraint intel_bdw_event_constraints[] = {
	FIXED_EVENT_CONSTRAINT(0x00c0, 0),	/* INST_RETIRED.ANY */
	FIXED_EVENT_CONSTRAINT(0x003c, 1),	/* CPU_CLK_UNHALTED.CORE */
	FIXED_EVENT_CONSTRAINT(0x0300, 2),	/* CPU_CLK_UNHALTED.REF */
	INTEL_UEVENT_CONSTRAINT(0x148, 0x4),	/* L1D_PEND_MISS.PENDING */
	INTEL_EVENT_CONSTRAINT(0xa3, 0x4),	/* CYCLE_ACTIVITY.* */
	EVENT_CONSTRAINT_END
};

static u64 intel_pmu_event_map(int hw_event)
{
	return intel_perfmon_event_map[hw_event];
@@ -415,6 +424,126 @@ static __initconst const u64 snb_hw_cache_event_ids

};

static __initconst const u64 hsw_hw_cache_event_ids
				[PERF_COUNT_HW_CACHE_MAX]
				[PERF_COUNT_HW_CACHE_OP_MAX]
				[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
 [ C(L1D ) ] = {
	[ C(OP_READ) ] = {
		[ C(RESULT_ACCESS) ] = 0x81d0, 	/* MEM_UOPS_RETIRED.ALL_LOADS */
		[ C(RESULT_MISS)   ] = 0x151, 	/* L1D.REPLACEMENT */
	},
	[ C(OP_WRITE) ] = {
		[ C(RESULT_ACCESS) ] = 0x82d0, 	/* MEM_UOPS_RETIRED.ALL_STORES */
		[ C(RESULT_MISS)   ] = 0x0,
	},
	[ C(OP_PREFETCH) ] = {
		[ C(RESULT_ACCESS) ] = 0x0,
		[ C(RESULT_MISS)   ] = 0x0,
	},
 },
 [ C(L1I ) ] = {
	[ C(OP_READ) ] = {
		[ C(RESULT_ACCESS) ] = 0x0,
		[ C(RESULT_MISS)   ] = 0x280, 	/* ICACHE.MISSES */
	},
	[ C(OP_WRITE) ] = {
		[ C(RESULT_ACCESS) ] = -1,
		[ C(RESULT_MISS)   ] = -1,
	},
	[ C(OP_PREFETCH) ] = {
		[ C(RESULT_ACCESS) ] = 0x0,
		[ C(RESULT_MISS)   ] = 0x0,
	},
 },
 [ C(LL  ) ] = {
	[ C(OP_READ) ] = {
		/* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD */
		[ C(RESULT_ACCESS) ] = 0x1b7,
		/* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD|SUPPLIER_NONE|
                   L3_MISS|ANY_SNOOP */
		[ C(RESULT_MISS)   ] = 0x1b7,
	},
	[ C(OP_WRITE) ] = {
		[ C(RESULT_ACCESS) ] = 0x1b7, 	/* OFFCORE_RESPONSE:ALL_RFO */
		/* OFFCORE_RESPONSE:ALL_RFO|SUPPLIER_NONE|L3_MISS|ANY_SNOOP */
		[ C(RESULT_MISS)   ] = 0x1b7,
	},
	[ C(OP_PREFETCH) ] = {
		[ C(RESULT_ACCESS) ] = 0x0,
		[ C(RESULT_MISS)   ] = 0x0,
	},
 },
 [ C(DTLB) ] = {
	[ C(OP_READ) ] = {
		[ C(RESULT_ACCESS) ] = 0x81d0, 	/* MEM_UOPS_RETIRED.ALL_LOADS */
		[ C(RESULT_MISS)   ] = 0x108, 	/* DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK */
	},
	[ C(OP_WRITE) ] = {
		[ C(RESULT_ACCESS) ] = 0x82d0, 	/* MEM_UOPS_RETIRED.ALL_STORES */
		[ C(RESULT_MISS)   ] = 0x149, 	/* DTLB_STORE_MISSES.MISS_CAUSES_A_WALK */
	},
	[ C(OP_PREFETCH) ] = {
		[ C(RESULT_ACCESS) ] = 0x0,
		[ C(RESULT_MISS)   ] = 0x0,
	},
 },
 [ C(ITLB) ] = {
	[ C(OP_READ) ] = {
		[ C(RESULT_ACCESS) ] = 0x6085, 	/* ITLB_MISSES.STLB_HIT */
		[ C(RESULT_MISS)   ] = 0x185, 	/* ITLB_MISSES.MISS_CAUSES_A_WALK */
	},
	[ C(OP_WRITE) ] = {
		[ C(RESULT_ACCESS) ] = -1,
		[ C(RESULT_MISS)   ] = -1,
	},
	[ C(OP_PREFETCH) ] = {
		[ C(RESULT_ACCESS) ] = -1,
		[ C(RESULT_MISS)   ] = -1,
	},
 },
 [ C(BPU ) ] = {
	[ C(OP_READ) ] = {
		[ C(RESULT_ACCESS) ] = 0xc4, 	/* BR_INST_RETIRED.ALL_BRANCHES */
		[ C(RESULT_MISS)   ] = 0xc5, 	/* BR_MISP_RETIRED.ALL_BRANCHES */
	},
	[ C(OP_WRITE) ] = {
		[ C(RESULT_ACCESS) ] = -1,
		[ C(RESULT_MISS)   ] = -1,
	},
	[ C(OP_PREFETCH) ] = {
		[ C(RESULT_ACCESS) ] = -1,
		[ C(RESULT_MISS)   ] = -1,
	},
 },
};

static __initconst const u64 hsw_hw_cache_extra_regs
				[PERF_COUNT_HW_CACHE_MAX]
				[PERF_COUNT_HW_CACHE_OP_MAX]
				[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
 [ C(LL  ) ] = {
	[ C(OP_READ) ] = {
		/* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD */
		[ C(RESULT_ACCESS) ] = 0x2d5,
		/* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD|SUPPLIER_NONE|
                   L3_MISS|ANY_SNOOP */
		[ C(RESULT_MISS)   ] = 0x3fbc0202d5ull,
	},
	[ C(OP_WRITE) ] = {
		[ C(RESULT_ACCESS) ] = 0x122, 	/* OFFCORE_RESPONSE:ALL_RFO */
		/* OFFCORE_RESPONSE:ALL_RFO|SUPPLIER_NONE|L3_MISS|ANY_SNOOP */
		[ C(RESULT_MISS)   ] = 0x3fbc020122ull,
	},
	[ C(OP_PREFETCH) ] = {
		[ C(RESULT_ACCESS) ] = 0x0,
		[ C(RESULT_MISS)   ] = 0x0,
	},
 },
};

static __initconst const u64 westmere_hw_cache_event_ids
				[PERF_COUNT_HW_CACHE_MAX]
				[PERF_COUNT_HW_CACHE_OP_MAX]
@@ -1905,6 +2034,24 @@ hsw_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event)
	return c;
}

/*
 * Broadwell:
 * The INST_RETIRED.ALL period always needs to have lowest
 * 6bits cleared (BDM57). It shall not use a period smaller
 * than 100 (BDM11). We combine the two to enforce
 * a min-period of 128.
 */
static unsigned bdw_limit_period(struct perf_event *event, unsigned left)
{
	if ((event->hw.config & INTEL_ARCH_EVENT_MASK) ==
			X86_CONFIG(.event=0xc0, .umask=0x01)) {
		if (left < 128)
			left = 128;
		left &= ~0x3fu;
	}
	return left;
}

PMU_FORMAT_ATTR(event,	"config:0-7"	);
PMU_FORMAT_ATTR(umask,	"config:8-15"	);
PMU_FORMAT_ATTR(edge,	"config:18"	);
@@ -2367,15 +2514,15 @@ __init int intel_pmu_init(void)
	 * Install the hw-cache-events table:
	 */
	switch (boot_cpu_data.x86_model) {
	case 14: /* 65 nm core solo/duo, "Yonah" */
	case 14: /* 65nm Core "Yonah" */
		pr_cont("Core events, ");
		break;

	case 15: /* original 65 nm celeron/pentium/core2/xeon, "Merom"/"Conroe" */
	case 15: /* 65nm Core2 "Merom"          */
		x86_add_quirk(intel_clovertown_quirk);
	case 22: /* single-core 65 nm celeron/core2solo "Merom-L"/"Conroe-L" */
	case 23: /* current 45 nm celeron/core2/xeon "Penryn"/"Wolfdale" */
	case 29: /* six-core 45 nm xeon "Dunnington" */
	case 22: /* 65nm Core2 "Merom-L"        */
	case 23: /* 45nm Core2 "Penryn"         */
	case 29: /* 45nm Core2 "Dunnington (MP) */
		memcpy(hw_cache_event_ids, core2_hw_cache_event_ids,
		       sizeof(hw_cache_event_ids));

@@ -2386,9 +2533,9 @@ __init int intel_pmu_init(void)
		pr_cont("Core2 events, ");
		break;

	case 26: /* 45 nm nehalem, "Bloomfield" */
	case 30: /* 45 nm nehalem, "Lynnfield" */
	case 46: /* 45 nm nehalem-ex, "Beckton" */
	case 30: /* 45nm Nehalem    */
	case 26: /* 45nm Nehalem-EP */
	case 46: /* 45nm Nehalem-EX */
		memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
		       sizeof(hw_cache_event_ids));
		memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
@@ -2415,11 +2562,11 @@ __init int intel_pmu_init(void)
		pr_cont("Nehalem events, ");
		break;

	case 28: /* Atom */
	case 38: /* Lincroft */
	case 39: /* Penwell */
	case 53: /* Cloverview */
	case 54: /* Cedarview */
	case 28: /* 45nm Atom "Pineview"   */
	case 38: /* 45nm Atom "Lincroft"   */
	case 39: /* 32nm Atom "Penwell"    */
	case 53: /* 32nm Atom "Cloverview" */
	case 54: /* 32nm Atom "Cedarview"  */
		memcpy(hw_cache_event_ids, atom_hw_cache_event_ids,
		       sizeof(hw_cache_event_ids));

@@ -2430,8 +2577,8 @@ __init int intel_pmu_init(void)
		pr_cont("Atom events, ");
		break;

	case 55: /* Atom 22nm "Silvermont" */
	case 77: /* Avoton "Silvermont" */
	case 55: /* 22nm Atom "Silvermont"                */
	case 77: /* 22nm Atom "Silvermont Avoton/Rangely" */
		memcpy(hw_cache_event_ids, slm_hw_cache_event_ids,
			sizeof(hw_cache_event_ids));
		memcpy(hw_cache_extra_regs, slm_hw_cache_extra_regs,
@@ -2446,9 +2593,9 @@ __init int intel_pmu_init(void)
		pr_cont("Silvermont events, ");
		break;

	case 37: /* 32 nm nehalem, "Clarkdale" */
	case 44: /* 32 nm nehalem, "Gulftown" */
	case 47: /* 32 nm Xeon E7 */
	case 37: /* 32nm Westmere    */
	case 44: /* 32nm Westmere-EP */
	case 47: /* 32nm Westmere-EX */
		memcpy(hw_cache_event_ids, westmere_hw_cache_event_ids,
		       sizeof(hw_cache_event_ids));
		memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
@@ -2474,8 +2621,8 @@ __init int intel_pmu_init(void)
		pr_cont("Westmere events, ");
		break;

	case 42: /* SandyBridge */
	case 45: /* SandyBridge, "Romely-EP" */
	case 42: /* 32nm SandyBridge         */
	case 45: /* 32nm SandyBridge-E/EN/EP */
		x86_add_quirk(intel_sandybridge_quirk);
		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
		       sizeof(hw_cache_event_ids));
@@ -2506,8 +2653,9 @@ __init int intel_pmu_init(void)

		pr_cont("SandyBridge events, ");
		break;
	case 58: /* IvyBridge */
	case 62: /* IvyBridge EP */

	case 58: /* 22nm IvyBridge       */
	case 62: /* 22nm IvyBridge-EP/EX */
		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids,
		       sizeof(hw_cache_event_ids));
		/* dTLB-load-misses on IVB is different than SNB */
@@ -2539,20 +2687,19 @@ __init int intel_pmu_init(void)
		break;


	case 60: /* Haswell Client */
	case 70:
	case 71:
	case 63:
	case 69:
	case 60: /* 22nm Haswell Core */
	case 63: /* 22nm Haswell Server */
	case 69: /* 22nm Haswell ULT */
	case 70: /* 22nm Haswell + GT3e (Intel Iris Pro graphics) */
		x86_pmu.late_ack = true;
		memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, sizeof(hw_cache_event_ids));
		memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
		memcpy(hw_cache_event_ids, hsw_hw_cache_event_ids, sizeof(hw_cache_event_ids));
		memcpy(hw_cache_extra_regs, hsw_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));

		intel_pmu_lbr_init_snb();

		x86_pmu.event_constraints = intel_hsw_event_constraints;
		x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
		x86_pmu.extra_regs = intel_snb_extra_regs;
		x86_pmu.extra_regs = intel_snbep_extra_regs;
		x86_pmu.pebs_aliases = intel_pebs_aliases_snb;
		/* all extra regs are per-cpu when HT is on */
		x86_pmu.er_flags |= ERF_HAS_RSP_1;
@@ -2565,6 +2712,28 @@ __init int intel_pmu_init(void)
		pr_cont("Haswell events, ");
		break;

	case 61: /* 14nm Broadwell Core-M */
		x86_pmu.late_ack = true;
		memcpy(hw_cache_event_ids, hsw_hw_cache_event_ids, sizeof(hw_cache_event_ids));
		memcpy(hw_cache_extra_regs, hsw_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));

		intel_pmu_lbr_init_snb();

		x86_pmu.event_constraints = intel_bdw_event_constraints;
		x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints;
		x86_pmu.extra_regs = intel_snbep_extra_regs;
		x86_pmu.pebs_aliases = intel_pebs_aliases_snb;
		/* all extra regs are per-cpu when HT is on */
		x86_pmu.er_flags |= ERF_HAS_RSP_1;
		x86_pmu.er_flags |= ERF_NO_HT_SHARING;

		x86_pmu.hw_config = hsw_hw_config;
		x86_pmu.get_event_constraints = hsw_get_event_constraints;
		x86_pmu.cpu_events = hsw_events_attrs;
		x86_pmu.limit_period = bdw_limit_period;
		pr_cont("Broadwell events, ");
		break;

	default:
		switch (x86_pmu.version) {
		case 1:
Loading