BTB size for Haswell, Sandy Bridge, Ivy Bridge and Skylake?

Question

BTB size for Haswell, Sandy Bridge, Ivy Bridge and Skylake?

Is there any way to determine or any resource where I can find the size of the destination branch buffer for Haswell, Sandy Bridge, Ivy Bridge and Intel Skylake processors?

+5

branch-prediction x86 cpu intel cpu-architecture

samira Jul 21 '16 at 19:33

source share

1 answer

osgx · Accepted Answer · 2016-07-21T19:47:04+0000

Check out software optimization resources from Agner Fog, http://www.agner.org/optimize/

BTB should be in the "Intel, AMD and VIA Processor Microarchitecture: Optimization Guide for Build Programmers and Compiler Developers", http://www.agner.org/optimize/microarchitecture.pdf

3.7 Branch Prediction in Intel Sandy Bridge and Ivy Bridge
BTB organization. The target branch buffer in Sandy Bridge is larger than in Nezhemam, according to unofficial rumors. It is not known whether it has one level, as in Core 2 and earlier processors, or two levels, as in Nehalem. It can handle a maximum of four call instructions on 16 byte code. Conditional branching is less efficient if more than three branch instructions are indicated on 16 bytes of code.
3.8 Prediction of branches in Intel Haswell, Broadwell and Skylake
BTB organization. The organization of the destination branch buffer is unknown. It seems big enough.

Intel may describe some of the information in the Intel 64 and IA-32 Architecture Optimization Reference Guide http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures -optimization-manual.html around "3.4.1 Optimizing Branch Prediction", but still no sizes.

This may seem strange, but in 1998-2000 there was no information about BTB in cpuid: http://www.installaware.com/forums/oldattachments/02142006163/tstcpuid.c (Gerald Heim, University of Tübingen, Germany.). And still not listed in http://www.felixcloutier.com/x86/CPUID.html or in some public materials from Intel employees ...

* This table describes the possible cache and TLB configurations * as documented by Intel. For now AMD doesn't use this but gives * exact cache layout data on CPUID 0x8000000x. * * MAX_CACHE_FEATURES_ITERATIONS limits the possible cache information * to 80 bytes (of which 16 bytes are used in generic Pentii2). * With 80 possible caches we are on the safe side for one or two years. * * Strange enough no BHT, BTB or return stack data is given this way...

There should be some performance monitoring counters (PMUs) for BTB, and there are experiments to get the BTB size from running special test programs, check out http://xania.org/201602/haswell-and-ivy-btb on Matt Godbolt

conclusions
From these results, it seems that Ivy Bridge (and therefore probably Sandy Bridge) uses almost the same strategy to search for BTB unconditional branches, albeit with a large table size: 4096 entries are divided into 1024 sets of 4 ways.
For Haswell, it seems that a new approach has been adopted for defining sets, as well as a new approach to record output.

and more of his posts on branch prediction and its events:

http://xania.org/201602/bpu-part-one Static branch prediction on new Intel processors.
http://xania.org/201602/bpu-part-two Predicting a Branch - Part Two
http://xania.org/201602/bpu-part-three BTB in modern Intel chips)
http://xania.org/201602/bpu-part-four Target Buffer Branch, Part 2

Its code is publicly available based on Agner tests: https://github.com/mattgodbolt/agner : https://github.com/mattgodbolt/agner/blob/master/tests/btb_size.py , https://github.com /mattgodbolt/agner/blob/master/tests/branch.py

BTB size for Haswell, Sandy Bridge, Ivy Bridge and Skylake?

More articles: