Graphics Cards

The GTX 970s have an effective 224-bit bus and can only efficiently use 3.5GB

 

This morning we were talking about a problem suffered by all NVIDIA GTX 970s related to their memory, which meant that when the card needed more than 3.5 GB of vRAM there was a more than notable performance penalty.

The initial explanation from NVIDIA was that the GM204 chip used in these graphics had 3 SMM less than the GTX 980 (full GM204), and that therefore they needed to partition the memory into two parts of 3.5 GB and 512 MB to access it. efficient to the first 3.5 GB. At the time, our cabals aimed at eliminating the L1 cache blocks belonging to the 3 eliminated SMMs, which left us with 7 blocks. However, we were wrong and the problem is much simpler and more serious at the same time.

As they comment on Hardware.fr, the specs that NVIDIA has published on its GTX 970 are bogus since, despite having announced that this GPU had 64 ROPs, it actually has 8 of them disabled, which leaves us with only 56 ROPs or Render OutPut Unit.

By associating the ROPs to the L2 cache, removing 8 of these raster units, each with 32 KB of associated L2 cache, we also lose 256 KB of L2 cache memory out of 2,048 KB which, according to NVIDIA, had the GTX 970, so that the actual amount of L2 cache on the GTX 970 is 1,792 KB.

While in Maxwell architecture, the L2 cache and ROPs are not tied to the general memory controller and memory interface, the GTX 970 still maintains a 256-bit physical data bus to access the 4 GB of GDDR5 memory that it integrates, but it cannot make use of the entirety of that bus. Although the actual memory bus is 256 bits, the graphics card will not be using those 256 bits, but 224 bits. We have 4 bits for each ROP, and we have 56 ROPS, which gives us 224 bits. Let’s imagine that the memory bus is a highway with 256 lanes, in this case only vehicles would circulate on 224 of the available lanes, because there is no one to send cars through the rest of the lanes

In this way, 3.5 GB of memory in the GTX 970 is connected directly to the L2 cache and its corresponding ROPs, so performance will not suffer when using that amount of memory. The problem arrives when we need to access the last 512 MB of the 4GB of vRAM on the card. Those 512MB cannot be addressed directly from the L2 cache, so the card has to take advantage of a section of the existing L2 cache in order to access that memory space.

In this way, performance is penalized by having to send more data than that L2 cache can optimally handle, and that’s when micro-outages occur. It seems that Nvidia prioritizes the data types that most influence performance, so that the penalty is not so great, but in the end it is inevitable not to suffer a performance loss when we require the use of more than 3, 5 GB of vRAM.

The GTX 970 have an effective 224-bit bus and can only efficiently use 3.5GB, Image 2

We could ask ourselves why NVIDIA has chosen to introduce 4 GB of memory on a card with these limitations instead of integrating only 3.5 GB and avoid the problems derived from accessing the remaining 512 MB.

Actually, in the case of running a game that requires more than 3.5 GB, we are going to obtain more performance using those 512 MB that penalize the performance of the card, than using the main memory of the system, with a much lower bandwidth that would be further reduced by the bandwidth limitations of the PCI Express port.

However, although the GTX 970 will not perform less than what they have been performing since its departure, there is a difference between the specifications that are actually announced and the product that is acquired, so we will have to see the reaction of the users and NVIDIA’s response.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button
%d bloggers like this: