That’s it, the Wii U has been finally released, and while a plethora are enjoying new game experiences and interesting Gamepad applications, a smaller but rather vocal number of gamers on the forums, is mixed regarding the technical viability of the console for other productions than Nintendo’s. These concerns stem mainly from several laborious ports to say the least, lukewarm comments on the CPU by 4A Games or DICE aggravated by its disclosed frequency and apparent architecture roots, and the system’s first teardowns. Their results revealed the type of main memory (RAM) used, which disappointed many techies. Let’s dig further into this affair, with exclusive words from developers stating that it’s not an issue.
Not Enough Bandwidth?
To introduce these matters simply, the RAM (mouse over for definitions) of the Wii U is a data storage that can be accessed faster and more conveniently (without orders derived from mechanical design limitations) than a disc or a hard-drive. It’s a crucial element in the chain between the support of your game and your screen, supplying the information that constitutes the software like sounds and graphics quickly enough to the processors which will then render it on your TV and the Gamepad. This memory is relevant for both the CPU and the GPU as its pool is unified.
There are several important parameters about this RAM. Among others, the capacity, the latency and the bandwidth, which is the amount of data per second read or written in the memory by other components, calculated by multiplying the width of the interface connecting those parts (the X bit bus that you may have encountered on spec lists), and the frequency at which the data is transferred (see here for more information on the topic). For the first point, the Wii U got 2GB of which 1GB is set aside for games, it’s twice the 512MB of the Xbox360 and PS3. Before the release of its next-gen rivals, the Wii U is the console with the most memory, so much that developers like Ubisoft’s Michel Ancel praised this volume. It’s the same flattering portray for latency, the main recipient of Shin’en’s Manfred Linzner compliments on the system in our exclusive interview.
Remained to be known the details on the bandwidth, a key specification since if not enough data can be transported between the CPU/GPU and the RAM, the latter would have its size wasted, and the former would see their processing power starved for information to treat. It’s the famous water & pipes allegory.
Specialized sites such as Anandtech or iFixit studied the guts of North-American Wii Us, and the easiest component to analyze was the 2GB pool of RAM by searching for the chips references.
Here are their discoveries:
There are four 4Gb (512MB) Hynix DDR3-1600 devices surrounding the Wii U’s MCM (Multi Chip Module). Memory is shared between the CPU and GPU, and if I’m decoding the DRAM part numbers correctly it looks like these are 16-bit devices giving the Wii U a total of 12.8GB/s of peak memory bandwidth.
Let’s compare this finding to other systems RAM (we are excluding “exotic” memory such as eDram here)
This is clearly a nice leap from the 5.6GB/s Wii’s bandwidth, but roughly 40% slower than the Xbox360 and PS3.
This underachievement on paper in front of systems out 7 years earlier worries the “spec-aware” gamers about the Wii U viability, its capacity to receive games running on the other manufacturers’ future consoles. It is likely that those, for reasons of chip density and cost, will use the same type of memory (DDR3 or the upcoming DDR4) but with a higher frequency and/or wider interface (Wii U uses a 64bit bus as there are four 16bit interfaces, Xbox720 and PS4 may be 128bit or more), hence doubling at least the bandwidth which would safely exceed 20GB/s. Precisely, there are apprehensions concerning the Wii U ability to quickly manage the CPU and GPU accesses to large amount of data such as detailed textures stored in the RAM, essential for technically ambitious games. Some of them consist of free roaming in huge spaces that entail hefty data streaming and transfers. As a result, could this bandwidth turns into an obstacle for the Wii U to obtain the next Elder Scrolls or GTAVI without excessive downscale that would distort the artist vision for their project?
Fifty shades of “slow” RAM
Multiple factors, truthful or hypothetical, nuance the hasty judgment of a sluggish Wii U main memory that could derive from the teardown’s info:
1 – Generally speaking, although being a non-negligible parameter, RAM bandwidth is less vital than the GPU power or the memory amount, especially as the Wii U is more targeting 720p resolution for its content, thus requiring less fillrate and bandwidth than in 1080p. The anonymous source involved in this article himself put this criterion into perspective, declaring:
In general all those DRAM numbers are not of much importance. Much more for example are the size and speed of caches for each core in the CPU. Because DRAM is always slow compared to caches speed.
Caches are faster but way smaller pools of memory than the RAM where repeatedly accessed data is stored, and our anonymous developer lauded the Wii U CPU ones in our little chat. So those caches should ease to a certain degree the slow RAM issue for the CPU. Then what about the GPU?
2 – The Wii U supposedly includes 32MB of “embedded DRAM”, a costly memory integrated on the same die as the GPU, like on Wii or Xbox 360 (for the latter, it was on a daughter die but on the same package as the GPU). The gains of this kind of memory compared to traditional stand-alone RAM chips are huge in pretty much all areas, like the latency and the bandwidth (feasibly reaching XXXGB/s rates). You might consider this eDram like another cache, but unlike the CPU one, it can be accessed by the GPU. It’s an efficient solution to spare the RAM of large bandwidth usages which are mandatory for the image treatments handled by the GPU, as they will occur on this specific memory instead.
Here are the motives from Xbox 360 architects behind its adoption:
HD, alpha blending, z-buffering, antialiasing, and HDR pixels take a heavy toll on memory bandwidth. Although more effects are being achieved in the shaders, postprocessing effects still require a large pixel-depth complexity. Also as texture filtering improves, texel fetches can consume large amounts of memory bandwidth, even with complex shaders. One approach to solving this problem is to use a wide external memory interface. This limits the ability to use higher-density memory technology as it becomes available, as well as requiring compression. Unfortunately, any compression technique must be lossless, which means unpredictable—generally no good for game optimization. In addition, the required bandwidth would most likely require using a second memory controller (a circuit that manages the flow of data with the RAM) in the CPU itself, rather than having a unified memory architecture, further reducing system flexibility.
EDRAM was the logical alternative. It has the advantage of completely removing the render target and the z-buffer bandwidth from the main-memory bandwidth equation. In addition, alpha blending and z-buffering are read-modify write processes, which further reduce the efficiency of memory bandwidth consumption. Keeping these processes on-chip means that the remaining high-bandwidth consumers—namely, geometry and texture—are now primarily read processes. Changing the majority of main-memory bandwidth to read requests increases main memory efficiency by reducing wasted memory bus cycles caused by turning around the bidirectional memory buses.” source
And this eDram plays a central role in the Wii U architecture, greater than on previous platforms as it’s designated in the leaked technical documentations for licensed developers as the “MEM1” while the 2GB of RAM is “MEM2”. Concretely, the eDram can acts as an ultra-fast “bridge” between the RAM and the CPU/GPU, indirectly mitigating the hypothetical slowness of the RAM and boosting the overall performance of the memory chain.
However, despite its critical place, its quantity is limited to 32MB, and much of it should be already used as frame buffer and image processing such as z-buffering or anti-aliasing. Consequently, the space left that could be employed as the previously mentioned bridge will be more restrained, reinforcing our apprehensions about the sustainability of the console compared to its rivals which may directly contain a faster RAM without having to rely on this approach. They could even conceivably feature their fair share of eDram like the Wii U, performing as a similar quick and versatile memory pool to counter the growing latencies associated with RAM evolution as demonstrated by this graph.
In this event, the advantage of the Wii U memory hierarchy would be neutralized or even exceeded if, thanks in part to smaller fabrication process, the Xbox720 and PS4 will dispose of more embedded RAM.
1 – All these numbers we’ve mentioned are the theoretical maximum bandwidth. In real conditions, the observed performances will be inferior to those advertised with in a few occasions huge variances, as shown by this table:
It possibly means that this abstract disparity could be lessened in practical situation. If the real sustained bandwidth is let’s say roughly 20% inferior to the marketed top, then as the Wii U peak figure is smaller than for Xbox 360 to begin with, the gap in absolute numbers will be reduced although the proportional difference (-43%) between the two will stay the same which is the most important.
2 – The Wii U RAM real bandwidth might be closer to its theoretical peak than for the Xbox 360 and PS3. It could be explained by a more modern memory controller that better handles the data flow from and into the RAM.
3 – We also must take into account several intricate concepts of bus and direction of RAM. For starters, the 22.4GB/s bandwidth of current gen systems buses is often an aggregated rate, distributed between reads and writes with the RAM. In the case of Xbox 360, the CPU doesn’t reach this speed as its access to the RAM is bound by the FSB, the interface connecting it with the GPU where resides the memory controller. And this FSB bandwidth is 10.8GB/s for read and 10.8GB/s for write.
For that reason, and strictly speaking about the CPU transfers with the main memory, it’s possible that the Wii U RAM full write or read bandwidth (12.8GB/s) isn’t as much at a disadvantage as the marketed numbers might suggest if the interface between U-CPU and U-GPU authorize this peak rate. As they are on the same substrate (the Multi Chip Module), it’s expected, although U-CPU bus to the GPU might be split like for Xenon. Still, since the CPU and GPU share the same bandwidth with the RAM, we’re talking of very specific circumstances here, like data processing sequences where the GPU might not use much RAM by extensively leveraging the eDram for example, allowing the CPU to get the most out of the transfer capabilities with the main memory.
Likewise, this RAM bandwidth isn’t only common for all the components consuming it, it’s also bidirectional, meaning it can be used either for reads or writes. This characteristic could be combined with the eDram too, working as a “scratchpad” where the important writing operations will be aimed, reducing the amount of writes to the RAM which could then benefit of its full bandwidth potential for data reads. However, this asymmetrical memory organization undoubtedly requires optimization, especially from ports that may have put to task the greater bandwidth of the Xbox 360 and PS3 GDDR3 RAM for concurrent reads and writes.
4 – The chips are DDR3, a newer standard comparatively to the GDDR3 of Microsoft and Sony actual systems which have the same technological foundation as DDR2. This could manifest in practical differences in favor of the Wii U RAM depending on the type of game code involved, for example if it requires either many short read/write or long transfers with the main memory.
5 – The last assumption would be that Nintendo and AMD could have developed a more forward-thinking texture compression method than for existing platforms, therefore reducing bandwidth needs. This tweet of Two Tribes Games, may support this premise.
We could finally add to this list of theories that no negative comments pertaining to memory have surfaced from developers. Quite the contrary with as an illustration, Manfred Linzner who insisted on the absence of bottlenecks in the Wii U. A GPU starved of data from the RAM due to a limited bandwidth would represent a noticed and surely mentioned hindrance. Likewise, Nintendo is known to build balanced systems since a couple of generations, so it would be strange to select a RAM which would by its nature and the way it’s implemented in the hardware constitute a congestion for the performances.
But these last points are nonetheless just hypothesis, and won’t reverse the unsatisfactory RAM specifications that have dismayed tech-enthusiasts.
Not worth the geeky soap? (discussion with Wii U developers)
Even if this “bandwidth drama” is only debated within confined circles, it shouldn’t be discarded, especially to gauge the Wii U longevity in regard to technically demanding third-party titles. To better comprehend this situation, we had a little chat with a developer (wishing to remain anonymous), who has released a graphically solid retail game on Wii U.
We also discussed with Joel Kinnunen, vice-president of Frozenbyte, and while remaining general to respect the NDA’s, he stated:
We haven’t really done any proper measurements, our interest is always on the actual game performance which has been good. We didn’t have any major issues otherwise either, obviously some optimizations for the CPU were needed but we did that and now it runs better than the other consoles. The one thing we did not notice until release was the gamma/washed-out look we have/had in the release version – we had the hardware perform an extra and unnecessary gamma correction, we’ve fixed that in the update that’s coming out soonish. But on the topic, we had no issues at all with memory bandwidth on Trine 2: Director’s Cut.
How can we interpret these reports whereas the recent disassembles indicate a lackluster RAM bandwidth? Let’s see the several explanations that come in mind:
1) A thought (or hope) seen several times on boards is that we could lack the whole set of specifications of the RAM, the teardowns have not revealed everything, and the theoretical bandwidth could be higher than 12.8GB/s. But how could we contemplate that, the people behind these studies are professionals after all and the chips provided by at least three companies answer to a known standard. What could, strangely I must say, falsify those first analysis witnessed in different websites? It’s extremely improbable so we can safely drop this scenario.
2) The anonymous developer doesn’t measure the same parameter or the memory speed at the same spot, like he presumed himself. Could he talk about the speed of the data processed after the intervention of the eDram or CPU’s caches, which would affect positively his results? Still, it illustrates how the RAM bandwidth isn’t a perceived deterrent at all for the system in this context.
3) The types of game that these developers have worked on could imply a coding that doesn’t require an extensive use of the RAM, for example with the load of huge textures and detailed environments in an openworld/sandbox title. In this scenario, even if the RAM would indeed have a theoretical bandwidth of 12.8GB/s, there would be no negative impact because the main bulk of data that the CPU and GPU need rapidly would be in the caches and the fast eDram. For that matter, it could be linked to the suppositions brought up earlier like the bus and ram direction and the more advanced memory controllers. But in this case, to what extent can the eDram stand as the “magical savior”, compensating for the relatively slow 1GB of RAM? Should we fear that games more pushing visually, in scale, and necessitating a bigger amount of data to be transferred to the CPU and GPU than this embedded RAM can store, will not be able to rely on this method and thus meet performances drawbacks on Wii U?
All in all, our anonymous feedback, combined with other developers’ acclaims, tends to confirm that the Wii U memory hierarchy is cleverly thought-out when appropriately used, and even if the RAM bandwidth could be displeasing on a cold specsheet, it’s not translated into a tangible weakness, at least for exclusives and properly done ports. We tried to contact Hynix, one of the manufacturers of the RAM, and their answer albeit limited because of non-disclosure agreements, leans indeed toward a more complex situation than the theoretical rates might infer:
We are truly sorry that we are not able to give the answer. It is so complicated and so many various factors can be made. And under the NDA with our customer, we are not to answer to performance wise inquiries
We can say as well that if the eDram’s huge role is established, this organization would be somewhat of an extreme continuation of Nintendo habits since the Gamecube, with multiple pools, one acting as the “performer” with very efficient characteristics (the eDram), and another being the “backup” pool, here bigger but much slower (the RAM). In fact, it’s perhaps one of the causes behind the poor execution of a few ports, some studios might not had enough time and resources to adapt their titles planned for big RAM space with large bandwidth to the Wii U asymmetrical memory organization. Then the real interrogation, as tackled in the third scenario, relies upon this configuration and if this 32MB performer will be adequate and sufficient to let third-party next-gen games run acceptably on Wii U.
Nevertheless, one could wonder if adopting at the minimum a wider bus (128 bit) or moderately increased frequency would have been such an immense effort for the Kyoto company. It could have ensured more bandwidth without having to resort to a procedure which perhaps impedes the true potential of the eDram if coupled with a more ambitious main memory. From a technical view, the Wii U increasingly appears as a system with a particular architecture, primarily designed for Nintendo needs and for which it’s imperative to optimize the game code to make the most of it. On this point, it’s deplorably in contradiction with the first statements around E3 2011, describing a very accessible platform allowing fast and easy porting of software built for Xbox360 and PS3. Let’s hope further improvements in the tools to develop on it (SDK), the middleware employed, and other parameters – such as the adaption of studios to a new/different memory paradigm involving slow main RAM but faster “intermediary memory” like eDram – will change this situation for the better.
What do techies around the world think of this whole memory affair? We await your comments!
Thanks to NeoGAF, Beyond3D and NES members for their insight.
Contact the author: inquiringmind3[@]gmail.com