AMD’s Manju Hegde is one of the rare folks I get to interact with who has an extensive background working at both AMD and NVIDIA. He was one of the co-founders and CEO of Ageia, a company that originally tried to bring higher quality physics simulation to desktop PCs in the mid-2000s. In 2008, NVIDIA acquired Ageia and Manju went along, becoming NVIDIA’s VP of CUDA Technical Marketing. The CUDA fit was a natural one for Manju as he spent the previous three years working on non-graphics workloads for highly parallel processors. Two years later, Manju made his way to AMD to continue his vision for heterogeneous compute work on GPUs. His current role is as the Corporate VP of Heterogeneous Applications and Developer Solutions at AMD.

Given what we know about the new AMD and its goal of building a Heterogeneous Systems Architecture (HSA), Manju’s position is quite important. For those of you who don’t remember back to AMD’s 2012 Financial Analyst Day, the formalized AMD strategy is to exploit its GPU advantages on the APU front in as many markets as possible. AMD has a significant GPU performance advantage compared to Intel, but in order to capitalize on that it needs developer support for heterogeneous compute. A major struggle everyone in the GPGPU space faced was enabling applications that took advantage of the incredible horsepower these processors offered. With AMD’s strategy closely married to doing more (but not all, hence the heterogeneous prefix) compute on the GPU, it needs to succeed where others have failed.

The hardware strategy is clear: don’t just build discrete CPUs and GPUs, but instead transition to APUs. This is nothing new as both AMD and Intel were headed in this direction for years. Where AMD sets itself apart is that it is willing to dedicate more transistors to the GPU than Intel. The CPU and GPU are treated almost as equal class citizens on AMD APUs, at least when it comes to die area.

The software strategy is what AMD is working on now. AMD’s Fusion12 Developer Summit (AFDS), in its second year, is where developers can go to learn more about AMD’s heterogeneous compute platform and strategy. Why would a developer attend? AMD argues that the speedups offered by heterogeneous compute can be substantial enough that they could enable new features, usage models or experiences that wouldn’t otherwise be possible. In other words, taking advantage of heterogeneous compute can enable differentiation for a developer.

In advance of this year’s AFDS, Manju agreed to directly answer your questions about heterogeneous compute, where the industry is headed and anything else AMD will be covering at AFDS. Today we have those answers. 

How Fusion and HSA Differ

First and foremost, AMD has been down the GPGPU path before with Fusion. Can you explain how HSA is different?

Existing APIs for GPGPU are not the easiest to use and have not had widespread adoption by mainstream programmers. In HSA we have taken a look at all the issues in programming GPUs that have hindered mainstream adoption of heterogeneous compute and changed the hardware architecture to address those. In fact the goal of HSA is to make the GPU in the APU a first class programmable processor as easy to program as today's CPUs. In particular, HSA incorporates critical hardware features which accomplish the following:

1. GPU Compute C++ support: This makes heterogeneous compute access a lot of the programming constructs that only CPU programmers can access today

2. HSA Memory Management Unit: This allows all system memory is accessible by both CPU or GPU, depending on need. In today's world, only a subset of system memory can be used by the GPU.

3. Unified Address Space for CPU and GPU: The unified address space provides ease of programming for developers to create applications. By not requiring separate memory pointers for CPU and GPU, libraries can simplify their interfaces

4. GPU uses pageable system memory via CPU pointers: This is the first time the GPU can take advantage of the CPU virtual address space. With pageable system memory, the GPU can reference the data directly in the CPU domain. In all prior generations, data had to be copied between the two spaces or page-locked prior to use

5. Fully coherent memory between CPU & GPU: This allows for data to be cached in the CPU or the GPU, and referenced by either. In all previous generations GPU caches had to be flushed at command buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU and GPU share a high speed coherent bus

6. GPU compute context switch and GPU graphics pre-emption: GPU tasks can be context switched, making the GPU in the APU a multi-tasker. Context switching means faster application, graphics and compute interoperation. Users get a snappier, more interactive experience. As UI's are becoming increasing more touch focused, it is critical for applications trying to respond to touch input to get access to the GPU with the lowest latency possible to give users immediate feedback on their interactions. With context switching and pre-emption, time criticality is added to the tasks assigned to the processors. Direct access to the hardware for multi-users or multiple applications are either prioritized or equalized

As a result, HSA is a purpose designed architecture to enable the software ecosystem to combine and exploit the complementary capabilities of CPUs (sequential programming) and GPUs (parallel processing) to deliver new capabilities to users that go beyond the traditional usage scenarios. It may be the first time a processor company has made such significant investment primarily to improve ease of programming!

In addition on an HSA architecture the application codes to the hardware which enables user mode queueing, hardware scheduling and much lower dispatch times and reduced memory operations. We eliminate memory copies, reduce dispatch overhead, eliminate unnecessary driver code, eliminate cache flushes, and enable GPU to be applied to new workloads. We have done extensive analysis on several workloads and have obtained significant performance per joule savings for workloads such as face detection, image stabilization, gesture recognition etc…

Finally, AMD has stated from the beginning that our intention is to make HSA an open standard, and we have been working with several industry partners who share our vision for the industry and share our commitment to making this easy form of heterogeneous computing become prevalent in the industry. While I can't get into specifics at this time, expect to hear more about this in a few weeks at the AMD Fusion Developer Summit (AFDS).

So you see why HSA is different and why we are excited :)

Comments Locked


View All Comments

  • ltcommanderdata - Monday, May 21, 2012 - link

    I just wanted to thank Manju Hegde for taking the time to respond to all those questions. Thank you Anand as well for organizing this. I'm definitely looking forward to the role-out and adoption of HSA.
  • Lucian Armasu - Monday, May 21, 2012 - link

    That sounds kind of like what ARM wants to do with big.Little and Mali GPU's:
  • CeriseCogburn - Wednesday, May 23, 2012 - link

    The first page sounds to me like AMD is going for Homeland Security dollars.
    " We have done extensive analysis on several workloads and have obtained significant performance per joule savings for workloads such as face detection, image stabilization, gesture recognition etc"

    I see.
    Don't be evil amd.
  • CeriseCogburn - Wednesday, May 23, 2012 - link

    Then I thought maybe they are going for gaming console capabilities - but it appears on page2 they are going for logging on to Windows:

    " OSes are moving towards providing some base functionality in terms of security, voice recognition, face detection, biometrics, gesture recognition, authentication, some core database functionality. All these benefit significantly from the optimizations in HSA described above. With the industry support we are building this should happen in the next few years."

    So, it's probably all three.
  • Loki726 - Monday, May 21, 2012 - link

    I'll need some time to read over all of the responses, but I want to thank Manju Hegde for taking the time to respond in such detail.
  • SanLouBlues - Monday, May 21, 2012 - link

    I'm not sure he understood the question of the guy with the Intel CPU and AMD GPU who wanted GPU accelerated winzip.
  • c0d1f1ed - Monday, May 21, 2012 - link

    None of the tough questions were answered, like how they're hoping to compete against AVX2 and AVX-1024.

    I'm sure HSA is an improvement, but it's not developed in a vacuum. AMD has to be willing to compare it against homogeneous high throughput computing. They won't gain the respect of developers if they dodge questions. It doesn't show much confidence in their own technology.
  • texasti89 - Monday, May 21, 2012 - link

  • B3an - Monday, May 21, 2012 - link

    Thank you Manju for answering my questions! :) And a interesting article, still reading it...
  • oldguybt - Tuesday, May 22, 2012 - link

    Sounds like for AMD it was surprise...?Just like they find out just now, that they have usualy x2 x3 or x4 more steam processors than Nvidia cuda cores.
    Probably local "hacker" told them that than u trying to decrypt in Backtrack WPA/WPA2 internet password AMD GPU do that 4 or 5 times faster.
    Well i think i find out that faster than AMD :D

    ok sorry for stupidity....
    Anyway AMD saying that their GPU`s made for gaming is actualy is like TESLA or i`m wrong?? :)

    Just buy Nvidia,

Log in

Don't have an account? Sign up now