Apple iPad 2 Preview
by Anand Lal Shimpi, Brian Klug & Vivek Gowri on March 12, 2011 6:01 AM ESTI remember the speculation that lead up to Apple's iPad launch. The list of things everyone expected the device to do was absurd, and the theories on the architecture behind Apple's first branded SoC was just as fantastic. The simplest answer is sometimes the right one and as Ars Technica's Jon Stokes pointed out, the A4 was nothing more than a hardened ARM Cortex A8 core running at 1GHz in the iPad (and 800MHz in the iPhone 4).
The Cortex A8 is something we've covered extensively here so I won't go into great detail right now. It's a dual-issue, in-order architecture with a 13 stage integer pipeline and a non-pipelined FPU.
When Apple announced the iPad 2, it also briefly announced the A5 SoC. The only detail given? The A5 is a dual-core processor with a GPU that's 9x faster than what's in the A4.
There are only two recent ARM architectures that have multicore support: the ARM11 and the ARM Cortex A9. The A8 doesn't come in a multicore variant. Given how many other SoC vendors are shipping dual-core Cortex A9 SoCs, the A5 was likely no different than NVIDIA's Tegra 2, TI's OMAP 4 or Samsung's Exynos in that regard: armed with a pair of Cortex A9s running at 1GHz. Update: Geekbench reports clock speed at 900MHz. Update 2: Apple confirms 1GHz clock speed on the iPad 2 specs page.
Architecture Comparison | |||||||
ARM11 | ARM Cortex A8 | ARM Cortex A9 | Qualcomm Scorpion | ||||
Issue Width | single-issue | dual-issue | dual-issue | dual-issue | |||
Pipeline Depth | 8 stages | 13 stages | 9 stages | 13 stages | |||
Out of Order Execution | N | N | Y | Partial | |||
FPU | Optional VFPv2 (not-pipelined) | VFPv3 (not-pipelined) | Optional VFPv3-D16 (pipelined) | VFPv3 (pipelined) | |||
NEON | N/A | Y (64-bit wide) | Optional MPE (64-bit wide) | Y (128-bit wide) | |||
Process Technology | 90nm | 65nm/45nm | 40nm | 40nm | |||
Typical Clock Speeds | 412MHz | 600MHz/1GHz | 1GHz | 1GHz |
The Cortex A9 is similar to the A8 but with an out-of-order execution engine and a shallower pipeline (9 stages). The result is better-than-A8 performance at the same clock speed. The A9 also adds a fully pipelined FPU.
Now it's unclear what the rest of the A5 SoC looks like, but from the CPU standpoint I think it's safe to say that there are a pair of ARM Cortex A9s in there. We can look at the increase in Geekbench Floating Point scores for some proof:
Geekbench 2 - Floating Point Performance | ||||
Apple iPad | Apple iPad 2 | |||
Overall FP Score | 456 | 915 | ||
Mandlebrot (single-threaded) | 79.5 Mflops | 279.1 Mflops | ||
Mandlebrot (multi-threaded) | 79.4 Mflops | 554.7 Mflops | ||
Dot Product (single-threaded) | 245.7 Mflops | 221.7 Mflops | ||
Dot Product (multi-threaded) | 247.2 Mflops | 436.8 Mflops | ||
LU Decomposition (single-threaded) | 54.5 Mflops | 205.4 Mflops | ||
LU Decomposition (multi-threaded) | 54.8 Mflops | 421.6 Mflops | ||
Primality Test (single-threaded) | 71.2 Mflops | 177.8 Mflops | ||
Primality Test (multi-threaded) | 69.3 Mflops | 318.1 Mflops | ||
Sharpen Image (single-threaded) | 1.51 Mpixels/s | 1.68 Mpixels/s | ||
Sharpen Image (multi-threaded) | 1.51 Mpixels/s | 3.34 Mpixels/s | ||
Blur Image (single-threaded) | 760.2 Kpixels/s | 665.5 Kpixels/s | ||
Blur Image (multi-threaded) | 753.2 Kpixels/s | 1.32 Mpixels/s |
Single threaded FPU performance is multiples of what we saw with the original iPad. This sort of an improvement in single-core performance is likely due to the pipelined Cortex A9 FPU. Looking at Linpack we see the same sort of huge improvement:
Whether this performance advantage matters is another matter entirely. Although there aren't many FP intensive iPad apps available today, moving to the A5 is all about enabling developers - not playing catch up to software.
Memory size, bandwidth and operating frequencies are all unknowns that I was hoping to find out more about once I put hands on the iPad 2. Geekbench reports the iPad 2 at 512MB of memory, double the original iPad's 256MB. Remember that Apple has to deal with lower profit margins than it'd like with the iPad, but it refuses to cut corners on screen quality so something else has to give.
L2 cache size has also apparently increased from 512KB to 1MB. The L2 cache is shared among both cores and 1MB seems to be the sweet spot this generation.
Geekbench 2 - Memory Performance | ||||
Apple iPad | Apple iPad 2 | |||
Overall Memory Score | 644 | 787 | ||
Read Sequential (single-threaded scalar) | 340.6 MB/s | 334.2 MB/s | ||
Write Sequential (single-threaded scalar) | 842.4 MB/s | 1.07 GB/s | ||
Stdlib Allocate (single-threaded scalar) | 1.74 Mallocs/s | 1.86 Mallocs/s | ||
Stdlib Write (single-threaded scalar) | 1.20 GB/s | 2.30 GB/s | ||
Stdlib Copy (single-threaded scalar) | 740.6 MB/s | 522.0 MB/s |
Geekbench's memory tests show an improvement in effective bandwidth as well. The biggest improvement is in the stdlib write test which shows a near doubling of bandwidth from 1.2GB/s to 2.3GB/s. Unfortunately this isn't enough data to draw conclusions about bus width or DRAM operating frequency. Given the increases in CPU and GPU performance, an increase in memory bandwidth to go along with the two isn't surprising.
Geekbench shows a healthy increase in integer performance, both in single and multithreaded scenarios. The multithreaded advantage makes sense (two are better than one), but the lead in single threaded tests shows the benefit the A9 can deliver thanks to its shorter pipeline and ability to reorder instructions around stalls.
Geekbench 2 - Integer Performance | ||||
Apple iPad | Apple iPad 2 | |||
Overall FP Score | 365 | 688 | ||
Blowfish (single-threaded) | 13.9 MB/s | 13.2 MB/s | ||
Blowfish (multi-threaded) | 14.3 MB/s | 26.1 MB/s | ||
Text Compression (single-threaded) | 1.23 MB/s | 1.50 MB/s | ||
Text Compression (multi-threaded) | 1.20 MB/s | 2.82 MB/s | ||
Text Decompression (single-threaded) | 1.11 MB/s | 2.09 MB/s | ||
Text Decompression (multi-threaded) | 1.08 MB/s | 3.28 MB/s | ||
Image Compress (single-threaded) | 3.36 Mpixels/s | 3.79 Mpixels/s | ||
Image Compress (multi-threaded) | 3.41 Mpixels/s | 7.51 Mpixels/s | ||
Image Decompress (single-threaded) | 6.02 Mpixels/s | 6.68 Mpixels/s | ||
Image Decompress (multi-threaded) | 5.98 Mpixels/s | 13.1 Mpixels/s | ||
Lua (single-threaded) | 172.1 Knodes/s | 273.4 Knodes/s | ||
Lua (multi-threaded) | 171.9 Knodes/s | 542.9 Knodes/s |
On average Geekbench shows a 31% increase in single threaded integer performance over the A4 in the original iPad. NVIDIA told me they saw a 20% increase in instructions executed per clock for the A9 vs. A8 and if we remove the one outlier (text decompression) that's about what we see here as well.
Geekbench 2 | |||||||
Overall | Integer | FP | Memory | Stream | |||
Apple iPad | 448 | 365 | 456 | 644 | 325 | ||
Apple iPad 2 | 750 | 688 | 915 | 787 | 324 |
The increases in integer performance and memory bandwidth are likely what will have the largest impact on your experience. The fact that we're seeing big gains in single as well as multi-threaded workloads means the performance improvement should be universal across all CPU-bound apps.
What does all of this mean for performance in the real world? The iPad 2 is much faster than its predecessor. Let's start with our trusty javascript benchmarks: SunSpider and BrowserMark.
Apple improved the Safari JavaScript engine in iOS 4.3, which right off the bat helped the original iPad become more competitive in this test. Even with both pads running iOS 4.3, the iPad 2 is 80% faster than the original iPad here.
The Motorola Xoom we recently reviewed scored a few percent slower than the iPad 2 in SunSpider as well. Running different OSes and browsers, it's difficult to conclude much when comparing the A5 to Tegra 2.
A bug in BrowserMark kept us from running it for the Xoom review but it's since been fixed. Again we're looking at mostly JavaScript performance here. Rightware modeled its benchmark after the JavaScript frameworks and functions used by websites like Facebook, Amazon and Gmail among others. The results are simply one aspect of web browsing performance, but an important one:
The move from the A4 in the iPad 1 to the A5 in the iPad 2 boosts scores by 47%. More impressive however is just how much faster the Xoom is here. I suspect this has more to do with Google's software optimizations in the Honeycomb browser than hardware, but let's see how these tablets fare in our web page loading tests.
We debuted an early version of our 2011 web page loading tests in the Xoom review. Two things have changed since then: 1) iOS 4.3 came out, and 2) we changed our timing methods to produce more accurate results. It turns out that Honeycomb's browser was stopping our page load timer sooner than iOS', which resulted in some funny numbers when we got to the 4.3/Honeycomb comparison. To ensure accuracy we went back to timing by hand (each test was repeated at least 5 times and we present an average of the results). We also added two more pages to the test suite (Digg and Facebook).
The iPad 2 generally loads web pages faster than the Xoom. On average it's a ~20% increase in performance. I wouldn't say that the improvement is necessarily noticeable when surfing most sites, but it's definitely measurable.
The move to iOS 4.3 really narrowed the gap between the original iPad and the Xoom. In some cases the two actually render pages in the same amount of time, however that's typically for lighter pages that are easy to render. Up the complexity and the Xoom easily distances itself from the original iPad.
We'll touch on this more in the full review but it's not all about performance when talking about web browsing between the iPad 2 and the Xoom. Although the iPad 2 may have faster render times on average, the Xoom still supports tabbed browsing which definitely has its advantages.
82 Comments
View All Comments
Brian Klug - Monday, March 14, 2011 - link
That's weird that the apple rep would say those things. The current MacBook is entirely hard plastic, and lacks the matte soft-touch material that used to stain. The current iPad is actually indeed covered in glass the same as the black model, so it will absolutely not stain.-Brian
podpi - Sunday, March 13, 2011 - link
Hey AnandLet me preface this by saying I'm a huge Anandtech fan, and really trust and appreciate your word and the deep, thorough reviews you do. That's what sets you apart from the other tabloidy, first out, sort of content. They do anything to have content out first. I believe some write the reviews before they test the equipment.
Now, whenever any big product comes out, I may glance at the Engadget/Wired reviews, but I keep checking Anandtech everyday, waiting eagerly in anticipation. The other sites try and up the number of views, so they launch bit by bit and try and draw it out, but also try and be out first with a brief headline-like review. Quality suffers. So I keep checking AT, and waiting for a thorough, unbiased, great reading review. And when it comes out, sometimes up to a week later than the rest. By then the hype had died, down, and it has a chance to come in clean, and see what someone who put the effort in actually thought of the product. The fact that your reviews tend to be a little later show to me how much effort you put in, and how much you care about quality. And this is what I appreciate.
Now, with the iPad 2 review, it seems to be coming in drips and drabs, to try and what appears to be to keep up with the competition? I think it's natural to get into a race with those sites. But how can you win a race against tech-tabloids? If you blend in with the crowd, then there is little differentiation. I don't think you can have best of both. In my opinion, taking your time and giving a full long in-depth review, like a short story, provides a unique, quality characteristic to your reviews, site and brand.
Thanks for the great work.
Terence
Brian Klug - Monday, March 14, 2011 - link
Terence,I actually agree with you that we really excel at the longer reviews, and it's good to hear that even though we're sometimes much later than everyone else, that it's still useful. I know I enjoy spending a lot of time being exacting about things and sometimes going behind schedule just to uncover absolutely everything.
We were able to get a preview out on day one this time around just because we have about four people with iPads this time around, where with most things we'll only have one (or in extreme cases, two) units. Each of us working in parallel really sped things up and we thought we'd just share some quick impressions.
The big review is still coming (today actually - 3/14) and will be the usual length/depth for certain ;)
-Brian
Belard - Sunday, March 13, 2011 - link
As an owner of a Galaxy S series, with poor support from Samsung/at&t and my experience of using an Android device phone for almost 6 months.Using the iPad2 was a breeze, its was fast and fun to use. Yeah, the missing USB & SD slots still sucks... but that won't effect sales much.
We then went across the isle to the Samsung Galaxy Tab which costs the exact same $500. It was easily heavier, thicker... the GUI looks and functions mostly like my own phone so I had no problems "using it".
So after that experience, we both agreed... why spend $500 on an Android, when we could get a much snappier, larger screen and much easier to use device. BestBuy was sold out.
If the Galaxy Tab was $300~350 - it would be a more fair deal.
podpi - Monday, March 14, 2011 - link
Thanks Brian - looking forward to it :)ssvb - Monday, March 14, 2011 - link
VFPv2 unit in ARM11 is actually pipelined: http://infocenter.arm.com/help/topic/com.arm.doc.d...And yes, naturally it is much faster than Cortex-A8 on floating point code.
NCM - Monday, March 14, 2011 - link
AT: "Laying in bed and reading is probably where the difference becomes most apparent."You should never read in bed while laying—your partner will be offended.
It's perfectly all right while lying in bed, however.
eanazag - Monday, March 14, 2011 - link
I really appreciated the mouse over images changing comparison. It really gave a clear picture of the difference in the images.retnuh - Wednesday, March 16, 2011 - link
So.... being the 16th should we assume its getting the super mega awesome thorough testing treatment?UNLK A6 - Thursday, March 17, 2011 - link
I'd like some clarification about LINPACK and Geekbench. Are these benchmarks created by compiling some portable code for each platform as a measure of floating point performance? Or, is this supposed to be some measure of how fast one can do linear algebra or DSP on the platform? On Mac OS and iOS, one wouldn't compile say LINPACK for this but use the hand-tuned LAPACK/BLAS and DSP routines built into Apple's Accelerate Framework. The difference between the two can be huge. Which do these benchmarks purport to supply--generic floating point performance or available linear algebra and DSP performance?