[cluster-l] Single- vs. Dual- vs. Quad-core CPUs

Jim Phillips jim at ks.uiuc.edu
Wed Mar 28 14:30:40 CDT 2007


Hi,

Everything you're seeing makes sense for a memory-limited code.

Each Opteron core has its own cache, and each chip has its own memory 
interface.  Each pair of Xeon cores shares a single cache, and all chips 
share a single memory interface.  Thus the Opteron system scales more 
linearly to higher numbers of cores.  Of course, when you use fewer cores 
it also slows down linearly, as IBM pointed out a few years ago.  It's a 
trade-off, and all that really matters is the maximum performance you can 
get when using all of the cores on a node.

Superlinear scaling is usually seen when your memory usage per core drops, 
resulting in fewer cache misses.  Superlinear scaling may continue as you 
add nodes, so a smaller single-node benchmark may be more realistic.  If 
your code uses shared memory (OpenMP or pthreads) within a node, then the 
shared cache on the Xeon may help performance if both cores access the 
same data.

In the end, make sure you're using the Intel compiler with all the bells 
and whistles turned on, and buy whatever gives the best bang per buck.

-Jim


On Wed, 28 Mar 2007, Nils Oberg wrote:

> Thanks for your help Jim.
>
> I performed some benchmarks on demo equipment from AMD and Intel and there 
> are some interesting differences between the two platforms for our code.  All 
> times are in seconds.
>
> Here are some results:
>
> 2x2 Opteron 2218 2.6 GHz with 4GB RAM:
> 1 core:  697
> 2 cores: 323
> 4 cores: 211
>
> 4x2 Opteron 875 2.2 Ghz with 8 GB RAM:
> 1 core: 531
> 2 cores: 333
> 4 cores: 181
> 6 cores: 143
> 8 cores: 139
>
> 1x4 Xeon 5355 2.6 GHz with 4 GB RAM:
> 1 core:  510
> 2 cores: 343
> 4 cores: 251
>
> 2x4 Xeon 5355 2.6 GHz with 8 GB RAM:
> 1 core:  516
> 2 cores: 314
> 4 cores: 228
> 6 cores: 195
> 8 cores: 167
>
>
> I don't understand why the Xeon performs better than the Opteron on one core, 
> but worse than the Opteron on 4 cores.  I tried a different CFD code and the 
> same pattern emerged.  Why might this be happening?
>
>
> I was getting better than linear speedup results for one of our programs.  Is 
> this possible?  Here are some results:
>
> 2x4 Xeon 5355 2.6 GHz with 8 GB RAM:
> cores: 1  7590
> cores: 2  4523   speedup: 1.68
> cores: 4  2060   speedup: 3.68
> cores: 8   916   speedup: 8.29
>
> 2x2 Opteron 2218 2.6 GHz with 4GB RAM:
> cores: 1  8497
> cores: 2  4360   speedup: 1.95
> cores: 4  1883   speedup: 4.51
>
>
> Does this make sense?
>
> Thanks for any help.
>
> Nils
>
>
>
> At 15:11 2/22/2007, Jim Phillips wrote:
>
>> You really need to run some benchmarks.  Failing that, look at the SPEC FP 
>> Rate results at http://www.spec.org/cpu2006/results/rfp2006.html  There are 
>> three different CFD codes in the benchmark suite.
>> 
>> 1x4 2.7 GHz Xeon  leslie3d = 15.0   total = 33.6
>> 2x4 2.7 GHz Xeon  leslie3d = 21.9   total = 54.1
>> 2x2 3.0 GHz Xeon  leslie3d = 25.8   total = 43.0
>> 2x2 2.6 GHz Optn  leslie3d = 28.3   total = 38.1
>> 2x2 2.8 GHz Optn  leslie3d = 36.3   total = 48.3  (PathScale compilers)
>> 
>> So, the dual-socket, dual-core Opteron *may* be your best bet, if your 
>> workload is similar to leslie3d.  Run some benchmarks.
>> 
>> -Jim
>> 
>> 
>> 
>> On Thu, 22 Feb 2007, Nils Oberg wrote:
>> 
>>> Hi Jim,
>>> 
>>> Thanks for your response.  I should probably describe the problem.  Our 
>>> application is a computation fluid dynamics (CFD) code.  My understanding 
>>> of CFD codes is that they are primarily memory bound.  Since the domain to 
>>> be modeled is broken up into chunks, during the course of a time-step in 
>>> the simulation a large number of messages (not necessary large amounts of 
>>> data) are passed between processors.
>>> 
>>> We're trying to decide between the following:
>>> 
>>> uni-processor quad-core Xeon 4 GB RAM ($2,300 / node)
>>> dual-processor quad-core Xeon 16 GB RAM ($5,800 / node)
>>> dual-processor quad-core Xeon 8 GB RAM ($4,600 / node)
>>> dual-processor dual-core Xeon 8 GB RAM ($3,800 / node)
>>> dual-processor dual-core Opteron 8 GB RAM ($3,200 / node)
>> 
>> --
>> Nils Oberg, Research Programmer
>> Civil & Environmental Engineering, University of Illinois at U-C
>> phone: 217-333-8365, web: http://vtchl.uiuc.edu
>


More information about the cluster-l mailing list