From noberg at uiuc.edu Thu Feb 22 08:44:46 2007 From: noberg at uiuc.edu (Nils Oberg) Date: Thu, 22 Feb 2007 08:44:46 -0600 Subject: [cluster-l] Single- vs. Dual- vs. Quad-core CPUs In-Reply-To: References: <7.0.1.0.2.20070119172450.00ec8dd8@uiuc.edu> Message-ID: <7.0.1.0.2.20070124095646.00ec0ec8@uiuc.edu> Hi Jim, Thanks for your response. I should probably describe the problem. Our application is a computation fluid dynamics (CFD) code. My understanding of CFD codes is that they are primarily memory bound. Since the domain to be modeled is broken up into chunks, during the course of a time-step in the simulation a large number of messages (not necessary large amounts of data) are passed between processors. We're trying to decide between the following: uni-processor quad-core Xeon 4 GB RAM ($2,300 / node) dual-processor quad-core Xeon 16 GB RAM ($5,800 / node) dual-processor quad-core Xeon 8 GB RAM ($4,600 / node) dual-processor dual-core Xeon 8 GB RAM ($3,800 / node) dual-processor dual-core Opteron 8 GB RAM ($3,200 / node) At 16:12 1/22/2007, Jim Phillips wrote: >Are you limited by memory bandwidth or clock speed? All of those >cores share the same memory bandwidth, but the clock speed is almost the same. I really don't know which is a limiting factor. I'm guessing it is memory latency (is that clock speed?) more than anything. >Is memory an issue? How much memory do you need per node and per >core? If you can use shared-memory within a node then you can add >cores without adding extra memory. Otherwise you may need to use >larger memory chips to fit enough memory into the node. Quad-core >is more flexible if you only need more memory on occasion since you >can drop down to one core per node. The programs currently don't use shared memory. I think would be fairly difficult to recode the software for shared memory, as it is using external libraries (Petsc, Parmetis) that rely on MPI. The newer Intel CPUs (Xeon 5000 series) require fully buffered RAM. I've read that FB RAM has both higher latency and lower bandwidth. Does this mean that, respectively, requests from the CPU take longer, and the amount of data transferred is smaller? A second question: The memory clock speed should be 50% of the CPU front-side bus speed, correct? In other words, I shouldn't get 533 Mhz memory with a 1333 Mhz FSB? Thanks for your help! Nils >On Fri, 19 Jan 2007, Nils Oberg wrote: > >>Hello, >> >>Our group is going to purchase a small cluster. I'm trying to decide >>if each node in the cluster should have dual- or quad-core >>CPUs. Does anyone have any advice for how to benchmark? Or other >>resources that might help me get started? >> >>As an FYI, I noticed that NCSA is building a new cluster >>(http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/Intel64Cluster/) >>that has dual-socket quad-core compute nodes for a total of 8 cores. >> >>Nils >> >> >> >>-- >>Nils Oberg, Research Programmer >>Civil & Environmental Engineering, University of Illinois at U-C >>phone: 217-333-8365, web: http://vtchl.uiuc.edu >> >>_______________________________________________ >>cluster-l mailing list >>cluster-l at ks.uiuc.edu >>http://www.ks.uiuc.edu/mailman/listinfo/cluster-l > >-- >Nils Oberg, Research Programmer >Civil & Environmental Engineering, University of Illinois at U-C >phone: 217-333-8365, web: http://vtchl.uiuc.edu From rdrobert at uiuc.edu Thu Feb 22 10:10:52 2007 From: rdrobert at uiuc.edu (Ricky Robertson) Date: Thu, 22 Feb 2007 10:10:52 -0600 (CST) Subject: [cluster-l] Single- vs. Dual- vs. Quad-core CPUs Message-ID: <20070222101052.AKV69602@expms5.cites.uiuc.edu> hi nils. this is ricky from hydro/ag-econ. my experience (on ncsa's tungsten and a scavanged mini-cluster we put together in ag-econ) has been that the memory bottleneck is the most important. my (non-expert) recommendation would be to go with the dual processor/dual core nodes and get twice as many of them (since they cost roughly half as much). that way you are maximizing the bandwidth between the same amount of memory and the same amount of feasible processing capability. there is always the possibility that the network traffic will slow things down, but you'll have to figure out which is the limiting factor. naturally, that probably can't be done until you build at least part of your system, meaning the decisions have already been made. :) peace, ricky ---- Original message ---- >Date: Thu, 22 Feb 2007 08:44:46 -0600 >From: Nils Oberg >Subject: Re: [cluster-l] Single- vs. Dual- vs. Quad-core CPUs >To: jim Phillips >Cc: cluster-l at ks.uiuc.edu > >Hi Jim, > >Thanks for your response. I should probably describe the >problem. Our application is a computation fluid dynamics (CFD) >code. My understanding of CFD codes is that they are primarily >memory bound. Since the domain to be modeled is broken up into >chunks, during the course of a time-step in the simulation a large >number of messages (not necessary large amounts of data) are passed >between processors. > >We're trying to decide between the following: > >uni-processor quad-core Xeon 4 GB RAM ($2,300 / node) >dual-processor quad-core Xeon 16 GB RAM ($5,800 / node) >dual-processor quad-core Xeon 8 GB RAM ($4,600 / node) >dual-processor dual-core Xeon 8 GB RAM ($3,800 / node) >dual-processor dual-core Opteron 8 GB RAM ($3,200 / node) ricky r -- rdrobert at uiuc.edu Carpe faenum: Seize the hay! rickyr at andrews.edu Little evidence supports the claim that refined sugar intake significantly influences behavior or cognitive performance in children. --J Wade White & Mark Wolraich in Am J Clin Nutr 1995; 62(suppl):242S From jim at ks.uiuc.edu Thu Feb 22 14:11:14 2007 From: jim at ks.uiuc.edu (Jim Phillips) Date: Thu, 22 Feb 2007 14:11:14 -0600 (CST) Subject: [cluster-l] Single- vs. Dual- vs. Quad-core CPUs In-Reply-To: <7.0.1.0.2.20070124095646.00ec0ec8@uiuc.edu> References: <7.0.1.0.2.20070119172450.00ec8dd8@uiuc.edu> <7.0.1.0.2.20070124095646.00ec0ec8@uiuc.edu> Message-ID: You really need to run some benchmarks. Failing that, look at the SPEC FP Rate results at http://www.spec.org/cpu2006/results/rfp2006.html There are three different CFD codes in the benchmark suite. 1x4 2.7 GHz Xeon leslie3d = 15.0 total = 33.6 2x4 2.7 GHz Xeon leslie3d = 21.9 total = 54.1 2x2 3.0 GHz Xeon leslie3d = 25.8 total = 43.0 2x2 2.6 GHz Optn leslie3d = 28.3 total = 38.1 2x2 2.8 GHz Optn leslie3d = 36.3 total = 48.3 (PathScale compilers) So, the dual-socket, dual-core Opteron *may* be your best bet, if your workload is similar to leslie3d. Run some benchmarks. -Jim On Thu, 22 Feb 2007, Nils Oberg wrote: > Hi Jim, > > Thanks for your response. I should probably describe the problem. Our > application is a computation fluid dynamics (CFD) code. My understanding of > CFD codes is that they are primarily memory bound. Since the domain to be > modeled is broken up into chunks, during the course of a time-step in the > simulation a large number of messages (not necessary large amounts of data) > are passed between processors. > > We're trying to decide between the following: > > uni-processor quad-core Xeon 4 GB RAM ($2,300 / node) > dual-processor quad-core Xeon 16 GB RAM ($5,800 / node) > dual-processor quad-core Xeon 8 GB RAM ($4,600 / node) > dual-processor dual-core Xeon 8 GB RAM ($3,800 / node) > dual-processor dual-core Opteron 8 GB RAM ($3,200 / node) > > > At 16:12 1/22/2007, Jim Phillips wrote: >> Are you limited by memory bandwidth or clock speed? All of those cores >> share the same memory bandwidth, but the clock speed is almost the same. > > I really don't know which is a limiting factor. I'm guessing it is memory > latency (is that clock speed?) more than anything. > >> Is memory an issue? How much memory do you need per node and per core? If >> you can use shared-memory within a node then you can add cores without >> adding extra memory. Otherwise you may need to use larger memory chips to >> fit enough memory into the node. Quad-core is more flexible if you only >> need more memory on occasion since you can drop down to one core per node. > > The programs currently don't use shared memory. I think would be fairly > difficult to recode the software for shared memory, as it is using external > libraries (Petsc, Parmetis) that rely on MPI. > > The newer Intel CPUs (Xeon 5000 series) require fully buffered RAM. I've > read that FB RAM has both higher latency and lower bandwidth. Does this mean > that, respectively, requests from the CPU take longer, and the amount of data > transferred is smaller? > > A second question: The memory clock speed should be 50% of the CPU front-side > bus speed, correct? In other words, I shouldn't get 533 Mhz memory with a > 1333 Mhz FSB? > > Thanks for your help! > > Nils > > >> On Fri, 19 Jan 2007, Nils Oberg wrote: >> >>> Hello, >>> >>> Our group is going to purchase a small cluster. I'm trying to decide >>> if each node in the cluster should have dual- or quad-core >>> CPUs. Does anyone have any advice for how to benchmark? Or other >>> resources that might help me get started? >>> >>> As an FYI, I noticed that NCSA is building a new cluster >>> (http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/Intel64Cluster/) >>> that has dual-socket quad-core compute nodes for a total of 8 cores. >>> >>> Nils >>> >>> >>> >>> -- >>> Nils Oberg, Research Programmer >>> Civil & Environmental Engineering, University of Illinois at U-C >>> phone: 217-333-8365, web: http://vtchl.uiuc.edu >>> >>> _______________________________________________ >>> cluster-l mailing list >>> cluster-l at ks.uiuc.edu >>> http://www.ks.uiuc.edu/mailman/listinfo/cluster-l >> >> -- >> Nils Oberg, Research Programmer >> Civil & Environmental Engineering, University of Illinois at U-C >> phone: 217-333-8365, web: http://vtchl.uiuc.edu >