an approach for robust design of turbulent

AN APPROACH FOR ROBUST DESIGN OF TURBULENT CONVECTIVE SYSTEMS

Nathan Rolander, Jeffrey Rambo, Yogendra Joshi, Janet K. Allen, Farrokh Mistree1 G. W. Woodruff School of Mechanical Engineering,

Georgia Institute of Technology, GA - 30332-0405, USA

ABSTRACT

The complex turbulent flow regimes encountered in many thermal-fluid engineering applications have

proven resistant to the effective application of systematic design because of the computational expense of

model evaluation and the inherent variability of turbulent systems. In this paper the integration of a novel

reduced order turbulent convection modeling approach based upon the Proper Orthogonal Decomposition

technique with the application of robust design principles implemented using the compromise Decision

Support problem is investigated as an effective design approach for this domain. In the illustrative example

application considered, thermally efficient computer server cabinet configurations that are insensitive to

variations in operating conditions are determined. The computer servers are cooled by turbulent convection

and have unsteady heat generation and cooling air flows, yielding substantial variability, yet have some of

the most stringent operational requirements of any engineering system. Results of the application of this

approach to an enclosed cabinet example show that the resulting robust thermally efficient configurations

are capable of dissipating up to a 50% greater heat load and a 60% decrease in the temperature variability

using the same cooling infrastructure.

NOMENCLATURE

Symbols ai weighting factor

,i id d+ − deviation variables ( )ig x inequality constraint function ( )ih x equality constraint function

m mass flow rate m number of observations/number of

goals n degrees of freedom/number of

design variables p number of inequality constraints q number of equality constraints s number of servers

( )u x observed phenomena x design variables xi,L,U lower/upper bound of design

variable xi ( )A x achievement function

C coefficient matrix ( , )F u β flux function

G flux goal vector Gi design goal target Q heat generation rate R,R’ covariance matrix T temperature U observation ensemble

oV observation set Wi goal weighing factor Z Archimedean objective function ϕ basis function Γ control surface

,Ω ∂Ω system domain and boundary

Subscripts o ensemble average r reconstruction

1Corresponding Author

Phone: (404) 385-2810; Fax: (404) 894-8496; E-mail: [email protected]

1

mailto:[email protected]

1 DESIGNING ROBUST COMPLEX TURBULENT FLUID SYSTEMS - CHALLENGES

The complex turbulent flow regimes encountered in many thermal-fluid engineering applications have

proven resistant to the effective application of systematic design. This is because the Computational Fluid

Dynamics (CFD) models required for analysis are computationally expensive, particularly for the latter

stages of design where more accurate solutions are required, making the application of iterative

optimization algorithms extremely time consuming. Furthermore, turbulent flow regimes are inherently

complex, requiring significant modeling simplifications and assumptions to be made in their simulation [1],

resulting in approximate solutions only. The Reynolds averaged Navier-Stokes based CFD approach

employed in simulation of engineering systems is based upon the mean flow field, with the turbulent

perturbations modeled as Reynolds stresses [1, 2]. Finally, in any complex system design, multiple

objectives must be considered in a mathematically rigorous fashion that also accurately reflects the

designer’s preferences. In many thermal-fluid applications the tradeoffs between energy efficiency, system

size, cost, and potential performance variability must be considered.

A representative example of a complex turbulent convective system in need of effective design is the

configuration of data centers. Data centers are computing infrastructures housing large quantities of data

processing equipment. This equipment is currently air cooled, and the resulting turbulent flow distribution

is both highly complex and variable. Furthermore, the reliability requirements of data centers are

exceedingly high, as discussed further in Section 3. Previous application of simulation based design for

data centers is limited to ad-hoc analyses based on experience and simple correlations [3, 4], simple data

center level CFD modeling with some comparison of configurations [5-10], and some limited geometric

optimization using design of experiments to create coarse response surface models with very few variables

[11-13]. All previous work utilizes the single objective of temperature minimization.

The development of an effective design approach for complex turbulent thermal-fluid systems, such as the

data center example, is thus hindered by three specific challenges:

1. Flow complexity – The CFD models required to analyze the systems are impractical to use in

iterative optimization algorithms, particularly in the presence of geometrical complexity and

multiple length scales.

2. Inherent variability – In complex three-dimensional turbulent flows, modeling uncertainties and

choice of turbulence closure models lead to variability in predictions.

3. Multiple objectives – The multiple design objectives in a complex system should represent the

designer’s preferences accurately.

These challenges are addressed in this paper through the application of three constructs: (1) the Flux-

Matching Procedure (FMP) augmenting the Proper Orthogonal Decomposition technique (POD), (2) robust

design principles, and (3) the compromise Decision Support Problem (cDSP). The POD is a highly

2

computationally efficient meta-modeling approach, providing the foundation for the development of

reduced order turbulent convective simulations [14], including the FMP. The principle of robust design is

used to find solutions that are insensitive to changes in both internal and external operating conditions.

This yields solutions that maintain their desired performance accounting for variability in both the system

and inaccuracies in the model of the system [15]. The cDSP, a hybrid formulation of mathematical

programming and goal programming, enables multi-objective solution finding through the specification of

multiple goals, and thus is well suited to engineering applications [16].

The challenge in the application of robust design is the computation of the non-linear numerical

derivatives, required for determination of the system variance, that require many functional evaluations of

computationally expensive CFD models. Simple response surface models are inadequate, as the non-

linearity of the systems is not well represented by linear or quadratic approximations, as shown by the

analyses in [8, 12, 13]. Krieging, multivariate adaptive regression splines, and other more advanced

interpolation approaches offer superior approximations [17]; however, these methods also require a large

number of data points for interpolation, a number which increases exponentially with the number of design

variables [17].

In Figure 1, the requirements and constructs for an approach for the robust design of turbulent convective

systems is presented. The problem presents three requirements: reduced order modeling, need to account

for variability and multi-objective trade-offs. These are instantiated in the approach by adopting three

constructs: FMP augmented POD, robust design and the cDSP.

Flow complexity

Inherent variability

Multipleobjectives

Requirements

Approach for robust design of

turbulent convective systems

Integration

FMP augmented POD

Robust design

The compromise DSP

Constructs

Flow complexity


Multipleobjectives

Requirements



Integration

Flow complexity


Multipleobjectives

Requirements

Flow complexity


Multipleobjectives

Requirements



Integration

FMP augmented POD

Robust design

The compromise DSP

Constructs

FMP augmented POD

Robust design

The compromise DSP

Constructs

FMP augmented POD

Robust design

The compromise DSP

Constructs

FMP augmented POD

Robust design

The compromise DSP

Constructs

Figure 1 - Requirements, constructs, and integration for a robust server cabinet design approach

The approach illustrated in Figure 1 is demonstrated through application to the robust design of data center

server cabinets; and the outline of this paper is as follows. In Section 2 the conceptual description and

explanation of the three constructs used are presented. In Section 3 the background information and

description of example problem is shown. In Sections 4 & 5 the formulation of the design problem using

the developed approach is described. In Section 6 a presentation and discussion of the results of the

3

example problem is given. Lastly, in Section 7 the discussion and review of the overall effectiveness of the

approach is presented.

2 THEORETICAL CONSTRUCTS AND INTEGRATION

The design approach proposed is a merging of robust design principles with the cDSP utilizing the FMP

augmented POD meta-modeling technique. Each of these constructs is described conceptually in turn

below, with details given in their application.

2.1 The Proper Orthogonal Decomposition

An emerging reduced order model development approach for turbulent flow is the Proper Orthogonal

Decomposition (POD), also known as the Karhunen-Loève Decomposition [18]. The POD has previously

been used successfully to create low dimensional steady state flow models within a prescribed range of

parameters [19]. The POD is similar to any modal decomposition, such as the Fourier series, where a

system is decomposed into a series of fundamental modes and a linear approximation is obtained using the

expansion theorem:

1

( ) ( )i ii

u x a xϕ∞

=

= ∑ (1)

Solution methods based on Eq. (1) are generally classified as Galerkin or spectral methods, where u x( ) is

the function to be approximated, such as the flow field, iϕ are the basis functions and ai are the weighting

factors. The utility of the POD is that it is a stochastic tool, which uses principal component analysis to

find the optimal linear basis for the modal decomposition presented in Eq. (1). The POD is well-suited for

CFD modeling as the complete flow field reconstruction is obtained; the solution is not a black box single

response value. Therefore, direct analysis of the solution can be made to ascertain the reasons behind a

response to the change in input parameters.

The concept of the POD computation is best explained graphically. Given a set of multi-dimensional data,

the aim of the POD is to accurately represent the complete data set in the most efficient manner possible by

using the minimum number of basis functions. This is accomplished through finding the principal axes of

the data set, representing the directions of maximum scatter. The orientation of these principal axes is

found through orthogonal distance regression, which is represented graphically versus traditional vertical

distance regression in . This orthogonal fit produces a smaller sum of the squares of the residuals

than any other linear fitting approach [20].

Figure 2

4

0 5 100

2

4

6

8

10

x

y

y residuals

0 5 100

2

4

6

8

10

x

y

orthogonal residuals

raw dataleast squares fitorthogonal fit

Figure 2 - y distance regression vs. orthogonal distance regression visualization

The basis functions are the projection of the data set onto each of these principal axes, which are then

normalized. As the first few principal axes account for the majority of the scatter or variability of the data

set, the small perturbations corresponding to movement along the last few principal directions can be

truncated while maintaining an accurate representation of the complete system. This representation enables

complex systems, such as turbulent flow fields, to be expressed using as a relatively small set of weight

coefficients associated with the POD modes of the system. The geometry for previous POD based flow

modeling has been either prototypical (such as flow around a cylinder) [21-24], or simple geometry where

inhomogeneous boundary conditions are easily homogenized by the inclusion of a source function in the

decomposition [25-27]. None of these previous applications has direct relevance to engineering design

applications. Summary of a POD based flow model suitable for engineering design, the FMP, is provided

in Section 4.

2.2 Robust Design Principles

The underlying principle of robust design is to determine superior solutions to design problems by

minimizing the effects of variation on system performance, without eliminating their causes. There are two

broad categories of robust design. Both simultaneously bring the mean system performance to a target and

minimize performance variation; however, the sources of the variation are different [15].

Type I – minimizing variations in performance caused by variations in noise factors

(uncontrollable parameters)

Type II – minimizing variations in performance caused by variations in control factors (design

variables)

Traditional optimization techniques only bring the mean response to a target and do not consider the effects

of the variation in the system parameters or control factors in the performance evaluation. By accounting

for variation, robust design techniques produce results that are effective regardless of changing operating

conditions, system parameters, assumptions and/or small inaccuracies made during the system modeling

5

process. This investigation focuses upon Type II, as the dominant system variables are considered as

design variables, and sources of noise are insignificant, as discussed later in Section 4.4.

Y

X

Objective Function

Deviationat Optimal Solution

Deviationat Robust Solution

DesignVariableRobust

SolutionOptimalSolution

Response

Constraint Boundary

Optimal Solution Bounds Robust

Solution Bounds

RobustSolution

OptimalSolution

FeasibleDesignSpace

X2

X1DesignVariable

DesignVariable

InfeasibleSolution Region

(a) (b)

Figure 3 - Type II Robust Design (a) goals & (b) constraints representation

Figure 3

A more in depth explanation of robust design is presented in [15]. The application of Type II robust design

is shown in Figure 3 (a). To reduce the variation of system response, y, through changes in the design

variable, x, the designer is interested in finding a flat region of the curve near the performance target. The

shallow slope of the response curve at the robust solution translates to a solution that still performs as

expected, despite variation in the design variables. The tradeoff between finding the robust or optimizing

solution is based upon the level of variation of each design variable and the designer’s preferences.

Constraints incur an added layer of complexity because the variation of system response must be

considered on top of the nominal response value. This variance consideration is represented in

(b). At the optimal solution point the solution violates the constraint, since part of the area created by the

variability in the control variables lies outside of the feasible region, despite having a feasible average

value. The entire area surrounding robust solution point is fully inside the feasible region and hence is

viable even in the worst case variability scenario. This consideration of variability through robust design is

important, as the RANS CFD calculations do not capture the inherent modeling variability.

2.3 The Compromise DSP

The objectives of bringing the mean to target and minimizing the variation of the response are required to

be achieved simultaneously; therefore, a mathematical construct capable of modeling and solving for

multiple objectives and constraints is required. The method used in this approach is the cDSP [16]. The

structure of the cDSP in the Archimedean, or weighted sum formulation is presented below in Table 1. The

conceptual basis of the cDSP is to minimize the difference between what is desired (the target Gi) and what

can be achieved ( ( )iA x ). The difference between these values is the deviation value, d and i+

id − ,

6

representing the overachievement and underachievement of each goal respectively. These deviations are

constrained to positive values, and no simultaneous over and under achievement is allowed.

Table 1 - Mathematical formulation of the compromise DSP

Given An alternative to be improved through modification Assumptions used to model the domain of interest The system parameters:

n number of system variables p number of inequality constraints q number of equality constraints m number of system goals Find Design Variables xi i = 1,…,n Deviation Variables ,i id d+ − i = 1,…,m Satisfy Inequality Constraints ( ) 0ig x ≤ i = 1,…,p Equality Constraints h x( ) 0i = i = 1,…,q Goals ( )i i i iA x d d G+ −− + = i = 1,…,m Bounds ,i L i i U,x x x≤ ≤ i = 1,…,n

i = 1,…,m 0; 0; 0i i i id d d d+ − + −≥ ≥ =i Minimize Deviation Function: Archimedean formulation

(1

m

i i ii

)Z W d d+ −

=

= +∑ i = 1,…,m

This cDSP template formulation shown in Table 1 constitutes the interface of the approach; yielding an

augmented cDSP construct for the robust design of turbulent convective systems. Further detail on the

formulation and solution of the cDSP is given in the application to the server cabinet configuration example

in Section 5.

3 ROBUST DESIGN OF DATA CENTER SERVER CABINETS

The approach for robust design of complex turbulent convective systems presented in this paper (see Figure

1) is demonstrated through application to data center server cabinets. The analysis model formulation and

robust design application is presented below.

3.1 What is a Data Center?

Data centers are computing infrastructures that house large quantities of data processing equipment. These

facilities have grown greatly in both size and power dissipation over the past decade, to as large as 5000 m2

7

dissipating several MW of power2. The data processing equipment is stored in 2 m high enclosures known

as cabinets. The demand for increased computational performance has led to very high power density

cabinet design, with a single cabinet dissipating up to 30 kW2. Thermal management is provided by

computer room air conditioning (CRAC) units that deliver cold air to the cabinets through perforated tiles

placed over an under-floor plenum. The cooling costs of data centers represent up to 40% of the energy

consumption of center operation3.

Thermal management difficulties in data centers, caused by the rapidly increasing power densities of

modern computational equipment, has lead to very high flow rates of cooling air, resulting in turbulent flow

regimes with large variability in velocity magnitude. In data center server cabinets this variability is caused

by variable speed fans in the servers, CRAC units, and unsteady heat generation by the processors, yielding

a highly variable problem. However, these computers are required to operate with near 99.9999%

reliability. Furthermore, the high thermal gradients lead to hot spots and thermal inefficiency as hot

exhaust air is drawn into the cooling air stream, resulting in overheating. A desired objective in data center

design is uniformity in the temperature distribution, as there are few effective modeling approaches to cope

with variability or temperature gradients. This uniformity approach is not only thermally and economically

inefficient, but also often impractical to implement [8-10, 12, 13, 28].

The approach taken in this investigation is to create energy efficient and reliable solutions through effective

application of robust design to create server configurations that allow the designer to trade off between

ultimate thermal efficiency and operational stability. The thermal efficiency measures apply primarily to

the cooling air supplied by the CRAC units, as this is directly proportional to the continual operating cost of

the facility. Addressing these thermal management and reliability challenges will contribute significantly

towards increasing the data center’s thermal and economic efficiency.

3.2 Partitioning the Data Center Problem

In the example considered only a single cabinet is investigated. This partitioning is possible because the

cabinet is partially isolated from the data center, interacting only through the supply of cool air from the

raised floor plenum, and the exhausted hot air though the top of the cabinet. This allows the cabinet system

to be decoupled from the overall data center system. In this manner, the configuration of a complete data

center can be broken down into individual server cabinet configuration sub-problems, shown in Figure 4.

2 The Uptime Institute, 2004, "Heat Density Trends in Data Processing, Computer Systems and Telecommunications

Equipment", http://www.upsite.com/TUIpages/tuiwhite.html, accessed on 2/16 2004. 3 Lawrence Berkeley National Laboratory and Rumsey Engineers, 2003, "Data Center Energy Benchmarking Case Study",

http://datacenters.lbl.gov/, accessed on 11/20 2003.

8

http://www.upsite.com/TUIpages/tuiwhite.html

http://datacenters.lbl.gov/

The following design reconfiguration possibilities are considered. (1) Equipment of differing power

density can be distributed within the cabinets for more efficient cooling. This can be implemented through

physical relocation of the hardware, and/or by distributing the processing tasks to reduce the load on critical

equipment [29-31]. (2) The volume of cooling air supplied to the cabinet can be increased, accomplished

via a CRAC unit output increase. A combination of these reconfiguration options is explored through the

following problem geometry.

3.3 Server Cabinet Problem Geometry

A fully enclosed vertical cabinet containing ten individual rack mounted servers has been selected as the

example system for investigation. A two dimensional model of a cross-section of a typical cabinet is

constructed as described in this section. This two dimensional model is a representative, although simple,

model of the system dynamics because of the orientation and symmetry of the servers. This cabinet

geometry is shown in Figure 4. It is noted that the formulation described in this paper has also been applied

to a higher fidelity three dimensional model at added computational expense [20].

Server 2

Server 3

Server 4

Server 5

Server 6

Server 7

Server 8

Server 9

Server 10

H

W

Vin

Section a

Section b

Section c

Cold Supply Air

Hot Exhaust Air

x

z

Lc

Server 1

Ls

Fan Model

Hs

x

z

(a) (b)

Isoflux BlocksQa,b,c

Figure 4 - Cabinet configuration & variables

The cabinet dimensions are height H = 1.93 m and width W = 0.87 m. Air enters the server cabinet

enclosure from the bottom cutout, Lc = 0.39 m at velocity Vin with temperature Tin, supplied through the

under floor plenum from the CRAC unit. The flow output of the CRAC units can be controlled resulting in

increased or decreased Vin; however, the complex flow patterns in the under floor plenum result in

significant variation. This variation is not accurately predicted by the RANS CFD codes used to model

plenum flow distributions [5, 32, 33], and thus this data must be estimated or empirically gathered. This

9

can be accomplished using a flow hood as used in [20], or other flow transducers such as a Pitot tube or hot

wire anemometer.

The cooling air is distributed within the cabinet and drawn through the various servers, as shown by the

flow arrows in Figure 4. Although internal flow patterns are complex, a mass balance exists under steady

state conditions between the air entering the cabinet and leaving through the top exhaust vent. The shaded

areas in Figure 4 (a) represent unfilled server racks where no air can flow. All solid surfaces are considered

no-slip, impermeable, and adiabatic. The system is analyzed at steady state, as transients are not of concern

in continually operational data center environments.

The individual server geometry is shown above in (b), where Ls = 0.61 m and Hs = 0.09 m. This

model has two isoflux blocks that act as flow obstructions, each representing a chip in a dual processor

server. Both blocks have a constant heat generation rate Q, which is dissipated through convective heat

transfer to the air flowing through the server. Note that these heated blocks are referred to as “chips” for

this illustrative design problem, although the two dimensional nature of the simulation means the heated

blocks are the same unit depth as the entire server. This simulated power dissipation requires lower heat

generation levels to maintain realistic chip temperatures, as enhanced chip level thermal management is not

being considered. The flow through the server is provided through a 130 CFM fan (0.0613 m3/s), modeled

by a cubic pressure–velocity relationship.

Figure 4

The cabinet is divided into three sections: a, b and c, corresponding to the lower two, middle three, and

upper five servers as shown in Figure 4. Qa, Qb, and Qc denote the heat generation of each processor in the

respective cabinet section. This sectioning of the cabinet was performed in order to reduce the number of

design variables to simplify the illustrative example considered but is not a limitation of the approach.

4 SERVER CABINET ANALYSIS MODEL DEVELOPMENT

In this investigation, analytical models are developed for the fluid flow and heat transfer, which are

combined to create a cabinet system model. Before the meta-model can be developed and validated, CFD

analysis of the cabinet is required. The turbulent flow modeling RANS equations are solved using the

standard k-ε model in the CFD software FLUENT v. 6.1.22 [34]. Details on the simulation mesh and

convergence criterion can be found in [35]. The flow profile is shown in Figure 5 (a) for Vin = 0.95 m/s.

10

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

20

30

40

50

60

70

80

Inlet Velocity (m/s)C

hip

Tem

pera

ture

( o C

)

Section a: servers 1-2 Section b: servers 3-5 Section c: servers 6-10

(a) (b) Figure 5 - Cabinet (a) velocity field (b) chip temperature profile

ure 5

Figure 5

The cabinet temperature profile was found to be essentially isothermal, except for the thin thermal

boundary layers surrounding the chips. The resulting server chip temperatures for a parameter sweep of Vin

with all chip powers set to 60 W/m is shown in Fig (b). The server temperature profile shows the three

sections have unique responses, as seen in (b). These clusters of server responses were used to

establish the cabinet sections a, b, and c shown in Figure 4 to arrive at a more manageable design problem.

4.1 Computing the POD Basis

The FMP flow model is built upon the basis functions created using the POD approach [14] described in

Section 2. A series of system observations, which can be either numerically determined or experimentally

gathered, is first collected into an ensemble and then mean centered. Mean centering the observation data

changes the problem to the reconstruction of a perturbation from an average condition, allowing the POD

modes to capture the less dominant system dynamics. Furthermore, for flow applications this mean

centering helps homogenize the boundary conditions. This mean centering adds a source function to the

expansion theorem in Eq. (1), where u is the ensemble average computed as the row-based average of

.

( )o x

( )u x

1

( ) ( ) ( )o ii

u x u x a xϕ∞

=

= +∑ i (2)

The empirical basis iϕ is found by maximizing the projection of the observations u x onto the basis

functions, solving the following constrained variational problem through extremitizing the functional:

( )

( ) ( )2 2,u ϕ λ ϕ 1− − (3)

11

where ⋅ denotes ensemble averaging, (,) is the L2 inner product, and . is the standard L2 norm,

assuming the observation data is sufficiently smooth. The constraint term ( )2 1ϕ − is included to produce

a normalized basis. Variational calculus can be applied to express the functional in Eq. (3) as the integral

equation:

( , ') ( ') ' ( ')R x x x dx xϕ λϕΩ

=∫ (4)

where is the cross-correlation function. To compute , an ensemble of m

system observations containing n DOF each are assembled as a matrix U. For the server cabinet example

these observations are the FLUENT CFD velocity and turbulent viscosity fields, for the set of inlet

velocities, V , creating the ensemble of observations:

*( , ') ( ) ( ')R x x u x u x≡< ⊗ >

0,0.25,0.5,0.75,o =

( , ')R x x

1.0,1.25,1.5,1.75, 2.0 m/s

1 2, ,..., n mmU u u u ×= ∈ (5)

Then the cross-correlation tensor of the observations is taken:

( )1( , ') T nmR x x UU n×= ∈ (6)

The eigenvectors of are the basis functions ( , ')R x x iϕ , called POD modes, and the eigenvalues determine

in decreasing magnitude the order of the modes. The eigenvalue spectrum is typically used as an ‘energy

criteria’ where the magnitude of each eigenvalue determines what portion of the total variation of the

system the corresponding eigenvector captures.

The basis produced by the POD can be proven to be the optimal linear decomposition, in the sense more

energy is captured for a given number of modes than any other linear decomposition [14]. Therefore in

general the first POD modes will better represent a system than the first p modes of any other linear

decomposition. The POD is able to create such a large reduction in the number of DOF in a system

because the eigenvalue spectrum exhibits a sharp decay, implying that only a few modes are needed to

create an accurate system representation. Further accuracy enhancements and computational discussions

are presented in [19].

p m≤

4.2 The Flux-Matching Procedure

With the POD modes computed, a method is required to enable the reconstruction of an arbitrary solution

within the bounds of the original observations. Thus the Flux-Matching Procedure (FMP) is developed, the

concept of which is to reconstruct a solution using the POD modes such that the sum of the weighted

modes satisfies the specified boundary conditions. This mass or energy flux across a control surface

can be mathematically represented as a flux function: iΓ ⊂ ∂Ω

ˆ( , )i

F u uβ ρβΓ

nds= ⋅∫ (7)

12

Depending upon the transport phenomena being modeled, the parameter β can be changed to describe the

flow of mass ( 1β = ), momentum ( uβ = ), or energy ( Eβ = ). The mass flux case is used for the

reconstruction of the velocity field, and thus the application of Eq. (7) to a control surface Γ yields the

mass flow rate . To reconstruct an approximate solution the fluxes are expressed as a vector of goals

, for which a specific mass flux goal is desired through each of the set of q corresponding control

surfaces . This flux function defines the desired reconstructed flow field u such that

, and thus achieving the desired mass flow rates across the surfaces

i

r

m

1 2, ,...Γ Γ

qG∈

(G F=

, qΓ = Γ

)ru Γ . The solution procedure

is thus to find the set of weight coefficients that minimize the error on the set Γ :

1

min ' ( )p

i ii

G a F ϕ=

−

∑ where ( )' oF u= −G G (8)

The corrected mass flux goal vector is required as the POD modes are mean centered, as such the goals

must also be defined as deviations from the mean. The modal summation is carried to modes

because the optimal reconstruction may require less than the full spectrum of modes, but always at least as

many modes as there are goals to match. This is true if the summation in Eq. (8) is not convergent, and

thus is truncated at the point giving the lowest error with respect to the mass flow rate goals. The weight

coefficients ai are found by assembling a coefficient matrix, C, by applying Eq. (8) to the q surfaces of the

p POD modes:

'G

q p m≤ ≤

( ) m qC F ϕ ×= ∈ (9)

Eq. (10) can then be applied, where ( )+⋅ is the Moore-Penrose pseudo-inverse, yielding the least squares

approximation.

(10) 'a C G+= i

The strength of the FMP is that only enough POD modes need to be generated in order to accurately

represent the system dynamics, as no interpolative procedures are employed as have been used in previous

POD based reconstruction approaches [36-39]. Furthermore, this approach avoids the computationally

expensive Galerkin projection procedure, which is less efficient and can produce erroneous reconstructions

[19]. Because the POD modes satisfy the governing equations [19], their superposition creates a solution

that most closely matches the desired goals, yet still constrained by the system physics. Thus an accurate

boundary profile for the flux specified is retained in the reconstruction, despite using an integral

formulation.

The resulting FMP based cabinet flow model has only 9 DOF, representing a 5 order of magnitude decrease

from the CFD model. Computation of this reduced DOF model takes under 1 second, compared to ~½

hour for the CFD model, measured on a high end desktop PC4. Comparing the flow vector fields from the

4 Single Intel P4 2.4GHz processor with 2GB of RAM

13

FMP solution to a CFD generated case not part of the original observations reveals the FMP solution to

have less than ~5-10% difference over the entire domain [19]. In this section only the fundamentals of the

POD and FMP methods are presented. Further accuracy investigations and validation of the FMP can be

found in [19].

4.3 Heat Transfer Modeling Approach

With the flow field determined by the FMP based flow model, solving the energy equation is

straightforward. A finite volume approach is used to model the heat transfer. This is done using the power

law to approximate the steady state heat flux between adjacent control volumes in two dimensions, as given

in [40].

In this equation, cp is

effective thermal condu

where the turbulent eff

= 0.85 [34]. Afte

the model is implemen

difference in maximum

found to be less than 2.

tPr

4.4 Server Cabinet

In any design problem

goals and constraints.

efficiently with minim

yields the following de

System Design Objectiv

Minimize

Minimize

Minimize

System Design Constra

All server

The cabin

00

( ) eff eff

p p p

k kd d dT d dTuT vTdx c dx dy c dy c

ρ ρ ρ

Τ + − + − =

(11) Sdt

the specific heat, S the volumetric heat generation, and ρ the fluid density. The

ctivity, keff, is computed using Eq. (12),

p teff

t

ck k

Prµ

= + (12)

ective viscosity, tµ , is computed using the FMP and the turbulent Prandtl number

r validating the thermal model against analytical and accepted numerical solutions,

ted for the cabinet geometry for a heat generation of 60 W/m per chip. The average

chip temperatures between the finite volume model and FLUENT CFD model is

5%, and thus adequate for this application.

System Model

the first step is to define the objectives and specifications, forming the problem

In this problem, the cabinet is to be configured such that it operates effectively and

um performance variation while using the minimum cooling air flow rate. This

sign objectives and specifications:

es:

flow rate of cooling air supplied to cabinet

server chip temperatures

sensitivity of configuration to changes in cabinet operating conditions

ints:

chips must be operate at under 85 oC

et must dissipate the required total heat load

14

These goals are explained and derived in detail in their application in the cDSP formulation. The next step

is to classify the control variables, noise factors, constants, and identify the appropriate system responses.

These variables and system model schematic is shown below in . Figure 6

Figure 6 - Server Cabinet system model diagram

Response Parameters (y):Chip Temperatures , Ti (oC)

Response Parameters (y):Chip Temperatures , Ti (oC)

Goals:Minimize Inlet air velocity

Minimize Chip TemperaturesMinimize Chip Temperature Variation

Goals:Minimize Inlet air velocity

Minimize Chip TemperaturesMinimize Chip Temperature Variation

Control Variables (x):Inlet air velocity, Vin [0, 1] m/s

Section a chip power, Qa [0, 200] WSection b chip power, Qb [0, 200] WSection c chip power, Qc [0, 200] W

Control Variables (x):Inlet air velocity, Vin [0, 1] m/s

Section a chip power, Qa [0, 200] WSection b chip power, Qb [0, 200] WSection c chip power, Qc [0, 200] W

Server Cabinet Model

iterate

Constraints:Total Cabinet Power Qtotal = Gpower

All Chip Temperatures < 85oC

Constraints:Total Cabinet Power Qtotal = Gpower

All Chip Temperatures < 85oC

Constants (c):Total Cabinet Power, Qtotal [1.8, 2.4] kW

Constants (c):Total Cabinet Power, Qtotal [1.8, 2.4] kW

The control variables, x , represent the major controllable design parameters of the inlet air velocity and the

chip heat generation from the servers in sections a, b, and c as described in Section 3.3. These parameters

have the largest impact on system performance and need to be varied in order to achieve the different

design goals. The value of the response, y, is used to evaluate the objectives as well as the constraints in

the cDSP. Sources of noise, , in this system come from variation in the cabinet geometry due to

manufacturing tolerances, which has a negligible effect on the temperature and flow fields and hence no

effect on the system response. The other source of noise is the inlet air temperature. The system response

to variations in this parameter is linear and uncoupled from the rest of the control factors. Thus accounting

for this variation is a trivial problem and not considered in this investigation. For each solution, the total

cabinet power, Qtotal, is held constant in order to find the most efficient and robust server configuration.

The control variables and problem constants are input into the server cabinet model, and the response of the

chip tempeatures monitored. The solutiuon of this design problem using the cDSP is described next.

z

5 THE COMPROMISE DSP FOR ROBUST SERVER CABINET DESIGN

Following the mathematical formulation as given in Table 1, the following cDSP for the robust design of

the server cabinet problem is given below in Table 2.

Table 2 - The cDSP for server cabinet configuration using robust design

Given Response model of Total Cabinet Power, Inlet Air Velocity, and

Server Temperature as functions of x1,x2,x3,x4, = Vin, Qa, Qb, Qc ∆Vin = 0.1 m/s ∆Qa, ∆Qb, ∆Qc = f(xi) = -0.1xi + 22 W/m, i = 2,3,4 (13)

15

Collected vector of variability bounds, , , ,j in a b cV Q Q Q∆ ∆ ∆ ∆∆ = (14) Target for total cabinet power, Gpower = 1800-2400 W/m Target for inlet velocity, Gvin = 0.1 m/s Target for total chip temperature sum and their total maximum

possible variation Gtemp = 300 oC, δTmax = 7657 oC Number of design variables, n = 4 Number of inequality constraints, p = 1 Number of equality constraints, q = 1 Number of system goals, m = 3 Number of servers, s = 10

Find The values of control factors:

x1, Inlet velocity, Vin x2, Chip power for Section a, Qa x3, Chip power for Section b, Qb x4, Chip power for Section c, Qc

The values of deviation variables ,i id d+ − , i = 1,…,n Satisfy

The constraints: The individual server chip temperatures cannot exceed 85 oC

1

85n

jj

i i

TT var

xδδ=

j+ ⋅ ≤∑ , j = 1,…,s (15)

The mean total cabinet power must equal value Gpower

2 3 44 6 10 powerx x x G+ + = (16) The goals:

Minimize inlet air velocity

1 11

1vinGd d

x− ++ − = (17)

Bring chip temperatures to target

2 2

1

1temps

ii

Gd d

T

− +

=

+ − =

∑ (18)

Minimize variation of chip temperatures

2

2

1 1

3 3 0

n si

jj i j

max

T varx

d dT

δδ

δ= = − +

+ − =

∑∑ (19)

The bounds: 10.2 1x≤ ≤ (m/s) (20) 20 200ix≤ ≤ , i = 2,3,4 (W/m) (21)

(22) 0, with , 0, 1,...,i i i id d d d i m+ − + −= ≥ =iMinimize The Archimedean objective function:

(23) 1 1

( ), with 1, 0, 1,...,m m

i i i i ii i

f W d d W W i+ −

= =

= + = ≥ =∑ ∑ m

The derivation of Table is discussed broken down by section below. 2

16

Given

Using the system model identified in and the computational models developed, a response model

of the server cabinet is developed of the form:

Figure 6

( )y f x= (24)

where y is a system response as a function of the control variables5. This model uses the FMP based flow

model with input x1, the inlet air velocity. The flow field generated is passed to the finite difference heat

transfer model with inputs x2, x3, x4, the chip heat generation rates for each cabinet section.

The variation of the control variables is determined through literature review and experience.

Manufacturers’ or experimental statistical data can also be used if available for more accurate

representation. For this investigation, a value of ∆Vin = 0.1 m/s corresponds to a ±5% velocity at the upper

bound of 1 m/s. The variation of ∆Qa, ∆Qb, and ∆Qc is given by Eq. (13) to determine the heat generation

variation in the different cabinet sections. Processors that are running continually will have a fairly

constant heat generation rate. To reduce the workload and hence heat generation on a processor, its

computational load is staggered creating a cyclic heat generation when the processor is computing or

waiting, and this cyclic process increases the variation of the heat generation rate. Equation (13) represents

this increased variation with a simple linear function. With the interval bounds representing the maximum

variation of each design variable defined, they are collected into a vector ∆j. Target values for the

responses are determined for the minimization goals by using the lower bound of the response; as such this

goal cannot be exceeded. This is 15 oC for the chip temperatures and 0.2 m/s for the inlet velocity. The

chip temperature goal, Gtemp is computed using the sum of the minimum server chip temperatures and

rounding down. For goals with a target of 0, such as the chip temperature variation goal, the maximum

total chip temperature variation of the system with respect to all design variables is computed using Eq.

(25).

2

2

1 1

( )n s

ij

j i j

TT x var

xδ

δδ= =

=

∑∑ (25)

In this equation, maxTδ from the Given section of the cDSP is found applying Eq. (25) using the upper

bound of x2, x3, and x4 and the lower bound of x1.

Find

The design variables, and the associated deviation from the goal value associated with each design variable,

as discussed in Section 4.4, are the parameters to be found.

Satisfy

5 In literature this equation is often of the form ( , )y f x z= , however in this application there are no noise variables ( z )

17

For Type II robust design the mean and variability of the response are obtained using Taylor expansions of

the system response given in Eq. (24), yielding:

Mean of the Response: ( , )y f x zµ = (26)

Variance of the Response: 2

2

1

n

yi i

f 2ix

xσ

=

∂= ∆ ∂ ∑ (27)

Because the response model is deterministic, the mean in Eq. (26) is simply the value of the response. This

form of the variance in Eq. (27) is known as the Mean Value First Order Second Moment (MVFOSM)

method [41], and the combination of Eqs. (26)-(27) and the cDSP goal formulation given in Table 1 is used

to derive all of the goals and constraints, Eqs. (15)-(19). The computations of the derivatives are computed

using the central difference technique as no closed form solution exists. The rationale behind this mean

and variance approach for goals is given in Figure 3 and the accompanying text.

All goal equations in , Eqs. (17)-(19) are formulated using the approach described in [16]. For a

data center server cabinet reliability and operational stability are of utmost concern. Therefore, the server

configuration should minimize the potential impact of one server’s thermal load on the rest of the system.

Through the consideration of the minimization of the chip temperature variation with respect to all system

parameters, the consequences of one server overheating are greatly reduced. This goal is reflected by Eq.

(19). The temperature variation is to be minimized for all servers, accounting for variation in all design

variables. Therefore the summation of the variation of the response for each server is computed, and

repeated for all design variables, resulting in the double summation in Eq. (19). Following the formulation

of absolute minimization goals for the cDSP, this value is divided by the maximum possible variation, as

computed in the Given section of the cDSP in Ta .

Table 1

ble 2

It has been shown that processors are more reliable when kept cool; thus, the goal of achieving chip

temperatures of Gtemp given in Eq. (18). Note that the response is computed using the sum of the server

chip temperatures, as the minimization of this summation is equivalent to the minimization of each server

individually with equal emphasis, ensuring the most energy efficient solution is found. Lastly, as the costs

associated with cooling a data center can represent up to 40% of the operating cost, the goal of minimizing

the flow rate of air used to cool the processors, proportional to the inlet air velocity, should be pursued.

This conservation goal is embodied in Eq. (17).

As discussed in Section 2, the worst case scenario handling of the constraints is modeled as:

( ) 0j jg x g+ ∆ ≤ j = 1,…,p (28)

Here the function gj(x) yields the value of the constraint function, in this application the chip temperatures

of the servers. This mean value is added to the maximum response variation attainable though the

variability of the control variables, given by ∆gj.

18

1

nj

j ii i

gg x

x=

∂∆ =

∂∑ ∆ , j = 1,…,p (29)

This worst case treatment of the constraints is appropriate in this application as violation of a constraint is

serious, resulting in a potentially disastrous overheating of the servers. Equations (28) and (29) are applied

directly to the server chip temperatures forming Eq. (15). Here the absolute value of the variation of the

server temperature response is computed for each of the design variables and added together, yielding the

maximum possible temperature. This is computed for all servers to ensure this constraint is met for the

entire cabinet.

The equality constraint, the total cabinet power level is computed using only the nominal response values

of the constraint function. This is because of the nature of an equality constraint, where the inclusion of

variability in a worst case scenario does not make sense as there is no way to ensure the constraint is

always met, only that it will be met by the average conditions, and hence its form in Eq. (16). The bounds

on the control factors keep the problem from diverging during the search, as well as providing simple

constraints. These bounds were established as shown in Eqs. (20)-(21) by evaluating sensible limits based

on the FMP flow model requirements and system response.

Minimize

The solution to the cDSP is the combination of control factors that minimize the total deviation function,

Eq. (23) representing the objectives of thermal efficiency and reliability. The priority of the multiple goals

is implemented though weighting each deviation variable. Variation of these weights can be performed to

change designer preferences of one goal over another, yielding different solutions.

6 RESULTS AND DISCUSSION

With the server cabinet design problem specified, it is solved in two different scenarios. Each scenario has

different design objectives to highlight the flexibility of the robust design approach to achieve the desired

results. Before these cases can be run, a baseline evaluation is performed for comparison with the more

efficient configurations. The algorithm used to find the minimum is Sequential Quadratic Programming

(SQP) [42] implemented using the MATLAB Optimization Toolbox.

6.1 Baseline Evaluation

For the baseline case the design variables x2, x3, and x4, the server chip heat generation rates, are lumped

into a single variable. This system represents a traditional server cabinet server configuration, where

dynamic distribution of the workload is not considered, as discussed in [29-31]. The maximum cabinet

total power was found to be just over 1600 W/m with an inlet air velocity of 0.54 m/s, constrained by the

85 oC temperature constraint for server 1. Note that the maximum allowable cabinet power was found

before the design variable Vin reached its upper bound, indicating that because of the flow distribution

19

within the cabinet, simply supplying more cold air from the CRAC units is not an effective cooling

solution.

6.2 Increasing Thermal Efficiency

In this scenario, an existing data center facility receives a batch of new high power density servers to be

integrated into the existing facility. This problem translates to how to distribute the high power servers in

the cabinet, and what volume of cooling air to supply the cabinet with in order to reliably meet the increase

in total cabinet power requirements.

To investigate this problem, the total cabinet heat generation was incremented from 1800 to 2400 W/m,

beyond which the problem constraints could not be met. This heat load range represents the lower bound

where the minimum flow rate of cooling air is required, to the maximum total cabinet power that can be

sustained. For each of these incremental heat loads the most energy efficient configuration is found that

simultaneously minimizes the volume of cooling air, the chip temperatures, and the variation of the chip

temperatures, as established by objective Eqs. (17)-(19). The weighting of the goals was established as:

0.5,0.25,0.25W = (30)

This weighting puts equal emphasis on the cooling energy conservation objective and server reliability

objectives. The resulting values of inlet air velocity and chip power for each cabinet section for increasing

total cabinet power levels are presented in F (a). igure 7

Figure 7 – (a) Inlet air velocity and power distribution (b) maximum chip temperature and bounds vs. total cabinet power

Figure 7

Figure 7

1800 1900 2000 2100 2200 2300 24000

0.25

0.5

0.75

1

Inle

t Air

Velo

city

(m/s

)

0

50

100

150

200

Total Cabinet Power (W )

Sec

tion

Chi

p P

ower

(W)

Section aSection bSection c

Inlet Air Velocity

1800 1900 2000 2100 2200 2300 240068

70

72

74

76

78

80

82

84

86

Total Cabinet Power (W )

Max

imum

Chi

p Te

mpe

ratu

re (

o C)

MeanUpper BoundLower Bound

(a) (b)

From (a), the volume of cooling air required to maintain reliable server operation increases in an

exponential fashion. This increase is to be expected, and from this curve a general estimate of cooling

costs for various heat loads can be extrapolated based on CRAC unit operating costs for the facility. Also

in it is evident that as the total power level increases, the server power distribution also must

change, adapting to the new flow conditions and resulting temperature fields for maximum efficiency. At

20

the inlet velocity of 0.54 m/s as used in the most efficient baseline case, the cabinet is dissipating nearly

2250 W/m when using a more thermally efficient power distribution. This shows that through efficiently

utilizing the airflow distribution within the server cabinet, much more power can be reliably dissipated

using the same volume of cooling air over a uniform power distribution.

In order to check that the optimization algorithm has correctly converged, the maximum temperature

bounds are presented in (b). In this figure the maximum chip temperature from all the servers is

plotted versus total cabinet power level. It is evident that the maximum chip temperature constraint is

never broken, as set by the worst case scenario constraint in Eq.(15). In this manner the temperature upper

bound is continually at 85 oC, not the mean value. It is also evident in this figure how the temperature

mean and variability responds with increasing cabinet heat loads and the resulting changes in power

distribution and inlet air velocity.

Figure 7

To validate the solutions of the cDSP, converged cases for 1800, 2100, and 2400 W/m power levels were

simulated using the CFD model, testing the full range of solutions produced. It was found that the CFD

results yielded chip temperatures within an average of 5% of the FMP computed solution. On a higher

level of validation, the power distribution of the servers found to be most efficient yields an approximate

hyperbolic tangent, demonstrated to be a highly efficient configuration by [43]. This result is encouraging,

as the investigation was computed using a very high fidelity three dimensional CFD analysis of a cabinet

with close to 2 million nodes.

6.3 Robust vs. Optimal Cabinet Configuration

The linear weighting system used in the cDSP gives only a rough mathematical translation of the designer’s

emphasis upon the goals sought in its formulation. The a priori selection of numerical values that

accurately represent the designer’s preferences for a complex, non-linear system such as the server cabinet

example is very difficult. This is of particular interest for the tradeoff between the goals of optimal energy

efficiency (the goal of minimizing the supply air rate) and the robust solution (defined as the minimization

of variance in the temperature response). In order to investigate the tradeoffs between the robust and

optimal solutions, a Pareto frontier is developed between the two solution points.

The Pareto frontier is traced out through changing the weights in the Archimedean objective function in the

cDSP. This approach of plotting a Pareto curve between the optimal and robust solution points is

investigated in [44] for simple design problems, however the focus is upon the development of this frontier

for problems where a linear weighting may not identify all points along the frontier. In this application the

linear weighting approach was found to provide an adequate mapping of the frontier.

21

A Pareto frontier for a constant total cabinet power, Qtotal = 2300 W/m is constructed showing the feasible

limit of each design variable as the goal changes from an optimal to a robust solution. To generate this

frontier the weighting of the inlet air velocity minimization goal and minimization of the variation of chip

temperatures goal are varied from 0 to 1 and 1 to 0 respectively, while the minimization of chip

temperatures goal is weighted with a 0, defining W as:

( ) 1 ,0, , 0,0.1,...,1W i i i i= − = (31)

The resulting Pareto frontier is plotted in for the response and all variable combinations. Figure 8

Figure 8 - Pareto frontiers with changing weighting

Figure 8

0.5 0.6 0.7 0.8 0.967

68

69

70

71

Inlet Air Velocity (m/s)

Ave

rage

Chi

p Te

mpe

ratu

re (

o C)

0.5 0.6 0.7 0.8 0.950

60

70

80

Inlet Air Velocity (m/s)S

ectio

n C

hip

Pow

er (W

)

0.5 0.6 0.7 0.8 0.978

80

82

84

86

88


Sec

tion

Chi

p P

ower

(W)

0.5 0.6 0.7 0.8 0.9151

152

153

154

155

156


Sec

tion

Chi

p P

ower

(W)

(y)

(b) (c)

(a)Feasable

space

Feasablespace

Feasablespace

Feasible space

Feasiblespace Feasible

space

The limits of the feasible design space are shown in Figure 8 subplots (a-c). The variation in the response

is shown in subplot (y). The leftmost point corresponds to the optimal solution parameters, the rightmost to

the robust solution parameters. The line connecting the two endpoints represents design parameters for a

combination of both goals, where the minimum inlet air velocity is plotted against the maximum heat

generation for each server section, shown in subplots (a-c) corresponding to the cabinet section a-c. Any

region to the right of this curve is feasible, but only points on the frontier represent most efficient

configurations.

In this plot the differences in design parameters that would occur if the data center were highly efficient

and had little variability, lending itself to a more optimal solution, or a data center that was more loosely

controlled or needed a high level of reliability, requiring a more robust solution, are demonstrated. The

concept of the Pareto frontier is to investigate the requirements of obtaining this more robust solution.

Viewing , as the priority changes from optimal to robust, the point spacing increases slightly,

showing more cooling air flow is required for only a slightly more robust solution. Subplot (y) further

22

shows that the chip temperatures to not decrease linearly either. This means that a point towards the middle

of the curve represents the best balance of minimization of cooling air flow rate and temperature variation

minimization. The designer, accounting for the amount of variability in the system under consideration,

specifies the location of this point, yielding the final design parameters.

More important than analysis of the server chip temperatures is the amount of variability in the temperature

response. In order to create a measure for this value for the entire cabinet the sum of the absolute value of

the slope of the temperature response with respect to the design variables is computed:

1 1

si

Vini

TS

xδδ=

= ∑ (32)

2 1

n si

Qj i j

TS

xδδ= =

= ∑∑ (33)

where n is the number of design variables and s is the number of servers. This is divided into two functions

as the units of the slopes are different. Equation (32) computes the slope of the temperature response with

respect to Vin, and Eq. (33) with respect to the sectional chip powers Qa,b,c, assuming a worst case scenario.

Plotting these responses as a function of the weighting value W as it is changed from optimal to robust

yields the following plots:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1180

200

220

240

260

Weighting Value, i

SVi

n

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 5.1

5.2

5.3

5.4

5.5

5.6

Weighting Value, i

SQ

OptimalSolution

RobustSolution

OptimalSolution

RobustSolution

Figure 9 - Cabinet chip temperature variability for optimal to robust design objectives

Figure 9Viewing , computing the rough average temperature variability per W/m increase in power

generation for each server is possible by dividing S by 10. The more robust solution point reduces the

23

potential variation in chip temperatures by an average of 7 oC per m/s change in Vin and 0.4 oC per W/m

change in Q. This means using the fairly conservative bounds of variability used in this investigation, the

average variability is reduced by close to 5 oC, or 20%. Although this may seem insignificant, it is

important to remember that the CRAC units can accurately control the room temperature to a single degree,

and are operating continuously, 24 hours a day, 7 days a week, 365 days a year, and thus this reduction

constitutes significant savings. Note that this curve was generated for a cabinet power close to the upper

limit of the system, and by using a lower total cabinet power of 2000 W/m the average variability is

reduced by close to 15 oC, or 60%.

This increased operational stability is obtained not through changing the source of the variability, but only

by re-configuring the cabinet. The cost of this increased stability is a redistribution of the power load,

which has no negative connotations, and an increase in the output of the CRAC units to provide the server

cabinet with an increase of 0.2 m/s flow rate of supply air. Further benefit of this configuration is the

reduction of chip temperatures by 3 oC. Therefore the final tradeoffs between a robust solution, optimal

solution, or anywhere in between are known to the designer. The final decision will be based upon the

amount of variability in the data center, and the cost of increasing the flow rate of the CRAC units versus

the cost of lowering the supply air temperature; there is no universal “degradation” of the solution moving

along the Pareto frontier. Overall, this Pareto approach gives the designer a much greater deal of

information and freedom in configuring the data center cabinets for their desired goals over a single

application of the weighted sum approach.

7 CLOSURE

The results of using the proposed approach to design a robust server cabinet configuration are promising.

The key results being:

50% more power than a uniform distribution can be reliably dissipated while maintaining

equal emphasis on energy efficiency and stability.

•

•

•

•

20-60% reduction on the average potential variability of the processors can be achieved

through emphasizing design robustness.

Any solution between the optimal and robust can be selected from the family of solutions

along the Pareto frontier generated by the cDSP.

The small degree of analysis error incurred through assumptions and approximate models is

nullified through the robustness of the solutions obtained, verified through CFD analysis.

In our opinion, the proposed approach represents a step towards addressing the challenge of reliable data

center thermal management. Further, we assert, that the proposed approach can be used to increase the

thermal efficiency, considerably reducing the energy costs and environmental impact of operating a data

center, while simultaneously increase the operational stability of the center also, reducing the cost

associated with downtime and backup system maintenance.

24

The approach presented is founded upon the integration of three constructs: the FMP augmented POD,

robust design principles, and the cDSP, to solve the challenges of flow complexity, system variability, and

multiple objective tradeoffs, as shown in Fi and described in Section 2. The viability of the approach

is demonstrated through the application to the data center server cabinet example in Section 5. Analysis of

the results obtained show that the approach enables the computation of superior solutions, both in ultimate

power dissipation and reduction in variability, over the traditionally implemented method, described in

Section 6. Although the robust design implementation is simple, the results are still effective, and the

meta-model can be further integrated with any more complex robust design implementation. In this paper

only a single, albeit complex, example is presented. However, the FMP meta-modeling approach has been

applied to many problems of varying scale and complexity [20], as have the cDSP and robust design

methods. Hence there is no fundamental reason this proposed approach cannot be extended to the more

general domain of the robust design of thermal-fluid systems with equally successful results.

gure 1

8 ACKNOWLEDGEMENTS

The authors acknowledge the support of the Consortium for Energy Efficient Thermal Management

(CEETHERM), a joint initiative between Georgia Institute of Technology and the University of Maryland.

9 REFERENCES

[1] Pope, S.B., Turbulent Flows. 2000, New York: Cambridge University Press. [2] Launder, B.E. and Spalding, D.B., Lectures in Mathematical Models of Turbulence. 1972,

London, England: Academic Press. [3] Schmidt, R. and Iyengar, M. "Effect of Data Center Layout on Rack Inlet Air Temperatures".

ASME InterPACK. 2005. San Francisco, California, USA: ASME, IPACK2005-73385. [4] Schmidt, R., Karki, K.C., Kelkar, K.M., Radmehr, A., and Patankar, S.V. "Measurements and

Predictions of the Flow Distribution Through Perforated Tiles in Raised Floor Data Centers". The Pacific Rim / ASME International Electronics Packaging Technical Conference and Exhibition. 2001. Kauai, Hawaii, IPACK2001-15728.

[5] Patel, C., Bash, C., Belady, C., Stahl, L., and Sullivan, D. "Computational Fluid Dynamics Modeling of High Compute Density Data Centers to Assure System Inlet Air Specifications". IPACK'01 - The Pacific Rin/ASME International Electronics Packaging Technical Conference and Exhibition. 2001. Kauai, Hawaii: ASME, IPACK2001-15622.

[6] Iwasaki, H. and Ishizuka, M. "Natural Convection Air Cooling Characteristics of Plate Fins in a Ventilated Electronic Cabinet". ITHERM 1998 - Eight Intersociety Conference of Thermal and Thermomechanical Phenomena in Electronic Systems. 1998. Seattle, Washington, p. 124-129.

[7] Patel, C.D., Sharma, R., Bash, C., and Beitelmal, M. "Thermal Considerations in Cooling of Large Scale High Compute Density Data Centers". ITHERM 2002 - Eight Intersociety Conference on Thermal and Thermomechanical Phenomena in Electronic Systems. 2002. San Diego, California, p. 767- 776.

[8] Shrivastava, S., Sammakia, B., Schmidt, R., and Iyengar, M. "Comparative Analysis of Different Data Center Airflow Management Configurations". ASME InterPACK. 2005. San Francisco, California, USA: ASME, IPACK2005-73234.

[9] Rambo, J. and Joshi, Y. "Multi-Scale Modeling of High Power Density Data Centers". InterPACK03 - The Pacific Rim / ASME International Electronics Packaging Technical Conference and Exhibition. 2003. Kauai, Hawaii, InterPack2003-35297.

25

[10] Rambo, J. and Joshi, Y., Thermal Modeling of Technology Infrastructure Facilities: A Case Study of Data Centers, in The Handbook of Numerical Heat Transfer,p. 821-849, W.J. Minkowycz, E.M. Sparrow, and J.Y. Murthy, Editors. New York: Taylor and Francis, 2006.

[11] Shah, A., Carey, V., Bash, C., and Patel, C. "Exergy-Based Optimization Strategies for Multi-Component Data Center Thermal Management: Part I, Analysis". ASME InterPACK. 2005. San Francisco, California, USA: ASME, IPACK2005-73137.

[12] Iyengar, M., Schmidt, R., Sharma, A., McVicker, G., Shrivastava, S., Sri-Jayantha, S., Amemiya, Y., Dang, H., Chainer, T., and Sammakia, B. "Thermal Characterization of Non-Raised Floor Air Cooled Data Centers Using Numerical Modeling". ASME InterPACK. 2005. San Francisco, California, USA: ASME, IPACK2005-73387.

[13] Bhopte, S., Agonafer, D., Schmidt, R., and Sammakia, B. "Optimization of Data Center Room Layout to Minimize Rack Inlet Air Temperature". ASME InterPACK. 2005. San Francisco, California, USA: ASME, IPACK2005-73027.

[14] Holmes, P., Lumley, J.L., and Berkooz, G., Turbulence, Coherent Structures, Dynamical Systems and Symmetry. 1996, Great Britain: Cambridge University Press.

[15] Chen, W., Allen, J.K., Tsui, K., and Mistree, F., 1996, "A Procedure for Robust Design: Minimizing Variations Caused by Noise Factors and Control Factors". ASME Journal of Mechanical Design. 118: p. 478-485.

[16] Mistree, F., Hughes, O.F., and Bras, B., The Compromise Decision Support Problem and the Adaptive Linear Programming Algorithm, in AIAA Structural Optimization: Status and Promise,p. 247-286, M.P. Kamat, Editor. Washington, D.C.: AIAA, 1993.

[17] Simpson, T., Peplinski, J., Koch, P., and Allen, J., 2001, "Metamodels for Computer-Based Engineering Design: Survey and Recommendations". Engineering With Computers. 17: p. 129-150.

[18] Loeve, M., Probability Theory. 1955, Princeton, NJ: Van Nostrand. [19] Rambo, J. and Joshi, Y. "Reduced Order Modeling of Steady Turbulent Flows Using the POD".

ASME Summer Heat Transfer Conference. 2005. San Francisco, California, USA: ASME, HT2005-72143.

[20] Rolander, N., 2005 "An Approach for the Design of Data Center Server Cabinets for Thermal Efficiency," MS Thesis, MS, George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA.

[21] Lumley, J., The Structure of Inhomogeneous Turbulent Flows, in Atmospheric Turbulence and Radio Wave Propagation, p. 166-178, A.M. Yaglom and V.I. Tatarsky, Editors. Nauka, Moscow, 1967.

[22] Aubry, N., Holmes, P., Lumley, J., and Stone, E., 1988, "The Dynamics of Coherent Structures in the Wall Region of a Turbulent Boundary Layer". Journal of Fluid Mechanics. 192: p. 155-173.

[23] Sirovich, L., 1987, "Turbulence and the Dynamics of Coherent Structures, Part II: Symmetries and Transformations". Quart. Appl. Math. XLV(N3): p. 573-582.

[24] Berkooz, G., Holmes, P., Lumley, J., and Mattingly, J., 1997, "Low-Dimensional Models of Coherent Structures in Turbulence". Physics Reports - Review Section of Physics Letters. 287(N4): p. 338-384.

[25] Webber, G., Handler, R., and Sirovich, L., 1997, "The Karhunen-Loeve Decomposition of Minimal Channel Flow". Physics of Fluids. 9(4): p. 1054-1066.

[26] Moin, P. and Moser, R., 1989, "Characteristic-Eddy Decomposition of Turbulence in a Channel". Journal of Fluid Mechanics. 200: p. 417-509.

[27] Ball, K., Sirovich, L., and Keefe, L., 1991, "Dynamical Eigenfunction Decomposition of Turbulent Channel Flow". International Journal for Numerical Methods in Fluids. 12: p. 585-604.

[28] Rambo, J. and Joshi, Y. "Physical Models in Data Center Airflow Simulations". IMECE-03 - ASME International Mechanical Engineering Congress and R&D Exposition. 2003. Washington D.C., IMECE03-41381.

[29] Boucher, T.D., Auslander, D.M., Bash, C.E., Federspiel, C.C., and Patel, C.D. "Viability of Dynamic Cooling Control in a Data Center Environment". Inter Society Conference on Thermal Phenomena. 2004: IEEE, p. 593-600.

[30] Sharma, R.K., Bash, C., Patel, C.D., Friedrich, R.J., and Chase, J.S., "Balance of Power: Dynamic Thermal Management for Internet Data Centers". 2003, Whitepaper issued by Hewlet Packard Laboratories Palo Alto, Technical Report: HPL-2003-5.

26

27

[31] Patel, C., Sharma, R., Bash, C., and Graupner, S. "Energy Aware Grid: Global Workload Placement based on Energy Efficiency". International Mechanical Engineering Congress and Exposition. 2003. Washington, D.C., IMECE 2003-41443.

[32] VanGilder, J.W. and Schmidt, R. "Airflow Uniformity Through Perforated Tiles in a Raised-Floor Data Center". ASME InterPACK. 2005. San Francisco, California, USA: ASME, IPACK2005-73375.

[33] Radmehr, A., Schmidt, R., Karki, K.C., and Patankar, S.V. "Distributed Leakage Flow in Raised-Floor Data Centers". ASME InterPACK. 2005. San Francisco, California, USA: ASME, IPACK2005-73273.

[34] Fluent Incorporated, Fluent v. 6.1 Users Manual. 2001, Lebanon, New Hampshire: Fluent Incorporated.

[35] Rolander, N., Rambo, J., Joshi, Y., and Mistree, F. "Robust Design if Air-Cooled Server Cabinets for Thermal Efficiency". ASME InterPACK. 2005. San Francisco, California, USA: ASME, IPACK2005-73171.

[36] Deane, A.E., Kevrekidis, I.G., Karniadakis, G.E., and Orszag, S.A., 1991, "Low-Dimensional Models for Complex Geometry Flows: Application to Grooved Channels and Circular Cylinders". Physics of Fluids A. 3(10): p. 2337-2354.

[37] Ma, X. and Karniadakis, G.E., 2002, "A Low-Dimensional Model for Simulating Three-Dimensional Cylinder Flows". Journal of Fluid Mechanics. 458: p. 181-190.

[38] Park, H.M. and Cho, D.H., 1996, "Low Dimensional Modeling of Flow Reactors". International Journal of Heat and Mass Transfer. 36: p. 359-368.

[39] Sirovich, L. and Tarman, I.H., 1998, "Extensions to the Karhunen-Loeve based Approximations of Complicated Phenomena". Computer Methods in Applied Mechanics and Engineering. 155: p. 359-368.

[40] Patankar, S.V., Numerical Heat Transfer and Fluid Flow. 1980, New York: McGraw Hill. [41] Phadke, M.S., Quality Engineering using Robust Design. 1989, Englewood Cliffs, New Jersey:

Prentice Hall. [42] Gill, P., Murray, E.W., and Wright, M.H., Practical Optimization. 1981, London: Academic Press. [43] Rambo, J. and Joshi, Y., 2005, "Arranging Servers in a Data Processing Cabinet to Optimize

Thermal Performance". ASME Journal of Electronics Packaging. (Publication appearing in Dec 2005).

[44] Mourelatos, Z.P. and Liang, J. "An Efficient Unified Approach for Reliability and Robustness in Engineering Design". NSF Workshop on Reliable Engineering Computing. 2004. Savannah, Georgia, p. 127-138.

an approach for robust design of turbulent

Documents