atca and microatca guide

60
www.eecatalog.com/atca Engineers’ Guide to AdvancedTCA ® & MicroTCA ® Annual Industry Guide AdvancedTCA, MicroTCA and AdvancedMC solutions Techniques for Measuring ACLR Performance in LTE Transmitters Plus: PICMG retrospective; Then and...Now 40G EE C atalog Featured Products From Emerson: ATCA-8310 DSP/Media Processing Blade Adax PacketRunner Gold Sponsors Scan this QR code to subscribe 40Gb Migration Drives ATCA Growth Joe Pavlat: PICMG From ADLINK Technology Inc.: AdvancedTCA Blade with Dual Intel Xeon Processors E5-2658/2648L

Upload: denkins2020

Post on 24-Nov-2015

88 views

Category:

Documents


4 download

DESCRIPTION

A course on ATCA architecture

TRANSCRIPT

  • www.eecatalog.com/atca

    Engineers Guide to AdvancedTCA & MicroTCA

    Annual Industry Guide AdvancedTCA, MicroTCA and AdvancedMC solutions

    Techniques for Measuring ACLR Performance in LTE Transmitters

    Plus: PICMG retrospective; Then and...Now 40G

    EECatalog

    Featured Products

    From Emerson: ATCA-8310 DSP/Media Processing Blade

    Adax PacketRunner

    Gold Sponsors

    Scan this QR code to subscribe

    40Gb Migration Drives ATCA Growth

    Joe Pavlat: PICMGFrom ADLINK Technology Inc.:

    AdvancedTCA Blade with Dual Intel Xeon Processors E5-2658/2648L

  • If its embedded, its Kontron.

    When you want the benefits of cloud based, virtualized computing plus the reliability, predictability and Quality of Service of telecommunication networks, you need the Carrier Cloud - a service that works how you need it and when you need it.

    If you are building Carrier Cloud solutions, you need Kontron. Telecom OEMs have been relying on carrier grade boards, blades and platforms from Kontron for over 25 years. Always keeping their customers current with the latest technologies, Kontron has 2 new products based on the latest Intel Xeon processor E5-2600 Series. Designed to address key requirements of virtualized cloud computing, the 8-core Xeon E5-2600 family extends performance up to 80%, and with integrated I/O, delivers up to twice the bandwidth.

    The new ATCA Processor Blade AT8060 and the Carrier Grade Server CG2200 from Kontron are just what you need for your Carrier Cloud solutions.

    Copy

    righ

    t 2

    012

    Kont

    ron

    AG. A

    ll ri

    ghts

    rese

    rved

    . Kon

    tron

    and

    the

    Kont

    ron

    logo

    and

    all

    othe

    r tra

    dem

    arks

    or r

    egis

    tere

    d tr

    adem

    arks

    are

    the

    prop

    erty

    of t

    heir

    resp

    ecti

    ve o

    wne

    rs a

    nd a

    re re

    cogn

    ized

    .

    Call, Email or Visit today.NA +1-888-294-4558EMEA + 49(0)8165 77 777

    NA: [email protected]: [email protected]: kontron.com/domorewith16cores

    CONTACT US

    When you need it to work every time. Go Carrier Cloud. Go Kontron.

  • Engineers Guide to ATCA & MicroTCA Technologies 20122

    Welcome to the 2012 Engineers Guide to ATCA & MicroTCA Technologies

    What a difference a year makes! Last year in this space my colleague Cheryl Coupe described an ATCA market that was continuing to grow. This year, analyst firm Markinetics predicts a whopping 12 percent leap to $831.2 million by the end of 2012, and up to $1.43 billion by 2016 - a 14 percent CAGR over the next five years. Whats driving this massive growth in an otherwise lackluster world economy? Its mostly cellular buildout: in developing nations like China, the conversion to true 4G LTE in North America, and in satisfying the demand for more data bandwidth and better cellular coverage as every handset user and their dog has a smartphone that plays YouTube videos or streams Pandora to their car (or dog house).

    In the technology arena, ATCA has hit its stride as 10Gb Ethernet pipes migrate to 40Gb. 40Gb increases the bandwidth between blades and the

    rest of system, and ATCAs benefactor - PICMG - has invested many cycles working with the IEEE 802.11 subcommittees to make 40Gb an interoperable, deployable reality. Our interview with PICMGs Joe Pavlat produced his then and now retro-spective Viewpoint explaining the challenges of interoperable 40G Ethernet.

    Elsewhere in this issue ADLINK provides an exclusive and exhaustive benchmark article quantifying Intels claim that a Xeon added to the NPU-like Cave Creek data engine - supplemented with Intels DPDK software - can speed up IPv4 packet for-warding by 10x over native Linux. In this issue, GE makes a similar case for packet processing engines, and Agilent makes some recommendations on measuring ACLR on adjacent LTE channels.

    GE, ADLINK and Pixus all make quantitative arguments that ATCA blade and chassis consolidation is very real, pushing ATCA size, weight and power (SWaP) numbers down even further. This concurs with the other market leaders like Emerson, Radisys, Kontron (and many more) who are seeing new opportunities for ATCA in security, high-rel/military and even mission-critical medical applications. As always, our Roundtable questions (with Emerson and Adax) plus our Special Feature by VITA on N-dimensional Embedded Supercomputers will give you an appreciation that ATCA, MicroTCA and their related markets and applications are pushing the boundaries of technology and growth.

    Hope you enjoy this issue, and dont forget to check our routinely-updated ATCA Technical Channel at: www.eecatalog.com/atca/ .

    Chris A. CiufoSenior Editor, EECatalog.com

    P.S. To subscribe to our series of Engineers Guides for embedded developers and engineers, visit:

    www.eecatalog.com/subscribe

    Engineers Guide to ATCA & MicroTCA Technologies 2012www.eecatalog.com/atca

    VP/Associate PublisherClair Bright [email protected](415) 255-0390 ext. 15

    EditorialEditorial DirectorJohn Blyler [email protected](503) 614-1082Managing EditorCheryl Berglund Coup [email protected] EditorChris A. Ciufo [email protected]

    Creative/ProductionProduction Manager Spryte Heithecker

    Graphic DesignersKeith Kelly - SeniorNicky Jacobson

    Production Assistant Jenn Burkhardt

    Senior Web DeveloperMariam Moattari

    Advertising/Reprint SalesVP/Associate Publisher Embedded Electronics Media GroupClair Bright [email protected](415) 255-0390 ext. 15

    Sales ManagerMichael [email protected] (415) 255-0390 ext. 17

    Marketing/CirculationJenna Johnson

    To Subscribewww.eecatalog.com/subscribe

    Extension Media, LLCCorporate OfficePresident and PublisherVince [email protected]

    Vice President, Sales Embedded Electronics Media GroupClair [email protected]

    Vice President, Business DevelopmentMelissa [email protected]

    Special Thanks to Our Sponsors

    The Engineers Guide to AdvancedTCA & MicroTCA Technologies 2012 is published by Extension Media LLC. Extension Media makes no warranty for the use of its products and assumes no responsibility for any errors which may appear in this Catalog nor does it make a commitment to update the information contained herein. Engineers Guide to AdvancedTCA & MicroTCA Technologies is Copyright 2012Extension Media LLC. No information in this Catalog may be reproduced without expressed written permission from Extension Media @ 1786 18th Street, San Francisco, CA 94107-2343.

    All registered trademarks and trademarks included in this Catalog are held by their respective companies. Every attempt was made to include all trademarks and registered trademarks where indicated by their companies.

    Intels DPDK software can speed up IPv4 packet forwarding by 10x,

    says ADLINK.

    EECatalog

  • ATCA IPM Controller Core Based on SmartFusion FPGA

    t Delivered as schematic with firmware and FPGA design for integration into customer board or module

    t Runs firmware on ARM Cortex-M3 built into FPGA; FPGA logic implements some IPMC functions, plus customer functions

    t Easily customized at schematic, firmware and/or FPGA design levels

    t Corresponding solutions for all xTCA board/module controller types

    Celebrating 10 Years of Delivering xTCA Management Solutions

    PIGEON POINT SYSTEMS

    Over the decade, these solutions have been intensively tested in PICMG plugfests and by leading TEMs and their suppliers, then incorporated in tens of thousands of shelves, plus hundreds of thousands of boards and modules, worldwide. They are supported by xTCA management experts who helped lead the development of the corresponding PICMG specifications.

    New ShMM-700R-Based ATCA Shelf Manager

    t 30% less expensive, 20% smaller, fully compatible with market-leading ShMM-500R

    t Installed on customer-designed shelf-specific carrier board

    ' 0 $ 6 4 & % t % & 1 & / % " # - & t 1 3 0 7 & /

    W O R L D - C L A S S M A N A G E M E N T C O M P O N E N T S

    [email protected] www.pigeonpoint.com

  • Engineers Guide to ATCA & MicroTCA Technologies 20124

    Contents40Gb Migration Drives ATCA Growth

    By Cheryl Coup, Editor...................................................................................................................................................................................................6A Common IA Platform for Workload Consolidation on ATCA

    By Paul Stevens, Advantech Europe BV .......................................................................................................................................................................10AdvancedTCA CO14N-AC

    By COMTEL ELECTRONICS ............................................................................................................................................................................................12Moving To N-Dimensional Embedded Supercomputers... But first, lets look at where computers started

    By Ray Alderman, Executive Director, VITA .................................................................................................................................................................. 14Consolidating Packet Forwarding Services with Data-Plane Development Software

    By Jack Lin, Yunxia Guo, and Xiang Li, ADLINK ............................................................................................................................................................16Performance Grows When Multicore Partners with ATCA

    By Gene Juknevicius, GE Intelligent Platforms ............................................................................................................................................................22Techniques for Measuring ACLR Performance in LTE Transmitters

    By Jung-ik Suh, Agilent Technologies .........................................................................................................................................................................26The Case for Optimal Mid-Sized Shelves for ATCA Applications

    By Justin Moll, Pixus Technologies ..............................................................................................................................................................................30PICMG Then and...Now Solves 40G Ethernet Challenges

    By Joe Pavlat, President and Chairman of the PCI Industrial Computer Manufacturers Group (PICMG) .....................................................................56

    Products and Services

    Hardware

    BackplanesElma Electronic Inc.40 Gigabit AdvancedTCA Backplanes .............................................33MicroTCA Backplanes .....................................................................33

    BladesAdax Inc.Adax PacketRunner Intelligent ACTA Carrier Blades .......................................................34AdvantechATCA-7310 Dual Cavium Octeon II CN6880 Node Blade with 40G switch ....................................................................35MIC-5332 AdvancedTCA 10GbE Dual Socket CPU Blade with Intel Xeon E5-2600 Processors ..............................36MIC-8901 ATCA DSP Blade with 20 TMS320TCI6608 DSPs ................ 36Emerson Network Power ATCA-7370 Dual Intel Xeon Processing Blade ................................37ATCA-8310 DSP/Media Processing Blade ......................................37Centellis Series ATCA Systems ................................................38Pinnacle Data Systems, Inc., An Avnet CompanyATCA-F1 Dual AMD Socket F AdvancedTCA Blade ........................39ATCA-RT01 AdvancedTCA RTM with Video and Storage .................40Dual Intel Xeon E5 ATCA Blade (ATCA-N1) ..........................41Scan Enginnering Telecom GmbHSAMC-404 High-performance DSP board .......................................42SAMC-514 Quad-core Processor AMC based on Core i7 ...............43

    Boards / Board AccessoriesAdax Inc.ATM4-AMC / ATM5-PCIe Signaling and ATM to IP Interworking for Femtocells, Home NodeB Gateways, and Access Concentration ......................44HDC3 8 Trunk SS7 Signaling & I-TDM Controller ......................................45

    Pkt2-PCIe / PacketAMC Secure User & Control Plane Application and Packet Processing for LTE and all IP Networks .............................................................46

    Boards / Small Form FactorLeCroy CorporationLeCroys PCI Express Protocol Analysis and Test Tools .................. 47

    EnclosuresElma Electronic Inc.AdvancedTCA 19 rackmount 5U System Platform, AC or DC versions ............................................................................48

    Front Panel HardwareElma Electronic Inc.AdvancedTCA Handles & Panels .....................................................49

    Integrated PlatformsAdax Inc.Application Ready Platform Highly Integrated Platform Ready for Your Value-Add Application .......50Elma Electronic Inc.AdvancedTCA System Platforms .....................................................51Scan Enginnering Telecom GmbHSAMC-713 High Performance Virtex-6 AMC with FMC expansion site ...........................................................................................52

    Market Applications / Military CommunicationsElma Electronic Inc.AdvancedTCA SystemPak - 40G Application Ready Platforms .............53ATCA-7365 Rugged Communications Platform ...............................53

    Routers / SwitchesAdvantechATCA-9112 Switch blade with 10/40GbE switching for 16 slot systems ................................................................................54

    ShelvesElma Electronic Inc.MicroTCA System Platforms ...........................................................55

  • Engineers Guide to ATCA & MicroTCA Technologies 20126

    EECatalog SPECIAL FEATURE

    According to Heavy Reading Components Insider, ATCA has become a mature market with a stable ecosystem. And the recently released ATCA, AMCs & MicroTCA: 2012 User Survey indicates that 40 Gb platforms are helping support this growth. Our roundtable participants reinforce this trend, and provide details around development challenges and strategies to address them. Thanks to Drew Sproul, director of marketing at Adax, Inc. and Rob Pettigrew, marketing director, Embedded Computing for Emerson Network Power for their insights.

    EECatalog: How are designers addressing the challenges of building systems to meet new 40 Gigabit demands?

    Drew Sproul, Adax: The electronics of 40Gb design have worked out much better than expected. Our chassis partners are all coming out with backplanes that are 40G-capable. As the switch and SBC manufacturers bring out their 40G products, interoperability

    testing can begin right away. This approach allows todays

    10/40G systems to migrate swiftly to full 40G support with the switch and carrier blade upgrade.

    Rob Pettigrew, Emerson Network Power: ATCA equipment providers are facing demand for higher bandwidth products, even though the ATCA 40G standard is not yet ratified by PICMG. In fact, companies like Emerson Network Power have been

    shipping chassis that we are confident are 40G ready for the past three years. This is possible because the ATCA 40G fabric channel, although not yet standardized by PICMG, is standardized by the IEEE as the 10GBase-KR standard 802.3ap-2007, which defines a 10Gbps Ethernet signal over a copper backplane connection. Four pairs of KR connections are available in each ATCA fabric channel, which can be used independently as four 10GBase-KR connections, or aggregated together in a single 40Gbps 40GBase-KR4 connection.

    40Gb Migration Drives ATCA GrowthATCA equipment providers are facing demand for higher bandwidth products, even though the ATCA 40Gb standard hasnt yet been ratified by PICMG. Migration strategies, interoperability and spec extensions all impact growth opportunities.By Cheryl Coup, Editor

    The recently released ATCA, AMCs & MicroTCA: 2012 User Survey analyzed current and projected use of these technologies by telecom equipment manufacturers, and reports a mature market.

  • Industry Leader in AdvancedTCA and Custom Chassis

    All Chassis are AC/DC Carrier-Grade 350W Cooling per Slot 40Gbps Backplanes Dual Star Dual Dual Star Replicated Mesh

    Contact us for further information [email protected] www.asis-pro.com

    COMING SOONNEW 6 SLOT FRONT TO BACK

    300W COOLING PER SLOT

    2 SLOT AC-DC

    6 SLOT AC-DC

    14 SLOT AC-DC

    Possible Applications DPI 3G-4G LTE Firewall Gateways eNodeB & EPC Military

    8 Week Delivery

  • Engineers Guide to ATCA & MicroTCA Technologies 20128

    EECatalog SPECIAL FEATURE

    The ATCA 40G standard, when ratified by PICMG (expected July 2012), will map this 40G connection onto ATCA, and assign maximum contributions of cross talk and insertion loss to each of the three elements in a 40G connection: the payload blade, backplane and hub blade.

    In the absence of this standard, ATCA manufacturers have typically made very conservative assumptions about how these signal integrity parameters are mapped to each of the system components. Companies that supply all three types of components will be able to guarantee end-to-end signal integrity. There will inevitably be interoperability issues for systems that are integrated from components provided by different companies.

    EECatalog: What migration strategies are most successful in that evolution?

    Pettigrew, Emerson Network Power: There are three steps to a smooth migration to 40G: introduction of a chassis, then a switch blade and finally payload blades.

    40G-ready chassis have been available on the market for a number of years. Deploying these chassis early has enabled carriers to deploy 40G infra-structure early, providing an opportunity for field migration to 40G without the need for an expensive fork-lift upgrade.

    Introducing 40G switches into these chassis is the next log-ical step. Emerson Network Power has a fully released 40G switch product, the ATCA-F140, which can be used in one of two 40G chassis: the six slot AXP640, or the fourteen-slot AXP1440. These switches are fully backward-compatible, meaning that they will work with current 10G payload. Deploying these switches early will mean that the complete platform core will be ready for 40G payload.

    The last step to 40G heaven is to deploy 40G payload. These products are available now in early access, and will be fully released before the end of the year. Technologies such as the OCTEON II processor family from Cavium pro-vide an unprecedented amount of packet processing and bandwidth for applications such as policy and access con-trol, lawful intercept and various classes of mobile data optimization applications.

    EECatalog: With the explosion in data traffic due to VoIP and multimedia/video, how will offload engines for TCP-UDP/IP, TOE, CODEC transcoders and other packet-optimization algorithms play a role?

    Sproul, Adax: Packet processing done on specialized NPUs is key in identifying and prioritizing data traffic, especially upgrades to higher quality video as a real-time revenue stream. Premium Skype as an over the top (OTT) voice application is ideal as a revenue-generating managed service. Both of these applications, as well as low-priority Internet traffic off-load and policy-based parental controls, require line-speed packet processing.

    Pettigrew, Emerson Network Power: These offload engines are critical to provide the performance boost that general-purpose processor cores require to meet the needs of next-generation network elements. These engines are either available integrated with specialized multicore devices, like the OCTEON II from Cavium, or available as physically separate PCIe-connected devices, like the Cavium Nitrox,

    or the recently disclosed Intel Crystal Forest technology.

    EECatalog: How effectively is the industry addressing interop-erability standards across ATCA blades, shelves and backplanes?

    Sproul, Adax: ATCA has a very strong foundation in PICMG standards. Implementa-tion of these standards has also been augmented by equally strong interoperability forums. The real challenge for ATCA customers is support for the integrated system, sub-systems and middleware like DPI, security and traffic management. In this regard, successful suppliers will move the hardware to the back and bring application development support to the forefront.

    Pettigrew, Emerson Network Power: Historically, the industry has collectively worked together to improve ATCA interoperability in the context of a trade association called Communications Platform Trade Association(CP-TA). Within this association, companies that were otherwise competitors worked together to ensure that their products worked well together. This level of co-opetition was critical to the success of the ATCA standard, because if products from competing companies did not work well together, then the standard would not have been truly open.

    The electronics of 40Gb design have worked out much better

    than expected.

  • www.eecatalog.com/atca 9

    EECatalog SPECIAL FEATURE

    PICMG has since acquired the assets of CP-TA, which is now the vehicle for this interoper-ability work. The technical specifications and test pro-cedures written by the CP-TA are now managed by PICMG.

    EECatalog: How will new MicroTCA and ATCA spec extensions enhance growth opportunities?

    Sproul, Adax: In my opinion, not much. ATCA and uTCA manufacturers are already fudging the specs, especially as they relate to power and cooling. 300-400W per-slot chassis with effective cooling are on the market today. I just dont see new 800+W ATCAblades and 80W AMC cards competing against proprietary blade servers from HP, IBM and Oracle blade servers with support for AMC and PCIe cards.

    Pettigrew, Emerson Network Power: The ATCA Extensions specification is being drafted to allow for larger payload and higher density ATCA systems. This is necessary to allow ATCA systems to better compete from a price/performance perspective with traditional IT computing systems. This in

    turn will allow for deeper market penetration of ATCA into adjacent markets outside of the traditional telecom net-work core. Look for features like double-wide boards, which can accommodate more memory and larger heat sinks, and back-to-back systems, which can more effectively use the deeper system space available in the traditional data center environment. We

    expect the ATCA Extensions specification to be released by PICMG this year.

    Cheryl Berglund Coup is editor of EECatalog.com. Her articles have appeared in EE Times, Electronic Business, Microsoft Embedded Review and Win-dows Developers Journal and she has developed presentations for the Embedded Systems Conference and ICSPAT. She has held a variety of production, technical marketing and writing positions within technology com-panies and agencies in the Northwest.

    The ATCA Extensions specification is being drafted

    to allow for larger payload and higher density ATCA systems.

    Make our expertise your solution - talk to us... we care.N.A.T. - Gesellschaft fr Netzwerk- und Automatisierungs-Technologie mbHKonrad-Zuse-Platz 9 I 53227 Bonn I Germany I Fon: +49 228 965 864 0 I Fax: +49 228 965 864 [email protected] I www.nateurope.com I innovation in communication

    Key features 780W (390W optional) JKIJGHEKGPE[RQYGTEQPXGTUKQP RQYGTOCPCIGOGPVHQTRQYGTEJCPPGNU DCEMWRRQYGTHQTQVJGT2QYGT/QFWNGU5/2 UWRRQTVHQT0TGFWPFCPE[ NQCFUJCTKPI KPRWVRQYGTRTQVGEVKQP KPRWVKUQNCVKQP KPTWUJEQPVTQN KPRWV14KPI '/+KPRWVNVGTKPI JQNFWREKTEWKV QRVKECNNQCFKPFKECVQT

    The NEW DC Power Module NAT-PM-DC780The heart of your MTCA system

  • Engineers Guide to ATCA & MicroTCA Technologies 201210

    After more than a decade of acquisitions and shake-ups, many OEMs are enduring the costs of maintaining multiple hardware platforms and face challenges to drive core product development across dissimilar technology bases. Enter the Common Platform strategists who are often confronted with making inter-divisional peace whilst preparing for the next generation platform rollout. This can be tough when competition is ahead of the game and a leap in technology is needed to catch up to deploy next generation services. One way to play leapfrog is to adopt commercial off-the-shelf systems and blades. ATCA is a solid choice for any common platform strategy, where products are required to scale and span several price performance tiers. A healthy ecosystem of vendors exists providing a broad product choice and ensuring competitive pricing. Moreover, 40G ATCA allows OEMs to address their cur-rent bandwidth dilemma and provides the headroom needed to scale their products over time to meet increasing packet processing needs. It certainly helps to mitigate risk more than any other cur-rent architecture.

    Benets of a Common PlatformFor a Network Equipment Provider (NEP), establishing a common platform provides a more cost efficient method for shared product management and a combined strategy across product lines with shared upgrade paths. Product groups can focus on the added value of their individual business without being distracted by base plat-form support. By sharing engineering resources, an efficient and effective development process can be planned whilst leveraging, wherever possible, the lowest cost. Moreover, establishing a best practice and authority on a single platform creates central expertise which can be shared across the organization. Strong interworking control processes help ensure that the platform remains stable and operational. With scalability as a key consideration and by adopting a common platform strategy, a product line can be built out with longer-term scalability considerations built in.

    Balancing Differentiation with CommonalityIts important to find the balance between commonality and dif-ferentiation. Zero commonality usually means customization for a market where costs are almost impossible to reach any other way. This is often where the debate begins for MicroTCA which scales down from ATCA in all respects, but frequently doesnt meet the higher volume, lower-cost needs unless seriously cost-optimized. It does however frequently serve for rapid prototyping due to the diversity of available AMCs. Chip manufacturers often choose AMCs to build reference designs for new silicon. This make them compelling for a leapfrog technology insertion strategy with con-trolled cost down transitions depending on market acceptance. However as die sizes shrink and performance increases, so does the system platform. Solutions which may have been deployed across several blades are rapidly being consolidated on to just one.

    For example theres more packet processing power on Advantechs latest generation ATCA blade based on the Intel Xeon E5-2600 than in a fully-loaded 6-slot system of 5-years ago. This increase in miniaturization needs to be accompanied by a similar trend at the mezzanine level in order to bring more I/O and acceleration closer to the processing core to allow a single ATCA blade to become in itself, the new entry-level system. Flexible fabric connectivity is required in order to match processing performance with I/O needs for system scale up.

    Fabric Mezzanine Modules (FMM) as Common DenominatorThe FMM concept addresses the above needs and is one of the key elements in Advantechs Customized COTS (C2OTS) strategy. FMMs are a new denominator for personalizing a common plat-form at the blade level. They scale extremely well for both I/O and acceleration functions. The MIC-5333 ATCA blade, based on the Intel platform for communications infrastructure formerly known as Crystal Forest, houses three FMM sites on the front blade and between one and four FMM sites on the rear transition module enabling a wide variety of solutions. FMMs also facilitate fabric

    interface flexibility allowing equipment providers to deploy the MIC-5333 into 40G or 10G topologies. A double-sized FMM carrying four i82599s provides two fabric interfaces with four 10GBaseKR ports each. For designers requiring 40GBaseKR4 interfaces, a Mellanox CX-3 FMM supports two 40G ports enabling dual dual-star back-plane architectures with two FMM modules for four times 40Gbps in and out of the blade. Finally a single i82599 FMM makes it possible to adapt MIC-5333 with 10GbE in order to upgrade legacy systems in the field.

    The FMM specification defines the high speed interfaces and asso-ciated FRU management. In addition the specification supports a

    A Common IA Platform for Workload Consolidation on ATCABy Paul Stevens, Advantech Europe BV

    Figure 1: MIC-5333 ATCA Blade from Advantech - A common platform for workload consolidation based on the Intel Xeon E5-2600 Series with FMM sites for 40G Fabrics

  • www.eecatalog.com/atca 11

    CONTACT INFORMATION

    [email protected]/nc

    connector interface for custom fabric connectivity like SRIO. Signal integrity to the fabric is ensured via a re-driver between the zone 2 connector and the FMM. A FRU EEPROM on the FMM describes its thermal & power requirements and zone 2 interface information. All other aspects are managed by a BMC on the ATCA blade. FMMs are compact, just 7 x 7.5 cms and use FMC compliant connectors for high speed differential I/O. There is adequate space to fit 40mm BGA ASICs and FPGAs and associated components with a thermal budget < 20W. The I/O area provides overhang for connector support on front panels or rear transition modules (RTM) making FMMs a good fit for specialized processing close to the application I/O.

    With a common platform for workload consolidation like the MIC-5333, multiple FMM sites provide a wide choice of PCIe I/O and acceleration:

    t .*$ '..T'BCSJD'SPOU1BOFMt 35. '..UP3FBS1BOFMt 35.O 35.TXJUINPSF'..T

    In fact there are sufficient FMMs to turn the MIC-5333 common platform into a 100G line card with crypto acceleration.

    By adopting an FMM approach for standard and custom designs, OEMs can effectively redeploy them across form factors scaling from appliances to ATCA systems for functions such as:

    t 1SPQSJFUBSZBDDFMFSBUJPOIBSEXBSFt 4QFDJBMJ[FEDPEJOHBOEUSBOTDPEJOHBMHPSJUINTt 4JHOBMJNBHFQSPDFTTJOHt .JMJUBSZDPNNFSDJBMDSZQUPHSBQIZt 'MPXQSPDFTTJOHBOEQBDLFUMUFSJOH

    Make and Buy the best of both worldsBefore going down a Make path, OEMs should consider the benefits of ATCA, Customized COTS and FMMs as a potent Make and Buy compromise for the best of both worlds. As workload consolidation becomes a reality so does a common platform based on ATCA, and for individual blade personalization FMMs offer the broadest flexibility for mass customization in the integration and build-to-order process of final products.

    Figure 2: Advantechs RTM-5104 provides one further FMM site with PCIe x16 to the front blade for expansion

    FMM-5001BIntel 82599ES with 2

    x 10GBaseKR FI

    FMM-5001FIntel 82599ES for 2 x

    10GbE with dual SFP+

    FMM-5001QQuad Intel 82599ES

    with 8 x 10GBaseKR FI

    FMM-5002Server Graphics Controller

    with VGA connector

    FMM-5004MMellanox CX3 with 2

    x 40GBaseKR4 FI

    FMM-5006Intel QuickAssist

    Accelerator

    Figure 3: Examples of FMMs

  • Engineers Guide to ATCA & MicroTCA Technologies 201212

    OVERVIEW

    Introducing the New Comtel CO14N-AC, 14U, 14 slot 19 shelf, which is based on the original CO14N-DC shelf with integrated 1U power supply extension.

    FEATURES

    tw3BDLNPVOU6TZTUFN

    tTMPU6GSPOUCPBSETBOE35.

    tFull Mesh, Dual Star and Dual-Dual Star topologies

    available in new Enhanced designs for wider margins

    t(#"4&,3BOE(#"4&,3(#"4&,3

    t%VBMSFEVOEBOU1JHFPO1PJOU#BTFE4IFMG.BOBHFS

    tSFEVOEBOU/QPXFSTVQQMJFTFBDI8

    t1VMMDPPMJOHXJUIGPVSIPUTXBQSFEVOEBOU#MPXFST

    t)JHI3FMJBCJMJUZCVTTFE*1.*UP146TXJUI1.#VT

    t'VMMZ1*$.(3FWDPNQBUJCMF

    t%FTJHOFEGPSDPNQMJBODFUP/BOE&/MFWFMT

    t"JSJOMFUMUFSXJUIPQUJNJ[FEBJSJNQFEBODF

    BENEFITS

    t8UPUBMQPXFS8SFEVOEBOU

    t1PXFSEJTUSJCVUJPONPSFUIBO8QFSTMPU

    tHighly efficient packaging with up to 300W per slot cooling in an abbreviated 14U form factor

    t35.DPPMJOHVQUP8QFSTMPU

    tHigh performance Backplane exceeds AdvancedTCA

    specification

    t$&BOE6-4BGFUZ$FSUJDBUJPOT

    t#F[FMGPS'SPOU"JS*OMFU

    ACCESSORIES

    t'SPOUBOE3FBS$BCMF5SBZT

    tw.PVOUJOH#SBDLFUT

    t-PXDPTUBOEMJHIUXFJHIU&.$MUFSTBJSPXCMPDLJOH

    modules

    t$VTUPN;POFCBDLQMBOFTBWBJMBCMF

    DIMENSIONS

    AdvancedTCA CO14N-AC14U, 14 slot Shelf with AC Power

    By COMTEL ELECTRONICS

    Height 620.0mm (14U)

    Width 445.0mm (with ears 485.6mm)

    Depth 507.0mm

    Weight 47 kg (with 5 PSU)

    Colors: Standard: Black powder painted

  • www.eecatalog.com/atca 13

    BLOWERS

    t$PPMJOHEJSFDUJPOCPUUPNGSPOUUPVQQFSSFBSTJEF

    t#MPXFSTTQFFETFUUJOHCZ5TFOTPSTWJB*1.*

    t'VTFTGPSFBDI#MPXFS6OJU

    t54FOTPSGPSBJSPVUMFUBOETFQBSBUFBNCJFOU

    t$PPMJOHDBQBCJMJUZVQUP8BUUTTMPU

    t$PNNVOJDBUJPOCZ*1.$

    PEM

    t#VJMUJO1&.GPSQSPUFDUJPOBOEQPXFSEJTUSJCVUJPO

    t4IBSFEDVSSFOUTQMJUJOUPQPXFSCBDLQMBOFTFHNFOUT

    over Circuit Breakers (4xA-Channel and 4xB-Channel)

    t/POFFEGPSTFSWJDFEVFUP$JSDVJU#SFBLFST

    t0QUJPOBM1&..POJUPSJOH'36DPOUJOVPVTMZ

    monitors Circuit Breakers state)

    SHELF MANAGEMENT CONTROLLER

    t1JHFPO1PJOU4ZTUFNT*1.4FOUSZ4INN

    t'VMMZIPUTXBQQBCMF

    t1.#VTTVQQPSUPWFSQSJWBUF*$#VT

    t1.#VTJOUFSGBDFBWBJMBCMFPOUIFCBDLQMBOFDPOOFDUPS

    and front panel

    t3FNPUFVQHSBEFDBQBCJMJUZ

    t3.$1JOUFSGBDFBOE4/.1JOUFSGBDF

    FRU DATA BOARD/TELCO ALARM BOARD

    t'36EBUBCPBSEUPDBSSZUIF4IFMG'36EBUBJOGPSNBUJPO

    t"TTFNCMZPQUJPOGPS5&-$0"MBSNGVODUJPO3FMBZT

    contacts, Alarm indication LEDs)

    t5FMDPJOEJDBUJPOJOGSPOUPGUIFTIFMGPWFS

    additional board

    t$PNNVOJDBUJPOCZ*1.$

    POWER SUPPLY BAY

    t1SPWJTJPOTGPS14NPEVMFTJOwTIFMG

    t/PFYUFSOBMXJSJOHPOMZ"$QPXFSDPSET

    t*OEJWJEVBM"$JOMFUGPS

    each module

    t6QUP8

    POWER SUPPLIES

    t)JHI&GDJFODZ8

    Modules (1250W low line)

    t*OUFSOBM03JOH.04'&5$VSSFOU4IBSF

    t1.#VTDBQBCMF

    t1SFTFODF"$'BJMVSF1PXFS0,BSFNPOJUPSFE

    t#VJMUJOMPDLJOHNFDIBOJTNBOETFSWJDFIBOEMF

    CONTACT INFORMATION

    COMTEL ELECTRONICSwww.comtel-online.comnasales@comtel-online.com619-573-9770

  • Engineers Guide to ATCA & MicroTCA Technologies 201214

    EECatalog TRENDS

    From the late 1940s up through 1990, all computers were CPU-bound: the I/O interconnections could pro-vide more data than the CPU could process. After 1990, clock speeds for microprocessors were doubling every 18 months (Moores Law), and CPU vendors started putting multiple cores on the same die. From 1990 through today, computers are I/O-bound: the CPUs can process more data than the interconnects can provide. Before too long, new embedded architectures will be needed such as the 4-dimension hypercube shown in the Figure.

    While the increases in CPU performance were revo-lutionary in the past 15 years, the increases in I/O bandwidth have been merely incremental. When we were using parallel buses such as VME or PCI as the primary architecture, we increased performance of the machine

    by widening the data buses...from 8 bits, to 16, 32, 64, and in some instances to 128 and 256-bits wide. VME, for instance, went from 16-bits wide to 2-bits wide, and then to 64-bits wide in only 10 years. And, we also clocked-up the buses from time to time. But, the rule of thumb is that every time you double your bus clock speed, the distance you can run the bus is cut in half, due to reflections and other signal integrity problems associated with single-ended signals.

    During this period where buses ruled the computer land-scape, we started connecting multiple processors on the already-slow bus connections to create multi-processing systems. Since the bus was a shared resource, CPUs had to arbitrate for the use of the bus, or share data with a cache coherency scheme (Snooping and Snarfing). Thats when we discovered the law of diminishing returns. According to many computer science studies, after four processors we hit the knee of the processing curve: each added pro-cessor did less and less work. A four-processor system could outperform an 8-processor system; not good value for the money.

    In the 2000s, we switched from parallel I/O buses like PCI to multi-gigaHertz high-speed serial buses using differen-tial signaling. That helped a little, but we still remained seriously I/O-Bound. PCI-Express (PCIe) was slightly helpful in relieving some of the bandwidth problems, but the stupid tree structure a carry-over from the old par-allel PCI bus architectureand the high latency associated with the transfers just exacerbated the existing problem. PCIe was never designed as an interprocessor communi-cations (IPC) mechanism. Desktop and laptop PCs were considered single-processor systems, so there was no need for an efficient and powerful IPC technique.

    Companies outside the nefarious PC morass (such as us in the embedded industry) recognized the need for faster interprocessor data bandwidths in multiprocessor systems. They designed Serial RapidIO (SRIO), InfiniBand (IB), and even the Ethernet crowd started efforts to increase IPC

    Moving To N-Dimensional Embedded Supercomputers... But first, lets look at where computers startedFrom CPU-bound to todays I/O-bound architectures. We now have enough CPU horsepower to worry about I/O bottlenecks. But how did we get here? And, where will system designs go next?By Ray Alderman, Executive Director, VITA

    Figure: A 4-dimension hypercube where every node connects to four others. This can scale to n nodes to realize embedded supercomputing architectures. (Courtesy: Wikipedia.)

  • www.eecatalog.com/atca 15

    EECatalog TRENDS

    bandwidth by eliminating the huge heavy protocol stacks infamous in traditional Ethernet connections. Now we can hook lots of processors together to build some potentially powerful computing systems. But, Gene Amdahls Law showed us yet another instance of diminishing returns to consider.

    All microprocessors use Von Neumann or Harvard archi-tectures that execute one instruction on one data element at a time in a serial fashion (SISD, or single instruction, single data). This convention matched how programmers think: manipulating data one element at a time. That morphed into architectures that execute multiple instruc-tions on multiple data elements (MIMD), where different parts of the CPU are operating on multiple data elements with multiple instructions in parallel. Amdahl, in his law, says that only a very small segment of a serially-contrived program can be parallelized and executed on multiple processors to enhance performance. Amdahl also says that very few programs can gain any significant performance improvement through parallelization.

    But, when certain applications are parsed by the pro-grammer into specific segments than can run concurrently on different processors, we begin to defeat the law of diminishing returns and Amdahls law. Algorithms used in Radar, Sonar, Signal Intelligence (SIGINT), and Elec-tronic Warfare are some interesting examples. These are algorithms for Fast Fourier transforms (FFT) and SWARM algorithms (a collection of autonomous craft operating collectively) for UAVs (Unmanned Aerial Vehicles) and UUVs (Unmanned Underwater Vehicles). Other applica-tions that can be effectively parsed for parallel processing

    are simulations in Finite Element Analysis (FEA) and Com-putational Fluid Dynamics (CFD). For the past 80 years, all computer architectures have been stuck in terribly infantile 2-dimensional implementation domains. But, as the high-speed serial connections on both copper and optical links begin to eliminate the I/O-bound limitations of present-day computer architectures, we must move to n-dimensional architectures to recognize supercomputing performance levels. That can be done with 4-dimensional and 6-dimensional hypercubes (see Figure). VITA is now setting the standards for these advanced computing archi-tectures in embedded applications.

    After all, there are only three possible hardware and pro-tocol architectures for the I/O and IPC links in a computer system. But, thats a topic for another paper, since it takes a lot more space to describe than my evil editorial masters have allotted me here.

    Ray Alderman is the Executive Director of VITA, an ANSI-certified standards developer for high-performance computer systems and architectures used in critical embedded appli-cations. He was previously Technical Director of VITA, CEO of PEP Modular Computers, and a partner and founder at Matrix Corporation. Ray worked in mainframe computers at Burroughs Corporation, and was a microprocessor applications engineer for both Texas In-struments (TMS9900) and Motorola (6809 and 68000) after serving in the US Army Military Intelligence group during the Vietnam war.

  • Engineers Guide to ATCA & MicroTCA Technologies 201216

    EECatalog SPECIAL FEATURE

    In recent years, there has been a market and technology trend towards the convergence of network infrastructure to a common platform or modular components that support multiple network elements and functions, such as application processing, control processing, packet processing and signal processing. In addition to cost savings and reduced time-to-market, this approach provides the flexibility of modularity and the ability to independently upgrade system components where and when needed, using a common platform or modular components in shelf systems and networks of varying sizes. In traditional networks, switching modules would be used to route traffic between in-band system modules and out-of-band systems; processor modules used for applications and control-plane functions; packet processing modules used for data-plane functions; and DSP modules used for specialized signal-plane functions. Four different types.

    Enhancements to processor architecture and the avail-ability of new software development tools are enabling developers to use a single blade architecture for consolidation of all their application, control and packet-processing workloads. Huge performance boosts achieved by this hardware/software combination are making the processor blade architecture increasingly viable as a packet-processing solution. To illustrate this evolution, we developed a series of tests to verify that an AdvancedTCA processor blade combined with a data-plane development kit (DPDK) supplied by the CPU manufacturer can provide the required performance and consolidate IP forwarding services using a single platform. In summary, we compared the Layer3 for-warding performance of an ATCA blade using native Linux IP forwarding without any additional optimization from soft-ware with that obtained using the DPDK. We then analyzed the reasons behind the gains in IP forwarding performance achieved using the DPDK. [Editors note: DPDK is an Intel product.]

    AdvancedTCA Processor BladeThe ATCA blade used in this study is a highly integrated processor blade with dual x86 processors, each with 8 cores (16 threads) and supporting eight channels of DDR3-1600 VLP RDIMM for a maximum system memory capacity of 64GB per processor. Network I/O features include two 10Gigabit Ethernet ports (XAUI, 10GBase-KX4) compliant with PICMG 3.1 option 1/9, and up to six Gigabit Ethernet 10/100/1000BASE-T ports to the front panel. The detailed architecture of the ATCA blade is illustrated in the func-tional block diagram in Figure 1.

    Data-Plane Development KitThe data plane development kit provides a lightweight

    run-time environment for x86 architecture processors, offering low overhead and run-to-completion mode to maximize packet-processing performance. The environ-ment provides a rich selection of optimized and efficient libraries, also known as the environment abstraction layer (EAL), which are responsible

    for initializing and allocating low-level resources, hiding the environment specifics from the applications and libraries, and gaining access to the low-level resources such as memory space, PCI devices, timers and consoles.

    The EAL provides an optimized poll mode driver (PMD); memory & buffer management;and timer, debug and packet-handling APIs, some of which may also be provided by the Linux OS. To facilitate interaction with application layers, the EAL, together with standard the GNU C Library (GLIBC), provide full APIs for integration with higher level applications. The software hierarchy is shown in Figure 2.

    Test TopologyIn order to measure the speed at which the ATCA processor blade can process and forward IP packets at the Layer3 level, we used the following test environment shown in Figure 3.

    Consolidating Packet Forwarding Services with Data-Plane Development SoftwareConsolidating all three planes to a single ATCA blade is now possible.

    By Jack Lin, Yunxia Guo, and Xiang Li, ADLINK

    Running the DPDK provides almost 6x the IP forwarding performance compared to

    native Linux.

  • www.eecatalog.com/atca 17

    EECatalog SPECIAL FEATURE

    Two ATCA switch blades with networking software pro-vided non-blocking interconnection switches for the

    10GbE Fabric and 1GbE Base Interface channels of all three processor blades in the ATCA shelf, which supports a full-mesh topology. Therefore, each switch blade can provide at least one Fabric and Base interface connection to each processor blade. A test system, compliant with RFC2544 for throughput benchmarking, was used as a packet simulator to send IP packets with different frame sizes and collect the final statistical data, such as frames per second and throughput.

    As shown in the topology of the test environment in Figure 3, the ATCA processor blade (device under test: DUT) has four Gigabit Ethernet interfaces: two directly from the front panel (Flow1 and Flow2), and another two from the Base Interfaces (Flow3 and Flow4) via the DUTs Base switches. In addition to these four 1GbE interfaces, the DUT has two 10GbE inter-faces connected to the test system via the switch blade.

    ZONE3

    IntelXeon E52648L

    8core

    DDR3 1600 RDIMM

    ZONE2

    FRONT

    PANEL

    CFast

    BIOS BIOS

    TPM

    ZONE1

    DDR3 1600 RDIMM

    DDR3 1600 RDIMM

    DDR3 1600 RDIMM

    DDR3 1600 RDIMM

    DDR3 1600 RDIMM

    COM3

    COM2LPC

    IPMB 0/1

    COM1

    SATA

    SAS

    QPI 8.0 GT/s

    SAS x3, USB x3, COM, PCIE x8, SerDes x2, SATA x2,

    FCH1

    RTM

    RTM

    PCIE x4

    IPMCBMRH8SSuper

    I/O

    DDR3 1600 RDIMM DDR3 1600 RDIMM

    QPI 8.0 GT/s

    IntelXeon E52648L

    8core

    x4 DMI 2.0

    Intel C604 PCH

    SAS x3, USB x3

    PCIE x4

    core

    core

    core

    core

    core

    core

    core core

    FCH2

    BCH1

    BCH2

    PCIE x8

    Intel82576EB

    Intel82599EB

    (aDB6100A)Fabric Riser Card

    PCIE x4

    core

    core

    core

    core

    core

    core

    core core

    2.5" SATA HDD

    PCIE x4

    MidSize AMC AMC.1 T4AMC.2 E2AMC.3 S2

    PCIE x8RTM

    Silicon MotionSM750

    PCIE x1VGA

    USB x3

    SATA

    Intel82580EB

    Port 1 Port 2

    Port 3 Port 4C S C S

    C S C S

    RJ45

    RJ45

    MUX

    RTM

    FRC

    Fabric Riser Card

    AMC.2 E2

    AMC

    Cave Creek(Optional)

    PCIE x16

    PCIE x4Intel82580

    SPI

    Figure 1: ADLINK aTCA-6200 functional block diagram used for the performance study.

    Application

    Libc (GLIBC) EAL

    Linux Kernel

    Open/read/writeStandard

    memory allocContiguous or

    DMA memory allocPCI configurations,

    Scan, and I/OHardware

    init

    Specific UIO driverSpecific UIO driver

    Figure 2: EAL and GLIBC in Linux application environment

  • Engineers Guide to ATCA & MicroTCA Technologies 201218

    EECatalog SPECIAL FEATURE

    In our test environment, the DUT was responsible for receiving IPv4 packets from the test system, processing these packets at the Layer3 level (e.g., packet de-encapsulation, IPv4 header checksum validation, route table look-up and packet encapsulation), then finally sending the packets back to the test system according to the routing table look-up result. All six flows are bi-directional: for example, the test system sends frames from Interface 1/2/3/4/5/6 to the DUT and receives frames via Interface 2/1/4/3/6/5, respectively.

    Test MethodologyTo evaluate how the DPDK consolidates packet-forwarding services on the processor blade, an IP forwarding application based on the DPDK was used in the following two test cases:

    Performance with native Linux

    In this test, UbuntuServer 11.10 64-bit was installed on the ATCA processor blade.

    Performance with DPDK The DPDK can be run in different modes, such as Bare Metal, Linux with Bare Metal Run-Time and Linux User Space. The Linux User Space mode is the easiest to use in the initial development stages. Details of how the DPDK functions in Linux User Space Mode are shown in Figure 4.

    After compiling the DPDK target environment, an IP forwarding application can be run as a Linux User Space application.

    ADLINK aTCA-8505 ShelfIxia XM12 Test System

    Test Monitor

    Flow2: 1GbE to Front Panel

    ADLINKaTCA-6200

    (DUT)

    ADLINKaTCA-3400

    Switch Blades

    Flow1: 1GbE to Front Panel

    Flow4: 1G to Base

    Flow3: 1G to Base

    Flow6: 10G to Fabric

    Flow5: 10G to Fabric

    1 Gigabit Ethernet

    10 Gigabit Ethernet

    Figure 3: IP Forwarding Test Environment used for benchmarking.

    Figure 4: Intel DPDK running in Linux User Space Mode

  • www.eecatalog.com/atca 19

    EECatalog SPECIAL FEATURE

    ResultsAfter testing the ATCA processor blade under native Linux and with the data-plane development kit provided by the CPU manufacturer, we compared the IP forwarding per-formance in these two configurations from the four 1GbE interfaces (2 from the front panel and 2 from the Base Interfaces) and two 10GbE Fabric Interfaces. In addition, we benchmarked the combined IPv4 forwarding performance of the processor blade using all six interfaces simultane-ously (four 1GbE interfaces and two 10GbE interfaces).

    Performance comparison using four 1GbE interfacesWhen running IPv4 forwarding on the four 1GbE interfaces of the processor blade with native Linux IP for-warding enabled, a rate of 1 million frames per second can be sustained with a frame size of 64 bytes. As the frame size is increased to 1024 bytes, native Linux IP forwarding can approach 100% of the line rate. But in the real world, frame sizes are usually smaller than 1024 bytes, so 100% line rate forwarding is not achievable. However, with the DPDK running on only two CPU threads under the same Linux OS, the processor blade can forward frames at 100% line speed without any frames lost regardless of the frame size setting, as shown in Figure 5.

    The ATCA processor blade running the DPDK provides almost 6 times the IP forwarding performance compared to native Linux IP forwarding.

    Performance comparison using two 10GbE interfacesRunning the IP forwarding test on the two 10GbE Fabric Interfaces shows an even greater performance gap between native Linux and DPDK-based IP forwarding than

    that using four 1GbE interfaces. As shown in Figure 6, the processor blade with DPDK running on only two threads provides a gain of more than 10 times IP forwarding per-formance compared to native Linux using all available CPU threads.

    Total IPv4 forwarding performance of the processor bladeTesting the combined IP forwarding performance of the processor blade using all available interfaces (two 10GbE Fabric Interfaces, two 1GbE front panel interfaces and two 1GbE Base Interfaces), the processor blade with the DPDK can forward up to 27 million frames per second when the frame size is set to 64 bytes. In other words, up to 18Gbps of the theoretical 24Gbps throughput can be forwarded (i.e., 75.3% of the line rate). Furthermore, the throughput in terms of the line rate increases to 92.3%, even up to 99%, when the frame size is set to 128 bytes and 256 bytes respectively.

    AnalysisThe reasons why the DPDK can consolidate more pow-erful IP forwarding performance than available with native Linux come mainly from the DPDK design features described below.

    Polling mode instead of interruptsGenerally, when packets come in, native Linux receives interrupts from the network interface controller (NIC), schedules the softIRQ, proceeds with context switching, and invokes system calls such as read() and write().

    IPv4 L3 Forwarding Performance of Native Linux and Intel DPDK(ADLINK aTCA6200 with 4x 1GbE interfaces)

    1055617 1169236 1149230

    5952380

    3378378

    677276435150 288524

    1811594

    939849 478927 32509717.70%

    34.60%

    63.40%

    73.20%

    90.80% 88.80%

    100% 100% 100% 100% 100% 100%

    0

    1000000

    2000000

    3000000

    4000000

    5000000

    6000000

    7000000

    64 128 256 512 1024 1518 Packet Size (bytes)

    Fram

    es p

    er S

    econ

    d

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    70.00%

    80.00%

    90.00%

    100.00%

    % o

    f Lin

    e R

    ate

    fps Linux fps DPDK % of line rate Linux % of line rate DPDK

    Figure 5: IP Forwarding performance comparison using 4x 1GbE interfaces

  • Engineers Guide to ATCA & MicroTCA Technologies 201220

    EECatalog SPECIAL FEATURE

    In contrast, the DPDK uses an optimized poll mode driver (PMD) instead of the default Ethernet driver to pull the incoming packets continuously, avoiding software inter-rupts, context switching and invoking of system calls. This saves significant CPU resources and reduces latency.

    Huge page instead of traditional pagesCompared to the 4 kB pages of native Linux, using larger pages means time savings for page look-ups and the reduced possibility of a translation look aside buffer (TLB) cache miss.

    The DPDK runs as a user-space application by allocating huge pages in its own memory zone to store frame buffer,

    IPv4 L3 Forwarding Performance of Native Linux and Intel DPDK(ADLINK aTCA6200 with 2x 10GbE interfaces)

    1934474 1925674 1929346

    20952182

    15321772

    1926566 1927670 1607140

    9048938

    46945582392158 1625478

    6.50%11.40%

    21.30%

    41.00%

    80.50%

    98.87%

    70.40%

    90.70%

    99.90% 99.90% 99.90%100.00%

    0

    5000000

    10000000

    15000000

    20000000

    25000000

    64 128 256 512 1024 1518 Packet Size (bytes)

    Fram

    es p

    er S

    econ

    d

    0.00%

    10.00%

    20.00%

    30.00%

    40.00%

    50.00%

    60.00%

    70.00%

    80.00%

    90.00%

    100.00%

    % o

    f Lin

    e R

    ate

    fps Linux fps DPDK % of line rate Linux % of line rate DPDK

    Figure 6: IP Forwarding performance comparison using 2x 10GbE interfaces

    IPv4 L3 Forwarding Performance of ADLINK aTCA6200 and Intel DPDK(with 2x 10GbE Fabric + 4x 1GbE)

    26904562

    18700148

    35714286

    10860530

    5634406

    28710821950574

    20270270

    10869565

    5639098

    28735631950585

    75.3%

    92.3%

    99.9% 99.9% 99.9% 100.0%

    0

    5000000

    10000000

    15000000

    20000000

    25000000

    30000000

    35000000

    40000000

    64 128 256 512 1024 1518 Packet Size(Byte)

    Fram

    es p

    er S

    econ

    d

    0.0%

    20.0%

    40.0%

    60.0%

    80.0%

    100.0%

    120.0%

    % o

    f Lin

    e R

    ate

    actual fps theoretical fps % of line rate theoretical line rate

    Figure 7: IP Forwarding performance comparison using 2x 10GbE + 4x 1GbE interfaces

  • www.eecatalog.com/atca 21

    EECatalog SPECIAL FEATURE

    ring and other related buffers, that are out of the control of other applications, even the Linux kernel. In the test described in this white paper, a total of 1024@2MB huge pages are reserved for running IP forwarding applications.

    Zero-copy buffersIn traditional packet processing, native Linux decapsulates the packet header, and then copies the data to the user space buffer according to the socket ID. Once the user space application finishes processing the data, a write system call is invoked to send out data to the kernel, which takes charge of copying data from the user space buffer to the kernel buffer, encapsulates the packet header and finally sends it out via the relevant physical port. Obviously, the native Linux process sacrifices time and resources on buffer copies between kernel and user space buffers.

    In comparison, the DPDK receives packets at its reserved memory zone, which is located in the user-space buffer, and then classifies the packets to each flow according to configured rules without copying to the kernel buffer. After processing the decapsulated packets, it encapsulates the packets with the correct headers in the same user-space buffer, and finally sends them out to the relevant physical ports.

    Run-to-implement and core affinityPrior to running applications, the DPDK initializes to allocate all low-level resources, such as memory space, PCI device, timers, consoles, which are reserved for DPDK-based applications only. After initialization, each of the cores are launched to take over each execute unit, which run the same or different workloads, depending on the actual application requirements.

    Moreover, the DPDK provides a way to set each execute unit running in each core to keep more core affinity, thus avoiding cache misses. In the tests described, the physical ports of the processor blade are bound to two different CPU threads according to affinity.

    Lockless implement and cache alignmentThe libraries or APIs provided by the DPDK are optimized to be lockless to prevent dead locks for multi-thread appli-cations. For buffer, ring and other data structures, the DPDK also optimizes them to be cache aligned to maximize cache-line efficiency and minimize cache-line contention.

    ConclusionBy analyzing the results of our tests using the ATCA pro-cessor blades four 1GbE interfaces and two 10GbE Fabric Interfaces with and without the data plane development kit provided by the CPU manufacturer (Figures 5 and 6), we can conclude that running Linux with the DPDK and using only two CPU threads for IP forwarding can achieve approximately 10 times the IP forwarding performance of that achieved by native Linux with all CPU threads run-ning on the same hardware platform.

    As is evident in Figure 7, the IPv4 forwarding performance achieved by the processor blade with the DPDK makes it cost- and performance-effective for customers to migrate their packet processing applications from network processor-based hardware to x86-based platforms, and use a uniform platform to deploy different services, such as application pro-cessing, control processing and packet processing services.

    Jack Lin is the team manager of Platform Inte-gration and Validation, Embedded Computing Product Segment, which focuses on validat-ing ADLINK building blocks and integrating application-ready platforms for end customers. He holds a B.S. and M.S. in information and communication engineering from Beijing JiaoTong University. Prior to joining ADLINK, he worked for Intel and Kasenna.

    Yunxia Guo is a PIV software system engineer in ADLINKs Embedded Computing Product Segment and holds a B.S. in communication engineering from Hubei University of Technol-ogy and an M.S. in information and communi-cation engineering from Wuhan University of Technology.

    Xiang Li is a member of the platform integra-tion and validation team in ADLINKs Embed-ded Computing Product Segment. He holds a B.S. in electronic and information engineering from Shanghai Tongji University.

  • Engineers Guide to ATCA & MicroTCA Technologies 201222

    EECatalog SPECIAL FEATURE

    Multicore processor technology combined with the AdvancedTCA form factor results in multi-faceted per-formance scaling options: performance can be scaled by using processor silicon with more processor cores as well as adding more ATCA blades into the chassis. Moreover, ATCA systems are easy to configure for a specific work load by com-bining standard multi-core x86 processors with specialized packet processors. Having multiple cores within a processor is potentially highly advantageous, of course, but they are useless unless the software infrastructure has a means of utilizing these cores. Virtualization is one technique that allows multiple cores to run multiple applications and their operating systems in parallel. New application develop-ment - or porting an existing application to a multicore environment - is eased by the development tools that are available. Packet processors in particular have a pow-erful set of tools that allow designing applications that run in parallel on multiple cores.

    Multicore on the RiseJust a few years ago, each new processor silicon release brought along a worthwhile clock frequency improve-ment. Today, however, clock frequency is not the main news in a new generation processor release; its the number of processor cores within the device thats taking center stage. As usual, small startups such as Cavium Networks and NetLogic (now Broadcom) were the first to market with multicore general purpose processors. Then followed the giants: Intel, AMD and Freescale. Today, 4-8 cores within a processor is the norm - and there are archi-tectures available that feature as many as 64 cores within one processor.

    The motivation for multicore processors is fairly simple: when running a typical application, the processor spends

    most of its time waiting for data to process. Historically, memory latency improved at a much slower pace than the speed of the processor. Today, the mismatch between processor and memory is such that adding a few extra clocks to the processor doesnt improve performance to any worthwhile degree. As if this is not a big enough problem, there is the issue of power consumption: adding a few extra Hertz to the clock translates into a significant increase in power consumption.

    From the multicore architecture perspective, having mul-tiple cores, each running perhaps at a slightly slower speed, results in a higher overall performance solution. Considering that the processor spends roughly three quarters of its time

    waiting for the memory, this approach works well for appli-cations that can benefit from parallel processing. Obvi-ously, the memory subsystem implementation has to sup-port multiple data accesses in parallel, which is typically the case today.

    From Enclosure to an AdvancedTCA SystemLets move the focus from the silicon to the system. When

    a single server with two or four multicore processors is required the 19-inch rack-mountable enclosure the pizza box - works very well. When the application requires more than that, or when redundancy and higher reliability are required, AdvancedTCA becomes a good choice for system implementation (Figure 1). The AdvancedTCA chassis can support up to 14 dual processor blades interconnected via two high performance Ethernet switches in a redundant fashion (Figure 2). All blades within the chassis share power supplies and cooling fans, which are also imple-mented to support redundancy and higher reliability.

    Performance Grows When Multicore Partners with ATCAATCA is the ideal platform for compute-intensive multicore applications. Even when legacy applications cant use multicore performance, virtualization evens the score in a tidy hardware system.

    By Gene Juknevicius, GE Intelligent Platforms

    ATCA allows further consolidation of multiple

    blades with multiple multicore processors: racks of legacy servers can be reduced to a

    single ATCA chassis.

  • www.eecatalog.com/atca 23

    EECatalog SPECIAL FEATURE

    A key requirement when building a multi-blade system is a high speed, reliable interconnect between the blades. From this perspective, an ATCA system interconnects each blade via a Fabric Interface and Base Interface. The Fabric Interface, which is considered to be a data path interface, is predominantly 10Gbit Ethernet today with some appli-cations already switching to 40Gbit Ethernet. The Base Interface is a control path and is implemented using 1Gbit Ethernet. Both Fabric and Base Interfaces are implemented in a redundant fashion, such that each ATCA blade connects

    to both ATCA hubs which provide the required Ethernet switching resources. All connectivity is provided via the ATCA backplane, reducing external cabling, thereby making the overall system more reliable and more serviceable.

    The separation of the control plane and data plane not only enables high performance blade management and control services, but also isolates the control traffic from the revenue-generating data plane traffic. Such isolation of the two planes becomes critical when overall system secu-rity is considered. Plane isolation ensures that data plane traffic, which is typically customer-facing traffic, will not intentionally or unintentionally start managing Ethernet switches and disrupt the operation of the complete system.

    Compute Application RequirementsDepending on the application type, the high performance interconnect brings a different value proposition. In a com-pute type application, its essential that large numbers of processors communicate with high throughput and very low latency. To that extent, 10Gbit and 40Gbit Ethernet can provide the required data throughput via the Fabric Inter-face. Some Ethernet switches also support pass-through switching mode where packet transmission starts before the packet is fully received. In such cases, packet switching latency can be lower than 500ns. Although configuring two hubs (Ethernet switches) in an ATCA chassis is primarily for redundancy, it is also possible to use both hubs in par-allel, effectively doubling the available bandwidth.

    From the compute power density perspective, it is inter-esting to note that 14 ATCA blades (Figure 3), each featuring dual Intel 8-core Sandy Bridge processors, yields no fewer than 224 x86 cores within a single ATCA chassis, all inter-connected via an in-chassis high speed interconnect.

    Compute applications also tend to require significant storage capacity, bandwidth and reliability. There are three main ways to address storage requirements. At the lowest level, each ATCA blade can have local hard disks, located on the blade itself or on an associated rear transition module (RTM): these could be two redun-dant serial-attached SCSI (SAS) drives. At the next level, one or more storage ATCA blades could be used within the system. Such storage blades would be accessed via Ethernet using either the FCoE (Fibre Channel over Ethernet) or iSCSI proto-cols. ATCA storage blades can be shared among multiple processor blades. Finally, an external storage array can be con-nected via Fibre Channel, FCoE or iSCSI.

    Figure 1: A fully populated ATCA integrated platform from NEI, an integration partner with GE Intelligent Platforms.

    Figure 2: The internal interconnect diagram for a 16-slot ATCA chassis containing 14 blades and two Ethernet switch blades.

  • Engineers Guide to ATCA & MicroTCA Technologies 201224

    EECatalog SPECIAL FEATURE

    Communication, Parallel Processing and MulticoreA key feature of communication applications is their require-ment for high data throughput and packet processing. Also, they typically lend themselves well to parallel processing which is where multicore technology finds its optimal advantage. Although processors from both AMD and Intel are excellent computing devices - especially when multiple cores are considered - both lack the ability to efficiently get data in and out at very high data rates.

    Packet processors, another type of multicore processor architecture, are specifically optimized to address the problem of efficiently moving packetized data in and out. Such devices are readily available in the ATCA blade form factor allowing system designers to take advantage of both x86 compute resources and packet processor packet manipulation resources within the same system. The interoperability inherent in the ATCA specification enables designers to plug in multiple x86 processor blades

    as well as multiple packet processor blades and intercon-nect them via high performance Ethernet interfaces.

    From this perspective, Ethernet switches within hubs provide additional value in load distribution. Ethernet switches today employ sophisticated Access Control List features that allow packets, based on their Layer-2 to Layer-4 information, to be steered to a specific ATCA blade. Such policy-based routing allows packet streams to be distributed at very high data rates (10Gbit/sec to 100Gbit/sec) among multiple ATCA blades while ensuring that packets belonging to the same flow are always directed to the same blade. An example of a high performance com-munication system is shown in Figure 4.

    From the data processing perspective, data enters the system via Ethernet hubs where packets are distributed - based on policies - among packet processing blades. Then, within the packet processor blade, packets are further dis-tributed between two OCTEON devices and finally, within each OCTEON device, between the cores. The packet processors perform the majority of the high throughput packet processing and specific packets requiring more extensive processing power are forwarded to x86-based blades. The key principle here is that although the majority of packets require little processing, a small subset requires more significant processing power.

    Software Development Optimizes All Those CoresIt is clear that any ATCA system is useless without software. Having hundreds of processor cores offers huge potential, however, unless used efficiently, they are a waste of sil-icon. Historically, most applications were written without any parallel computing concepts in mind. Consequently, although modern compilers attempt to recognize areas in the code that lend themselves to parallel processing

    and try to harness the power of multiple cores, performance improvements are very limited when running legacy appli-cations on multicore hardware.

    Virtualization is often used today to better utilize multiple processor cores. In a vir-tualized environment, multiple instances of the operating system or even multiple dissimilar operating systems - run on the same multicore processor. Since each oper-ating system has no relationship with the other, the operating systems can be happily executed in parallel on multiple cores. Hard-ware, with the help of a Hypervisor, ensures that each operating system can safely access its own memory and I/O devices without disturbing its neighbors. Virtualization

    Figure 3: A GE Intelligent Platforms ATCA single board computer. When populated with an 8-core CPU, a full ATCA chassis could contain as many as 224 cores.

    Figure 4 An ATCA System with two x86 blades and ten Cavium OCTEON blades illustrates the way packet processors can be efficiently used with general purpose CPUs. ATCA is ideal for mixing multi-multicores.

  • www.eecatalog.com/atca 25

    EECatalog SPECIAL FEATURE

    allows the consolidation of multiple physical servers into one server with multicore processors.

    ATCA allows further consolidation of multiple blades with multiple multicore processors: racks of legacy servers can be reduced to a single ATCA chassis. Virtualization within the ATCA environment provides another benefit -- redundancy and high availability. Using a high availability virtualized operating system, an application can be migrated from one physical server to the other if hardware failure occurs.

    Since packet processors were designed for parallel pro-cessing from the start, their software environment and development tools are fully geared toward application development in a multicore environment. Although Caviums OCTEON and similar devices are often called packet processors, internally they are based on standard processor architectures such as MIPS64, and can run standard operating systems such as Linux. Their perfor-mance advantages, however, are best exposed when run-ning simplified proprietary operating systems, such as Caviums Simple Executive. It is important not to con-fuse these devices and their operating systems with the network processors of the past, such as Intel s XScale. Modern packet processors are programmed using standard C and C++ even when their proprietary operating system is being used; in fact, they allow existing C code to be simply ported.

    Simplistic applications, such as a packet filter or L-2, L-3 switch, can be developed as sequential code which runs to completion and executes in an endless loop. The same code could be run on all cores, and the parallel nature of the processing would be provided by the hardware itself, which would schedule a packet processing event onto the next available processor core, enforcing packet ordering and atomicity rules if desired. The hardware also takes care of memory management and cache coherency, allowing developers to focus on the application itself. Inter-core communication can be implemented by setting aside a shared memory region or by using a shared variable.

    Depending on the application and development require-ments, a number of software packages can help developers get a head start. One notable example is 6WINDGate soft-ware which allows the seamless marriage of x86 processors with packet processors, offloading time-critical tasks to be run by the packet processors Simple Executive, and providing a large number of frequently needed protocols. 6WINDGate can be used standalone, or as a base platform for a specific application, and can abstract inter-processor and inter-core communications, significantly simplifying software development effort.

    ATCA and Multicore: Well MatchedToday, multicore processors are an integral element of elec-tronics design and are well supported by the AdvancedTCA infrastructure. AdvancedTCA enables very high compute density, without sacrificing reliability and redundancy. Redundant high-speed chassis-wide interconnect options

    support high performance computing clusters as well as high performance commu-nication applications. Load balancing and policy routing techniques enable packet distribution among the blades, avoiding bottlenecks and fully utilizing multicore devices. Although most legacy applications cant take

    advantage of multicore performance, software techniques such as virtualization let multiple legacy applications run on the same processor, taking full advantage of the avail-able multiple cores. Finally, software tools and hardware offload elements ease new application development or existing application porting to multicore environments.

    Gene Juknevicius is a Technologist and Ar-chitect at GE Intelligent Platforms. He has participated in the PICMG, AMC and Mi-croTCA committees, is currently an active member of the SCOPE Alliance and is responsi-ble for new product definition and architecture at GE Intelligent Platforms. He received his M.S. degree in Electrical Engineering from Stanford University. Gene can be contacted at [email protected].

    Virtualization allows the consolidation of multiple

    physical servers into one server with multicore processors.

  • Engineers Guide to ATCA & MicroTCA Technologies 201226

    EECatalog SPECIAL FEATURE

    Modern wireless service providers are continually pushing for more bandwidth to deliver Internet Protocol (IP) services to more users. Long-Term Evolution (LTE) is a next-generation cellular technology promising to answer this demand by enhancing current deployments of 3GPP networks and enabling significant new service opportuni-ties. LTEs complex, evolved architecture introduces new challenges in designing and testing network and user equipment. One of the key challenges at the air interface is power management during signal transmission.

    In a digital communication system such as LTE, the power that leaks from a transmitted signal into adjacent channels can interfere with transmissions in neighboring channels and impair system performance. The adjacent channel leakage-power ratio (ACLR) test verifies system transmitters are performing within specified limits. Performing this critical

    test quickly and accurately can be challenging given LTEs complexity (see sidebar, Complexity in LTE Transmitter Design). Meeting this challenge requires a signal generator with LTE-specific signal creation software, a modern signal analyzer with LTE-specific measurement software and use of optimization techniques for the analyzer.

    Understanding ACLR Test RequirementsACLR is a key transmitter characteristic included in the LTE RF transmitter conformance tests. These tests verify that minimum requirements are being met in the base station (eNB) and user equipment (UE). Most of the LTE conformance tests for out-of-band emissions are similar in scope and purpose to those for W-CDMA. However, while W-CDMA specifies a root-raised cosine (RRC) filter for making transmitter measurements, no equivalent filter is defined for LTE by standard. Thus, different filter implementations can be used for LTE transmitter testing to optimize either in-channel performance, resulting in improved error vector magnitude or out-of-channel performance, and in turn, better adjacent channel power characteristics.

    Given the extensive number of complex transmitter config-urations that can be used to test transmitter performance, LTE specifies a series of downlink signal configurations known as E-UTRA test models (E-TM) for testing the eNB. The models are grouped into three classes: E-TM1, E-TM2 and E-TM3. The first and third classes are further subdivided into E-TM1.1, E-TM1.2, E-TM3.1, E-TM3.2, and E-TM3.3. Note that the E in E-UTRA stands for enhanced and designates LTE UMTS terrestrial radio access, whereas UTRA without refers to W-CDMA.

    ACLR test requirements differ depending on whether the transmitter tests are being conducted on UE or eNB. For UE testing, the ACLR requirement is not as stringent as for the eNB. Transmitter tests are carried out using the reference mea-surement channels (RMC) specified for eNB receiver testing.

    The 3GPP specifications for LTE define ACLR as the ratio of the filtered mean power centered on the assigned channel frequency to the filtered mean power centered on an adja-cent channel frequency. Minimum ACLR conformance requirements for the eNB are given for two scenarios: adjacent E-UTRA channel carriers of the same bandwidth (E-UTRAACLR1), and UTRA adjacent and alternate channel carriers (UTRAACLR1 and UTRAACLR2, respectively).

    Techniques for Measuring ACLR Performance in LTE TransmittersBy Jung-ik Suh, Agilent Technologies

    Complexity in LTE Transmitter DesignWith performance targets set exceptionally high for LTE, engineers have to make careful design tradeoffs to cover each critical part of the radio transmitter chain. One important aspect of LTE transmitter design involves minimizing unwanted emissions, and in particular, spurious emissions which can occur at any frequency. While LTE is similar to other radio systems, challenges arise at the band edges where the transmitted signal must comply with rigorous power leakage requirements. With LTE supporting channel bandwidths up to 20 MHz and many bands being too narrow to support more than a few channels, a large proportion of the LTE channels will be at the edge of the band.

    Controlling transmitter performance at the edge of the band requires a design with filtering to attenuate out-of-band emissions without affecting in-channel performance. Factors such as cost, power efficien-cies, physical size, and location in the transmitter block diagram are also important considerations. Ultimately, the LTE transmitter must meet all speci-fied limits for unwanted emissions, including limits on the amount of power that leaks into adjacent channels (ACLR).

  • www.eecatalog.com/atca 27

    EECatalog SPECIAL FEATURE

    Different limits and measurement filters are specified for E-UTRA and UTRA adjacent channels, and are provided for both paired spectrum (FDD) and unpaired spectrum (TDD) operation. The E-UTRA channels are measured using a square measurement filter, while UTRA channels are measured using an RRC filter with a roll-off factor of 0.22 and a bandwidth equal to the chip rate.

    Addressing the ACLR Measurement ChallengeGiven LTEs complexity and the complexity of the transmitter configurations that can be used to test transmitter performance, standards-compliant spectrum measurements like ACLR can be quite daunting. Luckily, sophisticated signal evaluation tools are now available to enable engineers to make these LTE measurements quickly and accurately. Power measurements, including ACLR, are generally made using a spectrum or signal analyzer. The required test signals are built using a signal generator.

    To help better illustrate how these instruments can be used, consider the case where, according to the specifica-tions, the carrier frequency must be set within a frequency band supported by the base station-under-test and ACLR must be measured for frequency offsets on both sides of the channel frequency, as specified for paired spectrum TDD operation or unpaired spectrum FDD operation. The test is first performed using a transmitted E-TM1.1 signal, in which all of the PDSCH resource blocks have the same power. It is then performed using an E-TM1.2 signal employing power boosting and deboosting. The E-TM1.2 configuration is useful because it simulates multiple users whose devices are operating at different power levels. This scenario results in a higher crest factor, which makes it more difficult to amplify the signal without creating addi-tional, unwanted spectral content (e.g., ACLR).

    In this example, Agilents Signal Studio for LTE is con-nected to an Agilent MXG signal generator to generate a standards-compliant E-TM1.2 test signal with frequency

    Figure 1: The resource allocation blocks (at bottom) for the E-TM1.2 test signal are shown here. The Y-axis indicates frequency or resource blocks, the X-axis indicates slots or time, the white area represents Channel 1, and the pink area represents Channel 2. The other colors shown represent the synchronization channels, reference signals, etc.

  • Engineers Guide to ATCA & MicroTCA Technologies 201228

    EECatalog SPECIAL FEATURE

    set to 2.11 GHz. The output signal amplitude an impor-tant consideration in determining ACLR performance is set to -10 dBm. A 5-MHz channel bandwidth is selected from the range that extends from 1.4 to 20 MHz.

    Figure 1 shows the eNB setup with Transport Channel selected. A graph of the resource allocation blocks for the test signal appears at the bottom. Channels 1 and 2 are the downlink shared channels-of-interest in the measurement.

    Channel 1 has an output power level of -4.3 dB. Conse-quently, its channel power has been deboosted. The output power of Channel 2 has been boosted and is set at 3 dB. A complex array of power boosting and deboosting options can be set for the different resource blocks from the resource block allocation graph. The resulting composite signal has a higher peak-to-average ratio than a single channel in which all blocks are at the same power level. Amplifying a boosted signal such as this can be difficult. Without sufficient back-off in the power amplifier, clip-ping may result.

    The test signal can then be generated using Signal Studio software running on an Agilent X-Series signal analyzer. Once created, the waveform is downloaded to the signal generator via LAN or GPIB. The RF output of the signal generator is connected to the RF input of the signal ana-lyzer, where the ACLR performance is measured using swept spectrum analysis. In this example, the signal ana-lyzer is in LTE mode with a center frequency of 2.11 GHz and the ACP measurement selected. A quick, one-button ACLR measurement can then be made according to the LTE standard by recalling the appropriate parameters and test limits from a list of available choices (e.g., options for paired or unpaired spectrum and type of carrier in the adjacent and alternate channels) in the LTE application.

    For FDD operation, LTE defines two methods of making ACLR measurements: the case in which E-UTRA (LTE) is used at the center and offset frequencies, and the case where LTE is at the center frequency and UTRA (W-CDMA) at the adjacent and alternate offsets. Figure 2 depicts the ACLR measurement result for E-UTRA adjacent and alternate

    Figure 2: Shown here is an ACLR measurement result using Agilents X-Series analyzer. The first offset (A) is at 5 MHz, with an integration band-width of 4.515 MHz. The second offset (B) is at 10 MHz with the same integration bandwidth.

  • www.eecatalog.com/atca 29

    EECatalog SPECIAL FEATURE

    offset channels. For this measurement, a 5-MHz carrier was selected; however, the measurement noise bandwidth is 4.515 MHz, because the downlink contains 301 subcarriers.

    Optimizing Analyzer SettingsWhile the one-button measurement previously detailed pro-vides a very quick, usable ACLR measurement according to the LTE standard, signal analyzer settings can be optimized to achieve even better performance. Four ways to optimize the analyzer and further improve the measurement results are:

    t 0QUJNJ[FUIFTJHOBMMFWFMBUUIFNJYFS0QUJNJ[JOHUIFsignal level at the input mixer requires the attenuator to be adjusted for minimal clipping. Some analyzers automatically select an attenuation value based on the current measured signal value. This provides a good starting point for achieving optimal measure-ment range. Other analyzers, like the X-Series signal analyzers, have both electronic and mechanical attenu-ators, and use the two in combination to optimize performance. In such cases, the mechanical attenuator can be adjusted slightly to get even better results, about 1 or 2 dB.

    t $IBOHF UIF SFTPMVUJPOCBOEXJEUIMUFS3FTPMVUJPOCBOE-width can be lowered by pressing the analyzers bandwidth filter key. Note that sweep time increases as the resolution bandwidth is lowered. The slower sweep time reduces vari-ance in the measurement and measurement speed.

    t 5VSOPOOPJTFDPSSFDUJPOo0ODFOPJTFDPSSFDUJPOJTUVSOFEon, the analyzer takes one sweep to measure its internal noise floor at the current center frequency, and in subsequent

    sweeps subtracts that internal noise floor from the measurement result. This technique substantially improves ACLR, in some cases, by up to 5 dB.

    t &NQMPZ B EJFSFOU NFBTVSFNFOU NFUIPEPMPHZ Instead of using the default measurement method (integration bandwidth or IBW), the filtered IBW method, which uses a sharp, steep cutoff filter, can be employed. While this technique does degrade the absolute accuracy of the power measurement result, it does not degrade the ACLR result.

    Using these techniques in combination, a signal analyzer can automatically optimize the ACLR measurement for performance versus speed via the analyzers embedded LTE application. For a typical ACLR measurement, the results may be improved by up to 10 dB or more (Figure 3). For measurement scenarios requiring the maximum performance, the analyzer settings can be fur-ther adjusted.

    ConclusionStandards-compliant spectrum measurements

    such as ACLR are invaluable for RF