amd64 architecture programmer’s manual, volume 4, …culikzde/amd/26568.pdf · amd64 technology...

456
AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication No. Revision Date 26568 3.07 December 2005 Advanced Micro Devices

Upload: vukhanh

Post on 27-Jun-2018

266 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

AMD64 Technology

AMD64 ArchitectureProgrammer’s Manual

Volume 4:128-Bit Media Instructions

Publication No. Revision Date

26568 3.07 December 2005

Advanced Micro Devices

Page 2: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

TrademarksAMD, the AMD arrow logo, AMD Athlon, AMD Duron, and combinations thereof, and 3DNow! are trademarks, and AMD-K6 is a regis-tered trademark of Advanced Micro Devices, Inc.MMX is a trademark and Pentium is a registered trademark of Intel Corporation. Windows NT is a registered trademark of Microsoft Corporation.Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

© 2002, 2003, 2004, 2005 Advanced Micro Devices, Inc. All rights reserved.The contents of this document are provided in connection with Advanced Micro Devices, Inc.(“AMD”) products. AMD makes no representations or warranties with respect to the accuracy orcompleteness of the contents of this publication and reserves the right to make changes tospecifications and product descriptions at any time without notice. No license, whether express,implied, arising by estoppel or otherwise, to any intellectual property rights is granted by thispublication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumesno liability whatsoever, and disclaims any express or implied warranty, relating to its productsincluding, but not limited to, the implied warranty of merchantability, fitness for a particular pur-pose, or infringement of any intellectual property right.

AMD’s products are not designed, intended, authorized or warranted for use as components insystems intended for surgical implant into the body, or in other applications intended to supportor sustain life, or in any other application in which the failure of AMD’s product could create asituation where personal injury, death, or severe property or environmental damage may occur.AMD reserves the right to discontinue or make changes to its products at any time withoutnotice.

Page 3: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Contents iii

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Contents

Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAbout This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAudience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvContact Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvOrganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvDefinitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviRelated Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxviii

1 128-Bit Media Instruction Reference. . . . . . . . . . . . . . . . . . . . . 1

ADDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3ADDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6ADDSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9ADDSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12ADDSUBPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15ADDSUBPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18ANDNPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21ANDNPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23ANDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25ANDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27CMPPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29CMPPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33CMPSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36CMPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39COMISD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42COMISS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45CVTDQ2PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48CVTDQ2PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50CVTPD2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52CVTPD2PI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54CVTPD2PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57CVTPI2PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60CVTPI2PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62CVTPS2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64CVTPS2PD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67CVTPS2PI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69CVTSD2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72CVTSD2SS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75CVTSI2SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Page 4: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

iv Contents

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTSI2SS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81CVTSS2SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84CVTSS2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86CVTTPD2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89CVTTPD2PI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91CVTTPS2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94CVTTPS2PI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97CVTTSD2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100CVTTSS2SI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103DIVPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106DIVPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109DIVSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112DIVSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115FXRSTOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118FXSAVE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120HADDPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122HADDPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124HSUBPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127HSUBPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130LDDQU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133LDMXCSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135MASKMOVDQU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137MAXPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140MAXPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142MAXSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144MAXSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146MINPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148MINPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150MINSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152MINSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154MOVAPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156MOVAPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158MOVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161MOVDDUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164MOVDQ2Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166MOVDQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168MOVDQU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170MOVHLPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172MOVHPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174MOVHPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176MOVLHPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178MOVLPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180MOVLPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182MOVMSKPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184MOVMSKPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186MOVNTDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188MOVNTPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

Page 5: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Contents v

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MOVNTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192MOVQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194MOVQ2DQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196MOVSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198MOVSHDUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201MOVSLDUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203MOVSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205MOVUPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208MOVUPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211MULPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214MULPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217MULSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220MULSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223ORPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226ORPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228PACKSSDW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230PACKSSWB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232PACKUSWB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234PADDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236PADDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238PADDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240PADDSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242PADDSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244PADDUSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246PADDUSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248PADDW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250PAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252PANDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254PAVGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256PAVGW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258PCMPEQB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260PCMPEQD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262PCMPEQW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264PCMPGTB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266PCMPGTD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268PCMPGTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270PEXTRW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272PINSRW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274PMADDWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277PMAXSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279PMAXUB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281PMINSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283PMINUB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285PMOVMSKB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287PMULHUW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289PMULHW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291PMULLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

Page 6: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

vi Contents

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PMULUDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295POR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297PSADBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299PSHUFD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302PSHUFHW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305PSHUFLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308PSLLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311PSLLDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313PSLLQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315PSLLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317PSRAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320PSRAW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323PSRLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326PSRLDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329PSRLQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331PSRLW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334PSUBB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337PSUBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339PSUBQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341PSUBSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343PSUBSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345PSUBUSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347PSUBUSW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349PSUBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351PUNPCKHBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353PUNPCKHDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355PUNPCKHQDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357PUNPCKHWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359PUNPCKLBW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361PUNPCKLDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363PUNPCKLQDQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365PUNPCKLWD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367PXOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369RCPPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371RCPSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373RSQRTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375RSQRTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377SHUFPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379SHUFPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381SQRTPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384SQRTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386SQRTSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388SQRTSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390STMXCSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392SUBPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393SUBPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396SUBSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

Page 7: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Contents vii

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

SUBSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402UCOMISD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405UCOMISS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408UNPCKHPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411UNPCKHPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413UNPCKLPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415UNPCKLPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417XORPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419XORPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423

Page 8: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

viii Contents

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Page 9: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Figures ix

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Figures

Figure 1-1. Diagram Conventions for 128-Bit Media Instructions . . . . . . . . 1

Page 10: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

x Figures

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Page 11: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Tables xi

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Tables

Table 1-1. Immediate Operand Values for Compare Operations . . . . . . . 30

Table 1-2. Immediate-Byte Operand Encoding for 128-Bit PEXTRW. . . 273

Table 1-3. Immediate-Byte Operand Encoding for 128-Bit PINSRW . . . 275

Table 1-4. Immediate-Byte Operand Encoding for PSHUFD . . . . . . . . . 303

Table 1-5. Immediate-Byte Operand Encoding for PSHUFHW. . . . . . . . 306

Table 1-6. Immediate-Byte Operand Encoding for PSHUFLW . . . . . . . . 309

Table 1-7. Immediate-Byte Operand Encoding for SHUFPD . . . . . . . . . 380

Table 1-8. Immediate-Byte Operand Encoding for SHUFPS . . . . . . . . . . 382

Page 12: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

xii Tables

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Page 13: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Revision History xiii

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Revision History

Date Revision Description

December 2005 3.07 Made minor editorial and formatting changes.

January 2005 3.06 Added documentation on SSE3 instructions. Corrected numerous minor factual errors and typos.

September 2003 3.05 Made numerous small factual corrections.

April 2003 3.04 Made minor corrections.

Page 14: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

xiv Revision History

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Page 15: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Preface xv

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Preface

About This Book

This book is part of a multivolume work entitled the AMD64Architecture Programmer’s Manual. This table lists each volumeand its order number.

Audience

This volume (Volume 4) is intended for all programmers writingapplication or system software for processors that implementthe AMD64 architecture.

Contact Information

To submit questions or comments concerning this document,contact our technical documentat ion s taf f [email protected].

Organization

Volumes 3, 4, and 5 describe the AMD64 architecture’sinstruction set in detail. Together, they cover each instruction’smnemonic syntax, opcodes, functions, affected flags, andpossible exceptions.

The AMD64 instruction set is divided into five subsets:

General-purpose instructions

System instructions

128-bit media instructions

Title Order No.

Volume 1, Application Programming 24592

Volume 2, System Programming 24593

Volume 3, General-Purpose and System Instructions 24594

Volume 4, 128-Bit Media Instructions 26568

Volume 5, 64-Bit Media and x87 Floating-Point Instructions 26569

Page 16: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

xvi Preface

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

64-bit media instructions

x87 floating-point instructions

Several instructions belong to—and are described identicallyin—multiple instruction subsets.

This volume describes the 128-bit media instructions. The indexat the end cross-references topics within this volume. For othertopics relating to the AMD64 architecture, and for informationon instructions in other subsets, see the tables of contents andindexes of the other volumes.

Definitions

Many of the following definitions assume an in-depthknowledge of the legacy x86 architecture. See “RelatedDocuments” on page xxviii for descriptions of the legacy x86architecture.

Terms and Notation In addition to the notation described below, “Opcode-SyntaxNotation” in Volume 3 describes notation relating specificallyto opcodes.

1011bA binary value—in this example, a 4-bit value.

F0EAhA hexadecimal value—in this example a 2-byte value.

[1,2)A range that includes the left-most value (in this case, 1) butexcludes the right-most value (in this case, 2).

7–4A bit range, from bit 7 to 4, inclusive. The high-order bit isshown first.

128-bit media instructionsInstructions that use the 128-bit XMM registers. These are acombination of the SSE, SSE2 and SSE3 instruction sets.

64-bit media instructionsInstructions that use the 64-bit MMX registers. These areprimarily a combination of MMX™ and 3DNow!™

Page 17: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Preface xvii

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

instruction sets, with some additional instructions from theSSE and SSE2 instruction sets.

16-bit modeLegacy mode or compatibility mode in which a 16-bitaddress size is active. See legacy mode and compatibilitymode.

32-bit modeLegacy mode or compatibility mode in which a 32-bitaddress size is active. See legacy mode and compatibilitymode.

64-bit modeA submode of long mode. In 64-bit mode, the default addresssize is 64 bits and new features, such as register extensions,are supported for system and application software.

#GP(0)Notation indicating a general-protection exception (#GP)with error code of 0.

absoluteSaid of a displacement that references the base of a codesegment rather than an instruction pointer. Contrast withrelative.

ASIDAddress space identifier.

biased exponentThe sum of a floating-point value’s exponent and a constantbias for a particular floating-point data type. The bias makesthe range of the biased exponent always positive, whichallows reciprocation without overflow.

byteEight bits.

clearTo write a bit value of 0. Compare set.

compatibility modeA submode of long mode. In compatibility mode, the defaultaddress size is 32 bits, and legacy 16-bit and 32-bit

Page 18: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

xviii Preface

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

applications run without modification.

commitTo irreversibly write, in program order, an instruction’sresult to software-visible storage, such as a register(including flags), the data cache, an internal write buffer, ormemory.

CPLCurrent privilege level.

CR0–CR4A register range, from register CR0 through CR4, inclusive,with the low-order register first.

CR0.PE = 1Notation indicating that the PE bit of the CR0 register has avalue of 1.

directReferencing a memory location whose address is included inthe instruction’s syntax as an immediate operand. Theaddress may be an absolute or relative address. Compareindirect.

dirty dataData held in the processor’s caches or internal buffers that ismore recent than the copy held in main memory.

displacementA signed value that is added to the base of a segment(absolute addressing) or an instruction pointer (relativeaddressing). Same as offset.

doublewordTwo words, or four bytes, or 32 bits.

double quadwordEight words, or 16 bytes, or 128 bits. Also called octword.

DS:rSIThe contents of a memory location whose segment address isin the DS register and whose offset relative to that segmentis in the rSI register.

Page 19: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Preface xix

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

EFER.LME = 0Notation indicating that the LME bit of the EFER registerhas a value of 0.

effective address sizeThe address size for the current instruction after accountingfor the default address size and any address-size overrideprefix.

effective operand sizeThe operand size for the current instruction afteraccounting for the default operand size and any operand-size override prefix.

elementSee vector.

exceptionAn abnormal condition that occurs as the result of executingan instruction. The processor’s response to an exceptiondepends on the type of the exception. For all exceptionsexcept 128-bit media SIMD floating-point exceptions andx87 floating-point exceptions, control is transferred to thehandler (or service routine) for that exception, as defined bythe exception’s vector. For floating-point exceptions definedby the IEEE 754 standard, there are both masked andunmasked responses. When unmasked, the exceptionhandler is called, and when masked, a default response isprovided instead of calling the handler.

FF /0Notation indicating that FF is the first byte of an opcode,and a subopcode in the ModR/M byte has a value of 0.

flushAn often ambiguous term meaning (1) writeback, ifmodified, and invalidate, as in “flush the cache line,” or (2)invalidate, as in “flush the pipeline,” or (3) change a value,as in “flush to zero.”

GDTGlobal descriptor table.

GIFGlobal interrupt flag.

Page 20: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

xx Preface

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

IDTInterrupt descriptor table.

IGNIgnore. Field is ignored.

indirectReferencing a memory location whose address is in aregister or other memory location. The address may be anabsolute or relative address. Compare direct.

IRBThe virtual-8086 mode interrupt-redirection bitmap.

ISTThe long-mode interrupt-stack table.

IVTThe real-address mode interrupt-vector table.

LDTLocal descriptor table.

legacy x86The legacy x86 architecture. See “Related Documents” onpage xxviii for descriptions of the legacy x86 architecture.

legacy modeAn operating mode of the AMD64 architecture in whichexisting 16-bit and 32-bit applications and operating systemsrun without modification. A processor implementation ofthe AMD64 architecture can run in either long mode or legacymode. Legacy mode has three submodes, real mode, protectedmode, and virtual-8086 mode.

long modeAn operating mode unique to the AMD64 architecture. Aprocessor implementation of the AMD64 architecture canrun in either long mode or legacy mode. Long mode has twosubmodes, 64-bit mode and compatibility mode.

lsbLeast-significant bit.

Page 21: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Preface xxi

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

LSBLeast-significant byte.

main memoryPhysical memory, such as RAM and ROM (but not cachememory) that is installed in a particular computer system.

mask(1) A control bit that prevents the occurrence of a floating-point exception from invoking an exception-handlingroutine. (2) A field of bits used for a control purpose.

MBZMust be zero. If software attempts to set an MBZ bit to 1, ageneral-protection exception (#GP) occurs.

memoryUnless otherwise specified, main memory.

ModRMA byte following an instruction opcode that specifiesaddress calculation based on mode (Mod), register (R), andmemory (M) variables.

moffsetA 16, 32, or 64-bit offset that specifies a memory operanddirectly, without using a ModRM or SIB byte.

msbMost-significant bit.

MSBMost-significant byte.

multimedia instructionsA combination of 128-bit media instructions and 64-bit mediainstructions.

octwordSame as double quadword.

offsetSame as displacement.

Page 22: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

xxii Preface

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

overflowThe condition in which a floating-point number is larger inmagnitude than the largest, finite, positive or negativenumber that can be represented in the data-type formatbeing used.

packedSee vector.

PAEPhysical-address extensions.

physical memoryActual memory, consisting of main memory and cache.

probeA check for an address in a processor’s caches or internalbuffers. External probes originate outside the processor, andinternal probes originate within the processor.

protected modeA submode of legacy mode.

quadwordFour words, or eight bytes, or 64 bits.

RAZRead as zero (0), regardless of what is written.

real-address modeSee real mode.

real modeA short name for real-address mode, a submode of legacymode.

relativeReferencing with a displacement (also called offset) from aninstruction pointer rather than the base of a code segment.Contrast with absolute.

reservedFields marked as reserved may be used at some future time.

Page 23: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Preface xxiii

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

To preserve compatibility with future processors, reservedfields require special handling when read or written bysoftware.Reserved fields may be further qualified as MBZ, RAZ, SBZor IGN (see definitions).Software must not depend on the state of a reserved field,nor upon the ability of such fields to return to a previouslywritten state.If a reserved field is not marked with one of the abovequalifiers, software must not change the state of that field; itmust reload that field with the same values returned from aprior read.

REXAn instruction prefix that specifies a 64-bit operand size andprovides access to additional registers.

RIP-relative addressingAddressing relative to the 64-bit RIP instruction pointer.

setTo write a bit value of 1. Compare clear.

SIBA byte following an instruction opcode that specifiesaddress calculation based on scale (S), index (I), and base(B).

SIMDSingle instruction, multiple data. See vector.

SSEStreaming SIMD extensions instruction set. See 128-bitmedia instructions and 64-bit media instructions.

SSE2Extensions to the SSE instruction set. See 128-bit mediainstructions and 64-bit media instructions.

SSE3Further extensions to the SSE instruction set. See 128-bitmedia instructions.

Page 24: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

xxiv Preface

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

sticky bitA bit that is set or cleared by hardware and that remains inthat state until explicitly changed by software.

TOPThe x87 top-of-stack pointer.

TSSTask-state segment.

underflowThe condition in which a floating-point number is smaller inmagnitude than the smallest nonzero, positive or negativenumber that can be represented in the data-type formatbeing used.

vector(1) A set of integer or floating-point values, called elements,that are packed into a single operand. Most of the 128-bitand 64-bit media instructions use vectors as operands.Vectors are also called packed or SIMD (single-instructionmultiple-data) operands. (2) An index into an interrupt descriptor table (IDT), used toaccess exception handlers. Compare exception.

virtual-8086 modeA submode of legacy mode.

VMCBVirtual machine control block.

VMMVirtual machine monitor.

wordTwo bytes, or 16 bits.

x86See legacy x86.

Registers In the following list of registers, the names are used to refereither to a given register or to the contents of that register:

Page 25: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Preface xxv

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

AH–DHThe high 8-bit AH, BH, CH, and DH registers. CompareAL–DL.

AL–DLThe low 8-bit AL, BL, CL, and DL registers. Compare AH–DH.

AL–r15BThe low 8-bit AL, BL, CL, DL, SIL, DIL, BPL, SPL, andR8B–R15B registers, available in 64-bit mode.

BPBase pointer register.

CRnControl register number n.

CSCode segment register.

eAX–eSPThe 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers or the32-bit EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESPregisters. Compare rAX–rSP.

EFERExtended features enable register.

eFLAGS16-bit or 32-bit flags register. Compare rFLAGS.

EFLAGS32-bit (extended) flags register.

eIP16-bit or 32-bit instruction-pointer register. Compare rIP.

EIP32-bit (extended) instruction-pointer register.

FLAGS16-bit flags register.

GDTRGlobal descriptor table register.

Page 26: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

xxvi Preface

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

GPRsGeneral-purpose registers. For the 16-bit data size, these areAX, BX, CX, DX, DI, SI, BP, and SP. For the 32-bit data size,these are EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP. Forthe 64-bit data size, these include RAX, RBX, RCX, RDX,RDI, RSI, RBP, RSP, and R8–R15.

IDTRInterrupt descriptor table register.

IP16-bit instruction-pointer register.

LDTRLocal descriptor table register.

MSRModel-specific register.

r8–r15The 8-bit R8B–R15B registers, or the 16-bit R8W–R15Wregisters, or the 32-bit R8D–R15D registers, or the 64-bitR8–R15 registers.

rAX–rSPThe 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers, orthe 32-bit EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESPregisters, or the 64-bit RAX, RBX, RCX, RDX, RDI, RSI,RBP, and RSP registers. Replace the placeholder r withnothing for 16-bit size, “E” for 32-bit size, or “R” for 64-bitsize.

RAX64-bit version of the EAX register.

RBP64-bit version of the EBP register.

RBX64-bit version of the EBX register.

RCX64-bit version of the ECX register.

Page 27: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Preface xxvii

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

RDI64-bit version of the EDI register.

RDX64-bit version of the EDX register.

rFLAGS16-bit, 32-bit, or 64-bit flags register. Compare RFLAGS.

RFLAGS64-bit flags register. Compare rFLAGS.

rIP16-bit, 32-bit, or 64-bit instruction-pointer register. CompareRIP.

RIP64-bit instruction-pointer register.

RSI64-bit version of the ESI register.

RSP64-bit version of the ESP register.

SPStack pointer register.

SSStack segment register.

TPRTask priority register (CR8), a new register introduced inthe AMD64 architecture to speed interrupt management.

TRTask register.

Endian Order The x86 and AMD64 architectures address memory using little-endian byte-ordering. Multibyte values are stored with theirleast-significant byte at the lowest byte address, and they areillustrated with their least significant byte at the right side.Strings are illustrated in reverse order, because the addresses oftheir bytes increase from right to left.

Page 28: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

xxviii Preface

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related DocumentsPeter Abel, IBM PC Assembly Language and Programming,Prentice-Hall, Englewood Cliffs, NJ, 1995.

Rakesh Agarwal, 80x86 Architecture & Programming: VolumeII, Prentice-Hall, Englewood Cliffs, NJ, 1991.

AMD, AMD-K6™ MMX™ Enhanced Processor MultimediaTechnology, Sunnyvale, CA, 2000.

AMD, 3DNow!™ Technology Manual, Sunnyvale, CA, 2000.

AMD, AMD Extensions to the 3DNow!™ and MMX™Instruction Sets, Sunnyvale, CA, 2000.

Don Anderson and Tom Shanley, Pentium Processor SystemArchitecture, Addison-Wesley, New York, 1995.

Nabajyoti Barkakati and Randall Hyde, Microsoft MacroAssembler Bible, Sams, Carmel, Indiana, 1992.

Barry B. Brey, 8086/8088, 80286, 80386, and 80486 AssemblyLanguage Programming, Macmillan Publishing Co., NewYork, 1994.

Barry B. Brey, Programming the 80286, 80386, 80486, andPentium Based Personal Computer, Prentice-Hall, EnglewoodCliffs, NJ, 1995.

Ralf Brown and Jim Kyle, PC Interrupts, Addison-Wesley,New York, 1994.

Penn Brumm and Don Brumm, 80386/80486 AssemblyLanguage Programming, Windcrest McGraw-Hill, 1993.

Geoff Chappell, DOS Internals, Addison-Wesley, New York,1994.

Chips and Technologies, Inc. Super386 DX Programmer’sReference Manual, Chips and Technologies, Inc., San Jose,1992.

John Crawford and Patrick Gelsinger, Programming the80386, Sybex, San Francisco, 1987.

Cyrix Corporation, 5x86 Processor BIOS Writer's Guide, CyrixCorporation, Richardson, TX, 1995.

Cyrix Corporation, M1 Processor Data Book, CyrixCorporation, Richardson, TX, 1996.

Cyrix Corporation, MX Processor MMX Extension OpcodeTable, Cyrix Corporation, Richardson, TX, 1996.

Cyrix Corporation, MX Processor Data Book, CyrixCorporation, Richardson, TX, 1997.

Page 29: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Preface xxix

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Ray Duncan, Extending DOS: A Programmer's Guide toProtected-Mode DOS, Addison Wesley, NY, 1991.

William B. Giles, Assembly Language Programming for theIntel 80xxx Family, Macmillan, New York, 1991.

Frank van Gilluwe, The Undocumented PC, Addison-Wesley,New York, 1994.

John L. Hennessy and David A. Patterson, ComputerArchitecture, Morgan Kaufmann Publishers, San Mateo, CA,1996.

Thom Hogan, The Programmer’s PC Sourcebook, MicrosoftPress, Redmond, WA, 1991.

Hal Katircioglu, Inside the 486, Pentium, and Pentium Pro,Peer-to-Peer Communications, Menlo Park, CA, 1997.

IBM Corporation, 486SLC Microprocessor Data Sheet, IBMCorporation, Essex Junction, VT, 1993.

IBM Corporation, 486SLC2 Microprocessor Data Sheet, IBMCorporation, Essex Junction, VT, 1993.

IBM Corporation, 80486DX2 Processor Floating PointInstructions, IBM Corporation, Essex Junction, VT, 1995.

IBM Corporation, 80486DX2 Processor BIOS Writer's Guide,IBM Corporation, Essex Junction, VT, 1995.

IBM Corporation, Blue Lightening 486DX2 Data Book, IBMCorporation, Essex Junction, VT, 1994.

Institute of Electrical and Electronics Engineers, IEEEStandard for Binary Floating-Point Arithmetic, ANSI/IEEEStd 754-1985.

Institute of Electrical and Electronics Engineers, IEEEStandard for Radix-Independent Floating-Point Arithmetic,ANSI/IEEE Std 854-1987.

Muhammad Ali Mazidi and Janice Gillispie Mazidi, 80X86IBM PC and Compatible Computers, Prentice-Hall, EnglewoodCliffs, NJ, 1997.

Hans-Peter Messmer, The Indispensable Pentium Book,Addison-Wesley, New York, 1995.

Karen Miller, An Assembly Language Introduction toComputer Architecture: Using the Intel Pentium, OxfordUniversity Press, New York, 1999.

Stephen Morse, Eric Isaacson, and Douglas Albert, The80386/387 Architecture, John Wiley & Sons, New York, 1987.

Page 30: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

xxx Preface

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

NexGen Inc., Nx586 Processor Data Book, NexGen Inc.,Milpitas, CA, 1993.

NexGen Inc., Nx686 Processor Data Book, NexGen Inc.,Milpitas, CA, 1994.

Bipin Patwardhan, Introduction to the Streaming SIMDExtensions in the Pentium III, www.x86.org/articles/sse_pt1/simd1.htm, June, 2000.

Peter Norton, Peter Aitken, and Richard Wilton, PCProgrammer’s Bible, Microsoft Press, Redmond, WA, 1993.

PharLap 386|ASM Reference Manual, Pharlap, CambridgeMA, 1993.

PharLap TNT DOS-Extender Reference Manual, Pharlap,Cambridge MA, 1995.

Sen-Cuo Ro and Sheau-Chuen Her, i386/i486 AdvancedProgramming, Van Nostrand Reinhold, New York, 1993.

Jeffrey P. Royer, Introduction to Protected ModeProgramming, course materials for an onsite class, 1992.

Tom Shanley, Protected Mode System Architecture, AddisonWesley, NY, 1996.

SGS-Thomson Corporation, 80486DX Processor SMMProgramming Manual, SGS-Thomson Corporation, 1995.

Walter A. Triebel, The 80386DX Microprocessor, Prentice-Hall, Englewood Cliffs, NJ, 1992.

John Wharton, The Complete x86, MicroDesign Resources,Sebastopol, California, 1994.

Web sites and newsgroups:

- www.amd.com

- news.comp.arch

- news.comp.lang.asm.x86

- news.intel.microprocessors

- news.microsoft

Page 31: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

1

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

1 128-Bit Media Instruction Reference

This chapter describes the function, mnemonic syntax, opcodes, affected flags of the128-bit media instructions and the possible exceptions they generate. Theseinstructions load, store, or operate on data located in 128-bit XMM registers. Most ofthe instructions operate in parallel on sets of packed elements called vectors, althougha few operate on scalars. These instructions define both integer and floating-pointoperations. They include the SSE, SSE2 and SSE3 instructions.

Each instruction that performs a vector (packed) operation is illustrated with adiagram. Figure 1-1 on page 1 shows the conventions used in these diagrams. Theparticular diagram shows the PSLLW (packed shift left logical words) instruction.

Figure 1-1. Diagram Conventions for 128-Bit Media Instructions

Gray areas in diagrams indicate unmodified operand bits.

shift left

xmm1 xmm2/mem128

shift left

513-323.eps

. ... .. . ... ..

. ... ..127 63 0649596111112 7980 4748 15163132 127 63 0649596111112 7980 4748 15163132

Ellipses indicate that the operationis repeated for each element of thesource vectors. In this case, there are8 elements in each source vector, sothe operation is performed 8 times,in parallel.

Arrowheads coming from a source operandindicate that the source operand providesa control function. In this case, the secondsource operand specifies the number of bits to shift, and the first source operand specifiesthe data to be shifted.

Arrowheads going to a source operandindicate the writing of the result. In thiscase, the result is written to the first source operand, which is also the destination operand.

First Source Operand(and Destination Operand) Second Source Operand

Operation.In this case,a bitwiseshift-left.

File name ofthis figure (for documentation control)

Page 32: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

2

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

The 128-bit media instructions are useful in high-performance applications thatoperate on blocks of data. Because each instruction can independently andsimultaneously perform a single operation on multiple elements of a vector, theinstructions are classified as single-instruction, multiple-data (SIMD) instructions. Afew 128-bit media instructions convert operands in XMM registers to operands inGPR, MMX™, or x87 registers (or vice versa), or save or restore XMM state.

Hardware support for a specific 128-bit media instruction depends on the presence ofat least one of the following CPUID functions:

FXSAVE and FXRSTOR, indicated by EDX bit 24 returned by CPUID standardfunction 1 and extended function 8000_0001h.

SSE, indicated by EDX bit 25 returned by CPUID standard function 1.

SSE2, indicated by EDX bit 26 returned by CPUID standard function 1.

SSE3, indicated by ECX bit 0 returned by CPUID standard function 1.

The 128-bit media instructions can be used in legacy mode or long mode. Their use inlong mode is available if the following CPUID function is set:

Long Mode, indicated by EDX bit 29 returned by CPUID extended function8000_0001h.

Compilation of 128-bit media programs for execution in 64-bit mode offers fourprimary advantages: access to the eight extended XMM registers (for a register setconsisting of XMM0–XMM15), access to the eight extended, 64-bit general-purposeregisters (for a register set consisting of GPR0–GPR15), access to the 64-bit virtualaddress space, and access to the RIP-relative addressing mode.

For further information, see:

“128-Bit Media and Scientific Programming” in Volume 1.

“Summary of Registers and Data Types” in Volume 3.

“Notation” in Volume 3.

“Instruction Prefixes” in Volume 3.

Page 33: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ADDPD 3

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

ADDPD Add Packed Double-Precision Floating-Point

Adds each packed double-precision floating-point value in the first source operand tothe corresponding packed double-precision floating-point value in the second sourceoperand and writes the result of each addition in the corresponding quadword of thedestination (first source). The first source/destination operand is an XMM register.The second source operand is another XMM register or 128-bit memory location.

The ADDPD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ADDPS, ADDSD, ADDSS

rFLAGS Affected

None

Mnemonic Opcode Description

ADDPD xmm1, xmm2/mem128 66 0F 58 /r Adds two packed double-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

addpd.eps

127 63 064 127 63 064

xmm1 xmm2/mem128

add

add

Page 34: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

4 ADDPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was added to –infinity.

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Page 35: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ADDPD 5

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 36: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

6 ADDPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

ADDPS Add Packed Single-Precision Floating-Point

Adds each packed single-precision floating-point value in the first source operand tothe corresponding packed single-precision floating-point value in the second sourceoperand and writes the result of each addition in the corresponding quadword of thedestination (first source). The first source/destination operand is an XMM register.The second source operand is another XMM register or 128-bit memory location.

The ADDPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ADDPD, ADDSD, ADDSS

rFLAGS Affected

None

Mnemonic Opcode Description

ADDPS xmm1, xmm2/mem128 0F 58 /r Adds four packed single-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

addps.eps

xmm1 xmm2/mem128

add

add

add

add

127 63 0649596 3132127 63 0649596 3132

Page 37: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ADDPS 7

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was added to –infinity.

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Page 38: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

8 ADDPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 39: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ADDSD 9

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

ADDSD Add Scalar Double-Precision Floating-Point

Adds the double-precision floating-point value in the low-order quadword of the firstsource operand to the double-precision floating-point value in the low-order quadwordof the second source operand and writes the result in the low-order quadword of thedestination (first source). The high-order quadword of the destination is not modified.The first source/destination operand is an XMM register. The second source operand isanother XMM register or 64-bit memory location.

The ADDSD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ADDPD, ADDPS, ADDSS

rFLAGS Affected

None

Mnemonic Opcode Description

ADDSD xmm1, xmm2/mem64 F2 0F 58 /r Adds low-order double-precision floating-point values in an XMM register and another XMM register or 64-bit memory location and writes the result in the destination XMM register.

addsd.eps

xmm1 xmm2/mem64

add

127 63 064 127 63 064

Page 40: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

10 ADDSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was added to –infinity.

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Page 41: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ADDSD 11

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 42: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

12 ADDSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

ADDSS Add Scalar Single-Precision Floating-Point

Adds the single-precision floating-point value in the low-order doubleword of the firstsource operand to the single-precision floating-point value in the low-orderdoubleword of the second source operand and writes the result in the low-orderdoubleword of the destination (first source). The three high-order doublewords of thedestination are not modified. The first source/destination operand is an XMM register.The second source operand is another XMM register or 32-bit memory location.

The ADDSS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ADDPD, ADDPS, ADDSD

rFLAGS Affected

None

Mnemonic Opcode Description

ADDSS xmm1, xmm2/mem32 F3 0F 58 /r Adds low-order single-precision floating-point values in an XMM register and another XMM register or 32-bit memory location and writes the result in the destination XMM register.

addss.eps

xmm1 xmm2/mem32

add

127 31 032 127 31 032

Page 43: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ADDSS 13

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was added to –infinity.

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Page 44: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

14 ADDSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 45: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ADDSUBPD 15

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

ADDSUBPD Add and Subtract Packed Double-Precision

Adds the packed double-precision floating-point value in the high 64 bits of the sourceoperand to the double-precision floating-point value in the high 64 bits of thedestination operand and stores the sum in the high 64 bits of the destination operand;subtracts the packed double-precision floating-point value in the low 64 bits of thesource operand from the low 64 bits of the destination operand and stores thedifference in the low 64 bits of the destination operand.

The ADDSUBPD instruction is an SSE3 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ADDSUBPS

rFLAGS Affected

None

Mnemonic Opcode Description

ADDSUBPD xmm1, xmm2/mem128 66 0F D0 /r Adds the value in the upper 64 bits of the source operand to the value in the upper 64 bits of the destination operand and stores the result in the upper 64 bits of the destination operand; subtracts the value in the lower 64 bits of the source operand from the value in the lower 64 bits of the destination operand and stores the result in the lower 64 bits of the destination operand.

xmm1 xmm2/mem128

addsub

06364127 06364127

Page 46: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

16 ADDSUBPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE3 instructions are not supported, as indicated by ECX bit 0 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions below for details.

Page 47: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ADDSUBPD 17

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was added to –infinity.

X X X +infinity was subtracted from +infinity.

X X X –infinity was subtracted from –infinity.

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Overflow exception (OE)X X X A rounded result was too large to fit into the format of the

destination operand.

Underflow exception (UE)X X X A rounded result was too small to fit into the format of the

destination operand.

Precision exception (PE)X X X A result could not be represented exactly in the destination

format.

Exception RealVirtual8086 Protected Cause of Exception

Page 48: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

18 ADDSUBPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

ADDSUBPS Add and Subtract Packed Single-Precision

Subtracts the first and third single-precision floating-point values in the sourceoperand from the first and third single-precision floating-point values of thedestination operand and stores the result in the first and third values of thedestination operand. Simultaneously, the instruction adds the second and fourthsingle-precision floating-point values in the source operand to the second and fourthsingle-precision floating-point values in the destination operand and stores the resultin the second and fourth values of the destination operand.

The ADDSUBPS instruction is an SSE3 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ADDSUBPD

rFLAGS Affected

None

Mnemonic Opcode Description

ADDSUBPS xmm1, xmm2/mem128 F2 0F D0 /r Subtracts the first and third packed single-precision values in the source XMM register or 128-bit memory operand from the corresponding values in the destination XMM register and stores the resulting values in the corresponding positions in the destination register; simultaneously, adds the second and fourth packed single-precision values in the source XMM register or 128-bit memory operand to the corresponding values in the destination register and stores the result in the corresponding positions in the destination register.

subadd

subadd

xmm1 xmm2/mem128

06364127 319596 32 06364127 319596 32

Page 49: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ADDSUBPS 19

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE3 instructions are not supported, as indicated by ECX bit 0 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X Anull data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions below for details.

Page 50: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

20 ADDSUBPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was added to –infinity.

X X X +infinity was subtracted from +infinity.

X X X –infinity was subtracted from –infinity.

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Overflow exception (OE)X X X A rounded result was too large to fit into the format of the

destination operand.

Underflow exception (UE)X X X A rounded result was too small to fit into the format of the

destination operand.

Precision exception (PE)X X X A result could not be represented exactly in the destination

format.

Exception RealVirtual8086 Protected Cause of Exception

Page 51: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ANDNPD 21

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

ANDNPD Logical Bitwise AND NOTPacked Double-Precision Floating-Point

Performs a bitwise logical AND of the two packed double-precision floating-pointvalues in the second source operand and the one’s-complement of the correspondingtwo packed double-precision floating-point values in the first source operand andwrites the result in the destination (first source). The first source/destination operandis an XMM register. The second source operand is another XMM register or 128-bitmemory location.

The ADDNPD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ANDNPS, ANDPD, ANDPS, ORPD, ORPS, XORPD, XORPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

ANDNPD xmm1, xmm2/mem128 66 0F 55 /r Performs bitwise logical AND NOT of two packed double-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

andnpd.eps

127 63 064 127 63 064

xmm1 xmm2/mem128

AND

AND

invert

AND

AND

invert

Page 52: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

22 ANDNPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated byEDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 53: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ANDNPS 23

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

ANDNPS Logical Bitwise AND NOTPacked Single-Precision Floating-Point

Performs a bitwise logical AND of the four packed single-precision floating-pointvalues in the second source operand and the one’s-complement of the correspondingfour packed single-precision floating-point values in the first source operand andwrites the result in the destination (first source). The first source/destination operandis an XMM register. The second source operand is another XMM register or 128-bitmemory location.

The ADDNPS instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ANDNPD, ANDPD, ANDPS, ORPD, ORPS, XORPD, XORPS

rFLAGS Affected

None

Mnemonic Opcode Description

ANDNPS xmm1, xmm2/mem128 0F 55 /r Performs bitwise logical AND NOT of four packed single-precision floating-point values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.

andnps.eps

xmm1 xmm2/mem128

AND

AND

AND

AND

127 63 0649596 3132127 63 0649596 3132

invertinvert

invert

invert

Page 54: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

24 ANDNPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated byEDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 55: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ANDPD 25

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

ANDPD Logical Bitwise ANDPacked Double-Precision Floating-Point

Performs a bitwise logical AND of the two packed double-precision floating-pointvalues in the first source operand and the corresponding two packed double-precisionfloating-point values in the second source operand and writes the result in thedestination (first source). The first source/destination operand is an XMM register.The second source operand is another XMM register or 128-bit memory location.

The ANDPD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ANDNPD, ANDNPS, ANDPS, ORPD, ORPS, XORPD, XORPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

ANDPD xmm1, xmm2/mem128 66 0F 54 /r Performs bitwise logical AND of two packed double-precision floating-point values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.

andpd.eps

127 63 064 127 63 064

xmm1 xmm2/mem128

ANDAND

Page 56: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

26 ANDPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 57: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ANDPS 27

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

ANDPS Logical Bitwise ANDPacked Single-Precision Floating-Point

Performs a bitwise logical AND of the four packed single-precision floating-pointvalues in the first source operand and the corresponding four packed single-precisionfloating-point values in the second source operand and writes the result in thedestination (first source). The first source/destination operand is an XMM register.The second source operand is another XMM register or 128-bit memory location.

The ADDPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ANDNPD, ANDNPS, ANDPD, ORPD, ORPS, XORPD, XORPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

ANDPS xmm1, xmm2/mem128 0F 54 /r Performs bitwise logical AND of four packed single-precision floating-point values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.

andps.eps

xmm1 xmm2/mem128

AND

AND

AND

AND

127 63 0649596 3132127 63 0649596 3132

Page 58: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

28 ANDPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated byEDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 59: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CMPPD 29

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CMPPD Compare Packed Double-PrecisionFloating-Point

Compares each of the two packed double-precision floating-point values in the firstsource operand with the corresponding packed double-precision floating-point valuein the second source operand and writes the result of each comparison in thecorresponding 64 bits of the destination (first source). The type of comparison isspecified by the three low-order bits of the immediate-byte operand, as shown inTable 1-1. The result of each compare is a 64-bit value of all 1s (TRUE) or all 0s(FALSE). The first source/destination operand is an XMM register. The second sourceoperand is another XMM register or 128-bit memory location.

Signed compares return TRUE only if both operands are valid numbers, and thenumbers have the relation specified by the type of compare. "Ordered" comparereturns TRUE if both operands are valid numbers, or FALSE if either operand is aNaN. "Unordered" compare returns TRUE only if one or both operands are NaN, andFALSE otherwise.

QNaN operands generate an Invalid Operation Exception only if the compare typeisn't "Equal", "Unequal", "Orderered", or "Unordered". SNaN operands alwaysgenerate an Invalid Operation Exception (IE).

Some compare operations that are not directly supported by the immediate-byteencodings can be implemented by swapping the contents of the source anddestination operands and then executing the appropriate compare instruction usingthe swapped values. These additional compare operations are shown, together withthe directly supported compare operations, in Table 1-1. When swapping operands,the first source XMM register is overwritten by the result.

The CMPPD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

CMPPD xmm1, xmm2/mem128, imm8 66 0F C2 /r ib Compares two pairs of packed double-precision floating-point values in an XMM register and an XMM register or 128-bit memory location.

Page 60: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

30 CMPPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Table 1-1. Immediate Operand Values for Compare Operations

Immediate-Byte Value(bits 2–0) Compare Operation Result If NaN Operand

QNaN Operand Causes Invalid Operation Exception

000 Equal FALSE No

001 Less than FALSE Yes

Greater than (uses swapped operands)

FALSE Yes

010 Less than or equal FALSE Yes

Greater than or equal(uses swapped operands)

FALSE Yes

011 Unordered TRUE No

100 Not equal TRUE No

101 Not less than TRUE Yes

Not greater than (uses swapped operands)

TRUE Yes

110 Not less than or equal TRUE Yes

Not greater than or equal(uses swapped operands)

TRUE Yes

111 Ordered FALSE No

cmppd.eps

127 63 064 127 63 064

xmm1 xmm2/mem128

imm87 0

compare

all 1s or 0s all 1s or 0s

compare

Page 61: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CMPPD 31

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

CMPPS, CMPSD, CMPSS, COMISD, COMISS, UCOMISD, UCOMISS

rFLAGS Affected

None

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Page 62: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

32 CMPPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X A source operand was a QNaN value, and the comparison does not allow QNaN values (refer to Table 1-1 on page 30).

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Exception RealVirtual8086 Protected Cause of Exception

Page 63: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CMPPS 33

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CMPPS Compare Packed Single-PrecisionFloating-Point

Compares each of the four packed single-precision floating-point values in the firstsource operand with the corresponding packed single-precision floating-point value inthe second source operand and writes the result of each comparison in thecorresponding 32 bits of the destination (first source). The type of comparison isspecified by the three low-order bits of the immediate-byte operand, as shown inTable 1-1 on page 30. The result of each compare is a 32-bit value of all 1s (TRUE) orall 0s (FALSE). The first source/destination operand is an XMM register. The secondsource operand is another XMM register or 128-bit memory location.

Signed compares return TRUE only if both operands are valid numbers, and thenumbers have the relation specified by the type of compare. "Ordered" comparereturns TRUE if both operands are valid numbers, or FALSE if either operand is aNaN. "Unordered" compare returns TRUE only if one or both operands are NaN, andFALSE otherwise.

QNaN operands generate an Invalid Operation Exception only if the compare typeisn't "Equal", "Unequal", "Orderered", or "Unordered". SNaN operands alwaysgenerate an Invalid Operation Exception (IE).

Some compare operations that are not directly supported by the immediate-byteencodings can be implemented by swapping the contents of the source anddestination operands and then executing the appropriate compare instruction usingthe swapped values. These additional compare operations are shown in Table 1-1 onpage 30. When swapping operands, the first source XMM register is overwritten by theresult.

The CMPPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

CMPPS xmm1, xmm2/mem128, imm8 0F C2 /r ib Compares four pairs of packed single-precision floating-point values in an XMM register and an XMM register or 64-bit memory location.

Page 64: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

34 CMPPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

CMPPD, CMPSD, CMPSS, COMISD, COMISS, UCOMISD, UCOMISS

rFLAGS Affected

None

MXCSR Flags Affected

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

cmpps.eps

xmm1 xmm2/mem128

imm87 0

compare

all 1s or 0s all 1s or 0s

compare

127 63 0649596 3132127 63 0649596 3132

..

..

..

Page 65: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CMPPS 35

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X A source operand was a QNaN value, and the comparison does not allow QNaN values (refer to Table 1-1 on page 30).

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 66: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

36 CMPSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CMPSD Compare Scalar Double-PrecisionFloating-Point

Compares the double-precision floating-point value in the low-order 64 bits of the firstsource operand with the double-precision floating-point value in the low-order 64 bitsof the second source operand and writes the result in the low-order 64 bits of thedestination (first source). The type of comparison is specified by the three low-orderbits of the immediate-byte operand, as shown in Table 1-1 on page 30. The result of thecompare is a 64-bit value of all 1s (TRUE) or all 0s (FALSE). The firstsource/destination operand is an XMM register. The second source operand is anotherXMM register or 64-bit memory location. The high-order 64 bits of the destinationXMM register are not modified.

Signed compares return TRUE only if both operands are valid numbers, and thenumbers have the relation specified by the type of compare. "Ordered" comparereturns TRUE if both operands are valid numbers, or FALSE if either operand is aNaN. "Unordered" compare returns TRUE only if one or both operands are NaN, andFALSE otherwise.

QNaN operands generate an Invalid Operation Exception only if the compare typeisn't "Equal", "Unequal", "Orderered", or "Unordered". SNaN operands alwaysgenerate an Invalid Operation Exception (IE).

Some compare operations that are not directly supported by the immediate-byteencodings can be implemented by swapping the contents of the source anddestination operands and then executing the appropriate compare instruction usingthe swapped values. These additional compare operations are shown in Table 1-1 onpage 30. When swapping operands, the first source XMM register is overwritten by theresult.

This CMPSD instruction should not be confused with the same-mnemonic CMPSD(compare strings by doubleword) instruction in the general-purpose instruction set.Assemblers can distinguish the instructions by the number and type of operands.

The CMPSD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

CMPSD xmm1, xmm2/mem64, imm8 F2 0F C2 /r ib Compares double-precision floating-point values in an XMM register and an XMM register or 64-bit memory location.

Page 67: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CMPSD 37

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

CMPPD, CMPPS, CMPSS, COMISD, COMISS, UCOMISD, UCOMISS

rFLAGS Affected

None

MXCSR Flags Affected

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

cmpsd.eps

xmm1 xmm2/mem64

imm87 0

compare

all 1s or 0s

127 63 064 127 63 064

Page 68: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

38 CMPSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X A source operand was a QNaN value, and the comparison does not allow QNaN values (refer to Table 1-1 on page 30).

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 69: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CMPSS 39

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CMPSS Compare Scalar Single-PrecisionFloating-Point

Compares the single-precision floating-point value in the low-order 32 bits of the firstsource operand with the single-precision floating-point value in the low-order 32 bitsof the second source operand and writes the result in the low-order 32 bits of thedestination (first source). The type of comparison is specified by the three low-orderbits of the immediate-byte operand, as shown in Table 1-1 on page 30. The result of thecompare is a 32-bit value of all 1s (TRUE) or all 0s (FALSE). The firstsource/destination operand is an XMM register. The second source operand is anotherXMM register or 32-bit memory location. The three high-order doublewords of thedestination XMM register are not modified.

Signed compares return TRUE only if both operands are valid numbers, and thenumbers have the relation specified by the type of compare. "Ordered" comparereturns TRUE if both operands are valid numbers, or FALSE if either operand is aNaN. "Unordered" compare returns TRUE only if one or both operands are NaN, andFALSE otherwise.

QNaN operands generate an Invalid Operation Exception only if the compare typeisn't "Equal", "Unequal", "Orderered", or "Unordered". SNaN operands alwaysgenerate an Invalid Operation Exception (IE).

Some compare operations that are not directly supported by the immediate-byteencodings can be implemented by swapping the contents of the source anddestination operands and then executing the appropriate compare instruction usingthe swapped values. These additional compare operations are shown in Table 1-1 onpage 30. When swapping operands, the first source XMM register is overwritten by theresult.

The CMPSS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

CMPSS xmm1, xmm2/mem32, imm8 F3 0F C2 /r ib Compares single-precision floating-point values in an XMM register and an XMM register or 32-bit memory location.

Page 70: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

40 CMPSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

CMPPD, CMPPS, CMPSD, COMISD, COMISS, UCOMISD, UCOMISS

rFLAGS Affected

None

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

cmpss.eps

xmm1 xmm2/mem32

imm87 0

compare

127 31 032 127 31 032

Page 71: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CMPSS 41

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #ACX X An unaligned memory reference was performed while

alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X A source operand was a QNaN value, and the comparison does not allow QNaN values (refer to Table 1-1 on page 30).

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Exception RealVirtual8086 Protected Cause of Exception

Page 72: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

42 COMISD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

COMISD Compare Ordered Scalar Double-PrecisionFloating-Point

Compares the double-precision floating-point value in the low-order 64 bits of anXMM register with the double-precision floating-point value in the low-order 64 bits ofanother XMM register or a 64-bit memory location and sets the ZF, PF, and CF bits inthe rFLAGS register to reflect the result of the comparison. The OF, AF, and SF bits inrFLAGS are set to zero. The result is unordered if one or both of the operand values isa NaN.

If the instruction causes an unmasked SIMD floating-point exception (#XF), therFLAGS bits are not updated.

The COMISD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

COMISD xmm1, xmm2/mem64 66 0F 2F /r Compares double-precision floating-point values in an XMM register and an XMM register or 64-bit memory location and sets rFLAGS.

Result of Compare ZF PF CF

Unordered 1 1 1

Greater Than 0 0 0

Less Than 0 0 1

Equal 1 0 0

comisd.eps

compare

127 63 064

03163

127 63 064

xmm1

rFLAGS0

xmm2/mem64

Page 73: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

COMISD 43

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

CMPPD, CMPPS, CMPSD, CMPSS, COMISS, UCOMISD, UCOMISS

rFLAGS Affected

MXCSR Flags Affected

Exceptions

ID VIP VIF AC VM RF NT IOPL OF DF IF TF SF ZF AF PF CF

0 0 M 0 M M

21 20 19 18 17 16 14 13–12 11 10 9 8 7 6 4 2 0

Note: Bits 31–22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set either to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 74: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

44 COMISD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN or QNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Exception RealVirtual8086 Protected Cause of Exception

Page 75: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

COMISS 45

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

COMISS Compare Ordered Scalar Single-PrecisionFloating-Point

Performs an ordered comparison of the single-precision floating-point value in thelow-order 32 bits of an XMM register with the single-precision floating-point value inthe low-order 32 bits of another XMM register or a 32-bit memory location and sets theZF, PF, and CF bits in the rFLAGS register to reflect the result of the comparison. TheOF, AF, and SF bits in rFLAGS are set to zero. The result is unordered if one or both ofthe operand values is a NaN.

If the instruction causes an unmasked SIMD floating-point exception (#XF), therFLAGS bits are not updated.

The COMISS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

COMISS xmm1, xmm2/mem32 0F 2F /r Compares single-precision floating-point values in an XMM register and an XMM register or 32-bit memory location. Sets rFLAGS.

comiss.eps

compare

127 031 127 031

xmm1 xmm2/mem32

03163

rFLAGS0

Page 76: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

46 COMISS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

CMPPD, CMPPS, CMPSD, CMPSS, COMISD, UCOMISD, UCOMISS

rFLAGS Affected

MXCSR Flags Affected

Result of Compare ZF PF CF

Unordered 1 1 1

Greater Than 0 0 0

Less Than 0 0 1

Equal 1 0 0

ID VIP VIF AC VM RF NT IOPL OF DF IF TF SF ZF AF PF CF

0 0 M 0 M M

21 20 19 18 17 16 14 13–12 11 10 9 8 7 6 4 2 0

Note: Bits 31–22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Page 77: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

COMISS 47

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN or QNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 78: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

48 CVTDQ2PD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTDQ2PD Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point

Converts two packed 32-bit signed integer values in the low-order 64 bits of an XMMregister or a 64-bit memory location to two packed double-precision floating-pointvalues and writes the converted values in another XMM register.

The CVTDQ2PD instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTSI2SD, CVTTPD2DQ,CVTTPD2PI, CVTTSD2SI

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

CVTDQ2PD xmm1, xmm2/mem64 F3 0F E6 /r Converts packed doubleword signed integers in an XMM register or 64-bit memory location to double-precision floating-point values in the destination XMM register.

cvtdq2pd.eps

127 63 064

xmm1 xmm2/mem64

convertconvert

127 63 064 3132

Page 79: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTDQ2PD 49

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #ACX X An unaligned memory reference was performed while

alignment checking was enabled.

Page 80: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

50 CVTDQ2PS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTDQ2PS Convert Packed Doubleword Integers to Packed Single-Precision Floating-Point

Converts four packed 32-bit signed integer values in an XMM register or a 128-bitmemory location to four packed single-precision floating-point values and writes theconverted values in another XMM register. If the result of the conversion is an inexactvalue, the value is rounded as specified by the rounding control bits (RC) in theMXCSR register.

The CVTDQ2PS instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2DQ,CVTTPS2PI, CVTTSS2SI

rFLAGS Affected

None

Mnemonic Opcode Description

CVTDQ2PS xmm1, xmm2/mem128 0F 5B /r Converts packed doubleword integer values in an XMM register or 128-bit memory location to packed single-precision floating-point values in the destination XMM register.

cvtdq2ps.eps

xmm1 xmm2/mem128

convert

convert

convert

convert

127 63 0649596 3132127 63 0649596 3132

Page 81: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTDQ2PS 51

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Precision exception (PE) X X X A result coulld not be represented exactly in the destination format.

Page 82: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

52 CVTPD2DQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTPD2DQ Convert Packed Double-Precision Floating-Point to PackedDoubleword Integers

Converts two packed double-precision floating-point values in an XMM register or a128-bit memory location to two packed 32-bit signed integers and writes the convertedvalues in the low-order 64 bits of another XMM register. The high-order 64 bits in thedestination XMM register are cleared to all 0s.

If the result of the conversion is an inexact value, the value is rounded as specified bythe rounding control bits (RC) in the MXCSR register. If the floating-point value is aNaN, infinity, or if the result of the conversion is larger than the maximum signeddoubleword (–231 to +231 – 1), the instruction returns the 32-bit indefinite integervalue (8000_0000h) when the invalid-operation exception (IE) is masked.

The CVTPD2DQ instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTDQ2PD, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTSI2SD, CVTTPD2DQ,CVTTPD2PI, CVTTSD2SI

rFLAGS Affected

None

Mnemonic Opcode Description

CVTPD2DQ xmm1, xmm2/mem128 F2 0F E6 /r Converts packed double-precision floating-point values in an XMM register or 128-bit memory location to packed doubleword integers in the destination XMM register.

cvtpd2dq.eps

xmm1 xmm2/mem128

convertconvert

127 63 064 3132 127 63 064

0

Page 83: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTPD2DQ 53

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Page 84: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

54 CVTPD2PI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTPD2PI Convert Packed Double-Precision Floating-Point to PackedDoubleword Integers

Converts two packed double-precision floating-point values in an XMM register or a128-bit memory location to two packed 32-bit signed integer values and writes theconverted values in an MMX register.

If the result of the conversion is an inexact value, the value is rounded as specified bythe rounding control bits (RC) in the MXCSR register. If the floating-point value is aNaN, infinity, or if the result of the conversion is larger than the maximum signeddoubleword (–231 to +231 – 1), the instruction returns the 32-bit indefinite integervalue (8000_0000h) when the invalid-operation exception (IE) is masked.

The CVTPD2PI instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTDQ2PD, CVTPD2DQ, CVTPI2PD, CVTSD2SI, CVTSI2SD, CVTTPD2DQ,CVTTPD2PI, CVTTSD2SI

rFLAGS Affected

None

Mnemonic Opcode Description

CVTPD2PI mmx, xmm2/mem128 66 0F 2D /r Converts packed double-precision floating-point values in an XMM register or 128-bit memory location to packed doubleword integers values in the destination MMX register.

cvtpd2pi.eps

127 63 0643132

xmm/mem128mmx

convertconvert

63 0

Page 85: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTPD2PI 55

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

x87 floating-point exception pending, #MF

X X X An exception is pending due to an x87 floating-point instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Page 86: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

56 CVTPD2PI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 87: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTPD2PS 57

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CVTPD2PS Convert Packed Double-Precision Floating-Point to PackedSingle-Precision Floating-Point

Converts two packed double-precision floating-point values in an XMM register or a128-bit memory location to two packed single-precision floating-point values andwrites the converted values in the low-order 64 bits of another XMM register. Thehigh-order 64 bits in the destination XMM register are cleared to all 0s.

If the result of the conversion is an inexact value, the value is rounded as specified bythe rounding control bits (RC) in the MXCSR register.

The CVTPD2PS instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTPS2PD, CVTSD2SS, CVTSS2SD

rFLAGS Affected

None

Mnemonic Opcode Description

CVTPD2PS xmm1, xmm2/mem128 66 0F 5A /r Converts packed double-precision floating-point values in an XMM register or 128-bit memory location to packed single-precision floating-point values in the destination XMM register.

cvtpd2ps.eps

127 63 064

xmm1 xmm2/mem128

convertconvert

127 63 0643132

0

Page 88: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

58 CVTPD2PS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Page 89: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTPD2PS 59

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 90: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

60 CVTPI2PD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTPI2PD Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point

Converts two packed 32-bit signed integer values in an MMX register or a 64-bitmemory location to two double-precision floating-point values and writes theconverted values in an XMM register.

The CVTPI2PD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTSD2SI, CVTSI2SD, CVTTPD2DQ,CVTTPD2PI, CVTTSD2SI

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

CVTPI2PD xmm, mmx/mem64 66 0F 2A /r Converts two packed doubleword integer values in an MMX register or 64-bit memory location to two packed double-precision floating-point values in the destination XMM register.

cvtpi2pd.eps

127 63 064 3132

mmx/mem64xmm

convertconvert

63 0

Page 91: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTPI2PD 61

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or

was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

x87 floating-point exception pending, #MF

X X X An exception was pending due to an x87 floating-point instruction.

Alignment check, #ACX X An unaligned memory reference was performed while

alignment checking was enabled.

Page 92: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

62 CVTPI2PS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTPI2PS Convert Packed Doubleword Integers toPacked Single-Precision Floating-Point

Converts two packed 32-bit signed integer values in an MMX register or a 64-bitmemory location to two single-precision floating-point values and writes theconverted values in the low-order 64 bits of an XMM register. The high-order 64 bits ofthe XMM register are not modified.

The CVTPI2PS instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTDQ2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2DQ,CVTTPS2PI, CVTTSS2SI

rFLAGS Affected

None

Mnemonic Opcode Description

CVTPI2PS xmm, mmx/mem64 0F 2A /r Converts packed doubleword integer values in an MMX register or 64-bit memory location to single-precision floating-point values in the destination XMM register.

cvtpi2ps.eps

3132

mmx/mem64xmm

convertconvert

63 0127 63 064 3132

Page 93: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTPI2PS 63

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

x87 floating-point exception pending, #MF

X X X An exception was pending due to an x87 floating-point instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Page 94: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

64 CVTPS2DQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTPS2DQ Convert Packed Single-Precision Floating-Point to PackedDoubleword Integers

Converts four packed single-precision floating-point values in an XMM register or a128-bit memory location to four packed 32-bit signed integer values and writes theconverted values in another XMM register.

If the result of the conversion is an inexact value, the value is rounded as specified bythe rounding control bits (RC) in the MXCSR register. If the floating-point value is aNaN, infinity, or if the result of the conversion is larger than the maximum signeddoubleword (–231 to +231 – 1), the instruction returns the 32-bit indefinite integervalue (8000_0000h) when the invalid-operation exception (IE) is masked.

The CVTPS2DQ instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTDQ2PS, CVTPI2PS, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2DQ,CVTTPS2PI, CVTTSS2SI

Mnemonic Opcode Description

CVTPS2DQ xmm1, xmm2/mem128 66 0F 5B /r Converts four packed single-precision floating-point values in an XMM register or 128-bit memory location to four packed doubleword integers in the destination XMM register.

cvtps2dq.eps

xmm1 xmm2/mem128

convertconvert

convert

convert

127 63 0649596 3132127 63 0649596 3132

Page 95: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTPS2DQ 65

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Page 96: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

66 CVTPS2DQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 97: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTPS2PD 67

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CVTPS2PD Convert Packed Single-Precision Floating-Pointto Packed Double-Precision Floating-Point

Converts two packed single-precision floating-point values in the low-order 64 bits ofan XMM register or a 64-bit memory location to two packed double-precision floating-point values and writes the converted values in another XMM register.

The CVTPS2PD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTPD2PS, CVTSD2SS, CVTSS2SD

rFLAGS Affected

None

Mnemonic Opcode Description

CVTPS2PD xmm1, xmm2/mem64 0F 5A /r Converts packed single-precision floating-point values in an XMM register or 64-bit memory location to packed double-precision floating-point values in the destination XMM register.

cvtps2pd.eps

xmm2/mem64xmm1

convertconvert

127 63 064 3132127 63 064

Page 98: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

68 CVTPS2PD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 99: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTPS2PI 69

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CVTPS2PI Convert Packed Single-Precision Floating-Point to PackedDoubleword Integers

Converts two packed single-precision floating-point values in the low-order 64 bits ofan XMM register or a 64-bit memory location to two packed 32-bit signed integers andwrites the converted values in an MMX register.

If the result of the conversion is an inexact value, the value is rounded as specified bythe rounding control bits (RC) in the MXCSR register. If the floating-point value is aNaN, infinity, or if the result of the conversion is larger than the maximum signeddoubleword (–231 to +231 – 1), the instruction returns the 32-bit indefinite integervalue (8000_0000h) when the invalid-operation exception (IE) is masked.

The CVTPS2PI instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTSI2SS, CVTSS2SI, CVTTPS2DQ,CVTTPS2PI, CVTTSS2SI

rFLAGS Affected

None

Mnemonic Opcode Description

CVTPS2PI mmx, xmm/mem64 0F 2D /r Converts packed single-precision floating-point values in an XMM register or 64-bit memory location to packed doubleword integers in the destination MMX register.

cvtps2pi.eps

xmm/mem64mmx

convertconvert

127 63 064 3132313263 0

Page 100: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

70 CVTPS2PI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

x87 floating-point exception pending, #MF

X X X An exception was pending due to an x87 floating-point instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Page 101: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTPS2PI 71

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 102: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

72 CVTSD2SI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTSD2SI Convert Scalar Double-Precision Floating-Point to SignedDoubleword or Quadword Integer

Converts a scalar double-precision floating-point value in the low-order 64 bits of anXMM register or a 64-bit memory location to a 32-bit or 64-bit signed integer andwrites the converted value in a general-purpose register.

If the result of the conversion is an inexact value, the value is rounded as specified bythe rounding control bits (RC) in the MXCSR register. If the floating-point value is aNaN, infinity, or if the result of the conversion is larger than the maximum signeddoubleword (–231 to +231 – 1) or quadword value (–263 to +263 – 1), the instructionreturns the indef ini te integer value (8000_0000h for 32 -bi t integers ,8000_0000_0000_0000h for 64-bit integers) when the invalid-operation exception (IE)is masked.

The CVTSD2SI instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

CVTSD2SI reg32, xmm/mem64 F2 0F 2D /r Converts a packed double-precision floating-point value in an XMM register or 64-bit memory location to a doubleword integer in a general-purpose register.

CVTSD2SI reg64, xmm/mem64 F2 0F 2D /r Converts a packed double-precision floating-point value in an XMM register or 64-bit memory location to a quadword integer in a general-purpose register.

Page 103: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTSD2SI 73

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSI2SD, CVTTPD2DQ,CVTTPD2PI, CVTTSD2SI

rFLAGS Affected

None

MXCSR Flags Affected

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

cvtsd2si.epswith REX prefix

031 127 63 064

reg32 xmm2/mem64

63 0 127 63 064

reg64 xmm2/mem64

convert

convert

Page 104: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

74 CVTSD2SI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Page 105: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTSD2SS 75

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CVTSD2SS Convert Scalar Double-Precision Floating-Pointto Scalar Single-Precision Floating-Point

Converts a scalar double-precision floating-point value in the low-order 64 bits of anXMM register or a 64-bit memory location to a single-precision floating-point valueand writes the converted value in the low-order 32 bits of another XMM register. Thethree high-order doublewords in the destination XMM register are not modified. If theresult of the conversion is an inexact value, the value is rounded as specified by therounding control bits (RC) in the MXCSR register.

The CVTSD2SS instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTPD2PS, CVTPS2PD, CVTSS2SD

rFLAGS Affected

None

Mnemonic Opcode Description

CVTSD2SS xmm1, xmm2/mem64 F2 0F 5A /r Converts a scalar double-precision floating-point value in an XMM register or 64-bit memory location to a scalar single-precision floating-point value in the destination XMM register.

cvtsd2ss.eps

xmm1 xmm2/mem64

convert

127 63 064127 31 032

Page 106: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

76 CVTSD2SS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Page 107: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTSD2SS 77

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 108: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

78 CVTSI2SD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTSI2SD Convert Signed Doubleword or QuadwordInteger to Scalar Double-Precision Floating-Point

Converts a 32-bit or 64-bit signed integer value in a general-purpose register ormemory location to a double-precision floating-point value and writes the convertedvalue in the low-order 64 bits of an XMM register. The high-order 64 bits in thedestination XMM register are not modified.

If the result of the conversion is an inexact value, the value is rounded as specified bythe rounding control bits (RC) in the MXCSR register.

The CVTSI2SD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

CVTSI2SD xmm, reg/mem32 F2 0F 2A /r Converts a doubleword integer in a general-purpose register or 32-bit memory location to a double-precision floating-point value in the destination XMM register.

CVTSI2SD xmm, reg/mem64 F2 0F 2A /r Converts a quadword integer in a general-purpose register or 64-bit memory location to a double-precision floating-point value in the destination XMM register.

cvtsi2sd.eps

031

reg/mem32xmm

convert

127 63 064

reg/mem64xmm

convert

127 63 064

with REX prefix

63 0

Page 109: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTSI2SD 79

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTTPD2DQ,CVTTPD2PI, CVTTSD2SI

rFLAGS Affected

None

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #ACX X An unaligned memory reference was performed while

alignment checking was enabled.

Page 110: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

80 CVTSI2SD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 111: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTSI2SS 81

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CVTSI2SS Convert Signed Doubleword or Quadword Integer to ScalarSingle-Precision Floating-Point

Converts a 32-bit or 64-bit signed integer value in a general-purpose register ormemory location to a single-precision floating-point value and writes the convertedvalue in the low-order 32 bits of an XMM register. The three high-order doublewords inthe destination XMM register are not modified.

If the result of the conversion is an inexact value, the value is rounded as specified bythe rounding control bits (RC) in the MXCSR register.

The CVTSI2SS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

CVTSI2SS xmm, reg/mem32 F3 0F 2A /r Converts a doubleword integer in a general-purpose register or 32-bit memory location to a single-precision floating-point value in the destination XMM register.

CVTSI2SS xmm, reg/mem64 F3 0F 2A /r Converts a quadword integer in a general-purpose register or 64-bit memory location to a single-precision floating-point value in the destination XMM register.

cvtsi2ss.eps

031

reg/mem32xmm

convert

127 03132

127 03132

reg/mem64xmm

convert

with REX prefix

63 0

Page 112: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

82 CVTSI2SS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSS2SI, CVTTPS2DQ,CVTTPS2PI, CVTTSS2SI

rFLAGS Affected

None

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #ACX X An unaligned memory reference was performed while

alignment checking was enabled.

Page 113: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTSI2SS 83

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 114: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

84 CVTSS2SD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTSS2SD Convert Scalar Single-Precision Floating-Pointto Scalar Double-Precision Floating-Point

Converts a single-precision floating-point value in the low-order 32 bits of an XMMregister or a 32-bit memory location to a double-precision floating-point value andwrites the converted value in the low-order 64 bits of another XMM register. The high-order 64 bits in the destination XMM register are not modified.

The CVTSS2SD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTPD2PS, CVTPS2PD, CVTSD2SS

rFLAGS Affected

None

Mnemonic Opcode Description

CVTSS2SD xmm1, xmm2/mem32 F3 0F 5A /r Converts scalar single-precision floating-point value in an XMM register or 32-bit memory location to double-precision floating-point value in the destination XMM register.

cvtss2sd.eps

xmm1 xmm2/mem32

convert

127 31 032127 63 064

Page 115: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTSS2SD 85

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 116: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

86 CVTSS2SI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTSS2SI Convert Scalar Single-Precision Floating-Pointto Signed Doubleword or Quadword Integer

The CVTSS2SI instruction converts a single-precision floating-point value in the low-order 32 bits of an XMM register or a 32-bit memory location to a 32-bit or 64-bitsigned integer value and writes the converted value in a general-purpose register.

If the result of the conversion is an inexact value, the value is rounded as specified bythe rounding control bits (RC) in the MXCSR register. If the floating-point value is aNaN, infinity, or if the result of the conversion is larger than the maximum signeddoubleword (–231 to +231 – 1) or quadword value (–263 to +263 – 1), the instructionreturns the indef ini te integer value (8000_0000h for 32 -bi t integers ,8000_0000_0000_0000h for 64-bit integers) when the invalid-operation exception (IE)is masked.

The CVTSS2SI instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

CVTSS2SI reg32, xmm2/mem32 F3 0F 2D /r Converts a single-precision floating-point value in an XMM register or 32-bit memory location to a doubleword integer value in a general-purpose register.

CVTSS2SI reg64, xmm2/mem32 F3 0F 2D /r Converts a single-precision floating-point value in an XMM register or 32-bit memory location to a quadword integer value in a general-purpose register.

Page 117: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTSS2SI 87

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTTPS2DQ,CVTTPS2PI, CVTTSS2SI

rFLAGS Affected

None

MXCSR Flags Affected

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

cvtss2si.epswith REX prefix

63 0

031 127 31 032

reg32 xmm2/mem32

convert

0 127 31 032

reg64 xmm2/mem32

convert

Page 118: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

88 CVTSS2SI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Page 119: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTTPD2DQ 89

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CVTTPD2DQ Convert Packed Double-Precision Floating-Pointto Packed Doubleword Integers, Truncated

Converts two packed double-precision floating-point values in an XMM register or a128-bit memory location to two packed 32-bit signed integer values and writes theconverted values in the low-order 64 bits of another XMM register. The high-order 64bits of the destination XMM register are cleared to all 0s.

If the result of the conversion is an inexact value, the value is truncated (roundedtoward zero). If the floating-point value is a NaN, infinity, or if the result of theconversion is larger than the maximum signed doubleword (–231 to +231 – 1), theinstruction returns the 32-bit indefinite integer value (8000_0000h) when the invalid-operation exception (IE) is masked.

The CVTTPD2DQ instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTSI2SD,CVTTPD2PI, CVTTSD2SI

rFLAGS Affected

None

Mnemonic Opcode Description

CVTTPD2DQ xmm1, xmm2/mem128 66 0F E6 /r Converts packed double-precision floating-point values in an XMM register or 128-bit memory location to packed doubleword integer values in the destination XMM register. Inexact results are truncated.

cvttpd2dq.eps

127 63 064

xmm1 xmm2/mem128

convert

0

convert

127 63 0643132

Page 120: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

90 CVTTPD2DQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Page 121: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTTPD2PI 91

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CVTTPD2PI Convert Packed Double-Precision Floating-Pointto Packed Doubleword Integers, Truncated

Converts two packed double-precision floating-point values in an XMM register or a128-bit memory location to two packed 32-bit signed integer values and writes theconverted values in an MMX register.

If the result of the conversion is an inexact value, the value is truncated (roundedtoward zero). If the floating-point value is a NaN, infinity, or if the result of theconversion is larger than the maximum signed doubleword (–231 to +231 – 1), theinstruction returns the 32-bit indefinite integer value (8000_0000h) when the invalid-operation exception (IE) is masked.

The CVTTPD2PI instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTSI2SD,CVTTPD2DQ, CVTTSD2SI

rFLAGS Affected

None

Mnemonic Opcode Description

CVTTPD2PI mmx, xmm/mem128 66 0F 2C /r Converts packed double-precision floating-point values in an XMM register or 128-bit memory location to packed doubleword integer values in the destination MMX register. Inexact results are truncated.

cvttpd2pi.eps

127 63 0643132

xmm/mem128mmx

convertconvert

63 0

Page 122: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

92 CVTTPD2PI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

x87 floating-point exception pending, #MF

X X X An exception is pending due to an x87 floating-point instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

Page 123: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTTPD2PI 93

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 124: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

94 CVTTPS2DQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTTPS2DQ Convert Packed Single-Precision Floating-Pointto Packed Doubleword Integers, Truncated

Converts four packed single-precision floating-point values in an XMM register or a128-bit memory location to four packed 32-bit signed integers and writes theconverted values in another XMM register.

If the result of the conversion is an inexact value, the value is truncated (roundedtoward zero). If the floating-point value is a NaN, infinity, or if the result of theconversion is larger than the maximum signed doubleword (–231 to +231 – 1), theinstruction returns the 32-bit indefinite integer value (8000_0000h) when the invalid-operation exception (IE) is masked.

The CVTTPS2DQ instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2PI,CVTTSS2SI

Mnemonic Opcode Description

CVTTPS2DQ xmm1, xmm2/mem128 F3 0F 5B /r Converts packed single-precision floating-point values in an XMM register or 128-bit memory location to packed doubleword integer values in the destination XMM register. Inexact results are truncated.

cvttps2dq.eps

xmm1 xmm2/mem128

convert

convert

convert

convert

127 63 0649596 3132127 63 0649596 3132

Page 125: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTTPS2DQ 95

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Page 126: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

96 CVTTPS2DQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 127: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTTPS2PI 97

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CVTTPS2PI Convert Packed Single-Precision Floating-Pointto Packed Doubleword Integers, Truncated

Converts two packed single-precision floating-point values in the low-order 64 bits ofan XMM register or a 64-bit memory location to two packed 32-bit signed integervalues and writes the converted values in an MMX register.

If the result of the conversion is an inexact value, the value is truncated (roundedtoward zero). If the floating-point value is a NaN, infinity, or if the result of theconversion is larger than the maximum signed doubleword (–231 to +231 – 1), theinstruction returns the 32-bit indefinite integer value (8000_0000h) when the invalid-operation exception (IE) is masked.

The CVTTPS2PI instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI,CVTTPS2DQ, CVTTSS2SI

rFLAGS Affected

None

Mnemonic Opcode Description

CVTTPS2PI mmx, xmm/mem64 0F 2C /r Converts packed single-precision floating-point values in an XMM register or 64-bit memory location to doubleword integer values in the destination MMX register. Inexact results are truncated.

cvttps2pi.eps

xmm/mem64mmx

convertconvert

127 63 064 3132313263 0

Page 128: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

98 CVTTPS2PI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

x87 floating-point exception pending, #MF

X X X An exception was pending due to an x87 floating-point instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

Page 129: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTTPS2PI 99

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 130: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

100 CVTTSD2SI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

CVTTSD2SI Convert Scalar Double-Precision Floating-Pointto Signed Doubleword of Quadword Integer, Truncated

Converts a double-precision floating-point value in the low-order 64 bits of an XMMregister or a 64-bit memory location to a 32-bit or 64-bit signed integer value andwrites the converted value in a general-purpose register.

If the result of the conversion is an inexact value, the value is truncated (roundedtoward zero). If the floating-point value is a NaN, infinity, or if the result of theconversion is larger than the maximum signed doubleword (–231 to +231 – 1) orquadword value (–263 to +263 – 1), the instruction returns the indefinite integer value(8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers) when theinvalid-operation exception (IE) is masked.

The CVTTSD2SI instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

CVTTSD2SI reg32, xmm/mem64 F2 0F 2C /r Converts scalar double-precision floating-point value in an XMM register or 64-bit memory location to a doubleword signed integer value in a general-purpose register. Inexact results are truncated.

CVTTSD2SI reg64, xmm/mem64 F2 0F 2C /r Converts scalar double-precision floating-point value in an XMM register or 64-bit memory location to a quadword signed integer value in a general-purpose register. Inexact results are truncated.

Page 131: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTTSD2SI 101

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTSI2SD,CVTTPD2DQ, CVTTPD2PI

rFLAGS Affected

None

MXCSR Flags Affected

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

cvttsd2si.epswith REX prefix

031 127 63 064

reg32 xmm2/mem64

63 0 127 63 064

reg64 xmm2/mem64

convert

convert

Page 132: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

102 CVTTSD2SI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Page 133: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTTSS2SI 103

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

CVTTSS2SI Convert Scalar Single-Precision Floating-Point to SignedDoubleword or Quadword Integer, Truncated

Converts a single-precision floating-point value in the low-order 32 bits of an XMMregister or a 32-bit memory location to a 32-bit or 64-bit signed integer value andwrites the converted value in a general-purpose register.

If the result of the conversion is an inexact value, the value is truncated (roundedtoward zero). If the floating-point value is a NaN, infinity, or if the result of theconversion is larger than the maximum signed doubleword (–231 to +231 – 1) orquadword value (–263 to +263 – 1), the instruction returns the indefinite integer value(8000_0000h for 32-bit integers, 8000_0000_0000_0000h for 64-bit integers) when theinvalid-operation exception (IE) is masked.

The CVTTSS2SI instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

CVTTSS2SI reg32, xmm/mem32 F3 0F 2C /r Converts scalar single-precision floating-point value in an XMM register or 32-bit memory location to a signed doubleword integer value in a general-purpose register. Inexact results are truncated.

CVTTSS2SI reg64, xmm/mem32 F3 0F 2C /r Converts scalar single-precision floating-point value in an XMM register or 32-bit memory location to a signed quadword integer value in a general-purpose register. Inexact results are truncated.

Page 134: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

104 CVTTSS2SI

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI,CVTTPS2DQ, CVTTPS2PI

rFLAGS Affected

None

MXCSR Flags Affected

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

cvttss2si.epswith REX prefix

63 0

031 127 31 032

reg32 xmm2/mem64

convert

0 127 31 032

reg64 xmm2/mem64

convert

Page 135: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

CVTTSS2SI 105

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value, a QNaN value, or ±infinity.

X X X A source operand was too large to fit in the destination format.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Page 136: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

106 DIVPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

DIVPD Divide Packed Double-Precision Floating-Point

Divides each of the two packed double-precision floating-point values in the firstsource operand by the corresponding packed double-precision floating-point value inthe second source operand and writes the result of each division in the correspondingquadword of the destination (first source). The first source/destination operand is anXMM register. The second source operand is another XMM register or 128-bit memorylocation.

The DIVPD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

DIVPS, DIVSD, DIVSS

rFLAGS Affected

None

Mnemonic Opcode Description

DIVPD xmm1, xmm2/mem128 66 0F 5E /r Divides packed double-precision floating-point values in an XMM register by the packed double-precision floating-point values in another XMM register or 128-bit memory location.

divpd.eps

127 63 064 127 63 064

xmm1 xmm2/mem128

divide

divide

Page 137: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

DIVPD 107

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X ±Zero was divided by ±zero.

X X X ±infinity was divided by ±infinity.

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Page 138: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

108 DIVPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Zero-divide exception (ZE) X X X A non-zero number was divided by zero.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 139: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

DIVPS 109

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

DIVPS Divide Packed Single-Precision Floating-Point

Divides each of the four packed single-precision floating-point values in the firstsource operand by the corresponding packed single-precision floating-point value inthe second source operand and writes the result of each division in the correspondingquadword of the destination (first source). The first source/destination operand is anXMM register. The second source operand is another XMM register or 128-bit memorylocation.

The DIVPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

DIVPD, DIVSD, DIVSS

rFLAGS Affected

None

Mnemonic Opcode Description

DIVPS xmm1, xmm/2mem128 0F 5E /r Divides packed single-precision floating-point values in an XMM register by the packed single-precision floating-point values in another XMM register or 128-bit memory location.

divps.eps

xmm1 xmm2/mem128

divide

divide

divide

divide

127 63 0649596 3132127 63 0649596 3132

Page 140: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

110 DIVPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X ±Zero was divided by ±zero.

X X X ±infinity was divided by ±infinity.

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Page 141: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

DIVPS 111

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Zero-divide exception (ZE) X X X A non-zero number was divided by zero.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 142: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

112 DIVSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

DIVSD Divide Scalar Double-Precision Floating-Point

Divides the double-precision floating-point value in the low-order quadword of thefirst source operand by the double-precision floating-point value in the low-orderquadword of the second source operand and writes the result in the low-orderquadword of the destination (first source). The high-order quadword of thedestination is not modified. The first source/destination operand is an XMM register.The second source operand is another XMM register or 128-bit memory location.

The DIVSD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

DIVPD, DIVPS, DIVSS

rFLAGS Affected

None

Mnemonic Opcode Description

DIVSD xmm1, xmm2/mem64 F2 0F 5E /r Divides low-order double-precision floating-point value in an XMM register by the low-order double-precision floating-point value in another XMM register or in a 64- or 128-bit memory location.

divsd.eps

xmm1 xmm2/mem64

divide

127 63 064 127 63 064

Page 143: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

DIVSD 113

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X ±Zero was divided by ±zero.

X X X ±infinity was divided by ±infinity.

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Page 144: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

114 DIVSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Zero-divide exception (ZE) X X X A non-zero number was divided by zero.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 145: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

DIVSS 115

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

DIVSS Divide Scalar Single-Precision Floating-Point

Divides the single-precision floating-point value in the low-order doubleword of thefirst source operand by the single-precision floating-point value in the low-orderdoubleword of the second source operand and writes the result in the low-orderdoubleword of the destination (first source). The three high-order doublewords of thedestination are not modified. The first source/destination operand is an XMM register.The second source operand is another XMM register or 128-bit memory location.

The DIVSS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

DIVPD, DIVPS, DIVSD

rFLAGS Affected

None

Mnemonic Opcode Description

DIVSS xmm1, xmm2/mem32 F3 0F 5E /r Divides low-order single-precision floating-point value in an XMM register by the low-order single-precision floating-point value in another XMM register or in a 32-bit memory location.

divss.eps

xmm1 xmm2/mem32

divide

127 31 032 127 31 032

Page 146: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

116 DIVSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X ±Zero was divided by ±zero.

X X X ±infinity was divided by ±infinity.

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Page 147: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

DIVSS 117

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Zero-divide exception (ZE) X X X A non-zero number was divided by zero.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 148: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

118 FXRSTOR

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

FXRSTOR Restore XMM, MMX, and x87 State

Restores the XMM, MMX, and x87 state. The data loaded from memory is the stateinformation previously saved using the FXSAVE instruction. Restoring data withFXRSTOR that had been previously saved with an FSAVE (rather than FXSAVE)instruction results in an incorrect restoration.

If FXRSTOR results in set exception flags in the loaded x87 status word register, andthese exceptions are unmasked in the x87 control word register, a floating-pointexception occurs when the next floating-point instruction is executed (except for theno-wait floating-point instructions).

If the restored MXCSR register contains a set bit in an exception status flag, and thecorresponding exception mask bit is cleared (indicating an unmasked exception),loading the MXCSR register from memory does not cause a SIMD floating-pointexception (#XF).

FXRSTOR does not restore the x87 error pointers (last instruction pointer, last datapointer, and last opcode), except in the relatively rare cases in which the exception-summary (ES) bit in the x87 status word is set to 1, indicating that an unmasked x87exception has occurred.

The architecture supports two 512-bit memory formats for FXRSTOR, a 64-bit formatthat loads XMM0-XMM15, and a 32-bit legacy format that loads only XMM0-XMM7. IfFXRSTOR is executed in 64-bit mode, the 64-bit format is used, otherwise the 32-bitformat is used. When the 64-bit format is used, if the operand-size is 64-bit, FXRSTORloads the x87 pointer registers as offset64, otherwise it loads them as sel:offset32. Fordetails about the memory format used by FXRSTOR, see "Saving Media and x87Processor State" in volume 2.

If the fast-FXSAVE/FXRSTOR (FFXSR) feature is enabled in EFER, FXRSTOR doesnot restore the XMM registers (XMM0-XMM15) when executed in 64-bit mode atCPL 0. MXCSR is restored whether fast-FXSAVE/FXRSTOR is enabled or not.Software can use CPUID to determine whether the fast-FXSAVE/FXRSTOR feature isavailable. (See “CPUID” in Volume 3.)

If the operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is clearedto 0, the saved image of XMM0–XMM15 and MXCSR is not loaded into the processor.A general-protection exception occurs if there is an attempt to load a non-zero value tothe bits in MXCSR that are defined as reserved (bits 31–16).

Page 149: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

FXRSTOR 119

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

.

Related Instructions

FWAIT, FXSAVE

rFLAGS Affected

None

MXCSR Flags Affected

Exceptions

Mnemonic Opcode Description

FXRSTOR mem512env 0F AE /1 Restores XMM, MM™, and x87 state from 512-byte memory location.

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M M M M M M M M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The FXSAVE/FXRSTOR instructions are not supported, as indicated by EDX bit 24 of CPUID standard funcion 1 or extended function 8000_0001h.

X X X The emulate bit (EM) of CR0 was set to 1.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

X X X Ones were written to the reserved bits in MXCSR.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 150: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

120 FXSAVE

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

FXSAVE Save XMM, MMX, and x87 State

Saves the XMM, MMX, and x87 state. A memory location that is not aligned on a 16-byte boundary causes a general-protection exception.

Unlike FSAVE and FNSAVE, FXSAVE does not alter the x87 tag bits. The contents ofthe saved MMX/x87 data registers are retained, thus indicating that the registers maybe valid (or whatever other value the x87 tag bits indicated prior to the save). Toinvalidate the contents of the MMX/x87 data registers after FXSAVE, software mustexecute an FINIT instruction. Also, FXSAVE (like FNSAVE) does not check forpending unmasked x87 floating-point exceptions. An FWAIT instruction can be usedfor this purpose.

FXSAVE does not save the x87 pointer registers (last instruction pointer, last datapointer, and last opcode), except in the relatively rare cases in which the exception-summary (ES) bit in the x87 status word is set to 1, indicating that an unmasked x87exception has occurred.

The architecture supports two 512-bit memory formats for FXSAVE, a 64-bit formatthat saves XMM0-XMM15, and a 32-bit legacy format that saves only XMM0-XMM7. IfFXSAVE is executed in 64-bit mode, the 64-bit format is used, otherwise the 32-bitformat is used. When the 64-bit format is used, if the operand-size is 64-bit, FXSAVEsaves the x87 pointer registers as offset64, otherwise it saves them as sel:offset32. Formore details about the memory format used by FXSAVE, see “Saving Media and x87Processor State” in Volume 2.

If the fast-FXSAVE/FXRSTOR (FFXSR) feature is enabled in EFER, FXSAVE doesnot save the XMM registers (XMM0-XMM15) when executed in 64-bit mode at CPL 0.MXCSR is saved whether fast-FXSAVE/FXRSTOR is enabled or not. Software can useCPUID to determine whether the fast-FXSAVE/FXRSTOR feature is available. (See“CPUID” in Volume 3.)

If the operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is clearedto 0, FXSAVE does not save the image of XMM0–XMM15 or MXCSR. For details aboutthe CR4.OSFXSR bit, see “FXSAVE/FXRSTOR Support (OSFXSR) Bit” in Volume 2.

Related Instructions

FINIT, FNSAVE, FRSTOR, FSAVE, FXRSTOR, LDMXCSR, STMXCSR

Mnemonic Opcode Description

FXSAVE mem512env 0F AE /0 Saves XMM, MMX, and x87 state to 512-byte memory location.

Page 151: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

FXSAVE 121

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The FXSAVE/FXRSTOR instructions are not supported, as indicated by EDX bit 24 of CPUID standard function 1 or extended function 8000_0001h.

X X X The emulate bit (EM) of CR0 was set to 1.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 152: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

122 HADDPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

HADDPD Horizontal Add Packed Double

Adds the double-precision floating-point values in the high and low quadwords of thedestination operand and stores the result in the low quadword of the destinationoperand. Simultaneously, the instruction adds the double-precision floating-pointvalues in the high and low quadwords of the source operand and stores the result inthe high quadword of the destination operand.

The HADDPD instruction is an SSE3 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

HADDPS, HSUBPD, HSUBPS

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

HADDPD xmm1, xmm2/mem128 66 0F 7C /r Adds two packed double-precision values in xmm1 and stores the result in the lower 64 bits of xmm1; adds two packed double-precision values in xmm2 or a 128-bit memory operand and stores the result in the upper 64 bits of xmm1.

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

xmm1 xmm2/mem128

addadd

06364127 06364127

Page 153: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

HADDPD 123

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE3 instructions are not supported, as indicated by ECX bit 0 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was added to –infinity.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Page 154: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

124 HADDPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

HADDPS Horizontal Add Packed Single

Adds pairs of packed single-precision floating-point values simultaneously. The sum ofthe values in the first and second doublewords of the destination operand is stored inthe first doubleword of the destination operand; the sum of the values in the third andfourth doubleword of the destination operand is stored in the second doubleword ofthe destination operand; the sum of the values in the first and second doubleword ofthe source operand is stored in the third doubleword of the destination operand; andthe sum of the values in the third and fourth doubleword of the source operand isstored in the fourth doubleword of the destination operand.

The HADDPS instruction is an SSE3 instruction;. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

HADDPD, HSUBPD, HSUBPS

rFLAGS Affected

None

Mnemonic Opcode Description

HADDPS xmm1, xmm2/mem128 F2 0F 7C /r Adds the first an second packed single-precision values in xmm1 and stores the sum in xmm1[0-31]; adds the third and fourth single-precision values in xmm1 and stores the sum in xmm1[32–63]; adds the first and second packed single-precision values in xmm2 or a 128-bit memory operand and stores the sum in the xmm1[64–95]; adds the third and fourth packed single-precision values in xmm2 or a 128-bit memory operand and stores the result in xmm1[96–127].

add

xmm1 xmm2/mem128

addadd

add

06364127 319596 32 06364127 319596 32

Page 155: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

HADDPS 125

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE3 instructions are not supported, as indicated by ECX bit 0 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions below for details.

Page 156: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

126 HADDPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was added to –infinity.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 157: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

HSUBPD 127

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

HSUBPD Horizontal Subtract Packed Double

Subtracts the packed double-precision floating-point value in the upper quadword ofthe destination XMM register operand from the lower quadword of the destinationoperand and stores the result in the lower quadword of the destination operand;subtracts the value in the upper quadword of the source XMM register or 128-bitmemory operand from the value in the lower quadword of the source operand andstores the result in the upper quadword of the destination XMM register.

The HSUBPD instruction is an SSE3 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

HSUBPS, HADDPD, HADDPS

rFLAGS Affected

None

Mnemonic Opcode Description

HSUBPD xmm1, xmm2/mem128 66 0F 7D /r Subtracts the packed double-precision value in the upper 64 bits of the source register from the value in the lower 64 bits of the source register or 128-bit memory operand and stores the difference in the upper 64 bits of the destination XMM register; Subtracts the upper 64 bits of the destination register from the lower 64 bits of the destination register and stores the result in the lower 64 bits of the destination XMM register.

xmm1 xmm2/mem128

subsub

06364127 06364127

Page 158: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

128 HSUBPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE3 instructions are not supported, as indicated by ECX bit 0 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions below for details.

Page 159: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

HSUBPD 129

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was subtracted from +infinity.

X X X –infinity was subtracted from –infinity.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 160: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

130 HSUBPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

HSUBPS Horizontal Subtract Packed Single

Subtracts the packed single-precision floating-point value in the second doublewordof the destination XMM register from that in the first doubleword of the destinationregister and stores the difference in the first doubleword of the destination register;subtracts the value in the fourth doubleword of the destination register from that inthe third doubleword of the destination register and stores the result in the seconddoubleword of the destination register; subtracts the value in the second doublewordof the source XMM register or 128-bit memory operand from the first doubleword ofthe source operand and stores the result in the third doubleword of the destinationXMM register; subtracts the single-precision floating-point value in the fourthdoubleword of the source operand from the third doubleword of the source operandand stores the result in the fourth doubleword of the destination XMM register.

The HSUBPS instruction is an SSE3 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

HSUBPS xmm1, xmm2/mem128 F2 0F 7D /r Subtracts the second 32 bits of the destination operand from the first 32 bits of the destination operand and stores the difference in the first first doubleword of the destination operand; subtracts the fourth 32 bits of the destination operand from the third 32-bits of the destination operand and and stores the difference in the second doubleword of the destination operand; subtracts the second 32 bits of the source operand from the first 32 bits of the source operand and stores the difference in the third doubleword of the destination operand; subtracts the fourth 32-bits of the source operand from the third 32 bits of the source operand and stores the difference in the fourth doubleword of of the destination operand.

sub

xmm1 xmm2/mem128

subsub

sub

06364127 319596 32 06364127 319596 32

Page 161: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

HSUBPS 131

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

HSUBPD, HADDPD, HADDPS

rFLAGS Affected

None

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE3 instructions are not supported, as indicated by ECX bit 0 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions below for details.

SIMD Floating-Point Exceptions

Page 162: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

132 HSUBPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was subtracted from +infinity.

X X X –infinity was subtracted from –infinity.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 163: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

LDDQU 133

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

LDDQU Load Unaligned Double Quadword

Moves an unaligned 128-bit (double quadword) value from a 128-bit memory locationto a destination XMM register.

Like the MOVUPD instruction, the LDDQU instruction loads a 128-bit operand froman unaligned memory location. However, to improve performance when the memoryoperand is actually misaligned, LDDQU may read an aligned 16 bytes to get the firstpart of the operand, and an aligned 16 bytes to get the second part of the operand.This behavior is implementation-specific, and LDDQU may only read the exact 16bytes needed for the memory operand. If the memory operand is in a memory rangewhere reading extra bytes can cause performance or functional issues, use theMOVUPD instruction instead of LDDQU.

Memory operands that are not aligned on a 16-byte boundary do not cause a general-protection exception.

The LDDQU instruction is an SSE3 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVDQU

rFLAGS Affected

None

Mnemonic Opcode Description

LDDQU xmm1, mem128 F2 0F F0 /r Moves a 128-bit value from an unaligned 128-bit memory location to the destination XMM register.

copy

0127 0127

xmm1 mem128

Page 164: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

134 LDDQU

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE3 instructions are not supported, as indcated by ECX bit 0 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned-memory reference was performed while alignment checking was enabled.

Page 165: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

LDMXCSR 135

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

LDMXCSR Load MXCSR Control/Status Register

Loads the MXCSR register with a 32-bit value from memory. The least-significant bitof the memory location is loaded in bit 0 of MXCSR. Bits 31–16 of the MXCSR arereserved and must be zero. A general-protection exception occurs if the LDMXCSRinstruction attempts to load non-zero values into MXCSR bits 31–16.

The MXCSR register is described in “Registers” in Volume 1.

The LDMXCSR instruction is an SSE instruction; check the status of EDX bit 25returned by CPUID standard function 1 to verify that the processor supports thisfunction. (See “CPUID” in Volume 3.)

Related Instructions

STMXCSR

rFLAGS Affected

None

MXCSR Flags Affected

Exceptions

Mnemonic Opcode Description

LDMXCSR mem32 0F AE /2 Loads MXCSR register with 32-bit value in memory.

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M M M M M M M M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Page 166: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

136 LDMXCSR

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X Ones were written to the reserved bits in MXCSR.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

Exception RealVirtual8086 Protected Cause of Exception

Page 167: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MASKMOVDQU 137

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MASKMOVDQU Masked Move Double Quadword Unaligned

Stores bytes from the first source operand as selected by the sign bits in the secondsource operand (sign-bit is 0 = no write and sign-bit is 1 = write) to a memory locationspecified in the DS:rDI registers. The first source operand is an XMM register, and thesecond source operand is another XMM register. The store address may be unaligned.

A mask value of all 0s results in the following behavior:

No data is written to memory.

Code and data breakpoints are not guaranteed to be signaled in all implementa-tions.

Exceptions associated with memory addressing and page faults are not guaran-teed to be signaled in all implementations.

MASKMOVDQU implicitly uses weakly-ordered, write-combining buffering for thedata, as described in “Buffering and Combining Memory Writes” in Volume 2. Fordata that is shared by multiple processors, this instruction should be used togetherwith a fence instruction in order to ensure data coherency (refer to “Cache and TLBManagement” in Volume 2).

The MASKMOVDQU instruction is an SSE2 instruction. The presence of thisinstruction set is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

MASKMOVDQU xmm1, xmm2 66 0F F7 /r Store bytes from an XMM register selected by a mask value in another XMM register to DS:rDI.

Page 168: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

138 MASKMOVDQU

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

MASKMOVQ

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

xmm1

select

maskmovdqu.eps

127 0

select. . . . . . .. . . . . . .

. . . . . . .. . . . . . .

. . . . . . .. . . . . . .

xmm2/mem128127 0

store addressMemory

DS:rDI

Page 169: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MASKMOVDQU 139

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Exception RealVirtual8086 Protected Cause of Exception

Page 170: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

140 MAXPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MAXPD Maximum Packed Double-Precision Floating-Point

Compares each of the two packed double-precision floating-point values in the firstsource operand with the corresponding packed double-precision floating-point valuein the second source operand and writes the numerically greater of the two values foreach comparison in the corresponding quadword of the destination (first source). Thefirst source/destination operand is an XMM register. The second source operand isanother XMM register or 128-bit memory location.

If both source operands are equal to zero, the value in the second source operand isreturned. If either operand is a NaN (SNaN or QNaN), and invalid-operationexceptions are masked, the second source operand is written to the destination.

The MAXPD instructionn is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MAXPS, MAXSD, MAXSS, MINPD, MINPS, MINSD, MINSS

rFLAGS Affected

None

Mnemonic Opcode Description

MAXPD xmm1, xmm2/mem128 66 0F 5F /r Compares two pairs of packed double-precision values in an XMM register and another XMM register or 128-bit memory location and writes the greater value of each comparison in the destination XMM register.

maxpd.eps

127 63 064 127 63 064

xmm1 xmm2/mem128

maximum

maximum

Page 171: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MAXPD 141

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN or QNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 172: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

142 MAXPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MAXPS Maximum Packed Single-Precision Floating- Point

Compares each of the four packed single-precision floating-point values in the firstsource operand with the corresponding packed single-precision floating-point value inthe second source operand and writes the numerically greater of the two values foreach comparison in the corresponding doubleword of the destination (first source).The first source/destination operand is an XMM register. The second source operand isanother XMM register or 128-bit memory location.

If both source operands are equal to zero, the value in the second source operand isreturned. If either operand is a NaN (SNaN or QNaN), and invalid-operationexceptions are masked, the second source operand is written to the destination.

The MAXPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MAXPD, MAXSD, MAXSS, MINPD, MINPS, MINSD, MINSS

Mnemonic Opcode Description

MAXPS xmm1, xmm2/mem128 0F 5F /r Compares four pairs of packed single-precision values in an XMM register and another XMM register or 128-bit memory location and writes the maximum value of each comparison in the destination XMM register.

maxps.eps

xmm1 xmm2/mem128

maximum

maximum

maximum

maximum

127 63 0649596 3132127 63 0649596 3132

Page 173: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MAXPS 143

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X XA source operand was an SNaN or QNaN value.

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Page 174: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

144 MAXSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MAXSD Maximum Scalar Double-Precision Floating- Point

Compares the double-precision floating-point value in the low-order 64 bits of the firstsource operand with the double-precision floating-point value in the low-order 64 bitsof the second source operand and writes the numerically greater of the two values inthe low-order quadword of the destination (first source). The first source/destinationoperand is an XMM register. The second source operand is another XMM register or a64-bit memory location. The high-order quadword of the destination XMM register isnot modified.

If both source operands are equal to zero, the value in the second source operand isreturned. If either operand is a NaN (SNaN or QNaN), and invalid-operationexceptions are masked, the second source operand is written to the destination.

The MAXSD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MAXPD, MAXPS, MAXSS, MINPD, MINPS, MINSD, MINSS

rFLAGS Affected

None

Mnemonic Opcode Description

MAXSD xmm1, xmm2/mem64 F2 0F 5F /r Compares scalar double-precision values in an XMM register and another XMM register or 64-bit memory location and writes the greater of the two values in the destination XMM register.

maxsd.eps

xmm1 xmm2/mem64

maximum

127 63 064 127 63 064

Page 175: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MAXSD 145

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN or QNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 176: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

146 MAXSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MAXSS Maximum Scalar Single-Precision Floating- Point

Compares the single-precision floating-point value in the low-order 32 bits of the firstsource operand with the single-precision floating-point value in the low-order 32 bitsof the second source operand and writes the numerically greater of the two values inthe low-order 32 bits of the destination (first source). The first source/destinationoperand is an XMM register. The second source operand is another XMM register or a32-bit memory location. The three high-order doublewords of the destination XMMregister are not modified.

If both source operands are equal to zero, the value in the second source operand isreturned. If either operand is a NaN (SNaN or QNaN), and invalid-operationexceptions are masked, the second source operand is written to the destination.

The MAXSS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MAXPD, MAXPS, MAXSD, MINPD, MINPS, MINSD, MINSS, PFMAX

rFLAGS Affected

None

Mnemonic Opcode Description

MAXSS xmm1, xmm2/mem32 F3 0F 5F /r Compares scalar single-precision floating-point values in an XMM register and another XMM register or 32-bit memory location and writes the greater of the two values in the destination XMM register.

maxss.eps

xmm1 xmm2/mem32

maximum

127 31 032 127 31 032

Page 177: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MAXSS 147

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN or QNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 178: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

148 MINPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MINPD Minimum Packed Double-Precision Floating- Point

Compares each of the two packed double-precision floating-point values in the firstsource operand with the corresponding packed double-precision floating-point valuein the second source operand and writes the numerically lesser of the two values foreach comparison in the corresponding quadword of the destination (first source). Thefirst source/destination operand is an XMM register. The second source operand isanother XMM register or a 128-bit memory location.

If both source operands are equal to zero, the value in the second source operand isreturned. If either operand is a NaN (SNaN or QNaN), and invalid-operationexceptions are masked, the second source operand is written to the destination.

The MINPD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MAXPD, MAXPS, MAXSD, MAXSS, MINPS, MINSD, MINSS

rFLAGS Affected

None

Mnemonic Opcode Description

MINPD xmm1, xmm2/mem128 66 0F 5D /r Compares two pairs of packed double-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and writes the lesser value of each comparison in the destination XMM register.

minpd.eps

127 63 064 127 63 064

xmm1 xmm2/mem128

minimum

minimum

Page 179: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MINPD 149

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN or QNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 180: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

150 MINPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MINPS Minimum Packed Single-Precision Floating-Point

The MINPS instruction compares each of the four packed single-precision floating-point values in the first source operand with the corresponding packed single-precision floating-point value in the second source operand and writes thenumerically lesser of the two values for each comparison in the correspondingdoubleword of the destination (first source). The first source/destination operand is anXMM register. The second source operand is another XMM register or a 128-bitmemory location.

If both source operands are equal to zero, the value in the second source operand isreturned. If either operand is a NaN (SNaN or QNaN), and invalid-operationexceptions are masked, the second source operand is written to the destination.

The MINPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MAXPD, MAXPS, MAXSD, MAXSS, MINPD, MINSD, MINSS, PFMIN

Mnemonic Opcode Description

MINPS xmm1, xmm2/mem128 0F 5D /r Compares four pairs of packed single-precision values in an XMM register and another XMM register or 128-bit memory location and writes the numerically lesser value of each comparison in the destination XMM register.

minps.eps

xmm1 xmm2/mem128

minimum

minimum

minimum

minimum

127 63 0649596 3132127 63 0649596 3132

Page 181: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MINPS 151

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X XA source operand was an SNaN or QNaN value.

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Page 182: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

152 MINSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MINSD Minimum Scalar Double-Precision Floating- Point

Compares the double-precision floating-point value in the low-order 64 bits of the firstsource operand with the double-precision floating-point value in the low-order 64 bitsof the second source operand and writes the numerically lesser of the two values inthe low-order 64 bits of the destination (first source). The first source/destinationoperand is an XMM register. The second source operand is another XMM register or a64-bit memory location. The high-order quadword of the destination XMM register isnot modified.

If both source operands are equal to zero, the value in the second source operand isreturned. If either operand is a NaN (SNaN or QNaN), and invalid-operationexceptions are masked, the second source operand is written to the destination.

The MINSD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MAXPD, MAXPS, MAXSD, MAXSS, MINPD, MINPS, MINSS

rFLAGS Affected

None

Mnemonic Opcode Description

MINSD xmm1, xmm2/mem64 F2 0F 5D /r Compares scalar double-precision floating-point values in an XMM register and another XMM register or 64-bit memory location and writes the lesser of the two values in the destination XMM register.

minsd.eps

xmm1 xmm2/mem64

minimum

127 63 064 127 63 064

Page 183: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MINSD 153

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN or QNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 184: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

154 MINSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MINSS Minimum Scalar Single-Precision Floating-Point

Compares the single-precision floating-point value in the low-order 32 bits of the firstsource operand with the single-precision floating-point value in the low-order 32 bitsof the second source operand and writes the numerically lesser of the two values inthe low-order 32 bits of the destination (first source). The first source/destinationoperand is an XMM register. The second source operand is another XMM register or a32-bit memory location. The three high-order doublewords of the destination XMMregister are not modified.

If both source operands are equal to zero, the value in the second source operand isreturned. If either operand is a NaN (SNaN or QNaN), and invalid-operationexceptions are masked, the second source operand is written to the destination.

The MINSS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MAXPD, MAXPS, MAXSD, MAXSS, MINPD, MINPS, MINSD

rFLAGS Affected

None

Mnemonic Opcode Description

MINSS xmm1, xmm2/mem32 F3 0F 5D /r Compares scalar single-precision floating-point values in an XMM register and another XMM register or 32-bit memory location and writes the lesser of the two values in the destination XMM register.

minss.eps

xmm1 xmm2/mem32

minimum

127 31 032 127 31 032

Page 185: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MINSS 155

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN or QNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 186: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

156 MOVAPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVAPD Move Aligned Packed Double-Precision Floating-Point

Moves two packed double-precision floating-point values:

from an XMM register or 128-bit memory location to another XMM register, or

from an XMM register to another XMM register or 128-bit memory location.

A memory operand that is not aligned on a 16-byte boundary causes a general-protection exception.

The MOVAPD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

MOVAPD xmm1, xmm2/mem128 66 0F 28 /r Moves packed double-precision floating-point value from an XMM register or 128-bit memory location to an XMM register.

MOVAPD xmm1/mem128, xmm2 66 0F 29 /r Moves packed double-precision floating-point value from an XMM register to an XMM register or 128-bit memory location.

movapd.eps

127 63 064

xmm1 xmm2/mem128

copycopy

127 63 064

127 63 064

xmm1/mem128 xmm2

copycopy

127 63 064

Page 187: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVAPD 157

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

MOVHPD, MOVLPD, MOVMSKPD, MOVSD, MOVUPD

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X XThe task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 188: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

158 MOVAPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVAPS Move Aligned Packed Single-PrecisionFloating-Point

Moves four packed single-precision floating-point values:

from an XMM register or 128-bit memory location to another XMM register, or

from an XMM register to another XMM register or 128-bit memory location.

The MOVAPS instruction is an SSE instruction; check the status of EDX bit 25returned by CPUID standard function 1 to verify that the processor supports thisfunction. (See “CPUID” in Volume 3.)

A memory operand that is not aligned on a 16-byte boundary causes a general-protection exception.

Mnemonic Opcode Description

MOVAPS xmm1, xmm2/mem128 0F 28 /r Moves aligned packed single-precision floating-point value from an XMM register or 128-bit memory location to the destination XMM register.

MOVAPS xmm1/mem128, xmm2 0F 29 /r Moves aligned packed single-precision floating-point value from an XMM register to the destination XMM register or 128-bit memory location.

Page 189: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVAPS 159

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

MOVHLPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVSS, MOVUPS

rFLAGS Affected

None

MXCSR Flags Affected

None

movaps.eps

xmm1 xmm2/mem128

copy

copy

copy

copy

127 63 0649596 3132127 63 0649596 3132

xmm1/mem128 xmm2

copy

copy

copy

copy

127 63 0649596 3132127 63 0649596 3132

Page 190: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

160 MOVAPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 191: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVD 161

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MOVD Move Doubleword or Quadword

Moves a 32-bit or 64-bit value in one of the following ways:

from a 32-bit or 64-bit general-purpose register or memory location to the low-order 32 or 64 bits of an XMM register, with zero-extension to 128 bits

from the low-order 32 or 64 bits of an XMM to a 32-bit or 64-bit general-purpose register or memory location

from a 32-bit or 64-bit general-purpose register or memory location to the low-order 32 bits (with zero-extension to 64 bits) or the full 64 bits of an MMX register

from the low-order 32 or the full 64 bits of an MMX register to a 32-bit or 64-bit general-purpose register or memory location

The MOVD instruction is an MMX instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

MOVD xmm, reg/mem32 66 0F 6E /r Move 32-bit value from a general-purpose register or 32-bit memory location to an XMM register.

MOVD xmm, reg/mem64 66 0F 6E /r Move 64-bit value from a general-purpose register or 64-bit memory location to an XMM register.

MOVD reg/mem32, xmm 66 0F 7E /r Move 32-bit value from an XMM register to a 32-bit general-purpose register or memory location.

MOVD reg/mem64, xmm 66 0F 7E /r Move 64-bit value from an XMM register to a 64-bit general-purpose register or memory location.

Page 192: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

162 MOVD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

movd.eps

with REX prefix

All operationsare "copy"

with REX prefix

reg/mem64xmm

63 0

63 0

127 63 064

127 63 064

reg/mem64 xmm

0

031

reg/mem32xmm

reg/mem32 xmm

127 0313231 0

127 31 032

0

0

reg/mem64mmx

reg/mem64 mmx

0

with REX prefix

with REX prefix

63 063 0

63 063 0

0310

reg/mem32mmx

reg/mem32 mmx

31 0

313263 0

313263 0

0

Page 193: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVD 163

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

MOVDQA, MOVDQU, MOVDQ2Q, MOVQ, MOVQ2DQ

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions (All Modes)

Exception RealVirtual8086 Protected Description

Invalid opcode, #UD

X X X The MMX™ instructions are not supported, as indicated by EDX bit 23 of CPUID standard function 1.

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The instruction used XMM registers while CR4.OSFXSR=0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was non-canonical.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

x87 floating-point exception pending, #MF

X X X An x87 floating-point exception was pending and the instruction referenced an MMX register.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

Page 194: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

164 MOVDDUP

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVDDUP Move Double-Precision and Duplicate

Moves a quadword value with its duplicate from the source operand to each quadwordhalf of the XMM destination operand. The source operand may be an XMM register orthe address of the least-significant byte of 64 bits of data in memory. When an XMMregister is specified as the source operand, the lower 64-bits are duplicated andcopied. When a memory address is specified, the 8 bytes of data at mem64 areduplicated and loaded.

The MOVDDUP instruction is an SSE3 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

Related Instructions

MOVSHDUP, MOVSLDUP

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

MOVDDUP xmm1, xmm2/mem64 F2 0F 12 /r Moves two copies of the lower 64 bits of the source XMM or 128-bit memory operand to the lower and upper quadwords of the destination XMM register.

xmm1 xmm2/mem64

06306364127 64128

Page 195: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVDDUP 165

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE3 instructions are not supported, as indicated by ECX bit 0 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #ACX X An unaligned memory reference was performed while

alignment checking was enabled.

Page 196: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

166 MOVDQ2Q

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVDQ2Q Move Quadword to Quadword

Moves the low-order 64-bit value in an XMM register to a 64-bit MMX register.

The MOVDQ2Q instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVD, MOVDQA, MOVDQU, MOVQ, MOVQ2DQ

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Mnemonic Opcode Description

MOVDQ2Q mmx, xmm F2 0F D6 /r Moves low-order 64-bit value from an XMM register to the destination MMX register.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

movdq2q.eps

mmx xmm

copy

63 0 127 63 064

Page 197: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVDQ2Q 167

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

General protection X The destination operand was in a non-writable segment.

x87 floating-point exception pending, #MF

X X X An exception was pending due to an x87 floating-point instruction.

Exception RealVirtual8086 Protected Cause of Exception

Page 198: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

168 MOVDQA

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVDQA Move Aligned Double Quadword

Moves an aligned 128-bit (double quadword) value:

from an XMM register or 128-bit memory location to another XMM register, or

from an XMM register to a 128-bit memory location or another XMM register.

A memory operand that is not aligned on a 16-byte boundary causes a general-protection exception.

The MOVDQA instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVD, MOVDQU, MOVDQ2Q, MOVQ, MOVQ2DQ

Mnemonic Opcode Description

MOVDQA xmm1, xmm2/mem128 66 0F 6F /r Moves 128-bit value from an XMM register or 128-bit memory location to the destination XMM register.

MOVDQA xmm1/mem128, xmm2 66 0F 7F /r Moves 128-bit value from an XMM register to the destination XMM register or 128-bit memory location.

movdqa.eps

xmm1 xmm2/mem128

copy

127 0127 0

xmm1/mem128 xmm2

copy

127 0127 0

Page 199: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVDQA 169

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 200: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

170 MOVDQU

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVDQU Move Unaligned Double Quadword

Moves an unaligned 128-bit (double quadword) value:

from an XMM register or 128-bit memory location to another XMM register, or

from an XMM register to another XMM register or 128-bit memory location.

Memory operands that are not aligned on a 16-byte boundary do not cause a general-protection exception.

The MOVDQU instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVD, MOVDQA, MOVDQ2Q, MOVQ, MOVQ2DQ

Mnemonic Opcode Description

MOVDQU xmm1, xmm2/mem128 F3 0F 6F /r Moves 128-bit value from an XMM register or unaligned 128-bit memory location to the destination XMM register.

MOVDQU xmm1/mem128, xmm2 F3 0F 7F /r Moves 128-bit value from an XMM register to the destination XMM register or unaligned 128-bit memory location.

movdqu.eps

xmm1 xmm2/mem128

copy

127 0127 0

xmm1/mem128 xmm2

copy

127 0127 0

Page 201: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVDQU 171

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indcated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned-memory reference was performed while alignment checking was enabled.

Page 202: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

172 MOVHLPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVHLPS Move Packed Single-Precision Floating-PointHigh to Low

Moves two packed single-precision floating-point values from the high-order 64 bits ofan XMM register to the low-order 64 bits of another XMM register. The high-order 64bits of the destination XMM register are not modified.

The MOVHLPS instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVAPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVSS, MOVUPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

MOVHLPS xmm1, xmm2 0F 12 /r Moves two packed single-precision floating-point values from an XMM register to the destination XMM register.

127 63 0649596

xmm1 xmm2

copy copy

movhlps.eps

127 63 064 3132

Page 203: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVHLPS 173

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Page 204: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

174 MOVHPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVHPD Move High Packed Double-PrecisionFloating-Point

Moves a double-precision floating-point value:

from a 64-bit memory location to the high-order 64 bits of an XMM register, or

from the high-order 64 bits of an XMM register to a 64-bit memory location.

The low-order 64 bits of the destination XMM register are not modified.

The MOVHPD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVAPD, MOVLPD, MOVMSKPD, MOVSD, MOVUPD

Mnemonic Opcode Description

MOVHPD xmm, mem64 66 0F 16 /r Moves double-precision floating-point value from a 64-bit memory location to an XMM register.

MOVHPD mem64, xmm 66 0F 17 /r Moves double-precision floating-point value from an XMM register to a 64-bit memory location.

127 63 064

xmm mem64

copy

63 0

movhpd.eps

127 31 032

xmmmem64

copy

63 0

Page 205: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVHPD 175

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

Page 206: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

176 MOVHPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVHPS Move High Packed Single-Precision Floating-Point

Moves two packed single-precision floating-point values:

from a 64-bit memory location to the high-order 64 bits of an XMM register, or

from the high-order 64 bits of an XMM register to a 64-bit memory location.

The low-order 64 bits of the destination XMM register are not modified.

The MOVHPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVAPS, MOVHLPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVSS, MOVUPS

Mnemonic Opcode Description

MOVHPS xmm, mem64 0F 16 /r Moves two packed single-precision floating-point values from a 64-bit memory location to an XMM register.

MOVHPS mem64, xmm 0F 17 /r Moves two packed single-precision floating-point values from an XMM register to a 64-bit memory location.

127 63 0649596

xmm mem64

copy copy

63 03132

movhps.eps

xmmmem64

copy copy

63 03132 127 63 0649596

Page 207: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVHPS 177

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

Page 208: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

178 MOVLHPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVLHPS Move Packed Single-Precision Floating-PointLow to High

Moves two packed single-precision floating-point values from the low-order 64 bits ofan XMM register to the high-order 64 bits of another XMM register. The low-order 64bits of the destination XMM register are not modified.

The MOVLHPS instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVAPS, MOVHLPS, MOVHPS, MOVLPS, MOVMSKPS, MOVSS, MOVUPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

MOVLHPS xmm1, xmm2 0F 16 /r Moves two packed single-precision floating-point values from an XMM register to another XMM register.

127 63 0649596

xmm1 xmm2

copy copy

movlhps.eps

127 63 064 3132

Page 209: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVLHPS 179

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Page 210: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

180 MOVLPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVLPD Move Low Packed Double-PrecisionFloating-Point

Moves a double-precision floating-point value:

from a 64-bit memory location to the low-order 64 bits of an XMM register, or

from the low-order 64 bits of an XMM register to a 64-bit memory location.

The high-order 64 bits of the destination XMM register are not modified.

The MOVLPD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVAPD, MOVHPD, MOVMSKPD, MOVSD, MOVUPD

Mnemonic Opcode Description

MOVLPD xmm, mem64 66 0F 12 /r Moves double-precision floating-point value from a 64-bit memory location to an XMM register.

MOVLPD mem64, xmm 66 0F 13 /r Moves double-precision floating-point value from an XMM register to a 64-bit memory location.

127 63 064

xmm mem64

copy

63 0

movlpd.eps

xmmmem64

copy

63 0 127 63 064

Page 211: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVLPD 181

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

Page 212: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

182 MOVLPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVLPS Move Low Packed Single-PrecisionFloating-Point

Moves two packed single-precision floating-point values:

from a 64-bit memory location to the low-order 64 bits of an XMM register, or

from the low-order 64 bits of an XMM register to a 64-bit memory location

The high-order 64 bits of the destination XMM register are not modified.

The MOVLPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

MOVLPS xmm, mem64 0F 12 /r Moves two packed single-precision floating-point values from a 64-bit memory location to an XMM register.

MOVLPS mem64, xmm 0F 13 /r Moves two packed single-precision floating-point values from an XMM register to a 64-bit memory location.

127 63 064 3132

xmm mem64

copy copy

63 03132

movlps.eps

xmmmem64

copy copy

63 03132 127 63 064 3132

Page 213: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVLPS 183

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

MOVAPS, MOVHLPS, MOVHPS, MOVLHPS, MOVMSKPS, MOVSS, MOVUPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of the control register (CR4) was cleared to 0.

Device not available, #NM

X X XThe task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #ACX X An unaligned memory reference was performed while

alignment checking was enabled.

Page 214: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

184 MOVMSKPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVMSKPD Extract Packed Double-Precision Floating-Point Sign Mask

Moves the sign bits of two packed double-precision floating-point values in an XMMregister to the two low-order bits of a 32-bit general-purpose register, with zero-extension.

The MOVMSKPD instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVMSKPS, PMOVMSKB

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

MOVMSKPD reg32, xmm 66 0F 50 /r Move sign bits in an XMM register to a 32-bit general-purpose register.

movmskpd.eps

reg32 xmm

copy signcopy sign

127 63 00

0

131

Page 215: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVMSKPD 185

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception (vector) RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Page 216: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

186 MOVMSKPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVMSKPS Extract Packed Single-Precision Floating-Point Sign Mask

Moves the sign bits of four packed single-precision floating-point values in an XMMregister to the four low-order bits of a 32-bit general-purpose register, with zero-extension.

The MOVMSKPS instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVMSKPD, PMOVMSKB

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

MOVMSKPS reg32, xmm 0F 50 /r Move sign bits in an XMM register to a 32-bit general-purpose register.

movmskps.eps

03 127 63 095 31

reg32 xmm

copy signcopy signcopy signcopy sign

0

31

Page 217: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVMSKPS 187

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Page 218: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

188 MOVNTDQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVNTDQ Move Non-Temporal Double Quadword

Stores a 128-bit (double quadword) XMM register value into a 128-bit memorylocation. This instruction indicates to the processor that the data is non-temporal, andis unlikely to be used again soon. The processor treats the store as a write-combining(WC) memory write, which minimizes cache pollution. The exact method by whichcache pollution is minimized depends on the hardware implementation of theinstruction. For further information, see “Memory Optimization” in Volume 1.

MOVNTDQ is weakly-ordered with respect to other instructions that operate onmemory. Software should use an SFENCE instruction to force strong memory orderingof MOVNTDQ with respect to other stores.

The MOVNTDQ instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVNTI, MOVNTPD, MOVNTPS, MOVNTQ

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

MOVNTDQ mem128, xmm 66 0F E7 /r Stores a 128-bit XMM register value into a 128-bit memory location, minimizing cache pollution.

movntdq.eps

mem128 xmm

copy

127 0127 0

Page 219: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVNTDQ 189

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (CR0.EM) was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (CR4.OSFXSR) was cleared to 0.

Device not available, #NM

X X X The task-switch bit (CR0.TS) was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from executing the instruction.

Page 220: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

190 MOVNTPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVNTPD Move Non-Temporal Packed Double-PrecisionFloating-Point

Stores two double-precision floating-point XMM register values into a 128-bit memorylocation. This instruction indicates to the processor that the data is non-temporal, andis unlikely to be used again soon. The processor treats the store as a write-combining(WC) memory write, which minimizes cache pollution. The exact method by whichcache pollution is minimized depends on the hardware implementation of theinstruction. For further information, see “Memory Optimization” in Volume 1.

The MOVNTPD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

MOVNTPD is weakly-ordered with respect to other instructions that operate onmemory. Software should use an SFENCE instruction to force strong memory orderingof MOVNTPD with respect to other stores.

Related Instructions

MOVNTDQ, MOVNTI, MOVNTPS, MOVNTQ

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

MOVNTPD mem128, xmm 66 0F 2B /r Stores two packed double-precision floating-point XMM register values into a 128-bit memory location, minimizing cache pollution.

movntpd.eps

mem128 xmm

copy copy

127 63 064 127 63 064

Page 221: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVNTPD 191

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (CR0.EM) was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (CR4.OSFXSR) was cleared to 0.

Device not available, #NM

X X XThe task-switch bit (CR0.TS) was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from executing the instruction.

Page 222: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

192 MOVNTPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVNTPS Move Non-Temporal PackedSingle-Precision Floating-Point

Stores four single-precision floating-point XMM register values into a 128-bit memorylocation. This instruction indicates to the processor that the data is non-temporal, andis unlikely to be used again soon. The processor treats the store as a write-combining(WC) memory write, which minimizes cache pollution. The exact method by whichcache pollution is minimized depends on the hardware implementation of theinstruction. For further information, see “Memory Optimization” in Volume 1.

The MOVNTPS instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

MOVNTPD is weakly-ordered with respect to other instructions that operate onmemory. Software should use an SFENCE instruction to force strong memory orderingof MOVNTPD with respect to other stores.

Related Instructions

MOVNTDQ, MOVNTI, MOVNTPD, MOVNTQ

rFLAGS Affected

None

Mnemonic Opcode Description

MOVNTPS mem128, xmm 0F 2B /r Stores four packed single-precision floating-point XMM register values into a 128-bit memory location, minimizing cache pollution.

movntps.eps

mem128 xmm

copycopy

copycopy

127 63 0649596 3132127 63 0649596 3132

Page 223: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVNTPS 193

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (CR0.EM) was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (CR4.OSFXSR) was cleared to 0.

Device not available, #NM

X X X The task-switch bit (CR0.TS) was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from executing the instruction.

Page 224: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

194 MOVQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVQ Move Quadword

Moves a 64-bit value in one of the following ways:

from the low-order 64 bits of an XMM register or a 64-bit memory location to the low-order 64 bits of another XMM register, with zero-extension to 128 bits

from the low-order 64 bits of an XMM register to the low-order 64 bits of another XMM register, with zero-extension to 128 bits or to a 64-bit memory location

The MOVQ instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVD, MOVDQA, MOVDQU, MOVDQ2Q, MOVQ2DQ

rFLAGS Affected

None

Mnemonic Opcode Description

MOVQ xmm1, xmm2/mem64 F3 0F 7E /r Moves 64-bit value from an XMM register or memory location to an XMM register.

MOVQ xmm1/mem64, xmm2 66 0F D6 /r Moves 64-bit value from an XMM register to an XMM register or memory location.

xmm1 xmm2/mem64

copy

movq-128.eps

xmm2xmm1/mem64

copy

127 63 064127

0

63 064

127 63 064127

0

63 064

Page 225: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVQ 195

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

Page 226: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

196 MOVQ2DQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVQ2DQ Move Quadword to Quadword

Moves a 64-bit value from an MMX register to the low-order 64 bits of an XMMregister, with zero-extension to 128 bits.

The MOVQ2DQ instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVD, MOVDQA, MOVDQU, MOVDQ2Q, MOVQ

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

MOVQ2DQ xmm, mmx F3 0F D6 /r Moves 64-bit value from an MMX™ register to an XMM register.

127 63 064

xmm mmx

copy

63 0

movq2dq.eps

0

Page 227: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVQ2DQ 197

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

x87 floating-point exception pending, #MF

X X X An exception was pending due to an x87 floating-point instruction.

Page 228: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

198 MOVSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVSD Move Scalar Double-Precision Floating-Point

Moves a scalar double-precision floating-point value:

from the low-order 64 bits of an XMM register or a 64-bit memory location to the low-order 64 bits of another XMM register, or

from the low-order 64 bits of an XMM register to the low-order 64 bits of another XMM register or a 64-bit memory location.

If the source operand is an XMM register, the high-order 64 bits of the destinationXMM register are not modified. If the source operand is a memory location, the high-order 64 bits of the destination XMM register are cleared to all 0s.

This MOVSD instruction should not be confused with the MOVSD (move stringdoubleword) instruction with the same mnemonic in the general-purpose instructionset. Assemblers can distinguish the instructions by the number and type of operands.

The MOVSD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

MOVSD xmm1, xmm2/mem64 F2 0F 10 /r Moves double-precision floating-point value from an XMM register or 64-bit memory location to an XMM register.

MOVSD xmm1/mem64, xmm2 F2 0F 11 /r Moves double-precision floating-point value from an XMM register to an XMM register or 64-bit memory location.

Page 229: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVSD 199

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

MOVAPD, MOVHPD, MOVLPD, MOVMSKPD, MOVUPD

rFLAGS Affected

None.

MXCSR Flags Affected

None.

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

xmm1 xmm2

copy

mem64xmm1

copy

63 0127

0

63 064

movsd.eps

xmm2mem64

copy

127 63 06463 0

127 63 064127 63 064

Page 230: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

200 MOVSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

Exception RealVirtual8086 Protected Cause of Exception

Page 231: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVSHDUP 201

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MOVSHDUP Move Single-Precision High and Duplicate

Moves two copies of the second doubleword of data in the source XMM register or 128-bit memory operand to bits 31–0 and bits 63–32 of the destination XMM register;moves two copies of the fourth doubleword of data in the source operand to bits 95–64and bits 127–96 of the destination XMM register.

The MOVSHDUP instruction is an SSE3 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVDDUP, MOVSLDUP

rFLAGS Affected

None.

MXCSR Flags Affected

None.

Mnemonic Opcode Description

MOVSHDUP xmm1, xmm2/mem128 F3 0F 16 /r Copies the second 32-bits from the source operand to the first and second 32-bit segments of the destination XMM register; copies the fourth 32-bits from the source operand to the third and fourth 32-bit segments of the destination XMM register.

xmm1 xmm2/mem128

03132636495961270313263649596127

Page 232: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

202 MOVSHDUP

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE3 instructions are not supported, as indicated by ECX bit 0 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 233: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVSLDUP 203

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MOVSLDUP Move Single-Precision Low and Duplicate

Moves two copies of the first doubleword of data in the source XMM register or 128-bitmemory operand to bits 31–0 and bits 32–63 of the destination XMM register andmoves two copies of the third doubleword of data in the source operand to bits 95–64and bits 127–96 of the destination XMM register.

The MOVSLDUP instruction is an SSE3 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVDDUP, MOVSHDUP

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

MOVSLDUP xmm1, xmm2/mem128 F3 0F 12 /r Copies the first 32-bits from the source operand to the first and second 32-bit segments of the destination XMM register; copies the third 32-bits from the source operand to the third and fourth 32-bit segments of the destination XMM register.

xmm1 xmm2/mem128

03132636495961270313263649596127

Page 234: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

204 MOVSLDUP

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE3 instructions are not supported, as indicated by ECX bit 0 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 235: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVSS 205

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MOVSS Move Scalar Single-Precision Floating-Point

Moves a scalar single-precision floating-point value:

from the low-order 32 bits of an XMM register or a 32-bit memory location to the low-order 32 bits of another XMM register, or

from a 32-bit memory location to the low-order 32 bits of an XMM register, with zero-extension to 128 bits.

If the source operand is an XMM register, the high-order 96 bits of the destinationXMM register are not modified. If the source operand is a memory location, the high-order 96 bits of the destination XMM register are cleared to all 0s.

The MOVSS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

MOVSS xmm1, xmm2/mem32 F3 0F 10 /r Moves single-precision floating-point value from an XMM register or 32-bit memory location to an XMM register.

MOVSS xmm1/mem32, xmm2 F3 0F 11 /r Moves single-precision floating-point value from an XMM register to an XMM register or 32-bit memory location.

Page 236: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

206 MOVSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

MOVAPS, MOVHLPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVUPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

127 31 032

xmm1 xmm2

copy

mem32xmm1

copy

0

0

movss.eps

xmm2mem32

copy

127 31 032

31 0127 31 032

31 0 127 31 032

Page 237: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVSS 207

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

Exception RealVirtual8086 Protected Cause of Exception

Page 238: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

208 MOVUPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MOVUPD Move Unaligned Packed Double-PrecisionFloating-Point

Moves two packed double-precision floating-point values:

from an XMM register or 128-bit memory location to another XMM register, or

from an XMM register to another XMM register or 128-bit memory location.

Memory operands that are not aligned on a 16-byte boundary do not cause a general-protection exception.

The MOVUPD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

MOVUPD xmm1, xmm2/mem128 66 0F 10 /r Moves two packed double-precision floating-point values from an XMM register or unaligned 128-bit memory location to an XMM register.

MOVUPD xmm1/mem128, xmm2 66 0F 11 /r Moves two packed double-precision floating-point values from an XMM register to an XMM register or unaligned 128-bit memory location.

Page 239: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVUPD 209

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

MOVAPD, MOVHPD, MOVLPD, MOVMSKPD, MOVSD

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

127 63 064

xmm1 xmm2/mem28

copycopy

127 63 064

movupd.eps

127 63 064

xmm1/mem28 xmm2

copycopy

127 6364

Page 240: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

210 MOVUPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned-memory reference was performed while alignment checking was enabled.

Exception RealVirtual8086 Protected Cause of Exception

Page 241: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVUPS 211

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MOVUPS Move Unaligned Packed Single-Precision Floating-Point

Moves four packed single-precision floating-point values:

from an XMM register or 128-bit memory location to another XMM register, or

from an XMM register to another XMM register or 128-bit memory location.

Memory operands that are not aligned on a 16-byte boundary do not cause a general-protection exception.

The MOVUPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

MOVUPS xmm1, xmm2/mem128 0F 10 /r Moves four packed single-precision floating-point values from an XMM register or unaligned 128-bit memory location to an XMM register.

MOVUPS xmm1/mem128, xmm2 0F 11 /r Moves four packed single-precision floating-point values from an XMM register to an XMM register or unaligned 128-bit memory location.

Page 242: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

212 MOVUPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

MOVAPS, MOVHLPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS, MOVSS

rFLAGS Affected

None

MXCSR Flags Affected

None

movups.eps

xmm1 xmm2/mem128

copy

copy

copy

copy

127 63 0649596 3132127 63 0649596 3132

xmm1/mem128 xmm2

copy

copy

copy

copy

127 63 0649596 3132127 63 0649596 3132

Page 243: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MOVUPS 213

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #ACX X An unaligned-memory reference was performed while

alignment checking was enabled.

Page 244: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

214 MULPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MULPD Multiply Packed Double-Precision Floating-Point

Multiplies each of the two packed double-precision floating-point values in the firstsource operand by the corresponding packed double-precision floating-point value inthe second source operand and writes the result of each multiplication operation inthe corresponding quadword of the destination (first source). The firstsource/destination operand is an XMM register. The second source operand is anotherXMM register or 128-bit memory location.

The MULPD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MULPS, MULSD, MULSS, PFMUL

rFLAGS Affected

None

Mnemonic Opcode Description

MULPD xmm1, xmm2/mem128 66 0F 59 /r Multiplies packed double-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and writes the results in the destination XMM register.

mulpd.eps

127 63 064 127 63 064

xmm1 xmm2/mem128

multiply

multiply

Page 245: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MULPD 215

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X ±Zero was multiplied by ±infinity.

Overflow exception (OE)X X X A rounded result was too large to fit into the format of the

destination operand.

Page 246: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

216 MULPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 247: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MULPS 217

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MULPS Multiply Packed Single-Precision Floating-Point

Multiplies each of the four packed single-precision floating-point values in first sourceoperand by the corresponding packed single-precision floating-point value in thesecond source operand and writes the result of each multiplication operation in thecorresponding doubleword of the dest inat ion ( f irst source) . The f irstsource/destination operand is an XMM register. The second source operand is anotherXMM register or 128-bit memory location.

The MULPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MULPD, MULSD, MULSS, PFMUL

rFLAGS Affected

None

Mnemonic Opcode Description

MULPS xmm1, xmm2/mem128 0F 59 /r Multiplies packed single-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and writes the results in the destination XMM register.

mulps.eps

xmm1 xmm2/mem128

multiply

multiply

multiply

multiply

127 63 0649596 3132127 63 0649596 3132

Page 248: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

218 MULPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X ±Zero was multipled by ±infinity.

Overflow exception (OE)X X X A rounded result was too large to fit into the format of the

destination operand.

Page 249: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MULPS 219

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 250: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

220 MULSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MULSD Multiply Scalar Double-Precision Floating-Point

Multiplies the double-precision floating-point value in the low-order quadword of firstsource operand by the double-precision floating-point value in the low-orderquadword of the second source operand and writes the result in the low-orderquadword of the destination (first source). The high-order quadword of thedestination is not modified. The first source/destination operand is an XMM register.The second source operand is another XMM register or 64-bit memory location.

The MULSD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MULPD, MULPS, MULSS, PFMUL

rFLAGS Affected

None

Mnemonic Opcode Description

MULSD xmm1, xmm2/mem64 F2 0F 59 /r Multiplies low-order double-precision floating-point values in an XMM register and another XMM register or 64-bit memory location and writes the result in the low-order quadword of the destination XMM register.

mulsd.eps

xmm1 xmm2/mem64

multiply

127 63 064 127 63 064

Page 251: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MULSD 221

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X ±Zero was multipled by ±infinity.

Overflow exception (OE)X X X A rounded result was too large to fit into the format of the

destination operand.

Page 252: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

222 MULSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 253: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MULSS 223

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MULSS Multiply Scalar Single-Precision Floating-Point

Multiplies the single-precision floating-point value in the low-order doubleword offirst source operand by the single-precision floating-point value in the low-orderdoubleword of the second source operand and writes the result in the low-orderdoubleword of the destination (first source). The three high-order doublewords of thedestination are not modified. The first source/destination operand is an XMM register.The second source operand is another XMM register or 32-bit memory location.

The MULSS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MULPD, MULPS, MULSD, PFMUL

rFLAGS Affected

None

Mnemonic Opcode Description

MULSS xmm1, xmm2/mem32 F3 0F 59 /r Multiplies low-order single-precision floating-point values in an XMM register and another XMM register or 32-bit memory location and writes the result in the low-order doubleword of the destination XMM register.

mulss.eps

xmm1 xmm2/mem32

multiply

127 31 032 127 31 032

Page 254: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

224 MULSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X ±Zero was multipled by ±infinity.

Overflow exception (OE)X X X A rounded result was too large to fit into the format of the

destination operand.

Page 255: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

MULSS 225

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 256: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

226 ORPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

ORPD Logical Bitwise ORPacked Double-Precision Floating-Point

Performs a bitwise logical OR of the two packed double-precision floating-point valuesin the first source operand and the corresponding two packed double-precisionfloating-point values in the second source operand and writes the result in thedestination (first source). The first source/destination operand is an XMM register.The second source operand is another XMM register or 128-bit memory location.

The ORPD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ANDNPD, ANDNPS, ANDPD, ANDPS, ORPS, XORPD, XORPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

ORPD xmm1, xmm2/mem128 66 0F 56 /r Performs bitwise logical OR of two packed double-precision floating-point values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.

orpd.eps

127 63 064 127 63 064

xmm1 xmm2/mem128

OR

OR

Page 257: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ORPD 227

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 258: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

228 ORPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

ORPS Logical Bitwise ORPacked Single-Precision Floating-Point

Performs a bitwise logical OR of the four packed single-precision floating-point valuesin the first source operand and the corresponding four packed single-precisionfloating-point values in the second source operand and writes the result in thedestination (first source). The first source/destination operand is an XMM register.The second source operand is another XMM register or 128-bit memory location.

The ORPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ANDNPD, ANDNPS, ANDPD, ANDPS, ORPD, XORPD, XORPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

ORPS xmm1, xmm2/mem128 0F 56 /r Performs bitwise logical OR of four packed single-precision floating-point values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.

orps.eps

xmm1 xmm2/mem128

OR

OR

OR

OR

127 63 0649596 3132127 63 0649596 3132

Page 259: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

ORPS 229

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 260: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

230 PACKSSDW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PACKSSDW Pack with Saturation Signed Doubleword to Word

Converts each 32-bit signed integer in the first and second source operands to a 16-bitsigned integer and packs the converted values into words in the destination (firstsource). The first source/destination operand is an XMM register and the secondsource operand is another XMM register or 128-bit memory location.

Converted values from the first source operand are packed into the low-order words ofthe destination, and the converted values from the second source operand are packedinto the high-order words of the destination.

The PACKSSDW instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

For each packed value in the destination, if the value is larger than the largest signed16-bit integer, it is saturated to 7FFFh, and if the value is smaller than the smallestsigned 16-bit integer, it is saturated to 8000h.

Related Instructions

PACKSSWB, PACKUSWB

Mnemonic Opcode Description

PACKSSDW xmm1, xmm2/mem128 66 0F 6B /r Packs 32-bit signed integers in an XMM register and another XMM register or 128-bit memory location into 16-bit signed integers in an XMM register.

127 63 0649596111112 7980 4748 15163132

xmm1 xmm2/mem128

packssdw-128.eps

convert convert convertconvert

127 63 0649596 3132127 63 0649596 3132

. . .

....

.

Page 261: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PACKSSDW 231

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 262: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

232 PACKSSWB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PACKSSWB Pack with Saturation Signed Word to Byte

Converts each 16-bit signed integer in the first and second source operands to an 8-bitsigned integer and packs the converted values into bytes in the destination (firstsource). The first source/destination operand is an XMM register and the secondsource operand is another XMM register or 128-bit memory location.

Converted values from the first source operand are packed into the low-order bytes ofthe destination, and the converted values from the second source operand are packedinto the high-order bytes of the destination.

The PACKSSWB instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

For each packed value in the destination, if the value is larger than the largest signed8-bit integer, it is saturated to 7Fh, and if the value is smaller than the smallest signed8-bit integer, it is saturated to 80h.

Related Instructions

PACKSSDW, PACKUSWB

Mnemonic Opcode Description

PACKSSWB xmm1, xmm2/mem128 66 0F 63 /r Packs 16-bit signed integers in an XMM register and another XMM register or 128-bit memory location into 8-bit signed integers in an XMM register.

packsswb-128.eps

......

... ... ... ...

. .. ...

xmm1 xmm2/mem128

convertconvert

127 064 63

127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 263: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PACKSSWB 233

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 264: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

234 PACKUSWB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PACKUSWB Pack with Saturation Signed Word to Unsigned Byte

Converts each 16-bit signed integer in the first and second source operands to an 8-bitunsigned integer and packs the converted values into bytes in the destination (firstsource). The first source/destination operand is an XMM register and the secondsource operand is another XMM register or 128-bit memory location.

Converted values from the first source operand are packed into the low-order bytes ofthe destination, and the converted values from the second source operand are packedinto the high-order bytes of the destination.

The PACKUSWB instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

For each packed value in the destination, if the value is larger than the largestunsigned 8-bit integer, it is saturated to FFh, and if the value is smaller than thesmallest unsigned 8-bit integer, it is saturated to 00h.

Related Instructions

PACKSSDW, PACKSSWB

Mnemonic Opcode Description

PACKUSWB xmm1, xmm2/mem128 66 0F 67 /r Packs 16-bit signed integers in an XMM register and another XMM register or 128-bit memory location into 8-bit unsigned integers in an XMM register.

......

... ... ... ...

. .. ...

xmm1 xmm2/mem128

convertconvert

127 064 63 packuswb-128.eps

127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 265: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PACKUSWB 235

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 266: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

236 PADDB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PADDB Packed Add Bytes

Adds each packed 8-bit integer value in the first source operand to the correspondingpacked 8-bit integer in the second source operand and writes the integer result of eachaddition in the corresponding byte of the destination (first source). The firstsource/destination operand is an XMM register and the second source operand isanother XMM register or 128-bit memory location.

The PADDB instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

This instruction operates on both signed and unsigned integers. If the resultoverflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set),and only the low-order 8 bits of each result are written in the destination.

Related Instructions

PADDD, PADDQ, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

PADDB xmm1, xmm2/mem128 66 0F FC /r Adds packed byte integer values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

add

127 0 127 0

xmm1 xmm2/mem128

add

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

paddb-128.eps

Page 267: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PADDB 237

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 268: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

238 PADDD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PADDD Packed Add Doublewords

Adds each packed 32-bit integer value in the first source operand to the correspondingpacked 32-bit integer in the second source operand and writes the integer result ofeach addition in the corresponding doubleword of the destination (first source). Thefirst source/destination operand is an XMM register and the second source operand isanother XMM register or 128-bit memory location.

The PADDD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

This instruction operates on both signed and unsigned integers. If the resultoverflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set),and only the low-order 32 bits of each result are written in the destination.

Related Instructions

PADDB, PADDQ, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

PADDD xmm1, xmm2/mem128 66 0F FE /r Adds packed 32-bit integer values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

add

xmm1 xmm2/mem128

add

. .

paddd-128.eps

127 63 0649596 3132

. .

. .127 63 0649596 3132

Page 269: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PADDD 239

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 270: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

240 PADDQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PADDQ Packed Add Quadwords

Adds each packed 64-bit integer value in the first source operand to the correspondingpacked 64-bit integer in the second source operand and writes the integer result ofeach addition in the corresponding quadword of the destination (first source). Thefirst source/destination operand is an XMM register and the second source operand isanother XMM register or 128-bit memory location.

The PADDQ instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

This instruction operates on both signed and unsigned integers. If the resultoverflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set),and only the low-order 64 bits of each result are written in the destination.

Related Instructions

PADDB, PADDD, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PADDQ xmm1, xmm2/mem128 66 0F D4 /r Adds packed 64-bit integer values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

add

xmm1 xmm2/mem128

add

paddq-128.eps

127 63 064127 63 064

Page 271: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PADDQ 241

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 272: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

242 PADDSB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PADDSB Packed Add Signed with Saturation Bytes

Adds each packed 8-bit signed integer value in the first source operand to thecorresponding packed 8-bit signed integer in the second source operand and writesthe signed integer result of each addition in the corresponding byte of the destination(first source). The first source/destination operand is an XMM register and the secondsource operand is another XMM register or 128-bit memory location.

The PADDSB instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

For each packed value in the destination, if the value is larger than the largestrepresentable signed 8-bit integer, it is saturated to 7Fh, and if the value is smallerthan the smallest signed 8-bit integer, it is saturated to 80h.

Related Instructions

PADDB, PADDD, PADDQ, PADDSW, PADDUSB, PADDUSW, PADDW

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

PADDSB xmm1, xmm2/mem128 66 0F EC /r Adds packed byte signed integer values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

add

127 0 127 0

xmm1 xmm2/mem128

add

saturatesaturate

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

paddsb-128.eps

Page 273: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PADDSB 243

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 274: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

244 PADDSW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PADDSW Packed Add Signed with Saturation Words

Adds each packed 16-bit signed integer value in the first source operand to thecorresponding packed 16-bit signed integer in the second source operand and writesthe signed integer result of each addition in the corresponding word of the destination(first source). The first source/destination operand is an XMM register and the secondsource operand is another XMM register or 128-bit memory location.

The PADDSW instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

For each packed value in the destination, if the value is larger than the largestrepresentable signed 16-bit integer, it is saturated to 7FFFh, and if the value issmaller than the smallest signed 16-bit integer, it is saturated to 8000h.

Related Instructions

PADDB, PADDD, PADDQ, PADDSB, PADDUSB, PADDUSW, PADDW

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

PADDSW xmm1, xmm2/mem128 66 0F ED /r Adds packed 16-bit signed integer values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

add

xmm1 xmm2/mem128

add

saturatesaturate

paddsw-128.eps

. .. .... .. ...

. .. ...127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 275: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PADDSW 245

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 276: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

246 PADDUSB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PADDUSB Packed Add Unsigned with Saturation Bytes

Adds each packed 8-bit unsigned integer value in the first source operand to thecorresponding packed 8-bit unsigned integer in the second source operand and writesthe unsigned integer result of each addition in the corresponding byte of thedestination (first source). The first source/destination operand is an XMM register andthe second source operand is another XMM register or 128-bit memory location.

The PADDUSB instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

For each packed value in the destination, if the value is larger than the largestunsigned 8-bit integer, it is saturated to FFh, and if the value is smaller than thesmallest unsigned 8-bit integer, it is saturated to 00h.

Related Instructions

PADDB, PADDD, PADDQ, PADDSB, PADDSW, PADDUSW, PADDW

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

PADDUSB xmm1, xmm2/mem128 66 0F DC /r Adds packed byte unsigned integer values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

add

127 0 127 0

xmm1 xmm2/mem128

add

saturatesaturate

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

paddusb-128.eps

Page 277: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PADDUSB 247

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 278: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

248 PADDUSW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PADDUSW Packed Add Unsigned with Saturation Words

Adds each packed 16-bit unsigned integer value in the first source operand to thecorresponding packed 16-bit unsigned integer in the second source operand andwrites the unsigned integer result of each addition in the corresponding word of thedestination (first source). The first source/destination operand is an XMM register andthe second source operand is another XMM register or 128-bit memory location.

The PADDUSW instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

For each packed value in the destination, if the value is larger than the largestunsigned 16-bit integer, it is saturated to FFFFh, and if the value is smaller than thesmallest unsigned 16-bit integer, it is saturated to 0000h.

Related Instructions

PADDB, PADDD, PADDQ, PADDSB, PADDSW, PADDUSB, PADDW

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

PADDUSW xmm1, xmm2/mem128 66 0F DD /r Adds packed 16-bit unsigned integer values in an XMM register and another XMM register or 128-bit memory location and writes result in the destination XMM register.

add

xmm1 xmm2/mem128

add

saturatesaturate

paddusw-128.eps

. .. .... .. ...

. .. ...127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 279: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PADDUSW 249

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 280: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

250 PADDW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PADDW Packed Add Words

Adds each packed 16-bit integer value in the first source operand to the correspondingpacked 16-bit integer in the second source operand and writes the integer result ofeach addition in the corresponding word of the destination (second source). The firstsource/destination operand is an XMM register and the second source operand isanother XMM register or 128-bit memory location.

The PADDW instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

This instruction operates on both signed and unsigned integers. If the resultoverflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set),and only the low-order 16 bits of the result are written in the destination.

Related Instructions

PADDB, PADDD, PADDQ, PADDSB, PADDSW, PADDUSB, PADDUSW

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

PADDW xmm1, xmm2/mem128 66 0F FD /r Adds packed 16-bit integer values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

add

xmm1 xmm2/mem128

add

paddw-128.eps

. .. .... .. ...

. .. ...127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 281: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PADDW 251

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 282: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

252 PAND

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PAND Packed Logical Bitwise AND

Performs a bitwise logical AND of the values in the first and second source operandsand writes the result in the destination (first source). The first source/destinationoperand is an XMM register and the second source operand is another XMM registeror 128-bit memory location.

The PAND instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PANDN, POR, PXOR

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PAND xmm1, xmm2/mem128 66 0F DB /r Performs bitwise logical AND of values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.

pand-128.eps

AND

xmm1 xmm2/mem128

127 0127 0

Page 283: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PAND 253

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 284: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

254 PANDN

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PANDN Packed Logical Bitwise AND NOT

Performs a bitwise logical AND of the value in the second source operand and theone’s complement of the value in the first source operand and writes the result in thedestination (first source). The first source/destination operand is an XMM register andthe second source operand is another XMM register or 128-bit memory location.

The PANDN instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PAND, POR, PXOR

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PANDN xmm1, xmm2/mem128 66 0F DF /r Performs bitwise logical AND NOT of values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.

pandn-128.eps

AND

xmm1 xmm2/mem128

127 0127 0

invert

Page 285: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PANDN 255

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 286: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

256 PAVGB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PAVGB Packed Average Unsigned Bytes

Computes the rounded average of each packed unsigned 8-bit integer value in the firstsource operand and the corresponding packed 8-bit unsigned integer in the secondsource operand and writes each average in the corresponding byte of the destination(first source). The average is computed by adding each pair of operands, adding 1 tothe 9-bit temporary sum, and then right-shifting the temporary sum by one bitposition. The destination and source operands are an XMM register and another XMMregister or 128-bit memory location.

The PAVGB instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PAVGW

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PAVGB xmm1, xmm2/mem128 66 0F E0 /r Averages packed 8-bit unsigned integer values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

average

127 0 127 0

xmm1 xmm2/mem128

average

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

pavgb-128.eps

Page 287: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PAVGB 257

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 in CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 288: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

258 PAVGW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PAVGW Packed Average Unsigned Words

Computes the rounded average of each packed unsigned 16-bit integer value in thefirst source operand and the corresponding packed 16-bit unsigned integer in thesecond source operand and writes each average in the corresponding word of thedestination (first source). The average is computed by adding each pair of operands,adding 1 to the 17-bit temporary sum, and then right-shifting the temporary sum byone bit position. The first source/destination operand is an XMM register and thesecond source operand is another XMM register or 128-bit memory location.

The PAVGW instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PAVGB

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PAVGW xmm1, xmm2/mem128 66 0F E3 /r Averages packed 16-bit unsigned integer values in an XMM register and another XMM register or 128-bit memory location and writes the result in the destination XMM register.

average

xmm1 xmm2/mem128

average

pavgw-128.eps

. .. .... .. ...

. .. ...127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 289: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PAVGW 259

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 290: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

260 PCMPEQB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PCMPEQB Packed Compare Equal Bytes

Compares corresponding packed bytes in the first and second source operands andwrites the result of each comparison in the corresponding byte of the destination (firstsource). For each pair of bytes, if the values are equal, the result is all 1s. If the valuesare not equal, the result is all 0s. The first source/destination operand is an XMMregister and the second source operand is another XMM register or 128-bit memorylocation.

The PCMPEQB instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PCMPEQB xmm1, xmm2/mem128 66 0F 74 /r Compares packed bytes in an XMM register and an XMM register or 128-bit memory location.

compare

127 0 127 0

xmm1 xmm2/mem128

compare. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

all 1s or 0s

all 1s or 0s

pcmpeqb-128.eps

Page 291: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PCMPEQB 261

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 292: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

262 PCMPEQD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PCMPEQD Packed Compare Equal Doublewords

Compares corresponding packed 32-bit values in the first and second source operandsand writes the result of each comparison in the corresponding 32 bits of thedestination (first source). For each pair of doublewords, if the values are equal, theresult is all 1s. If the values are not equal, the result is all 0s. The firstsource/destination operand is an XMM register and the second source operand isanother XMM register or 128-bit memory location.

The PCMPEQD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PCMPEQB, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PCMPEQD xmm1, xmm2/mem128 66 0F 76 /r Compares packed doublewords in an XMM register and an XMM register or 128-bit memory location.

compare

all 1s or 0s

xmm1 xmm2/mem128

compare

all 1s or 0s

. .

pcmpeqd-128.eps

127 63 0649596 3132

. .

. .127 63 0649596 3132

Page 293: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PCMPEQD 263

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 294: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

264 PCMPEQW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PCMPEQW Packed Compare Equal Words

Compares corresponding packed 16-bit values in the first and second source operandsand writes the result of each comparison in the corresponding 16 bits of thedestination (first source). For each pair of words, if the values are equal, the result isall 1s. If the values are not equal, the result is all 0s. The first source/destinationoperand is an XMM register and the second source operand is another XMM registeror 128-bit memory location.

The PCMPEQW instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PCMPEQB, PCMPEQD, PCMPGTB, PCMPGTD, PCMPGTW

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PCMPEQW xmm1, xmm2/mem128 66 0F 75 /r Compares packed 16-bit values in an XMM register and an XMM register or 128-bit memory location.

compare

xmm1 xmm2/mem128

compare

all 1s or 0s

all 1s or 0s

pcmpeqw-128.eps

. .. .... .. ...

. .. ...127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 295: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PCMPEQW 265

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 296: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

266 PCMPGTB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PCMPGTB Packed Compare Greater Than Signed Bytes

Compares corresponding packed signed bytes in the first and second source operandsand writes the result of each comparison in the corresponding byte of the destination(first source). For each pair of bytes, if the value in the first source operand is greaterthan the value in the second source operand, the result is all 1s. If the value in the firstsource operand is less than or equal to the value in the second source operand, theresult is all 0s. The first source/destination operand is an XMM register and the secondsource operand is another XMM register or 128-bit memory location.

The PCMPGTB instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTD, PCMPGTW

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PCMPGTB xmm1, xmm2/mem128 66 0F 64 /r Compares packed signed bytes in an XMM register and an XMM register or 128-bit memory location.

compare

127 0 127 0

xmm1 xmm2/mem128

compare. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

all 1s or 0sall 1s or 0s

pcmpgtb-128.eps

Page 297: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PCMPGTB 267

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 298: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

268 PCMPGTD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PCMPGTD Packed Compare Greater Than Signed Doublewords

Compares corresponding packed signed 32-bit values in the first and second sourceoperands and writes the result of each comparison in the corresponding 32 bits of thedestination (first source). For each pair of doublewords, if the value in the first sourceoperand is greater than the value in the second source operand, the result is all 1s. Ifthe value in the first source operand is less than or equal to the value in the secondsource operand, the result is all 0s. The first source/destination operand is an XMMregister and the second source operand is another XMM register or 128-bit memorylocation.

The PCMPGTD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTW

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PCMPGTD xmm1, xmm2/mem128 66 0F 66 /r Compares packed signed 32-bit values in an XMM register and an XMM register or 128-bit memory location.

compare

all 1s or 0s

xmm1 xmm2/mem128

compare

all 1s or 0s

. .

pcmpgtd-128.eps

127 63 0649596 3132

. .

. .127 63 0649596 3132

Page 299: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PCMPGTD 269

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 300: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

270 PCMPGTW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PCMPGTW Packed Compare Greater Than Signed Words

Compares corresponding packed signed 16-bit values in the first and second sourceoperands and writes the result of each comparison in the corresponding 16 bits of thedestination (first source). For each pair of words, if the value in the first sourceoperand is greater than the value in the second source operand, the result is all 1s. Ifthe value in the first source operand is less than or equal to the value in the secondsource operand, the result is all 0s. The first source/destination operand is an XMMregister and the second source operand is another XMM register or 128-bit memorylocation.

The PCMPGTW instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PCMPGTW xmm1, xmm2/mem128 66 0F 65 /r Compares packed signed 16-bit values in an XMM register and an XMM register or 128-bit memory location.

127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

compare

xmm1 xmm2/mem128

compare

all 1s or 0sall 1s or 0s

pcmpgtw-128.eps

. .. .... .. ...

. .. ...

Page 301: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PCMPGTW 271

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 302: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

272 PEXTRW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PEXTRW Extract Packed Word

Extracts a 16-bit value from an XMM register, as selected by the immediate byteoperand (as shown in Table 1-2) and writes it to the low-order word of a 32-bit general-purpose register, with zero-extension to 32 bits.

The PEXTRW instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PEXTRW reg32, xmm, imm8 66 0F C5 /r ib Extracts a 16-bit value from an XMM register and writes it to low-order 16 bits of a general-purpose register.

reg32 xmm

imm87 0

mux

015

0

pextrw-128.eps

127 63 0649596111112 7980 4748 1516313232

Page 303: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PEXTRW 273

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

PINSRW

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Table 1-2. Immediate-Byte Operand Encoding for 128-Bit PEXTRW

Immediate-ByteBit Field Value of Bit Field Source Bits Extracted

2–0

0 15–0

1 31–16

2 47–32

3 63–48

4 79–64

5 95–80

6 111–96

7 127–112

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 in CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Page 304: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

274 PINSRW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PINSRW Packed Insert Word

Inserts a 16-bit value from the low-order word of a 32-bit general purpose register or a16-bit memory location into an XMM register. The location in the destination registeris selected by the immediate byte operand, as shown in Table 1-3 on page 275. Theother words in the destination register operand are not modified.

The PINSRW instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PINSRW xmm, reg32/mem16, imm8 66 0F C4 /r ib Inserts a 16-bit value from a general-purpose register or memory location into an XMM register.

reg32/mem16xmm

imm87 0

select word position for insert

015

pinsrw-128.eps

127 63 0649596111112 7980 4748 15163132 32

Page 305: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PINSRW 275

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

PEXTRW

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Table 1-3. Immediate-Byte Operand Encoding for 128-Bit PINSRW

Immediate-ByteBit Field Value of Bit Field Destination Bits Filled

2–0

0 15–0

1 31–16

2 47–32

3 63–48

4 79–64

5 95–80

6 111–96

7 127–112

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 in CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

Page 306: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

276 PINSRW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #ACX X An unaligned memory reference was performed while

alignment checking was enabled.

Exception RealVirtual8086 Protected Cause of Exception

Page 307: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PMADDWD 277

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PMADDWD Packed Multiply Words and Add Doublewords

Multiplies each packed 16-bit signed value in the first source operand by thecorresponding packed 16-bit signed value in the second source operand, adds theadjacent intermediate 32-bit results of each multiplication (for example, themultiplication results for the adjacent bit fields 63–48 and 47–32, and 31–16 and15–0), and writes the 32-bit result of each addition in the corresponding doubleword ofthe destination (first source). The first source/destination operand is an XMM registerand the second source operand is another XMM register or 128-bit memory location.

The PMADDWD instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

There is only one case in which the result of the multiplication and addition will not fitin a signed 32-bit destination. If all four of the 16-bit source operands used to producea 32-bit multiply-add result have the value 8000h, the 32-bit result is 8000_0000h,which is incorrect.

Mnemonic Opcode Description

PMADDWD xmm1, xmm2/mem128 66 0F F5 /r Multiplies eight packed 16-bit signed values in an XMM register and another XMM register or 128-bit memory location, adds intermediate results, and writes the result in the destination XMM register.

xmm1 xmm2/mem128

multiply

multiply

addmultiply

multiply

add

pmaddwd-128.eps

.. .... ..

..127 63 0649596 3132

127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 308: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

278 PMADDWD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

PMULHUW, PMULHW, PMULLW, PMULUDQ

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 309: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PMAXSW 279

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PMAXSW Packed Maximum Signed Words

Compares each of the packed 16-bit signed integer values in the first source operandwith the corresponding packed 16-bit signed integer value in the second sourceoperand and writes the numerically greater of the two values for each comparison inthe corresponding word of the destination (first source). The first source/destinationand second source operands are an XMM register and an XMM register or 128-bitmemory location.

The PMAXSW instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PMAXUB, PMINSW, PMINUB

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PMAXSW xmm1, xmm2/mem128 66 0F EE /r Compares packed signed 16-bit integer values in an XMM register and another XMM register or 128-bit memory location and writes the greater value of each comparison in destination XMM register.

maximum

xmm1 xmm2/mem128

maximum

pmaxsw-128.eps

. .. .... .. ...

. .. ...127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 310: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

280 PMAXSW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 in CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X A memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 311: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PMAXUB 281

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PMAXUB Packed Maximum Unsigned Bytes

Compares each of the packed 8-bit unsigned integer values in the first source operandwith the corresponding packed 8-bit unsigned integer value in the second sourceoperand and writes the numerically greater of the two values for each comparison inthe corresponding byte of the destination (first source). The first source/destinationand second source operands are an XMM register and an XMM register or 128-bitmemory location.

The PMAXUB instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PMAXSW, PMINSW, PMINUB

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PMAXUB xmm1, xmm2/mem128 66 0F DE /r Compares packed unsigned 8-bit integer values in an XMM register and another XMM register or 128-bit memory location and writes the greater value of each compare in the destination XMM register.

maximum

127 0 127 0

xmm1 xmm2/mem128

maximum

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

pmaxub-128.eps

Page 312: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

282 PMAXUB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 in CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X A memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 313: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PMINSW 283

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PMINSW Packed Minimum Signed Words

Compares each of the packed 16-bit signed integer values in the first source operandwith the corresponding packed 16-bit signed integer value in the second sourceoperand and writes the numerically lesser of the two values for each comparison inthe corresponding word of the destination (first source). The first source/destinationand second source operands are an XMM register and an XMM register or 128-bitmemory location.

The PMINSW instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PMAXSW, PMAXUB, PMINUB

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PMINSW xmm1, xmm2/mem128 66 0F EA /r Compares packed signed 16-bit integer values in an XMM register and another XMM register or 128-bit memory location and writes the lesser value of each compare in the destination XMM register.

minimum

xmm1 xmm2/mem128

minimum

pminsw-128.eps

. .. .... .. ...

. .. ...127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 314: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

284 PMINSW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 in CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X A memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 315: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PMINUB 285

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PMINUB Packed Minimum Unsigned Bytes

Compares each of the packed 8-bit unsigned integer values in the first source operandwith the corresponding packed 8-bit unsigned integer value in the second sourceoperand and writes the numerically lesser of the two values for each comparison inthe corresponding byte of the destination (first source). The first source/destinationoperand is an XMM register and the second source operand is another XMM registeror 128-bit memory location.

The PMINUB instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PMAXSW, PMAXUB, PMINSW

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PMINUB xmm1, xmm2/mem128 66 0F DA /r Compares packed unsigned 8-bit integer values in an XMM register and another XMM register or 128-bit memory location and writes the lesser value of each comparison in the destination XMM register.

minimum

127 0 127 0

xmm1 xmm2/mem128

minimum

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

pminub-128.eps

Page 316: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

286 PMINUB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 in CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X A memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 317: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PMOVMSKB 287

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PMOVMSKB Packed Move Mask Byte

Moves the most-significant bit of each byte in the source operand to the destination,with zero-extension to 32 bits. The destination and source operands are a 32-bitgeneral-purpose register and an XMM register. The result is written to the low-orderword of the general-purpose register.

The PMOVMSKB instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

MOVMSKPD, MOVMSKPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PMOVMSKB reg32, xmm 66 0F D7 /r Moves most-significant bit of each byte in an XMM register to low-order word of a 32-bit general-purpose register.

reg32 xmm

copycopy

pmovmskb-128.eps

.. . . ... ..... . .

127 01523313947556371798795103111119 7015

0

. . .... . ...... . .... . .....32

Page 318: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

288 PMOVMSKB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Page 319: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PMULHUW 289

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PMULHUW Packed Multiply High Unsigned Word

Multiplies each packed unsigned 16-bit values in the first source operand by thecorresponding packed unsigned word in the second source operand and writes thehigh-order 16 bits of each intermediate 32-bit result in the corresponding word of thedestination (first source). The first source/destination operand is an XMM register andthe second source operand is another XMM register or 128-bit memory location.

The PMULHUW instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PMADDWD, PMULHW, PMULLW, PMULUDQ

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PMULHUW xmm1, xmm2/mem128 66 0F E4 /r Multiplies packed 16-bit values in an XMM register by the packed 16-bit values in another XMM register or 128-bit memory location and writes the high-order 16 bits of each result in the destination XMM register.

multiply

xmm1 xmm2/mem128

multiply

. . .. . .

pmulhuw-128.eps

127 63 0649596111112 7980 4748 15163132

. . .. . .

. . .. . .127 63 0649596111112 7980 4748 15163132

Page 320: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

290 PMULHUW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by bit 25 in CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 321: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PMULHW 291

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PMULHW Packed Multiply High Signed Word

Multiplies each packed 16-bit signed integer value in the first source operand by thecorresponding packed 16-bit signed integer in the second source operand and writesthe high-order 16 bits of the intermediate 32-bit result of each multiplication in thecorresponding word of the destination (first source). The first source/destinationoperand is an XMM register and the second source operand is another XMM registeror 128-bit memory location.

The PMULHW instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PMADDWD, PMULHUW, PMULLW, PMULUDQ

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PMULHW xmm1, xmm2/mem128 66 0F E5 /r Multiplies packed 16-bit signed integer values in an XMM register and another XMM register or 128-bit memory location and writes the high-order 16 bits of each result in the destination XMM register.

multiply

xmm1 xmm2/mem128

multiply

. . .. . .

pmulhw-128.eps

127 63 0649596111112 7980 4748 15163132

. . .. . .

. . .. . .127 63 0649596111112 7980 4748 15163132

Page 322: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

292 PMULHW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 323: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PMULLW 293

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PMULLW Packed Multiply Low Signed Word

Multiplies each packed 16-bit signed integer value in the first source operand by thecorresponding packed 16-bit signed integer in the second source operand and writesthe low-order 16 bits of the intermediate 32-bit result of each multiplication in thecorresponding word of the destination (first source). The first source/destinationoperand is an XMM register and the second source operand is another XMM registeror 128-bit memory location.

The PMULLW instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PMADDWD, PMULHUW, PMULHW, PMULUDQ

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PMULLW xmm1, xmm2/mem128 66 0F D5 /r Multiplies packed 16-bit signed integer values in an XMM register and another XMM register or 128-bit memory location and writes the low-order 16 bits of each result in the destination XMM register.

multiply

xmm1 xmm2/mem128

multiply

. . .. . .

pmullw-128.eps

127 63 0649596111112 7980 4748 15163132

. . .. . .

. . .. . .127 63 0649596111112 7980 4748 15163132

Page 324: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

294 PMULLW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 325: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PMULUDQ 295

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PMULUDQ Packed Multiply Unsigned Doubleword and Store Quadword

Multiplies two pairs of 32-bit unsigned integer values in the first and second sourceoperands and writes the two 64-bit results in the destination (first source). The firstsource/destination operand is an XMM register and the second source operand isanother XMM register or 128-bit memory location. The source operands are in the first(low-order) and third doublewords of the source operands, and the result of eachmultiply is stored in the first and second quadwords of the destination XMM register.

The PMULUDQ instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PMADDWD, PMULHUW, PMULHW, PMULLW

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PMULUDQ xmm1, xmm2/mem128 66 0F F4 /r Multiplies two pairs of 32-bit unsigned integer values in an XMM register and another XMM register or 128-bit memory location and writes the two 64-bit results in the destination XMM register.

multiply

xmm1 xmm2/mem128

multiply

pmuludq-128.eps

127 63 0649596 3132127 63 0649596 3132

Page 326: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

296 PMULUDQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 327: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

POR 297

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

POR Packed Logical Bitwise OR

Performs a bitwise logical OR of the values in the first and second source operandsand writes the result in the destination (first source). The first source/destinationoperand is an XMM register and the second source operand is another XMM registeror 128-bit memory location.

The POR instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PAND, PANDN, PXOR

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

POR xmm1, xmm2/mem128 66 0F EB /r Performs bitwise logical OR of values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.

por-128.eps

OR

xmm1 xmm2/mem128

127 0 127 0

Page 328: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

298 POR

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 329: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSADBW 299

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSADBW Packed Sum of Absolute Differences of Bytes Into a Word

Computes the absolute differences of eight corresponding packed 8-bit unsignedintegers in the first and second source operands and writes the unsigned 16-bit integerresult of the sum of the eight differences in a word in the destination (first source).The first source/destination operand is an XMM register and the second sourceoperand is another XMM register or 128-bit memory location.

The sum of the differences of the eight bytes in the high-order quadwords of thesource operands are written in the least-significant word of the high-order quadwordin the destination XMM register, with the remaining bytes cleared to all 0s. The sum ofthe differences of the eight bytes in the low-order quadwords of the source operandsare written in the least-significant word of the low-order quadword in the destinationXMM register, with the remaining bytes cleared to all 0s.

The PSADBW instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSADBW xmm1, xmm2/mem128 66 0F F6 /r Compute the sum of the absolute differences of two sets of packed 8-bit unsigned integer values in an XMM register and another XMM register or 128-bit memory location and writes the 16-bit unsigned integer result in the destination XMM register.

Page 330: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

300 PSADBW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 in CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

psadbw-128.eps

. . . . . ... . . . . . . . . . . . . . . . .

xmm1 xmm2/mem128

127 0127 0

absolutedifference

absolutedifference

0

absolutedifference

add 8pairs

add 8pairs

absolutedifference

6364 6364

63 015127 6479

0

Page 331: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSADBW 301

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Exception RealVirtual8086 Protected Cause of Exception

Page 332: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

302 PSHUFD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PSHUFD Packed Shuffle Doublewords

Moves any one of the four packed doublewords in an XMM register or 128-bit memorylocation to each doubleword in another XMM register. In each case, the value of thedestination doubleword is determined by a two-bit field in the immediate-byteoperand, with bits 0 and 1 selecting the contents of the low-order doubleword, bits 2and 3 selecting the second doubleword, bits 4 and 5 selecting the third doubleword,and bits 6 and 7 selecting the high-order doubleword. Refer to Table 1-4 on page 303.A doubleword in the source operand may be copied to more than one doubleword inthe destination.

The PSHUFD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSHUFD xmm1, xmm2/mem128, imm8 66 0F 70 /r ib Moves packed 32-bit values in an XMM register or 128-bit memory location to doubleword locations in another XMM register, as selected by the immediate-byte operand.

pshufd.eps

xmm1 xmm2/mem128

imm87 0

muxmux

muxmux

127 63 0649596 3132127 63 0649596 3132

Page 333: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSHUFD 303

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

PSHUFHW, PSHUFLW, PSHUFW

rFLAGS Affected

None

MXCSR Flags Affected

None

Table 1-4. Immediate-Byte Operand Encoding for PSHUFD

Destination Bits FilledImmediate-Byte

Bit Field Value of Bit Field Source Bits Moved

31–0 1–0

0 31–0

1 63–32

2 95–64

3 127–96

63–32 3–2

0 31–0

1 63–32

2 95–64

3 127–96

95–64 5–4

0 31–0

1 63–32

2 95–64

3 127–96

127–96 7–6

0 31–0

1 63–32

2 95–64

3 127–96

Page 334: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

304 PSHUFD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 335: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSHUFHW 305

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSHUFHW Packed Shuffle High Words

Moves any one of the four packed words in the high-order quadword of an XMMregister or 128-bit memory location to each word in the high-order quadword ofanother XMM register. In each case, the value of the destination word is determinedby a two-bit field in the immediate-byte operand, with bits 0 and 1 selecting thecontents of the low-order word, bits 2 and 3 selecting the second word, bits 4 and 5selecting the third word, and bits 6 and 7 selecting the high-order word. Refer toTable 1-5 on page 306. A word in the source operand may be copied to more than oneword in the destination. The low-order quadword of the source operand is copied tothe low-order quadword of the destination register.

The PSHUFHW instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSHUFHW xmm1, xmm2/mem128, imm8 F3 0F 70 /r ib Shuffles packed 16-bit values in high-order quadword of an XMM register or 128-bit memory location and puts the result in high-order quadword of another XMM register.

pshufhw.eps

xmm1 xmm2/mem128

imm87 0

127 63 0649596111112 7980127 63 0649596111112 7980

muxmux

muxmux

Page 336: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

306 PSHUFHW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

PSHUFD, PSHUFLW, PSHUFW

rFLAGS Affected

None

MXCSR Flags Affected

None

Table 1-5. Immediate-Byte Operand Encoding for PSHUFHW

Destination Bits FilledImmediate-Byte

Bit Field Value of Bit Field Source Bits Moved

79–64 1–0

0 79–64

1 95–80

2 111–96

3 127–112

95–80 3–2

0 79–64

1 95–80

2 111–96

3 127–112

111–96 5–4

0 79–64

1 95–80

2 111–96

3 127–112

127–112 7–6

0 79–64

1 95–80

2 111–96

3 127–112

Page 337: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSHUFHW 307

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 338: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

308 PSHUFLW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PSHUFLW Packed Shuffle Low Words

Moves any one of the four packed words in the low-order quadword of an XMMregister or 128-bit memory location to each word in the low-order quadword of anotherXMM register. In each case, the selection of the value of the destination word isdetermined by a two-bit field in the immediate-byte operand, with bits 0 and 1selecting the contents of the low-order word, bits 2 and 3 selecting the second word,bits 4 and 5 selecting the third word, and bits 6 and 7 selecting the high-order word.Refer to Table 1-6 on page 309. A word in the source operand may be copied to morethan one word in the destination. The high-order quadword of the source operand iscopied to the high-order quadword of the destination register.

The PSHUFLW instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSHUFLW xmm1, xmm2/mem128, imm8 F2 0F 70 /r ib Shuffles packed 16-bit values in low-order quadword of an XMM register or 128-bit memory location and puts the result in low-order quadword of another XMM register.

pshuflw.eps

xmm1 xmm2/mem128

imm87 0

muxmux

muxmux

127 064 63 4748 15163132127 064 63 4748 15163132

Page 339: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSHUFLW 309

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

PSHUFD, PSHUFHW, PSHUFW

rFLAGS Affected

None

MXCSR Flags Affected

None

Table 1-6. Immediate-Byte Operand Encoding for PSHUFLW

Destination Bits FilledImmediate-Byte

Bit Field Value of Bit Field Source Bits Moved

15–0 1–0

0 15–0

1 31–16

2 47–32

3 63–48

31–16 3–2

0 15–0

1 31–16

2 47–32

3 63–48

47–32 5–4

0 15–0

1 31–16

2 47–32

3 63–48

63–48 7–6

0 15–0

1 31–16

2 47–32

3 63–48

Page 340: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

310 PSHUFLW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 341: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSLLD 311

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSLLD Packed Shift Left Logical Doublewords

Left-shifts each of the packed 32-bit values in the first source operand by the numberof bits specified in the second source operand and writes each shifted value in thecorresponding doubleword of the dest inat ion ( f irst source) . The f irstsource/destination and second source operands are:

an XMM register and another XMM register or 128-bit memory location, or

an XMM register and an immediate byte value.

The low-order bits that are emptied by the shift operation are cleared to 0. If the shiftvalue is greater than 31, the destination is cleared to all 0s.

The PSLLD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSLLD xmm1, xmm2/mem128 66 0F F2 /r Left-shifts packed doublewords in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location.

PSLLD xmm, imm8 66 0F 72 /6 ib Left-shifts packed doublewords in an XMM register by the amount specified in an immediate byte value.

shift left

xmm1 xmm2/mem128

shift left

pslld-128.eps

xmm imm8

127 63 064127 63 0649596 3132

..

..

shift left

shift left

127 63 0649596 3132

..

..7 0

Page 342: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

312 PSLLD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 343: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSLLDQ 313

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSLLDQ Packed Shift Left Logical Double Quadword

Left-shifts the 128-bit (double quadword) value in an XMM register by the number ofbytes specified in an immediate byte value. The low-order bytes that are emptied bythe shift operation are cleared to 0. If the shift value is greater than 15, thedestination XMM register is cleared to all 0s.

The PSLLDQ instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PSLLD, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PSLLDQ xmm, imm8 66 0F 73 /7 ib Left-shifts double quadword value in an XMM register by the amount specified in an immediate byte value.

pslldq.eps

127 0

xmm imm8

shift left

7 0

Page 344: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

314 PSLLDQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Page 345: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSLLQ 315

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSLLQ Packed Shift Left Logical Quadwords

Left-shifts each 64-bit value in the first source operand by the number of bits specifiedin the second source operand and writes each shifted value in the correspondingquadword of the destination (first source). The first source/destination and secondsource operands are:

an XMM register and another XMM register or 128-bit memory location, or

an XMM register and an immediate byte value.

The low-order bits that are emptied by the shift operation are cleared to 0. If the shiftvalue is greater than 63, the destination is cleared to all 0s.

The PSLLQ instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSLLQ xmm1, xmm2/mem128 66 0F F3 /r Left-shifts packed quadwords in XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location.

PSLLQ xmm, imm8 66 0F 73 /6 ib Left-shifts packed quadwords in an XMM register by the amount specified in an immediate byte value.

shift left

xmm1 xmm2/mem128

shift left

psllq-128.eps

xmm imm8

127 63 064

shift leftshift left

127 63 064 7 0

127 63 064

Page 346: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

316 PSLLQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

PSLLD, PSLLDQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 347: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSLLW 317

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSLLW Packed Shift Left Logical Words

Left-shifts each of the packed 16-bit values in the first source operand by the numberof bits specified in the second source operand and writes each shifted value in thecorresponding word of the destination (first source). The first source/destination andsecond source operands are:

an XMM register and another XMM register or 128-bit memory location, or

an XMM register and an immediate byte value

The low-order bits that are emptied by the shift operation are cleared to 0. If the shiftvalue is greater than 15, the destination is cleared to all 0s.

The PSLLW instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSLLW xmm1, xmm2/mem128 66 0F F1 /r Left-shifts packed words in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location.

PSLLW xmm, imm8 66 0F 71 /6 ib Left-shifts packed words in an XMM register by the amount specified in an immediate byte value.

Page 348: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

318 PSLLW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

PSLLD, PSLLDQ, PSLLQ, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

shift left

xmm1 xmm2/mem128

shift left

psllw-128.eps

xmm imm8

shift leftshift left

. ... ..

. ... ..7 0127 63 0649596111112 7980 4748 15163132

. ... ..

. ... ..127 63 0649596111112 7980 4748 15163132 127 63 064

Page 349: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSLLW 319

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Exception RealVirtual8086 Protected Cause of Exception

Page 350: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

320 PSRAD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PSRAD Packed Shift Right Arithmetic Doublewords

Right-shifts each of the packed 32-bit values in the first source operand by the numberof bits specified in the second source operand and writes each shifted value in thecorresponding doubleword of the dest inat ion ( f irst source) . The f irstsource/destination and second source operands are:

an XMM register and another XMM register or 128-bit memory location, or

an XMM register and an immediate byte value.

The high-order bits that are emptied by the shift operation are filled with the sign bitof the doubleword’s initial value. If the shift value is greater than 31, each doublewordin the destination is filled with the sign bit of the doubleword’s initial value.

The PSRAD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSRAD xmm1, xmm2/mem128 66 0F E2 /r Right-shifts packed doublewords in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location.

PSRAD xmm, imm8 66 0F 72 /4 ib Right-shifts packed doublewords in an XMM register by the amount specified in an immediate byte value.

Page 351: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSRAD 321

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

shift right

xmm1 xmm2/mem128

shift right

psrad-128.eps

xmm imm8

127 63 0649596 3132

..

..

..

shift right

shift right

127 63 0649596 3132

..

7 0

127 63 064

Page 352: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

322 PSRAD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Exception RealVirtual8086 Protected Cause of Exception

Page 353: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSRAW 323

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSRAW Packed Shift Right Arithmetic Words

Right-shifts each of the packed 16-bit values in the first source operand by the numberof bits specified in the second source operand and writes each shifted value in thecorresponding word of the destination (first source). The first source/destination andsecond source operands are:

an XMM register and another XMM register or 128-bit memory location, or

an XMM register and an immediate byte value.

The high-order bits that are emptied by the shift operation are filled with the sign bitof the word’s initial value. If the shift value is greater than 15, each word in thedestination is filled with the sign bit of the word’s initial value.

The PSRAW instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSRAW xmm1, xmm2/mem128 66 0F E1 /r Right-shifts packed words in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location.

PSRAW xmm, imm8 66 0F 71 /4 ib Right-shifts packed words in an XMM register by the amount specified in an immediate byte value.

Page 354: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

324 PSRAW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRLD, PSRLDQ, PSRLQ, PSRLW

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

shift rightarithmetic

xmm1 xmm2/mem128

shift rightarithmetic

psraw-128.eps

xmm imm8

shift rightarithmetic

shift rightarithmetic

. ... ..

. ... ..

. ... ..

7 0127 63 0649596111112 7980 4748 15163132

. ... ..

127 63 0649596111112 7980 4748 15163132 127 63 064

Page 355: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSRAW 325

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Exception RealVirtual8086 Protected Cause of Exception

Page 356: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

326 PSRLD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PSRLD Packed Shift Right Logical Doublewords

Right-shifts each of the packed 32-bit values in the first source operand by the numberof bits specified in the second source operand and writes each shifted value in thecorresponding doubleword of the dest inat ion ( f irst source) . The f irstsource/destination and second source operands are:

an XMM register and another XMM register or 128-bit memory location, or

an XMM register and an immediate byte value.

The high-order bits that are emptied by the shift operation are cleared to 0. If the shiftvalue is greater than 31, the destination is cleared to 0.

The PSRLD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSRLD xmm1, xmm2/mem128 66 0F D2 /r Right-shifts packed doublewords in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location.

PSRLD xmm, imm8 66 0F 72 /2 ib Right-shifts packed doublewords in an XMM register by the amount specified in an immediate byte value.

Page 357: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSRLD 327

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLDQ, PSRLQ, PSRLW

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

psrld-128.eps

shift right

xmm1 xmm2/mem128

shift right

xmm imm8

127 63 0649596 3132

..

..

..

shift right

shift right

127 63 0649596 3132

..

7 0

127 63 064

Page 358: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

328 PSRLD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Exception RealVirtual8086 Protected Cause of Exception

Page 359: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSRLDQ 329

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSRLDQ Packed Shift Right Logical Double Quadword

Right-shifts the 128-bit (double quadword) value in an XMM register by the number ofbytes specified in an immediate byte value. The high-order bytes that are emptied bythe shift operation are cleared to 0. If the shift value is greater than 15, thedestination XMM register is cleared to all 0s.

The PSRLDQ instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLQ, PSRLW

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PSRLDQ xmm, imm8 66 0F 73 /3 ib Right-shifts double quadword value in an XMM register by the amount specified in an immediate byte value.

psrldq.eps

127 0

xmm imm8

shift right

7 0

Page 360: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

330 PSRLDQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Page 361: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSRLQ 331

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSRLQ Packed Shift Right Logical Quadwords

Right-shifts each 64-bit value in the first source operand by the number of bitsspecified in the second source operand and writes each shifted value in thecorresponding quadword of the destination (first source). The first source/destinationand second source operands are:

an XMM register and another XMM register or 128-bit memory location, or

an XMM register and an immediate byte value.

The high-order bits that are emptied by the shift operation are cleared to 0. If the shiftvalue is greater than 63, the destination is cleared to 0.

The PSRLQ instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSRLQ xmm1, xmm2/mem128 66 0F D3 /r Right-shifts packed quadwords in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location.

PSRLQ xmm, imm8 66 0F 73 /2 ib Right-shifts packed quadwords in an XMM register by the amount specified in an immediate byte value.

Page 362: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

332 PSRLQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLW

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

shift right

xmm1 xmm2/mem128

shift right

psrlq-128.eps

xmm imm8

127 63 064

shift right

shift right

127 63 064 7 0

127 63 064

Page 363: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSRLQ 333

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Exception RealVirtual8086 Protected Cause of Exception

Page 364: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

334 PSRLW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

PSRLW Packed Shift Right Logical Words

Right-shifts each of the packed 16-bit values in the first source operand by the numberof bits specified in the second operand and writes each shifted value in thecorresponding word of the destination (first source). The first source/destination andsecond source operands are:

an XMM register and another XMM register or 128-bit memory location, or

an XMM register and an immediate byte value.

The high-order bits that are emptied by the shift operation are cleared to 0. If the shiftvalue is greater than 15, the destination is cleared to 0.

The PSRLW instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

PSRLW xmm1, xmm2/mem128 66 0F D1 /r Right-shifts packed words in an XMM register by the amount specified in the low 64 bits of an XMM register or 128-bit memory location.

PSRLW xmm, imm8 66 0F 71 /2 ib Right-shifts packed words in an XMM register by the amount specified in an immediate byte value.

Page 365: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSRLW 335

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

shift right

xmm1 xmm2/mem128

shift right

psrlw-128.eps

xmm imm8

shift right

shift right

. ... ..

. ... ..

. ... ..

7 0127 63 0649596111112 7980 4748 15163132

. ... ..

127 63 0649596111112 7980 4748 15163132 127 63 064

Page 366: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

336 PSRLW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X X X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Exception RealVirtual8086 Protected Cause of Exception

Page 367: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSUBB 337

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSUBB Packed Subtract Bytes

Subtracts each packed 8-bit integer value in the second source operand from thecorresponding packed 8-bit integer in the first source operand and writes the integerresult of each subtraction in the corresponding byte of the destination (first source).The first source/destination operand is an XMM register and the second sourceoperand is another XMM register or 128-bit memory location.

This instruction operates on both signed and unsigned integers. If the resultoverflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set),and only the low-order 8 bits of each result are written in the destination.

The PSUBB instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW

rFLAGS Affected

None

Mnemonic Opcode Description

PSUBB xmm1, xmm2/mem128 66 0F F8 /r Subtracts packed byte integer values in an XMM register or 128-bit memory location from packed byte integer values in another XMM register and writes the result in the destination XMM register.

subtract

127 0 127 0

xmm1 xmm2/mem128

subtract

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

psubb-128.eps

Page 368: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

338 PSUBB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 369: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSUBD 339

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSUBD Packed Subtract Doublewords

Subtracts each packed 32-bit integer value in the second source operand from thecorresponding packed 32-bit integer in the first source operand and writes the integerresult of each subtraction in the corresponding doubleword of the destination (firstsource). The first source/destination operand is an XMM register and the secondsource operand is another XMM register or 128-bit memory location.

The PSUBD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

This instruction operates on both signed and unsigned integers. If the resultoverflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set),and only the low-order 32 bits of each result are written in the destination.

Related Instructions

PSUBB, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

PSUBD xmm1, xmm2/mem128 66 0F FA /r Subtracts packed 32-bit integer values in an XMM register or 128-bit memory location from packed 32-bit integer values in another XMM register and writes the result in the destination XMM register.

subtract

xmm1 xmm2/mem128

subtract

. .

psubd-128.eps

127 63 0649596 3132

. .

. .127 63 0649596 3132

Page 370: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

340 PSUBD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 371: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSUBQ 341

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSUBQ Packed Subtract Quadword

Subtracts each packed 64-bit integer value in the second source operand from thecorresponding packed 64-bit integer in the first source operand and writes the integerresult of each subtraction in the corresponding quadword of the destination (firstsource). The first source/destination and source operands are an XMM register andanother XMM register or 128-bit memory location.

The PSUBQ instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

This instruction operates on both signed and unsigned integers. If the resultoverflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set),and only the low-order 64 bits of each result are written in the destination.

Related Instructions

PSUBB, PSUBD, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

PSUBQ xmm1, xmm2/mem128 66 0F FB /r Subtracts packed 64-bit integer values in an XMM register or 128-bit memory location from packed 64-bit integer values in another XMM register and writes the result in the destination XMM register.

subtract

xmm1 xmm2/mem128

subtract

psubq-128.eps

127 63 064127 63 064

Page 372: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

342 PSUBQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 373: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSUBSB 343

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSUBSB Packed Subtract Signed With Saturation Bytes

Subtracts each packed 8-bit signed integer value in the second source operand fromthe corresponding packed 8-bit signed integer in the first source operand and writesthe signed integer result of each subtraction in the corresponding byte of thedestination (first source). The first source/destination operand is an XMM register andthe second source operand is another XMM register or 128-bit memory location.

The PSUBSB instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

For each packed value in the destination, if the value is larger than the largest signed8-bit integer, it is saturated to 7Fh, and if the value is smaller than the smallest signed8-bit integer, it is saturated to 80h.

Related Instructions

PSUBB, PSUBD, PSUBQ, PSUBSW, PSUBUSB, PSUBUSW, PSUBW

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

PSUBSB xmm1, xmm2/mem128 66 0F E8 /r Subtracts packed byte signed integer values in an XMM register or 128-bit memory location from packed byte integer values in another XMM register and writes the result in the destination XMM register.

subtract

127 0 127 0

xmm1 xmm2/mem128

subtract

saturate

saturate

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

psubsb-128.eps

Page 374: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

344 PSUBSB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 375: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSUBSW 345

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSUBSW Packed Subtract Signed With Saturation Words

Subtracts each packed 16-bit signed integer value in the second source operand fromthe corresponding packed 16-bit signed integer in the first source operand and writesthe signed integer result of each subtraction in the corresponding word of thedestination (first source). The first source/destination and source operands are anXMM register and another XMM register or 128-bit memory location.

For each packed value in the destination, if the value is larger than the largest signed16-bit integer, it is saturated to 7FFFh, and if the value is smaller than the smallestsigned 16-bit integer, it is saturated to 8000h.

The PSUBSW instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBUSB, PSUBUSW, PSUBW

rFLAGS Affected

None

Mnemonic Opcode Description

PSUBSW xmm1, xmm2/mem128 66 0F E9 /r Subtracts packed 16-bit signed integer values in an XMM register or 128-bit memory location from packed 16-bit integer values in another XMM register and writes the result in the destination XMM register.

subtract

xmm1 xmm2/mem128

subtract

saturatesaturate

psubsw-128.eps

. .. .... .. ...

. .. ...127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 376: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

346 PSUBSW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 377: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSUBUSB 347

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSUBUSB Packed Subtract Unsigned and Saturate Bytes

Subtracts each packed 8-bit unsigned integer value in the second source operand fromthe corresponding packed 8-bit unsigned integer in the first source operand andwrites the unsigned integer result of each subtraction in the corresponding byte of thedestination (first source). The first source/destination operand is an XMM register andthe second source operand is another XMM register or 128-bit memory location.

For each packed value in the destination, if the value is larger than the largestunsigned 8-bit integer, it is saturated to FFh, and if the value is smaller than thesmallest unsigned 8-bit integer, it is saturated to 00h.

The PSUBUSB instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSW, PSUBW

rFLAGS Affected

None

Mnemonic Opcode Description

PSUBUSB xmm1, xmm2/mem128 66 0F D8 /r Subtracts packed byte unsigned integer values in an XMM register or 128-bit memory location from packed byte integer values in another XMM register and writes the result in the destination XMM register.

subtract

127 0 127 0

xmm1 xmm2/mem128

subtract

saturate

saturate

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

psubusb-128.eps

Page 378: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

348 PSUBUSB

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 379: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSUBUSW 349

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSUBUSW Packed Subtract Unsigned and Saturate Words

Subtracts each packed 16-bit unsigned integer value in the second source operandfrom the corresponding packed 16-bit unsigned integer in the first source operand andwrites the unsigned integer result of each subtraction in the corresponding word ofthe destination (first source). The first source/destination operand is an XMM registerand the second source operand is another XMM register or 128-bit memory location.

For each packed value in the destination, if the value is larger than the largestunsigned 16-bit integer, it is saturated to FFFFh, and if the value is smaller than thesmallest unsigned 16-bit integer, it is saturated to 0000h.

The PSUBUSW instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBW

rFLAGS Affected

None

Mnemonic Opcode Description

PSUBUSW xmm1, xmm2/mem128 66 0F D9 /r Subtracts packed 16-bit unsigned integer values in an XMM register or 128-bit memory location from packed 16-bit integer values in another XMM register and writes the result in the destination XMM register.

subtract

xmm1 xmm2/mem128

subtract

saturate

saturate

psubusw-128.eps

. .. .... .. ...

. .. ...127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 380: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

350 PSUBUSW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 381: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PSUBW 351

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSUBW Packed Subtract Words

Subtracts each packed 16-bit integer value in the second source operand from thecorresponding packed 16-bit integer in the first source operand and writes the integerresult of each subtraction in the corresponding word of the destination (first source).The first source/destination operand is an XMM register and the second sourceoperand is another XMM register or 128-bit memory location.

For each packed value in the destination, if the value is larger than the largestunsigned 16-bit integer, it is saturated to FFFFh, and if the value is smaller than thesmallest unsigned 16-bit integer, it is saturated to 0000h.

This instruction operates on both signed and unsigned integers. If the resultoverflows, the carry is ignored (neither the overflow nor carry bit in rFLAGS is set),and only the low-order 16 bits of the result are written in the destination.

The PSUBW instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW

Mnemonic Opcode Description

PSUBW xmm1, xmm2/mem128 66 0F F9 /r Subtracts packed 16-bit integer values in an XMM register or 128-bit memory location from packed 16-bit integer values in another XMM register and writes the result in the destination XMM register.

subtract

xmm1 xmm2/mem128

subtract

psubw-128.eps

. .. .... .. ...

. .. ...127 63 0649596111112 7980 4748 15163132127 63 0649596111112 7980 4748 15163132

Page 382: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

352 PSUBW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 383: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PUNPCKHBW 353

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PUNPCKHBW Unpack and Interleave High Bytes

Unpacks the high-order bytes from the first and second source operands and packsthem into interleaved bytes in the destination (first source). The low-order bytes of thesource operands are ignored. The first source/destination operand is an XMM registerand the second source operand is another XMM register or 128-bit memory location.

If the second source operand is all 0s, the destination contains the bytes from the firstsource operand zero-extended to 16 bits. This operation is useful for expandingunsigned 8-bit values to unsigned 16-bit operands for subsequent processing thatrequires higher precision.

The PUNPCKHBW instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ,PUNPCKLQDQ, PUNPCKLWD

rFLAGS Affected

None

Mnemonic Opcode Description

PUNPCKHBW xmm1, xmm2/mem128 66 0F 68 /r Unpacks the eight high-order bytes in an XMM register and another XMM register or 128-bit memory location and packs them into interleaved bytes in the destination XMM register.

punpckhbw-128.eps

.. ...... ....

127 63 064127 63 064

... ... ... ...

xmm1 xmm2/mem128

copy copy

127 064 63

copy copy

Page 384: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

354 PUNPCKHBW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 385: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PUNPCKHDQ 355

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PUNPCKHDQ Unpack and Interleave High Doublewords

Unpacks the high-order doublewords from the first and second source operands andpacks them into interleaved doublewords in the destination (first source). The low-order doublewords of the source operands are ignored. The first source/destinationoperand is an XMM register and the second source operand is another XMM registeror 128-bit memory location.

If the second source operand is all 0s, the destination contains the doubleword(s) fromthe first source operand zero-extended to 64 bits. This operation is useful forexpanding unsigned 32-bit values to unsigned 64-bit operands for subsequentprocessing that requires higher precision.

The PUNPCKHDQ instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PUNPCKHBW, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ,PUNPCKLQDQ, PUNPCKLWD

Mnemonic Opcode Description

PUNPCKHDQ xmm1, xmm2/mem128 66 0F 6A /r Unpacks two high-order doublewords in an XMM register and another XMM register or 128-bit memory location and packs them into interleaved doublewords in the destination XMM register.

punpckhdq-128.eps127 63 0649596 3132

127 63 0649596127 63 0649596

xmm1 xmm2/mem128

copy copycopy copy

Page 386: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

356 PUNPCKHDQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 387: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PUNPCKHQDQ 357

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PUNPCKHQDQ Unpack and Interleave High Quadwords

Unpacks the high-order quadwords from the first and second source operands andpacks them into interleaved quadwords in the destination (first source). The firstsource/destination is an XMM register, and the second source operand is anotherXMM register or 128-bit memory location. The low-order quadwords of the sourceoperands are ignored.

If the second source operand is all 0s, the destination contains the quadword from thefirst source operand zero-extended to 128 bits. This operation is useful for expandingunsigned 64-bit values to unsigned 128-bit operands for subsequent processing thatrequires higher precision.

The PUNPCKHQDQ instruction is an SSE2 instruction. The presence of thisinstruction set is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PUNPCKHBW, PUNPCKHDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ,PUNPCKLQDQ, PUNPCKLWD

Mnemonic Opcode Description

PUNPCKHQDQ xmm1, xmm2/mem128 66 0F 6D /r Unpacks high-order quadwords in an XMM register and another XMM register or 128-bit memory location and packs them into interleaved quadwords in the destination XMM register.

punpckhqdq.eps

127 63 064127 63 064

63 064127

xmm1 xmm2/mem128

copy copy

Page 388: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

358 PUNPCKHQDQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 389: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PUNPCKHWD 359

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PUNPCKHWD Unpack and Interleave High Words

Unpacks the high-order words from the first and second source operands and packsthem into interleaved words in the destination (first source). The low-order words ofthe source operands are ignored. The first source/destination operand is an XMMregister and the second source operand is another XMM register or 128-bit memorylocation.

If the second source operand is all 0s, the destination contains the words from the firstsource operand zero-extended to 32 bits. This operation is useful for expandingunsigned 16-bit values to unsigned 32-bit operands for subsequent processing thatrequires higher precision.

The PUNPCKHWD instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKLBW, PUNPCKLDQ,PUNPCKLQDQ, PUNPCKLWD

Mnemonic Opcode Description

PUNPCKHWD xmm1, xmm2/mem128 66 0F 69 /r Unpacks four high-order words in an XMM register and another XMM register or 128-bit memory location and packs them into interleaved words in the destination XMM register.

punpckhwd-128.eps

127 63 0649596111112 7980127 63 0649596111112 7980

. .

. . . .

. .

127 63 0649596111112 7980 4748 15163132... ... ... ...

xmm1 xmm2/mem128

copy copycopy copy

Page 390: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

360 PUNPCKHWD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 391: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PUNPCKLBW 361

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PUNPCKLBW Unpack and Interleave Low Bytes

Unpacks the low-order bytes from the first and second source operands and packsthem into interleaved bytes in the destination (first source). The high-order bytes ofthe source operands are ignored. The first source/destination operand is an XMMregister and the second source operand is another XMM register or 128-bit memorylocation.

If the second source operand is all 0s, the destination contains the bytes from the firstsource operand zero-extended to 16 bits. This operation is useful for expandingunsigned 8-bit values to unsigned 16-bit operands for subsequent processing thatrequires higher precision.

The PUNPCKLBW instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLDQ,PUNPCKLQDQ, PUNPCKLWD

Mnemonic Opcode Description

PUNPCKLBW xmm1, xmm2/mem128 66 0F 60 /r Unpacks the eight low-order bytes in an XMM register and another XMM register or 128-bit memory location and packs them into interleaved bytes in the destination XMM register.

punpcklbw-128.eps

.. ...... ....

127 63 064127 63 064

... ... ... ...

xmm1 xmm2/mem128

copy copy

127 064 63

copy copy

Page 392: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

362 PUNPCKLBW

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 393: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PUNPCKLDQ 363

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PUNPCKLDQ Unpack and Interleave Low Doublewords

Unpacks the low-order doublewords from the first and second source operands andpacks them into interleaved doublewords in the destination (first source). The high-order doublewords of the source operands are ignored. The first source/destinationoperand is an XMM register and the second source operand is another XMM registeror 128-bit memory location.

If the second source operand is all 0s, the destination contains the doubleword(s) fromthe first source operand zero-extended to 64 bits. This operation is useful forexpanding unsigned 32-bit values to unsigned 64-bit operands for subsequentprocessing that requires higher precision.

The PUNPCKLDQ instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW,PUNPCKLQDQ, PUNPCKLWD

Mnemonic Opcode Description

PUNPCKLDQ xmm1, xmm2/mem128 66 0F 62 /r Unpacks two low-order doublewords in an XMM register and another XMM register or 128-bit memory location and packs them into interleaved doublewords in the destination XMM register.

punpckldq-128.eps

127 63 064 3132127 63 064 3132

127 63 0649596 3132

xmm1 xmm2/mem128

copycopy copycopy

Page 394: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

364 PUNPCKLDQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 395: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PUNPCKLQDQ 365

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PUNPCKLQDQ Unpack and Interleave Low Quadwords

Unpacks the low-order quadwords from the first and second source operands andpacks them into interleaved quadwords in the destination (first source). The firstsource/destination is an XMM register, and the second source operand is anotherXMM register or 128-bit memory location. The high-order quadwords of the sourceoperands are ignored.

If the second source operand is all 0s, the destination contains the quadword from thefirst source operand zero-extended to 128 bits. This operation is useful for expandingunsigned 64-bit values to unsigned 128-bit operands for subsequent processing thatrequires higher precision.

The PUNPCKLQDQ instruction is an SSE2 instruction. The presence of thisinstruction set is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW,PUNPCKLDQ, PUNPCKLWD

Mnemonic Opcode Description

PUNPCKLQDQ xmm1, xmm2/mem128 66 0F 6C /r Unpacks low-order quadwords in an XMM register and another XMM register or 128-bit memory location and packs them into interleaved quadwords in the destination XMM register.

punpcklqdq.eps

127 63 064127 63 064

63 064127

xmm1 xmm2/mem128

copy copy

Page 396: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

366 PUNPCKLQDQ

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 397: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PUNPCKLWD 367

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PUNPCKLWD Unpack and Interleave Low Words

Unpacks the low-order words from the first and second source operands and packsthem into interleaved words in the destination (first source). The high-order words ofthe source operands are ignored. The first source/destination operand is an XMMregister and the second source operand is another XMM register or 128-bit memorylocation.

If the second source operand is all 0s, the destination contains the words from the firstsource operand zero-extended to 32 bits. This operation is useful for expandingunsigned 16-bit values to unsigned 32-bit operands for subsequent processing thatrequires higher precision.

The PUNPCKLWD instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW,PUNPCKLDQ, PUNPCKLQDQ

Mnemonic Opcode Description

PUNPCKLWD xmm1, xmm2/mem128 66 0F 61 /r Unpacks the four low-order words in an XMM register and another XMM register or 128-bit memory location and packs them into interleaved words in the destination XMM register.

punpcklwd-128.eps

127 63 064 4748 15163132127 63 064 4748 15163132

. .. .

. .. .

127 63 0649596111112 7980 4748 15163132... ... ... ...

xmm1 xmm2/mem128

copy copycopy copy

Page 398: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

368 PUNPCKLWD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 399: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

PXOR 369

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PXOR Packed Logical Bitwise Exclusive OR

Performs a bitwise exclusive OR of the values in the first and second source operandsand writes the result in the destination (first source). The first source/destinationoperand is an XMM register and the second source operand is another XMM registeror 128-bit memory location.

The PXOR instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

PAND, PANDN, POR

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

PXOR xmm1, xmm2/mem128 66 0F EF /r Performs bitwise logical XOR of values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.

pxor-128.eps

XOR

xmm1 xmm2/mem128

127 0127 0

Page 400: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

370 PXOR

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 401: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

RCPPS 371

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

RCPPS Reciprocal Packed Single-PrecisionFloating-Point

Computes the approximate reciprocal of each of the four packed single-precisionfloating-point values in an XMM register or 128-bit memory location and writes theresult in the corresponding doubleword of another XMM register. The roundingcontrol bits (RC) in the MXCSR register have no effect on the result.

The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal. Asource value that is ±zero or denormal returns an infinity of the source value’s sign.Results that underflow are changed to signed zero. For both SNaN and QNaN sourceoperands, a QNaN is returned.

The RCPPS instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

RCPSS, RSQRTPS, RSQRTSS

Mnemonic Opcode Description

RCPPS xmm1, xmm2/mem128 0F 53 /r Computes reciprocals of packed single-precision floating-point values in an XMM register or 128-bit memory location and writes result in the destination XMM register.

rcpps.eps

xmm1 xmm2/mem128

reciprocal

reciprocal

reciprocal

reciprocal

127 63 0649596 3132127 63 0649596 3132

Page 402: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

372 RCPPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 403: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

RCPSS 373

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

RCPSS Reciprocal Scalar Single-PrecisionFloating-Point

Computes the approximate reciprocal of the low-order single-precision floating-pointvalue in an XMM register or in a 32-bit memory location and writes the result in thelow-order doubleword of another XMM register. The three high-order doublewords inthe destination XMM register are not modified. The rounding control bits (RC) in theMXCSR register have no effect on the result.

The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal. Asource value that is ±zero or denormal returns an infinity of the source value’s sign.Results that underflow are changed to signed zero. For both SNaN and QNaN sourceoperands, a QNaN is returned.

The RCPSS instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

RCPPS, RSQRTPS, RSQRTSS

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

RCPSS xmm1, xmm2/mem32 F3 0F 53 /r Computes reciprocal of scalar single-precision floating-point value in an XMM register or 32-bit memory location and writes the result in the destination XMM register.

rcpss.eps

xmm1 xmm2/mem32

reciprocal

127 31 032 127 31 032

Page 404: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

374 RCPSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #ACX X An unaligned memory reference was performed while

alignment checking was enabled.

Page 405: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

RSQRTPS 375

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

RSQRTPS Reciprocal Square Root Packed Single-PrecisionFloating-Point

Computes the approximate reciprocal of the square root of each of the four packedsingle-precision floating-point values in an XMM register or 128-bit memory locationand writes the result in the corresponding doubleword of another XMM register. Therounding control bits (RC) in the MXCSR register have no effect on the result.

The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal squareroot. A source value that is ±zero or denormal returns an infinity of the source value’ssign. Negative source values other than –zero and –denormal return a QNaN floating-point indefinite value (“Indefinite Values” in Volume 1). For both SNaN and QNaNsource operands, a QNaN is returned.

The RSQRTPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

RSQRTSS, SQRTPD, SQRTPS, SQRTSD, SQRTSS

Mnemonic Opcode Description

RSQRTPS xmm1, xmm2/mem128 0F 52 /r Computes reciprocals of square roots of packed single-precision floating-point values in an XMM register or 128-bit memory location and writes the result in the destination XMM register.

rsqrtps.eps

xmm1 xmm2/mem128

reciprocalsquare root reciprocal

square rootreciprocal

square rootreciprocal

square root

127 63 0649596 3132127 63 0649596 3132

Page 406: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

376 RSQRTPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 407: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

RSQRTSS 377

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

RSQRTSS Reciprocal Square Root Scalar Single-PrecisionFloating-Point

Computes the approximate reciprocal of the square root of the low-order single-precision floating-point value in an XMM register or in a 32-bit memory location andwrites the result in the low-order doubleword of another XMM register. The threehigh-order doublewords in the destination XMM register are not modified. Therounding control bits (RC) in the MXCSR register have no effect on the result.

The maximum error is less than or equal to 1.5 * 2–12 times the true reciprocal squareroot. A source value that is ±zero or denormal returns an infinity of the source value’ssign. Negative source values other than –zero and –denormal return a QNaN floating-point indefinite value (“Indefinite Values” in Volume 1). For both SNaN and QNaNsource operands, a QNaN is returned.

The RSQRTSS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

RSQRTPS, SQRTPD, SQRTPS, SQRTSD, SQRTSS

rFLAGS Affected

None

Mnemonic Opcode Description

RSQRTSS xmm1, xmm2/mem32 F3 0F 52 /r Computes reciprocal of square root of single-precision floating-point value in an XMM register or 32-bit memory location and writes the result in the destination XMM register.

rsqrtss.eps

xmm1 xmm2/mem32

reciprocalsquare root

127 31 032 127 31 032

Page 408: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

378 RSQRTSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

Page 409: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SHUFPD 379

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

SHUFPD Shuffle Packed Double-Precision Floating-Point

Moves either of the two packed double-precision floating-point values in the firstsource operand to the low-order quadword of the destination (first source) and moveseither of the two packed double-precision floating-point values in the second sourceoperand to the high-order quadword of the destination. In each case, the value of thedestination quadword is determined by the least-significant two bits in theimmediate-byte operand, as shown in Table 1-7 on page 380. The f irstsource/destination operand is an XMM register. The second source operand is anotherXMM register or 128-bit memory location.

The SHUFPD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

SHUFPD xmm1, xmm2/mem128, imm8 66 0F C6 /r ib Shuffles packed double-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and puts the result in the destination XMM register.

shufpd.eps

xmm1 xmm2/mem128

imm87 0

mux mux

127 63 064127 63 064

Page 410: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

380 SHUFPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

SHUFPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Table 1-7. Immediate-Byte Operand Encoding for SHUFPD

Destination Bits FilledImmediate-Byte

Bit FieldValue of Bit

Field Source 1 Bits Moved Source 2 Bits Moved

63–0 00 63–0 —

1 127–64 —

127–64 10 — 63–0

1 — 127–64

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 411: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SHUFPS 381

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

SHUFPS Shuffle Packed Single-Precision Floating-Point

Moves two of the four packed single-precision floating-point values in the first sourceoperand to the low-order quadword of the destination (first source) and moves two ofthe four packed single-precision floating-point values in the second source operand tothe high-order quadword of the destination. In each case, the value of the destinationdoubleword is determined by a two-bit field in the immediate-byte operand, as shownin Table 1-8 on page 382. The first source/destination operand is an XMM register. Thesecond source operand is another XMM register or 128-bit memory location.

The SHUFPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

SHUFPS xmm1, xmm2/mem128, imm8 0F C6 /r ib Shuffles packed single-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and puts the result in the destination XMM register.

shufps.eps

xmm1 xmm2/mem128

imm87 0

muxmux

127 63 0649596 3132127 63 064

Page 412: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

382 SHUFPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

SHUFPD

rFLAGS Affected

None

MXCSR Flags Affected

None

Table 1-8. Immediate-Byte Operand Encoding for SHUFPS

Destination Bits FilledImmediate-Byte

Bit FieldValue of Bit

Field Source 1 Bits Moved Source 2 Bits Moved

31–0 1–0

0 31–0 —

1 63–32 —

2 95–64 —

3 127–96 —

63–32 3–2

0 31–0 —

1 63–32 —

2 95–64 —

3 127–96 —

95–64 5–4

0 — 31–0

1 — 63–32

2 — 95–64

3 — 127–96

127–96 7–6

0 — 31–0

1 — 63–32

2 — 95–64

3 — 127–96

Page 413: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SHUFPS 383

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 414: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

384 SQRTPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

SQRTPD Square Root Packed Double-PrecisionFloating-Point

Computes the square root of each of the two packed double-precision floating-pointvalues in an XMM register or 128-bit memory location and writes the result in thecorresponding quadword of another XMM register. Taking the square root of +infinityreturns +infinity.

The SQRTPD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

RSQRTPS, RSQRTSS, SQRTPS, SQRTSD, SQRTSS

rFLAGS Affected

None

Mnemonic Opcode Description

SQRTPD xmm1, xmm2/mem128 66 0F 51 /r Computes square roots of packed double-precision floating-point values in an XMM register or 128-bit memory location and writes the result in the destination XMM register.

sqrtpd.eps

127 63 064

xmm1 xmm2/mem128

square rootsquare root

127 63 064

Page 415: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SQRTPD 385

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X A source operand was negative (not including –0).

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Precision exception (PE)X X X A result could not be represented exactly in the destination

format.

Page 416: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

386 SQRTPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

SQRTPS Square Root Packed Single-PrecisionFloating-Point

Computes the square root of each of the four packed single-precision floating-pointvalues in an XMM register or 128-bit memory location and writes the result in thecorresponding doubleword of another XMM register. Taking the square root of+infinity returns +infinity.

The SQRTPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

RSQRTPS, RSQRTSS, SQRTPD, SQRTSD, SQRTSS

rFLAGS Affected

None

Mnemonic Opcode Description

SQRTPS xmm1, xmm2/mem128 0F 51 /r Computes square roots of packed single-precision floating-point values in an XMM register or 128-bit memory location and writes the result in the destination XMM register.

sqrtps.eps

xmm1 xmm2/mem128

square root

square root

square root

square root

127 63 0649596 3132127 63 0649596 3132

Page 417: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SQRTPS 387

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X A source operand was negative (not including –0).

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Precision exception (PE)X X X A result could not be represented exactly in the destination

format.

Page 418: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

388 SQRTSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

SQRTSD Square Root Scalar Double-PrecisionFloating-Point

Computes the square root of the low-order double-precision floating-point value in anXMM register or in a 64-bit memory location and writes the result in the low-orderquadword of another XMM register. The high-order quadword of the destination XMMregister is not modified. Taking the square root of +infinity returns +infinity.

The SQRTSD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

RSQRTPS, RSQRTSS, SQRTPD, SQRTPS, SQRTSS

rFLAGS Affected

None

Mnemonic Opcode Description

SQRTSD xmm1, xmm2/mem64 F2 0F 51 /r Computes square root of double-precision floating-point value in an XMM register or 64-bit memory location and writes the result in the destination XMM register.

sqrtsd.eps

xmm1 xmm2/mem64

127 63 064 127 63 064

square root

Page 419: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SQRTSD 389

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions.

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X A source operand was negative (not including –0).

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Precision exception (PE)X X X A result could not be represented exactly in the destination

format.

Page 420: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

390 SQRTSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

SQRTSS Square Root Scalar Single-PrecisionFloating-Point

Computes the square root of the low-order single-precision floating-point value in anXMM register or 32-bit memory location and writes the result in the low-orderdoubleword of another XMM register. The three high-order doublewords of thedestination XMM register are not modified. Taking the square root of +infinityreturns +infinity.

The SQRTSS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

RSQRTPS, RSQRTSS, SQRTPD, SQRTPS, SQRTSD

rFLAGS Affected

None

Mnemonic Opcode Description

SQRTSS xmm1, xmm2/mem32 F3 0F 51 /r Computes square root of single-precision floating-point value in an XMM register or 32-bit memory location and writes the result in the destination XMM register.

sqrtss.eps

xmm1 xmm2/mem32

127 31 032 127 31 032

square root

Page 421: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SQRTSS 391

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X A source operand was negative (not including –0).

Denormalized-operand exception (DE)

X X XA source operand was a denormal value.

Precision exception (PE)X X X A result could not be represented exactly in the destination

format.

Page 422: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

392 STMXCSR

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

STMXCSR Store MXCSR Control/Status Register

Saves the contents of the MXCSR register in a 32-bit location in memory. The MXCSRregister is described in “Registers” in Volume 1.

The STMXCSR instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

LDMXCSR

rFLAGS Affected

None

MXCSR Flags Affected

None

Exceptions

Mnemonic Opcode Description

STMXCSR mem32 0F AE /3 Stores contents of MXCSR in 32-bit memory location.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X The destination operand was in a non-writable segment.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

Page 423: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SUBPD 393

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

SUBPD Subtract Packed Double-Precision Floating-Point

Subtracts each packed double-precision floating-point value in the second sourceoperand from the corresponding packed double-precision floating-point value in thefirst source operand and writes the result of each subtraction in the correspondingquadword of the destination (first source). The first source/destination operand is anXMM register. The second source operand is another XMM register or 128-bit memorylocation.

The SUBPD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

SUBPS, SUBSD, SUBSS

rFLAGS Affected

None

Mnemonic Opcode Description

SUBPD xmm1, xmm2/mem128 66 0F 5C /r Subtracts packed double-precision floating-point values in an XMM register or 128-bit memory location from packed double-precision floating-point values in another XMM register and writes the result in the destination XMM register.

subpd.eps

127 63 064 127 63 064

xmm1 xmm2/mem128

subtract

subtract

Page 424: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

394 SUBPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was subtracted from +infinity.

X X X –infinity was subtracted from –infinity.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 425: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SUBPD 395

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 426: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

396 SUBPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

SUBPS Subtract Packed Single-Precision Floating-Point

Subtracts each packed single-precision floating-point value in the second sourceoperand from the corresponding packed single-precision floating-point value in thefirst source operand and writes the result of each subtraction in the correspondingdoubleword of the destination (first source). The first source/destination operand is anXMM register. The second source operand is another XMM register or 128-bit memorylocation.

The SUBPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

SUBPD, SUBSD, SUBSS

rFLAGS Affected

None

Mnemonic Opcode Description

SUBPS xmm1, xmm2/mem128 0F 5C /r Subtracts packed single-precision floating-point values in an XMM register or 128-bit memory location from packed single-precision floating-point values in another XMM register and writes the result in the destination XMM register.

subps.eps

xmm1 xmm2/mem128

subtract

subtract

subtract

subtract

127 63 0649596 3132127 63 0649596 3132

Page 427: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SUBPS 397

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was subtracted from +infinity.

X X X –infinity was subtracted from –infinity.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 428: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

398 SUBPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 429: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SUBSD 399

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

SUBSD Subtract Scalar Double-Precision Floating-Point

Subtracts the double-precision floating-point value in the low-order quadword of thesecond source operand from the double-precision floating-point value in the low-orderquadword of the first source operand and writes the result in the low-order quadwordof the destination (first source). The high-order quadword of the destination is notmodified. The first source/destination operand is an XMM register. The second sourceoperand is another XMM register or 64-bit memory location.

The SUBSD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

SUBPD, SUBPS, SUBSS

rFLAGS Affected

None

Mnemonic Opcode Description

SUBSD xmm1, xmm2/mem64 F2 0F 5C /r Subtracts low-order double-precision floating-point value in an XMM register or in a 64-bit memory location from low-order double-precision floating-point value in another XMM register and writes the result in the destination XMM register.

subsd.eps

xmm1 xmm2/mem64

subtract

127 63 064 127 63 064

Page 430: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

400 SUBSD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was subtracted from +infinity.

X X X –infinity was subtracted from –infinity.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 431: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SUBSD 401

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 432: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

402 SUBSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

SUBSS Subtract Scalar Single-Precision Floating-Point

Subtracts the single-precision floating-point value in the low-order doubleword of thesecond source operand from the single-precision floating-point value in the low-orderdoubleword of the first source operand and writes the result in the low-orderdoubleword of the destination (first source). The three high-order doublewords of thedestination are not modified. The first source/destination operand is an XMM register.The second source operand is another XMM register or 32-bit memory location.

The SUBSS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

SUBPD, SUBPS, SUBSD

rFLAGS Affected

None

Mnemonic Opcode Description

SUBSS xmm1, xmm2/mem32 F3 0F 5C /r Subtracts low-order single-precision floating-point value in an XMM register or in a 32-bit memory location from low-order single-precision floating-point value in another XMM register and writes the result in the destination XMM register.

subss.eps

xmm1 xmm2/mem32

subtract

127 31 032 127 31 032

Page 433: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

SUBSS 403

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

MXCSR Flags Affected

Exceptions

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M M M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

X X X +infinity was subtracted from +infinity.

X X X –infinity was subtracted from –infinity.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 434: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

404 SUBSS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Overflow exception (OE) X X X A rounded result was too large to fit into the format of the destination operand.

Underflow exception (UE) X X X A rounded result was too small to fit into the format of the destination operand.

Precision exception (PE) X X X A result could not be represented exactly in the destination format.

Exception RealVirtual8086 Protected Cause of Exception

Page 435: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

UCOMISD 405

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

UCOMISD Unordered Compare ScalarDouble-Precision Floating-Point

Performs an unordered compare of the double-precision floating-point value in thelow-order 64 bits of an XMM register with the double-precision floating-point value inthe low-order 64 bits of another XMM register or a 64-bit memory location and sets theZF, PF, and CF bits in the rFLAGS register to reflect the result of the compare. The OF,AF, and SF bits in rFLAGS are set to zero. The result is unordered if one or both of theoperand values is a NaN.

If the instruction causes an unmasked SIMD floating-point exception (#XF), therFLAGS bits are not updated.

The UCOMISD instruction is an SSE2 instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Mnemonic Opcode Description

UCOMISD xmm1, xmm2/mem64 66 0F 2E /r Compares scalar double-precision floating-point values in an XMM register and an XMM register or 64-bit memory location. Sets rFLAGS.

ucomisd.eps

compare

127 63 064

03163

127 63 064

xmm1

rFLAGS0

xmm2/mem64

Page 436: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

406 UCOMISD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Related Instructions

CMPPD, CMPPS, CMPSD, CMPSS, COMISD, COMISS, UCOMISS

rFLAGS Affected

MXCSR Flags Affected

Result of Compare ZF PF CF

Unordered 1 1 1

Greater Than 0 0 0

Less Than 0 0 1

Equal 1 0 0

ID VIP VIF AC VM RF NT IOPL OF DF IF TF SF ZF AF PF CF

0 0 M 0 M M

21 20 19 18 17 16 14 13–12 11 10 9 8 7 6 4 2 0

Note: Bits 31–22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Page 437: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

UCOMISD 407

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 438: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

408 UCOMISS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Performs an unordered compare of the single-precision floating-point value in the low-order 32 bits of an XMM register with the single-precision floating-point value in thelow-order 32 bits of another XMM register or a 32-bit memory location and sets the ZF,PF, and CF bits in the rFLAGS register to reflect the result. The OF, AF, and SF bits inrFLAGS are set to zero. The result is unordered if one or both of the operand values isa NaN.

If the instruction causes an unmasked SIMD floating-point exception (#XF), therFLAGS bits are not updated.

The UCOMISS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

UCOMISS Unordered Compare ScalarSingle-Precision Floating-Point

Mnemonic Opcode Description

UCOMISS xmm1, xmm2/mem32 0F 2E /r Compares scalar single-precision floating-point values in an XMM register and an XMM register or 32-bit memory location. Sets rFLAGS.

ucomiss.eps

compare

127 031 127 031xmm1 xmm2/mem32

03163

rFLAGS0

Page 439: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

UCOMISS 409

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Related Instructions

CMPPD, CMPPS, CMPSD, CMPSS, COMISD, COMISS, UCOMISD

rFLAGS Affected

MXCSR Flags Affected

Result of Compare ZF PF CF

Unordered 1 1 1

Greater Than 0 0 0

Less Than 0 0 1

Equal 1 0 0

ID VIP VIF AC VM RF NT IOPL OF DF IF TF SF ZF AF PF CF

0 0 M 0 M M

21 20 19 18 17 16 14 13–12 11 10 9 8 7 6 4 2 0

Note: Bits 31–22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. If the instruction causes an unmasked SIMD floating-point exception (#XF), the rFLAGS bits are not updated.

FZ RC PM UM OM ZM DM IM DAZ PE UE OE ZE DE IE

M M

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Note: A flag that may be set to one or cleared to zero is M (modified). Unaffected flags are blank.

Page 440: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

410 UCOMISS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 0.See SIMD Floating-Point Exceptions, below, for details.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP X X X A memory address exceeded a data segment limit or was

non-canonical.

X A null data segment was used to reference memory.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Alignment check, #AC X X An unaligned memory reference was performed while alignment checking was enabled.

SIMD Floating-Point Exception, #XF

X X X There was an unmasked SIMD floating-point exception while CR4.OSXMMEXCPT = 1.See SIMD Floating-Point Exceptions, below, for details.

SIMD Floating-Point Exceptions

Invalid-operation exception (IE)

X X X A source operand was an SNaN value.

Denormalized-operand exception (DE)

X X X A source operand was a denormal value.

Page 441: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

UNPCKHPD 411

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

UNPCKHPD Unpack High Double-Precision Floating-Point

Unpacks the high-order double-precision floating-point values in the first and secondsource operands and packs them into quadwords in the destination (first source). Thevalue from the first source operand is packed into the low-order quadword of thedestination, and the value from the second source operand is packed into the high-order quadword of the destination. The low-order quadwords of the source operandsare ignored. The first source/destination operand is an XMM register. The secondsource operand is another XMM register or 128-bit memory location.

The UNPCKHPD instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

UNPCKHPS, UNPCKLPD, UNPCKLPS

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

UNPCKHPD xmm1, xmm2/mem128 66 0F 15 /r Unpacks high-order double-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and packs them into the destination XMM register.

unpckhpd.eps

127 63 064127 63 064

63 064127

xmm1 xmm2/mem128

copy copy

Page 442: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

412 UNPCKHPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDXX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 443: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

UNPCKHPS 413

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

UNPCKHPS Unpack High Single-Precision Floating-Point

Unpacks the high-order single-precision floating-point values in the first and secondsource operands and packs them into interleaved doublewords in the destination (firstsource). The low-order quadwords of the source operands are ignored. The firstsource/destination operand is an XMM register. The second source operand is anotherXMM register or 128-bit memory location.

The UNPCKHPS instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

UNPCKHPD, UNPCKLPD, UNPCKLPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

UNPCKHPS xmm1, xmm2/mem128 0F 15 /r Unpacks high-order single-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and packs them into the destination XMM register.

unpckhps.eps127 63 0649596 3132

127 63 0649596127 63 0649596

xmm1 xmm2/mem128

copy copycopy copy

Page 444: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

414 UNPCKHPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 445: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

UNPCKLPD 415

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

UNPCKLPD Unpack Low Double-Precision Floating-Point

Unpacks the low-order double-precision floating-point values in the first and secondsource operands and packs them into the destination (first source). The value from thefirst source operand is packed into the low-order quadword of the destination, and thevalue from the second source operand is packed into the high-order quadword of thedestination. The high-order quadwords of the source operands are ignored. The firstsource/destination operand is an XMM register. The second source operand is anotherXMM register or 128-bit memory location.

The UNPCKLPD instruction is an SSE2 instruction. The presence of this instructionset is indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

UNPCKHPD, UNPCKHPS, UNPCKLPS

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

UNPCKLPD xmm1, xmm2/mem128 66 0F 14 /r Unpacks low-order double-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and packs them into the destination XMM register.

unpcklpd.eps

127 63 064127 63 064

63 064127

xmm1 xmm2/mem128

copy copy

Page 446: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

416 UNPCKLPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 447: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

UNPCKLPS 417

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

UNPCKLPS Unpack Low Single-Precision Floating-Point

Unpacks the low-order single-precision floating-point values in the first and secondsource operands and packs them into interleaved doublewords in the destination (firstsource). The high-order quadwords of the source operands are ignored. The firstsource/destination operand is an XMM register. The second source operand is anotherXMM register or 128-bit memory location

The UNPCKLPS instruction is an SSE instruction. The presence of this instruction setis indicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

UNPCKHPD, UNPCKHPS, UNPCKLPD

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

UNPCKLPS xmm1, xmm2/mem128 0F 14 /r Unpacks low-order single-precision floating-point values in an XMM register and another XMM register or 128-bit memory location and packs them into the destination XMM register.

unpcklps.eps127 63 0649596 3132

127 63 064 3132127 63 064 3132

xmm1 xmm2/mem128

copycopy copycopy

Page 448: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

418 UNPCKLPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 449: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

XORPD 419

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

XORPD Logical Bitwise Exclusive ORPacked Double-Precision Floating-Point

Performs a bitwise logical Exclusive OR of the two packed double-precision floating-point values in the first source operand and the corresponding two packed double-precision floating-point values in the second source operand and writes the result inthe destination (first source). The first source/destination operand is an XMM register.The second source operand is another XMM register or 128-bit memory location.

The XORPD instruction is an SSE2 instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ANDNPD, ANDNPS, ANDPD, ANDPS, ORPD, ORPS, XORPS

rFLAGS Affected

None

MXCSR Flags Affected

None

Mnemonic Opcode Description

XORPD xmm1, xmm2/mem128 66 0F 57 /r Performs bitwise logical XOR of two packed double-precision floating-point values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.

xorpd.eps

127 63 064 127 63 064

xmm1 xmm2/mem128

XOR

XOR

Page 450: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

420 XORPD

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE2 instructions are not supported, as indicated by EDX bit 26 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0.

Device not available, #NM

X X X The task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was non-canonical.

General protection, #GP

X X X A memory address exceeded a data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 451: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

XORPS 421

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

XORPS Logical Bitwise Exclusive ORPacked Single-Precision Floating-Point

Performs a bitwise Exclusive OR of the four packed single-precision floating-pointvalues in the first source operand and the corresponding four packed single-precisionfloating-point values in the second source operand and writes the result in thedestination (first source). The first source/destination operand is an XMM register.The second source operand is another XMM register or 128-bit memory location.

The XORPS instruction is an SSE instruction. The presence of this instruction set isindicated by a CPUID feature bit. (See “CPUID” in Volume 3.)

Related Instructions

ANDNPD, ANDNPS, ANDPD, ANDPS, ORPD, ORPS, XORPD

rFLAGS Affected

None

MXCSR Flags Affected

Mnemonic Opcode Description

XORPS xmm1, xmm2/mem128 0F 57 /r Performs bitwise logical XOR of four packed single-precision floating-point values in an XMM register and in another XMM register or 128-bit memory location and writes the result in the destination XMM register.

xorps.eps

xmm1 xmm2/mem128

XOR

XOR

XOR

XOR

127 63 0649596 3132127 63 0649596 3132

Page 452: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

422 XORPS

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

None

Exceptions

Exception RealVirtual8086 Protected Cause of Exception

Invalid opcode, #UD

X X X The SSE instructions are not supported, as indicated by EDX bit 25 of CPUID standard function 1.

X X X The emulate bit (EM) of CR0 was set to 1.

X X X The operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 was cleared to 0.

Device not available, #NM

X X XThe task-switch bit (TS) of CR0 was set to 1.

Stack, #SS X X X A memory address exceeded the stack segment limit or was

non-canonical.

General protection, #GP

X X X A memory address exceeded the data segment limit or was non-canonical.

X A null data segment was used to reference memory.

X X X The memory operand was not aligned on a 16-byte boundary.

Page fault, #PF X X A page fault resulted from the execution of the instruction.

Page 453: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Index 423

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

Numerics16-bit mode................................................ xvii32-bit mode................................................ xvii64-bit mode................................................ xvii

AADDPD .......................................................... 3ADDPS........................................................... 6addressing

RIP-relative .......................................... xxiiiADDSD .......................................................... 9ADDSS ......................................................... 12ADDSUBPD................................................. 15ADDSUBPS ................................................. 18ANDNPD ..................................................... 21ANDNPS ...................................................... 23ANDPD ........................................................ 25ANDPS......................................................... 27

Bbiased exponent........................................ xvii

CCMPPD ........................................................ 29CMPPS......................................................... 33CMPSD ........................................................ 36CMPSS ......................................................... 39COMISD....................................................... 42COMISS ....................................................... 45commit ..................................................... xviiicompatibility mode................................... xviiCVTDQ2PD ................................................. 48CVTDQ2PS.................................................. 50CVTPD2DQ ................................................. 52CVTPD2PI ................................................... 54CVTPD2PS .................................................. 57CVTPI2PD ................................................... 60CVTPI2PS.................................................... 62CVTPS2DQ.................................................. 64CVTPS2PD .................................................. 67CVTPS2PI.................................................... 69CVTSD2SI ................................................... 72CVTSD2SS................................................... 75CVTSI2SD ................................................... 78CVTSI2SS .................................................... 81CVTSS2SD................................................... 84CVTSS2SI .................................................... 86CVTTPD2DQ............................................... 89CVTTPD2PI................................................. 91

CVTTPS2DQ............................................... 94CVTTPS2PI................................................. 97CVTTSD2SI............................................... 100CVTTSS2SI ............................................... 103

Ddirect referencing.................................... xviiidisplacements.......................................... xviiiDIVPD ....................................................... 106DIVPS........................................................ 109DIVSD ....................................................... 112DIVSS........................................................ 115double quadword..................................... xviiidoubleword .............................................. xviii

EeAX–eSP register ..................................... xxveffective address size................................ xixeffective operand size............................... xixeFLAGS register....................................... xxveIP register ............................................... xxvelement ...................................................... xixendian order ........................................... xxviiexceptions .................................................. xixexponent ................................................... xvii

Fflush............................................................ xixFXRSTOR ................................................. 118FXSAVE .................................................... 120

HHADDPD................................................... 122HADDPS ................................................... 124HSUBPD.................................................... 127HSUBPS .................................................... 130

IIGN .............................................................. xxindirect ........................................................ xxinstructions

128-bit media ............................................. 1SSE ............................................................. 1SSE-2 .......................................................... 1

LLDDQU...................................................... 133LDMXCSR ................................................ 135legacy mode ................................................ xxlegacy x86 ................................................... xxlong mode.................................................... xx

Index

Page 454: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

424 Index

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005

LSB.............................................................. xxilsb................................................................. xx

Mmask............................................................ xxiMASKMOVDQU ....................................... 137MAXPD...................................................... 140MAXPS ...................................................... 142MAXSD...................................................... 144MAXSS ...................................................... 146MBZ............................................................. xxiMINPD ....................................................... 148MINPS........................................................ 150MINSD ....................................................... 152MINSS........................................................ 154modes

16-bit ....................................................... xvii32-bit ....................................................... xvii64-bit ....................................................... xviicompatibility .......................................... xviilegacy ........................................................ xxlong............................................................ xxprotected ................................................ xxiireal .......................................................... xxiivirtual-8086 ........................................... xxiv

moffset........................................................ xxiMOVAPD.................................................... 156MOVAPS .................................................... 158MOVD ........................................................ 161MOVDDUP ................................................ 164MOVDQ2Q................................................. 166MOVDQA................................................... 168MOVDQU................................................... 170MOVHLPS ................................................. 172MOVHPD................................................... 174MOVHPS.................................................... 176MOVLHPS ................................................. 178MOVLPD.................................................... 180MOVLPS .................................................... 182MOVMSKPD.............................................. 184MOVMSKPS .............................................. 186MOVNTDQ ................................................ 188MOVNTPD................................................. 190MOVNTPS ................................................. 192MOVQ ........................................................ 194MOVQ2DQ................................................. 196MOVSD ...................................................... 198MOVSHDUP.............................................. 201MOVSLDUP .............................................. 203MOVSS....................................................... 205MOVUPD................................................... 208MOVUPS.................................................... 211

MSB ............................................................ xximsb ............................................................. xxiMSR.......................................................... xxviMULPD ..................................................... 214MULPS...................................................... 217MULSD...................................................... 220MULSS ...................................................... 223

Ooctword....................................................... xxioffset........................................................... xxiORPD......................................................... 226ORPS ......................................................... 228overflow..................................................... xxii

Ppacked ....................................................... xxiiPACKSSDW............................................... 230PACKSSWB............................................... 232PACKUSWB .............................................. 234PADDB....................................................... 236PADDD ...................................................... 238PADDQ ...................................................... 240PADDSB .................................................... 242PADDSW ................................................... 244PADDUSB ................................................. 246PADDUSW ................................................ 248PADDW ..................................................... 250PAND......................................................... 252PANDN ...................................................... 254PAVGB ....................................................... 256PAVGW...................................................... 258PCMPEQB................................................. 260PCMPEQD ................................................ 262PCMPEQW................................................ 264PCMPGTB................................................. 266PCMPGTD................................................. 268PCMPGTW................................................ 270PEXTRW ................................................... 272PINSRW .................................................... 274PMADDWD............................................... 277PMAXSW .................................................. 279PMAXUB................................................... 281PMINSW.................................................... 283PMINUB.................................................... 285PMOVMSKB ............................................. 287PMULHUW............................................... 289PMULHW.................................................. 291PMULLW................................................... 293PMULUDQ................................................ 295POR ........................................................... 297protected mode ........................................ xxiiPSADBW ................................................... 299

Page 455: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

Index 425

26568—Rev. 3.07—December 2005 AMD 64-Bit Technology

PSHUFD .................................................... 302PSHUFHW ................................................ 305PSHUFLW ................................................. 308PSLLD........................................................ 311PSLLDQ..................................................... 313PSLLQ........................................................ 315PSLLW ....................................................... 317PSRAD....................................................... 320PSRAW ...................................................... 323PSRLD ....................................................... 326PSRLDQ .................................................... 329PSRLQ ....................................................... 331PSRLW....................................................... 334PSUBB ....................................................... 337PSUBD ....................................................... 339PSUBQ ....................................................... 341PSUBSB ..................................................... 343PSUBSW .................................................... 345PSUBUSB .................................................. 347PSUBUSW ................................................. 349PSUBW ...................................................... 351PUNPCKHBW........................................... 353PUNPCKHDQ ........................................... 355PUNPCKHQDQ ........................................ 357PUNPCKHWD .......................................... 359PUNPCKLBW ........................................... 361PUNPCKLDQ............................................ 363PUNPCKLQDQ ......................................... 365PUNPCKLWD ........................................... 367PXOR ......................................................... 369

Qquadword................................................... xxii

Rr8–r15........................................................ xxvirAX–rSP.................................................... xxviRAZ............................................................ xxiiRCPPS ....................................................... 371RCPSS........................................................ 373real address mode. See real modereal mode................................................... xxiiregisters

eAX–eSP................................................. xxveFLAGS................................................... xxveIP ........................................................... xxvr8–r15..................................................... xxvirAX–rSP................................................. xxvirFLAGS................................................. xxviirIP.......................................................... xxvii

relative....................................................... xxiireserved ..................................................... xxiirevision history ......................................... xiii

rFLAGS register ..................................... xxviirIP register.............................................. xxviiRIP-relative addressing .......................... xxiiiRSQRTPS.................................................. 375RSQRTSS .................................................. 377

Sset ............................................................. xxiiiSHUFPD.................................................... 379SHUFPS .................................................... 381SQRTPD .................................................... 384SQRTPS..................................................... 386SQRTSD .................................................... 388SQRTSS..................................................... 390SSE ........................................................... xxiiiSSE2 ......................................................... xxiiiSSE3 ......................................................... xxiiisticky bits ................................................. xxivSTMXCSR................................................. 392SUBPD....................................................... 393SUBPS ....................................................... 396SUBSD....................................................... 399SUBSS ....................................................... 402

TTSS............................................................ xxiv

UUCOMISD ................................................. 405UCOMISS.................................................. 408underflow................................................. xxivUNPCKHPD.............................................. 411UNPCKHPS .............................................. 413UNPCKLPD .............................................. 415UNPCKLPS............................................... 417

Vvector........................................................ xxivvirtual-8086 mode ................................... xxiv

XXORPD...................................................... 419XORPS ...................................................... 421

Page 456: AMD64 Architecture Programmer’s Manual, Volume 4, …culikzde/amd/26568.pdf · AMD64 Technology AMD64 Architecture Programmer’s Manual Volume 4: 128-Bit Media Instructions Publication

426 Index

AMD 64-Bit Technology 26568—Rev. 3.07—December 2005