[permalink] [id link]
SSE2 adds new math instructions for double-precision ( 64-bit ) floating point and also extends MMX integer instructions to operate on 128-bit XMM registers.
from
Wikipedia
Some Related Sentences
SSE2 and new
Until SSE2, SSE integer instructions introduced with later SSE extensions could still operate on 64-bit MMX registers because the new XMM registers require operating system support.
SSE2 and math
SSE2 enables the programmer to perform SIMD math on any data type ( from 8-bit integer to 64-bit float ) entirely with the XMM vector-register file, without the need to use the legacy MMX or FPU registers.
If codes designed for x87 are ported to the lower precision double precision SSE2 floating point, certain combinations of math operations or input datasets can result in measurable numerical deviation, which can be an issue in reproducible scientific computations, e. g. if the calculation results must be compared against results generated from a different machine architecture.
SSE2 and instructions
SIMD instructions can be found, to one degree or another, on most CPUs, including the IBM's AltiVec and SPE for PowerPC, HP's PA-RISC Multimedia Acceleration eXtensions ( MAX ), Intel's MMX and iwMMXt, SSE, SSE2, SSE3 SSSE3 and SSE4. x, AMD's 3DNow !, ARC's ARC Video subsystem, SPARC's VIS and VIS2, Sun's MAJC, ARM's NEON technology, MIPS ' MDMX ( MaDMaX ) and MIPS-3D.
Thirty-two 128-bit vector registers are provided, compared to eight for SSE and SSE2 ( extended to 16 in x86-64 ), and most AltiVec instructions take three register operands compared to only two register / register or register / memory operands on IA-32.
Many programmers consider SSE2 to be " everything SSE should have been ", as SSE2 offers an orthogonal set of instructions for dealing with common data types.
* SSE3, also called Prescott New Instructions ( PNI ), is an incremental upgrade to SSE2, adding a handful of DSP-oriented mathematics instructions and some process ( thread ) management instructions.
In practice it is typical to use instructions which will execute on anything later than an Intel 80386 ( or fully compatible clone ) processor or else anything later than an Intel Pentium ( or compatible clone ) processor but in recent years various operating systems and application software have begun to require more modern processors or at least support for later specific extensions to the instruction set ( e. g. MMX, 3DNow !, SSE / SSE2 / SSE3 ).
SSE2 extends MMX instructions to operate on XMM registers, allowing the programmer to completely avoid the eight 64-bit MMX registers " aliased " on the original IA-32 floating point register stack.
Other SSE2 extensions include a set of cache-control instructions intended primarily to minimize cache pollution when processing indefinite streams of information, and a sophisticated complement of numeric format conversion instructions.
Two major reasons are: accessing SSE2 data in memory not aligned to a 16-byte boundary can incur significant penalty, and the throughput of SSE2 instructions in older x86 implementations was half that for MMX instructions.
Since GCC 3, GCC can automatically generate SSE / SSE2 scalar code when the target supports those instructions.
The Sun Studio Compiler Suite can also generate SSE2 instructions when the compiler flag-xvector = simd is used.
SSE2 and for
Thus SSE2 is much more suitable for scientific calculations than either SSE1 or 3DNow !, which were limited to only single precision.
SSE2, introduced with the Pentium 4, further extended the x86 SIMD instruction set with integer ( 8 / 16 / 32 bit ) and double-precision floating-point data support for the XMM register file.
Rival chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.
As SSE2 does not have this problem, usually provides much better throughput and provides more registers in 64-bit code, it should be preferred for nearly all vectorization work.
SSE2 and the other SIMD instruction sets were intended primarily to improve CPU support for realtime graphics, notably gaming.
A CPU that is not marketed for this purpose or that has an alternative SIMD instruction set has no need for SSE2.
In Windows 8 operating system the NX feature, together with PAE and SSE2, is a hardware requirement for installing the OS.
Lazy Assembler is a freeware assembler not associated with Borland which is compatible with TASM ideal mode but with support for newer instructions not supported by TASM: MMX, SSE, SSE2, SSE3 ( PNI ), SSE4 ( MNI ), 3DNow! Pro.
This implementation includes Altivec accelerated code for PowerPC G4 and G5 processors that speeds up comparisons 10 – 20-fold, using a modification of the Wozniak, 1997 approach, and an SSE2 vectorization developed by Farrar making optimal protein sequence database searches quite practical.
Since the introduction of SSE2, the x87 instructions are not as essential as they once were, but remain important as a scalar unit for numerical calculations sensitive to round-off error and requiring the 64-bit mantissa precision available in the 80-bit format.
SSE2 and 64-bit
Unlike SSE2, AltiVec supports a special RGB " pixel " data type, but it does not operate on 64-bit double precision floats, and there is no way to move data directly between scalar and vector registers.
SSE2 and also
SSE2 also allowed the MMX opcodes to use XMM register operands, but ended this support with SSE4 ( and recently with SSE4. 2, introduced in the Core microarchitecture.
SSE2 and MMX
The addition of integer support in SSE2 made MMX largely redundant, though further performance increases can be attained in some situations by using MMX in parallel with SSE operations.
This design is very different from comparable extensions on CISC processors, such as MMX, SSE, SSE2, SSE3, SSE4, 3DNow !.
In this respect VIS is more similar to the design of MMX than other SIMD architectures such as SSE / SSE2 / AltiVec.
It uses VIS on SPARC platforms ( and MMX / SSE / SSE2 on x86 / x64 platforms ) to accelerate multimedia application execution
Internally, the Efficeon had two arithmetic logic units, two load / store / add units, two execute units, two floating-point / MMX / SSE / SSE2 units, one branch prediction unit, one alias unit, and one control unit.
Although one SSE2 instruction can operate on twice as much data as an MMX instruction, performance might not increase significantly.
Internally, the Efficeon has two arithmetic logic units, two load / store / add units, two execute units, two floating-point / MMX / SSE / SSE2 units, one branch prediction unit, one alias unit, and one control unit.
implement intrinsics that map directly to the x86 SIMD instructions ( MMX, SSE, SSE2, SSE3, SSSE3, SSE4 ).
0.223 seconds.