SSE2-adds-new-math-instructions-for-double-precisi

[permalink] [id link]

+ −

Page "Streaming SIMD Extensions" ¶ 55

from Wikipedia

Promote Demote Fragment Fix

« More previous Okay Cancel More next »

Some Related Sentences

SSE2 and new

Until SSE2, SSE integer instructions introduced with later SSE extensions could still operate on 64-bit MMX registers because the new XMM registers require operating system support.

SSE2 added 144 new instructions to SSE, which has 70 instructions.

SSE3 contains 13 new instructions over SSE2.

SSE2 and math

SSE2 enables the programmer to perform SIMD math on any data type ( from 8-bit integer to 64-bit float ) entirely with the XMM vector-register file, without the need to use the legacy MMX or FPU registers.

If codes designed for x87 are ported to the lower precision double precision SSE2 floating point, certain combinations of math operations or input datasets can result in measurable numerical deviation, which can be an issue in reproducible scientific computations, e. g. if the calculation results must be compared against results generated from a different machine architecture.

SSE2 and instructions

SIMD instructions can be found, to one degree or another, on most CPUs, including the IBM's AltiVec and SPE for PowerPC, HP's PA-RISC Multimedia Acceleration eXtensions ( MAX ), Intel's MMX and iwMMXt, SSE, SSE2, SSE3 SSSE3 and SSE4. x, AMD's 3DNow !, ARC's ARC Video subsystem, SPARC's VIS and VIS2, Sun's MAJC, ARM's NEON technology, MIPS ' MDMX ( MaDMaX ) and MIPS-3D.

Thirty-two 128-bit vector registers are provided, compared to eight for SSE and SSE2 ( extended to 16 in x86-64 ), and most AltiVec instructions take three register operands compared to only two register / register or register / memory operands on IA-32.

Many programmers consider SSE2 to be " everything SSE should have been ", as SSE2 offers an orthogonal set of instructions for dealing with common data types.

* SSE3, also called Prescott New Instructions ( PNI ), is an incremental upgrade to SSE2, adding a handful of DSP-oriented mathematics instructions and some process ( thread ) management instructions.

In practice it is typical to use instructions which will execute on anything later than an Intel 80386 ( or fully compatible clone ) processor or else anything later than an Intel Pentium ( or compatible clone ) processor but in recent years various operating systems and application software have begun to require more modern processors or at least support for later specific extensions to the instruction set ( e. g. MMX, 3DNow !, SSE / SSE2 / SSE3 ).

For example JIT can choose SSE2 CPU instructions when it detects that the CPU supports them.

SSE2 extends MMX instructions to operate on XMM registers, allowing the programmer to completely avoid the eight 64-bit MMX registers " aliased " on the original IA-32 floating point register stack.

Other SSE2 extensions include a set of cache-control instructions intended primarily to minimize cache pollution when processing indefinite streams of information, and a sophisticated complement of numeric format conversion instructions.

SSE2 extends MMX instructions to operate on XMM registers.

Two major reasons are: accessing SSE2 data in memory not aligned to a 16-byte boundary can incur significant penalty, and the throughput of SSE2 instructions in older x86 implementations was half that for MMX instructions.

Since GCC 3, GCC can automatically generate SSE / SSE2 scalar code when the target supports those instructions.

The Sun Studio Compiler Suite can also generate SSE2 instructions when the compiler flag-xvector = simd is used.

* SSE2 instructions

SSE2 and for

Thus SSE2 is much more suitable for scientific calculations than either SSE1 or 3DNow !, which were limited to only single precision.

SSE2, introduced with the Pentium 4, further extended the x86 SIMD instruction set with integer ( 8 / 16 / 32 bit ) and double-precision floating-point data support for the XMM register file.

Rival chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.

As SSE2 does not have this problem, usually provides much better throughput and provides more registers in 64-bit code, it should be preferred for nearly all vectorization work.

Automatic vectorization for SSE / SSE2 has been added since GCC 4.

SSE2 and the other SIMD instruction sets were intended primarily to improve CPU support for realtime graphics, notably gaming.

A CPU that is not marketed for this purpose or that has an alternative SIMD instruction set has no need for SSE2.

In Windows 8 operating system the NX feature, together with PAE and SSE2, is a hardware requirement for installing the OS.

Lazy Assembler is a freeware assembler not associated with Borland which is compatible with TASM ideal mode but with support for newer instructions not supported by TASM: MMX, SSE, SSE2, SSE3 ( PNI ), SSE4 ( MNI ), 3DNow! Pro.

* Support for SSE2 and SSE3 extended instructions.

This implementation includes Altivec accelerated code for PowerPC G4 and G5 processors that speeds up comparisons 10 – 20-fold, using a modification of the Wozniak, 1997 approach, and an SSE2 vectorization developed by Farrar making optimal protein sequence database searches quite practical.

Since the introduction of SSE2, the x87 instructions are not as essential as they once were, but remain important as a scalar unit for numerical calculations sensitive to round-off error and requiring the 64-bit mantissa precision available in the 80-bit format.

SSE2 and 64-bit

Unlike SSE2, AltiVec supports a special RGB " pixel " data type, but it does not operate on 64-bit double precision floats, and there is no way to move data directly between scalar and vector registers.

SSE2 and also

SSE2 also allowed the MMX opcodes to use XMM register operands, but ended this support with SSE4 ( and recently with SSE4. 2, introduced in the Core microarchitecture.

All known x86-64 CPUs also implement SSE2.

SSE2 and MMX

The addition of integer support in SSE2 made MMX largely redundant, though further performance increases can be attained in some situations by using MMX in parallel with SSE operations.

This design is very different from comparable extensions on CISC processors, such as MMX, SSE, SSE2, SSE3, SSE4, 3DNow !.

In this respect VIS is more similar to the design of MMX than other SIMD architectures such as SSE / SSE2 / AltiVec.

It uses VIS on SPARC platforms ( and MMX / SSE / SSE2 on x86 / x64 platforms ) to accelerate multimedia application execution

Internally, the Efficeon had two arithmetic logic units, two load / store / add units, two execute units, two floating-point / MMX / SSE / SSE2 units, one branch prediction unit, one alias unit, and one control unit.

Therefore, it is possible to convert all existing MMX code to an SSE2 equivalent.

Although one SSE2 instruction can operate on twice as much data as an MMX instruction, performance might not increase significantly.

Internally, the Efficeon has two arithmetic logic units, two load / store / add units, two execute units, two floating-point / MMX / SSE / SSE2 units, one branch prediction unit, one alias unit, and one control unit.

implement intrinsics that map directly to the x86 SIMD instructions ( MMX, SSE, SSE2, SSE3, SSSE3, SSE4 ).

0.223 seconds.