Memory Copy, writes unprivileged and non-temporal. These instructions perform a memory copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYPWTWN, then CPYMWTWN, and then CPYEWTWN.
CPYPWTWN performs some preconditioning of the arguments suitable for using the CPYMWTWN instruction, and performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMWTWN performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYEWTWN performs the last part of the memory copy.
The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be performed.
For CPYPWTWN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is IMPLEMENTATION DEFINED.
Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWTWN, option A (which results in encoding PSTATE.C = 0):
After execution of CPYPWTWN, option B (which results in encoding PSTATE.C = 1):
For CPYMWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
For CPYMWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
For CPYEWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
For CPYEWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
sz | 0 | 1 | 1 | 1 | 0 | 1 | op1 | 0 | Rs | 0 | 1 | 0 | 1 | 0 | 1 | Rn | Rd | ||||||||||||||
o0 | op2 |
if !IsFeatureImplemented(FEAT_MOPS) || sz != '00' then UNDEFINED; CPYParams memcpy; memcpy.d = UInt(Rd); memcpy.s = UInt(Rs); memcpy.n = UInt(Rn); bits(4) options = op2; boolean rnontemporal = options<3> == '1'; boolean wnontemporal = options<2> == '1'; case op1 of when '00' memcpy.stage = MOPSStage_Prologue; when '01' memcpy.stage = MOPSStage_Main; when '10' memcpy.stage = MOPSStage_Epilogue; otherwise SEE "Memory Copy and Memory Set";
For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on UNPREDICTABLE behaviors, and particularly Memory Copy and Memory Set CPY*.
CheckMOPSEnabled(); CheckCPYConstrainedUnpredictable(memcpy.n, memcpy.d, memcpy.s); memcpy.nzcv = PSTATE.<N,Z,C,V>; memcpy.toaddress = X[memcpy.d, 64]; memcpy.fromaddress = X[memcpy.s, 64]; if memcpy.stage == MOPSStage_Prologue then memcpy.cpysize = UInt(X[memcpy.n, 64]); else memcpy.cpysize = SInt(X[memcpy.n, 64]); memcpy.implements_option_a = CPYOptionA(); boolean rprivileged = if options<1> == '1' then AArch64.IsUnprivAccessPriv() else PSTATE.EL != EL0; boolean wprivileged = if options<0> == '1' then AArch64.IsUnprivAccessPriv() else PSTATE.EL != EL0; AccessDescriptor raccdesc = CreateAccDescMOPS(MemOp_LOAD, rprivileged, rnontemporal); AccessDescriptor waccdesc = CreateAccDescMOPS(MemOp_STORE, wprivileged, wnontemporal); if memcpy.stage == MOPSStage_Prologue then if memcpy.cpysize > 0x007FFFFFFFFFFFFF then memcpy.cpysize = 0x007FFFFFFFFFFFFF; memcpy.forward = IsMemCpyForward(memcpy); if memcpy.implements_option_a then memcpy.nzcv = '0000'; if memcpy.forward then // Copy in the forward direction offsets the arguments. memcpy.toaddress = memcpy.toaddress + memcpy.cpysize; memcpy.fromaddress = memcpy.fromaddress + memcpy.cpysize; memcpy.cpysize = 0 - memcpy.cpysize; else if !memcpy.forward then // Copy in the reverse direction offsets the arguments. memcpy.toaddress = memcpy.toaddress + memcpy.cpysize; memcpy.fromaddress = memcpy.fromaddress + memcpy.cpysize; memcpy.nzcv = '1010'; else memcpy.nzcv = '0010'; memcpy.stagecpysize = MemCpyStageSize(memcpy); if memcpy.stage != MOPSStage_Prologue then memcpy.forward = memcpy.cpysize < 0 || (!memcpy.implements_option_a && memcpy.nzcv<3> == '0'); CheckMemCpyParams(memcpy, options); integer copied; boolean iswrite; AddressDescriptor memaddrdesc; PhysMemRetStatus memstatus; boolean fault = FALSE; integer B; if memcpy.implements_option_a then while memcpy.stagecpysize != 0 && !fault do // IMP DEF selection of the block size that is worked on. While many // implementations might make this constant, that is not assumed. B = CPYSizeChoice(memcpy); if memcpy.forward then assert B <= -1 * memcpy.stagecpysize; (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes( memcpy.toaddress + memcpy.cpysize, memcpy.fromaddress + memcpy.cpysize, memcpy.forward, B, raccdesc, waccdesc); if copied != B then fault = TRUE; else memcpy.cpysize = memcpy.cpysize + B; memcpy.stagecpysize = memcpy.stagecpysize + B; else assert B <= memcpy.stagecpysize; memcpy.cpysize = memcpy.cpysize - B; memcpy.stagecpysize = memcpy.stagecpysize - B; (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes( memcpy.toaddress + memcpy.cpysize, memcpy.fromaddress + memcpy.cpysize, memcpy.forward, B, raccdesc, waccdesc); if copied != B then fault = TRUE; memcpy.cpysize = memcpy.cpysize + B; memcpy.stagecpysize = memcpy.stagecpysize + B; else while memcpy.stagecpysize > 0 && !fault do // IMP DEF selection of the block size that is worked on. While many // implementations might make this constant, that is not assumed. B = CPYSizeChoice(memcpy); assert B <= memcpy.stagecpysize; if memcpy.forward then (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress, memcpy.fromaddress, memcpy.forward, B, raccdesc, waccdesc); if copied != B then fault = TRUE; else memcpy.fromaddress = memcpy.fromaddress + B; memcpy.toaddress = memcpy.toaddress + B; else (copied, iswrite, memaddrdesc, memstatus) = MemCpyBytes(memcpy.toaddress - B, memcpy.fromaddress - B, memcpy.forward, B, raccdesc, waccdesc); if copied != B then fault = TRUE; else memcpy.fromaddress = memcpy.fromaddress - B; memcpy.toaddress = memcpy.toaddress - B; if !fault then memcpy.cpysize = memcpy.cpysize - B; memcpy.stagecpysize = memcpy.stagecpysize - B; UpdateCpyRegisters(memcpy, fault, copied); if fault then if IsFault(memaddrdesc) then AArch64.Abort(memaddrdesc.vaddress, memaddrdesc.fault); else AccessDescriptor accdesc = if iswrite then waccdesc else raccdesc; HandleExternalAbort(memstatus, iswrite, memaddrdesc, B, accdesc); if memcpy.stage == MOPSStage_Prologue then PSTATE.<N,Z,C,V> = memcpy.nzcv;
Internal version only: aarchmrs v2024-03_relA, pseudocode v2024-03_rel, sve v2024-03_rel ; Build timestamp: 2024-03-26T09:45
Copyright © 2010-2024 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.