Assembly: MOVing between two memory addresses












18















I'm trying to learn assembly (so bear with me) and I'm getting a compile error on this line:



mov byte [t_last], [t_cur]


The error is



error: invalid combination of opcode and operands


I suspect that the cause of this error is simply that its not possible for a mov instruction to move between two memory addresses, but half an hour of googling and I haven't been able to confirm this - is this the case?



Also, assuming I'm right that means I need to use a register as an intermediate point for copying memory:



mov cl, [t_cur]
mov [t_last], cl


Whats the recommended register to use (or should I use the stack instead)?










share|improve this question




















  • 3





    sometimes is better go to the source instead of googling, here for example is Intel 64 & IA-32 instructions A-M, where you can see operand combinations for mov, intel.com/Assets/PDF/manual/253666.pdf

    – Nick Dandoulakis
    Aug 19 '09 at 11:11











  • There're exceptions to the rule that an instruction cannot take two memory operands; see here.

    – legends2k
    Jun 13 '15 at 11:05
















18















I'm trying to learn assembly (so bear with me) and I'm getting a compile error on this line:



mov byte [t_last], [t_cur]


The error is



error: invalid combination of opcode and operands


I suspect that the cause of this error is simply that its not possible for a mov instruction to move between two memory addresses, but half an hour of googling and I haven't been able to confirm this - is this the case?



Also, assuming I'm right that means I need to use a register as an intermediate point for copying memory:



mov cl, [t_cur]
mov [t_last], cl


Whats the recommended register to use (or should I use the stack instead)?










share|improve this question




















  • 3





    sometimes is better go to the source instead of googling, here for example is Intel 64 & IA-32 instructions A-M, where you can see operand combinations for mov, intel.com/Assets/PDF/manual/253666.pdf

    – Nick Dandoulakis
    Aug 19 '09 at 11:11











  • There're exceptions to the rule that an instruction cannot take two memory operands; see here.

    – legends2k
    Jun 13 '15 at 11:05














18












18








18


4






I'm trying to learn assembly (so bear with me) and I'm getting a compile error on this line:



mov byte [t_last], [t_cur]


The error is



error: invalid combination of opcode and operands


I suspect that the cause of this error is simply that its not possible for a mov instruction to move between two memory addresses, but half an hour of googling and I haven't been able to confirm this - is this the case?



Also, assuming I'm right that means I need to use a register as an intermediate point for copying memory:



mov cl, [t_cur]
mov [t_last], cl


Whats the recommended register to use (or should I use the stack instead)?










share|improve this question
















I'm trying to learn assembly (so bear with me) and I'm getting a compile error on this line:



mov byte [t_last], [t_cur]


The error is



error: invalid combination of opcode and operands


I suspect that the cause of this error is simply that its not possible for a mov instruction to move between two memory addresses, but half an hour of googling and I haven't been able to confirm this - is this the case?



Also, assuming I'm right that means I need to use a register as an intermediate point for copying memory:



mov cl, [t_cur]
mov [t_last], cl


Whats the recommended register to use (or should I use the stack instead)?







assembly x86 instructions mov






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 25 '18 at 16:25









phuclv

15.1k853227




15.1k853227










asked Aug 19 '09 at 10:46









JustinJustin

65.6k40187327




65.6k40187327








  • 3





    sometimes is better go to the source instead of googling, here for example is Intel 64 & IA-32 instructions A-M, where you can see operand combinations for mov, intel.com/Assets/PDF/manual/253666.pdf

    – Nick Dandoulakis
    Aug 19 '09 at 11:11











  • There're exceptions to the rule that an instruction cannot take two memory operands; see here.

    – legends2k
    Jun 13 '15 at 11:05














  • 3





    sometimes is better go to the source instead of googling, here for example is Intel 64 & IA-32 instructions A-M, where you can see operand combinations for mov, intel.com/Assets/PDF/manual/253666.pdf

    – Nick Dandoulakis
    Aug 19 '09 at 11:11











  • There're exceptions to the rule that an instruction cannot take two memory operands; see here.

    – legends2k
    Jun 13 '15 at 11:05








3




3





sometimes is better go to the source instead of googling, here for example is Intel 64 & IA-32 instructions A-M, where you can see operand combinations for mov, intel.com/Assets/PDF/manual/253666.pdf

– Nick Dandoulakis
Aug 19 '09 at 11:11





sometimes is better go to the source instead of googling, here for example is Intel 64 & IA-32 instructions A-M, where you can see operand combinations for mov, intel.com/Assets/PDF/manual/253666.pdf

– Nick Dandoulakis
Aug 19 '09 at 11:11













There're exceptions to the rule that an instruction cannot take two memory operands; see here.

– legends2k
Jun 13 '15 at 11:05





There're exceptions to the rule that an instruction cannot take two memory operands; see here.

– legends2k
Jun 13 '15 at 11:05












5 Answers
5






active

oldest

votes


















24














Your suspicion is correct, you can't move from memory to memory.



Any general-purpose register will do. Remember to PUSH the register if you are not sure what's inside it and to restore it back once done.






share|improve this answer





















  • 1





    Is there any advantage to using a register over pushing the data itself onto the stack?

    – Justin
    Aug 19 '09 at 11:02






  • 3





    Pushing on and later popping from the stack adds two additional memory accesses.

    – Gunther Piez
    Aug 19 '09 at 12:49



















5














It's really simple in 16 bit, just do the following:



     push     di
push si
push cx
mov cx,(number of bytes to move)
lea di,(destination address)
lea si,(source address)
rep movsb
pop cx
pop si
pop di


Note: the pushes & pops are neceessary if you need to save the contents of the registers.






share|improve this answer


























  • +1, since in some circumstances it's good to know all tools in your toolbox. rep movsb/movsw are 1 byte opcodes, IIRC

    – Aki Suihkonen
    Oct 11 '12 at 11:43











  • Depending on the architecture, you can use pusha instead of pushing all the registers individually and popa instead of popping them all.

    – BalinKingOfMoria
    Jun 2 '15 at 3:39











  • This works in 32 and 64 bit as well, except it uses the registers for that bit system

    – Rahly
    Aug 23 '16 at 3:41



















3














There's also a MOVS command from moving data from memory to memory:



MOV SI, OFFSET variable1
MOV DI, OFFSET variable2
MOVS





share|improve this answer
























  • Will work, but it requires extra care: you need to save si and di registers. I guess it's not worth it for copying one byte.

    – sharptooth
    Aug 19 '09 at 11:03






  • 6





    The string commands on x86 can be considered obsolete. Never use them. They are never faster than copying "by hand", but in most cases much slower.

    – Gunther Piez
    Aug 19 '09 at 12:51











  • @hirschhornsalz, sorry to necromance, but do you have any detailed info about the string commands being essentially obsolete?

    – Matthew Sainsbury
    Mar 29 '15 at 9:18






  • 2





    @Matt agner.org/optimize/optimizing_assembly.pdf

    – Gunther Piez
    Mar 30 '15 at 11:17











  • @hirschhornsalz thank you, that's a very useful document!

    – Matthew Sainsbury
    Mar 30 '15 at 14:04



















2














That's correct, x86 machine code can't encode an instruction with two explicit memory operands (arbitrary addresses specified in )




  • Why isn't movl from memory to memory allowed?

  • What x86 instructions take two (or more) memory operands?



Whats the recommended register




Any register you don't need to save/restore.



In all the mainstream 32-bit and 64-bit calling conventions, EAX, ECX, and EDX are call-clobbered, so AL, CL, and DL are good choices. In 64-bit mode, SIL, DIL, r8b, r9b and so on are also find choices, but require a REX prefix in the machine code so there's a minor code-size reason to avoid them.



Generally avoid writing AH, BH, CH, or DH for performance reasons, unless you've read and understood the following links and any false dependencies or partial-register merging stalls aren't going to be a problem or happen at all in your code.




  • Why doesn't GCC use partial registers?

  • How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent





(or should I use the stack instead)?




First of all, you can't push a single byte at all, so there's no way you could do a byte load / byte store from the stack. For a word, dword, or qword (depending on CPU mode), you could push [src] / pop [dst], but that's a lot slower than copying via a register. It introduces an extra store/reload store-forwarding latency before the data can be read from the final destination, and takes more uops.



Unless somewhere on the stack is the desired destination and you can't optimize that local variable into a register, in which case push [src] is just fine to copy it there and allocate stack space for it.



See https://agner.org/optimize/ and other x86 performance links in the x86 tag wiki






share|improve this answer































    -2














    Just want to discuss "memory barrier" with you.
    In c code



    a = b;//Take data from b and puts it in a


    would be assembled to



    mov %eax, b # suppose %eax is used as the temp
    mov a, %eax


    The system cannot guarantee the atomicity of the assignment. That's why we need a rmb
    (read barrier)






    share|improve this answer


























    • x86 can't atomically copy from memory to memory. Barriers don't create atomicity, they only stop reordering (compile time or run-time or both, depending on the barrier).

      – Peter Cordes
      Nov 14 '17 at 21:57











    • @YuvalKeysar: your edit left one bug unfixed (which I hadn't noticed before): in AT&T syntax, destination comes 2nd. This asm actually stores EAX into b, then loads EAX from a. This answer just needs to be deleted, IMO, because the discussion about barriers is nonsense.

      – Peter Cordes
      Jan 5 at 19:13













    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1299077%2fassembly-moving-between-two-memory-addresses%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    5 Answers
    5






    active

    oldest

    votes








    5 Answers
    5






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    24














    Your suspicion is correct, you can't move from memory to memory.



    Any general-purpose register will do. Remember to PUSH the register if you are not sure what's inside it and to restore it back once done.






    share|improve this answer





















    • 1





      Is there any advantage to using a register over pushing the data itself onto the stack?

      – Justin
      Aug 19 '09 at 11:02






    • 3





      Pushing on and later popping from the stack adds two additional memory accesses.

      – Gunther Piez
      Aug 19 '09 at 12:49
















    24














    Your suspicion is correct, you can't move from memory to memory.



    Any general-purpose register will do. Remember to PUSH the register if you are not sure what's inside it and to restore it back once done.






    share|improve this answer





















    • 1





      Is there any advantage to using a register over pushing the data itself onto the stack?

      – Justin
      Aug 19 '09 at 11:02






    • 3





      Pushing on and later popping from the stack adds two additional memory accesses.

      – Gunther Piez
      Aug 19 '09 at 12:49














    24












    24








    24







    Your suspicion is correct, you can't move from memory to memory.



    Any general-purpose register will do. Remember to PUSH the register if you are not sure what's inside it and to restore it back once done.






    share|improve this answer















    Your suspicion is correct, you can't move from memory to memory.



    Any general-purpose register will do. Remember to PUSH the register if you are not sure what's inside it and to restore it back once done.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jul 10 '12 at 11:38









    Igor Skochinsky

    20.4k15086




    20.4k15086










    answered Aug 19 '09 at 10:48









    Federico klez CullocaFederico klez Culloca

    16k134380




    16k134380








    • 1





      Is there any advantage to using a register over pushing the data itself onto the stack?

      – Justin
      Aug 19 '09 at 11:02






    • 3





      Pushing on and later popping from the stack adds two additional memory accesses.

      – Gunther Piez
      Aug 19 '09 at 12:49














    • 1





      Is there any advantage to using a register over pushing the data itself onto the stack?

      – Justin
      Aug 19 '09 at 11:02






    • 3





      Pushing on and later popping from the stack adds two additional memory accesses.

      – Gunther Piez
      Aug 19 '09 at 12:49








    1




    1





    Is there any advantage to using a register over pushing the data itself onto the stack?

    – Justin
    Aug 19 '09 at 11:02





    Is there any advantage to using a register over pushing the data itself onto the stack?

    – Justin
    Aug 19 '09 at 11:02




    3




    3





    Pushing on and later popping from the stack adds two additional memory accesses.

    – Gunther Piez
    Aug 19 '09 at 12:49





    Pushing on and later popping from the stack adds two additional memory accesses.

    – Gunther Piez
    Aug 19 '09 at 12:49













    5














    It's really simple in 16 bit, just do the following:



         push     di
    push si
    push cx
    mov cx,(number of bytes to move)
    lea di,(destination address)
    lea si,(source address)
    rep movsb
    pop cx
    pop si
    pop di


    Note: the pushes & pops are neceessary if you need to save the contents of the registers.






    share|improve this answer


























    • +1, since in some circumstances it's good to know all tools in your toolbox. rep movsb/movsw are 1 byte opcodes, IIRC

      – Aki Suihkonen
      Oct 11 '12 at 11:43











    • Depending on the architecture, you can use pusha instead of pushing all the registers individually and popa instead of popping them all.

      – BalinKingOfMoria
      Jun 2 '15 at 3:39











    • This works in 32 and 64 bit as well, except it uses the registers for that bit system

      – Rahly
      Aug 23 '16 at 3:41
















    5














    It's really simple in 16 bit, just do the following:



         push     di
    push si
    push cx
    mov cx,(number of bytes to move)
    lea di,(destination address)
    lea si,(source address)
    rep movsb
    pop cx
    pop si
    pop di


    Note: the pushes & pops are neceessary if you need to save the contents of the registers.






    share|improve this answer


























    • +1, since in some circumstances it's good to know all tools in your toolbox. rep movsb/movsw are 1 byte opcodes, IIRC

      – Aki Suihkonen
      Oct 11 '12 at 11:43











    • Depending on the architecture, you can use pusha instead of pushing all the registers individually and popa instead of popping them all.

      – BalinKingOfMoria
      Jun 2 '15 at 3:39











    • This works in 32 and 64 bit as well, except it uses the registers for that bit system

      – Rahly
      Aug 23 '16 at 3:41














    5












    5








    5







    It's really simple in 16 bit, just do the following:



         push     di
    push si
    push cx
    mov cx,(number of bytes to move)
    lea di,(destination address)
    lea si,(source address)
    rep movsb
    pop cx
    pop si
    pop di


    Note: the pushes & pops are neceessary if you need to save the contents of the registers.






    share|improve this answer















    It's really simple in 16 bit, just do the following:



         push     di
    push si
    push cx
    mov cx,(number of bytes to move)
    lea di,(destination address)
    lea si,(source address)
    rep movsb
    pop cx
    pop si
    pop di


    Note: the pushes & pops are neceessary if you need to save the contents of the registers.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jul 10 '12 at 0:56

























    answered Jul 10 '12 at 0:49









    Will MattisonWill Mattison

    5112




    5112













    • +1, since in some circumstances it's good to know all tools in your toolbox. rep movsb/movsw are 1 byte opcodes, IIRC

      – Aki Suihkonen
      Oct 11 '12 at 11:43











    • Depending on the architecture, you can use pusha instead of pushing all the registers individually and popa instead of popping them all.

      – BalinKingOfMoria
      Jun 2 '15 at 3:39











    • This works in 32 and 64 bit as well, except it uses the registers for that bit system

      – Rahly
      Aug 23 '16 at 3:41



















    • +1, since in some circumstances it's good to know all tools in your toolbox. rep movsb/movsw are 1 byte opcodes, IIRC

      – Aki Suihkonen
      Oct 11 '12 at 11:43











    • Depending on the architecture, you can use pusha instead of pushing all the registers individually and popa instead of popping them all.

      – BalinKingOfMoria
      Jun 2 '15 at 3:39











    • This works in 32 and 64 bit as well, except it uses the registers for that bit system

      – Rahly
      Aug 23 '16 at 3:41

















    +1, since in some circumstances it's good to know all tools in your toolbox. rep movsb/movsw are 1 byte opcodes, IIRC

    – Aki Suihkonen
    Oct 11 '12 at 11:43





    +1, since in some circumstances it's good to know all tools in your toolbox. rep movsb/movsw are 1 byte opcodes, IIRC

    – Aki Suihkonen
    Oct 11 '12 at 11:43













    Depending on the architecture, you can use pusha instead of pushing all the registers individually and popa instead of popping them all.

    – BalinKingOfMoria
    Jun 2 '15 at 3:39





    Depending on the architecture, you can use pusha instead of pushing all the registers individually and popa instead of popping them all.

    – BalinKingOfMoria
    Jun 2 '15 at 3:39













    This works in 32 and 64 bit as well, except it uses the registers for that bit system

    – Rahly
    Aug 23 '16 at 3:41





    This works in 32 and 64 bit as well, except it uses the registers for that bit system

    – Rahly
    Aug 23 '16 at 3:41











    3














    There's also a MOVS command from moving data from memory to memory:



    MOV SI, OFFSET variable1
    MOV DI, OFFSET variable2
    MOVS





    share|improve this answer
























    • Will work, but it requires extra care: you need to save si and di registers. I guess it's not worth it for copying one byte.

      – sharptooth
      Aug 19 '09 at 11:03






    • 6





      The string commands on x86 can be considered obsolete. Never use them. They are never faster than copying "by hand", but in most cases much slower.

      – Gunther Piez
      Aug 19 '09 at 12:51











    • @hirschhornsalz, sorry to necromance, but do you have any detailed info about the string commands being essentially obsolete?

      – Matthew Sainsbury
      Mar 29 '15 at 9:18






    • 2





      @Matt agner.org/optimize/optimizing_assembly.pdf

      – Gunther Piez
      Mar 30 '15 at 11:17











    • @hirschhornsalz thank you, that's a very useful document!

      – Matthew Sainsbury
      Mar 30 '15 at 14:04
















    3














    There's also a MOVS command from moving data from memory to memory:



    MOV SI, OFFSET variable1
    MOV DI, OFFSET variable2
    MOVS





    share|improve this answer
























    • Will work, but it requires extra care: you need to save si and di registers. I guess it's not worth it for copying one byte.

      – sharptooth
      Aug 19 '09 at 11:03






    • 6





      The string commands on x86 can be considered obsolete. Never use them. They are never faster than copying "by hand", but in most cases much slower.

      – Gunther Piez
      Aug 19 '09 at 12:51











    • @hirschhornsalz, sorry to necromance, but do you have any detailed info about the string commands being essentially obsolete?

      – Matthew Sainsbury
      Mar 29 '15 at 9:18






    • 2





      @Matt agner.org/optimize/optimizing_assembly.pdf

      – Gunther Piez
      Mar 30 '15 at 11:17











    • @hirschhornsalz thank you, that's a very useful document!

      – Matthew Sainsbury
      Mar 30 '15 at 14:04














    3












    3








    3







    There's also a MOVS command from moving data from memory to memory:



    MOV SI, OFFSET variable1
    MOV DI, OFFSET variable2
    MOVS





    share|improve this answer













    There's also a MOVS command from moving data from memory to memory:



    MOV SI, OFFSET variable1
    MOV DI, OFFSET variable2
    MOVS






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Aug 19 '09 at 11:01









    ymvymv

    1,697919




    1,697919













    • Will work, but it requires extra care: you need to save si and di registers. I guess it's not worth it for copying one byte.

      – sharptooth
      Aug 19 '09 at 11:03






    • 6





      The string commands on x86 can be considered obsolete. Never use them. They are never faster than copying "by hand", but in most cases much slower.

      – Gunther Piez
      Aug 19 '09 at 12:51











    • @hirschhornsalz, sorry to necromance, but do you have any detailed info about the string commands being essentially obsolete?

      – Matthew Sainsbury
      Mar 29 '15 at 9:18






    • 2





      @Matt agner.org/optimize/optimizing_assembly.pdf

      – Gunther Piez
      Mar 30 '15 at 11:17











    • @hirschhornsalz thank you, that's a very useful document!

      – Matthew Sainsbury
      Mar 30 '15 at 14:04



















    • Will work, but it requires extra care: you need to save si and di registers. I guess it's not worth it for copying one byte.

      – sharptooth
      Aug 19 '09 at 11:03






    • 6





      The string commands on x86 can be considered obsolete. Never use them. They are never faster than copying "by hand", but in most cases much slower.

      – Gunther Piez
      Aug 19 '09 at 12:51











    • @hirschhornsalz, sorry to necromance, but do you have any detailed info about the string commands being essentially obsolete?

      – Matthew Sainsbury
      Mar 29 '15 at 9:18






    • 2





      @Matt agner.org/optimize/optimizing_assembly.pdf

      – Gunther Piez
      Mar 30 '15 at 11:17











    • @hirschhornsalz thank you, that's a very useful document!

      – Matthew Sainsbury
      Mar 30 '15 at 14:04

















    Will work, but it requires extra care: you need to save si and di registers. I guess it's not worth it for copying one byte.

    – sharptooth
    Aug 19 '09 at 11:03





    Will work, but it requires extra care: you need to save si and di registers. I guess it's not worth it for copying one byte.

    – sharptooth
    Aug 19 '09 at 11:03




    6




    6





    The string commands on x86 can be considered obsolete. Never use them. They are never faster than copying "by hand", but in most cases much slower.

    – Gunther Piez
    Aug 19 '09 at 12:51





    The string commands on x86 can be considered obsolete. Never use them. They are never faster than copying "by hand", but in most cases much slower.

    – Gunther Piez
    Aug 19 '09 at 12:51













    @hirschhornsalz, sorry to necromance, but do you have any detailed info about the string commands being essentially obsolete?

    – Matthew Sainsbury
    Mar 29 '15 at 9:18





    @hirschhornsalz, sorry to necromance, but do you have any detailed info about the string commands being essentially obsolete?

    – Matthew Sainsbury
    Mar 29 '15 at 9:18




    2




    2





    @Matt agner.org/optimize/optimizing_assembly.pdf

    – Gunther Piez
    Mar 30 '15 at 11:17





    @Matt agner.org/optimize/optimizing_assembly.pdf

    – Gunther Piez
    Mar 30 '15 at 11:17













    @hirschhornsalz thank you, that's a very useful document!

    – Matthew Sainsbury
    Mar 30 '15 at 14:04





    @hirschhornsalz thank you, that's a very useful document!

    – Matthew Sainsbury
    Mar 30 '15 at 14:04











    2














    That's correct, x86 machine code can't encode an instruction with two explicit memory operands (arbitrary addresses specified in )




    • Why isn't movl from memory to memory allowed?

    • What x86 instructions take two (or more) memory operands?



    Whats the recommended register




    Any register you don't need to save/restore.



    In all the mainstream 32-bit and 64-bit calling conventions, EAX, ECX, and EDX are call-clobbered, so AL, CL, and DL are good choices. In 64-bit mode, SIL, DIL, r8b, r9b and so on are also find choices, but require a REX prefix in the machine code so there's a minor code-size reason to avoid them.



    Generally avoid writing AH, BH, CH, or DH for performance reasons, unless you've read and understood the following links and any false dependencies or partial-register merging stalls aren't going to be a problem or happen at all in your code.




    • Why doesn't GCC use partial registers?

    • How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent





    (or should I use the stack instead)?




    First of all, you can't push a single byte at all, so there's no way you could do a byte load / byte store from the stack. For a word, dword, or qword (depending on CPU mode), you could push [src] / pop [dst], but that's a lot slower than copying via a register. It introduces an extra store/reload store-forwarding latency before the data can be read from the final destination, and takes more uops.



    Unless somewhere on the stack is the desired destination and you can't optimize that local variable into a register, in which case push [src] is just fine to copy it there and allocate stack space for it.



    See https://agner.org/optimize/ and other x86 performance links in the x86 tag wiki






    share|improve this answer




























      2














      That's correct, x86 machine code can't encode an instruction with two explicit memory operands (arbitrary addresses specified in )




      • Why isn't movl from memory to memory allowed?

      • What x86 instructions take two (or more) memory operands?



      Whats the recommended register




      Any register you don't need to save/restore.



      In all the mainstream 32-bit and 64-bit calling conventions, EAX, ECX, and EDX are call-clobbered, so AL, CL, and DL are good choices. In 64-bit mode, SIL, DIL, r8b, r9b and so on are also find choices, but require a REX prefix in the machine code so there's a minor code-size reason to avoid them.



      Generally avoid writing AH, BH, CH, or DH for performance reasons, unless you've read and understood the following links and any false dependencies or partial-register merging stalls aren't going to be a problem or happen at all in your code.




      • Why doesn't GCC use partial registers?

      • How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent





      (or should I use the stack instead)?




      First of all, you can't push a single byte at all, so there's no way you could do a byte load / byte store from the stack. For a word, dword, or qword (depending on CPU mode), you could push [src] / pop [dst], but that's a lot slower than copying via a register. It introduces an extra store/reload store-forwarding latency before the data can be read from the final destination, and takes more uops.



      Unless somewhere on the stack is the desired destination and you can't optimize that local variable into a register, in which case push [src] is just fine to copy it there and allocate stack space for it.



      See https://agner.org/optimize/ and other x86 performance links in the x86 tag wiki






      share|improve this answer


























        2












        2








        2







        That's correct, x86 machine code can't encode an instruction with two explicit memory operands (arbitrary addresses specified in )




        • Why isn't movl from memory to memory allowed?

        • What x86 instructions take two (or more) memory operands?



        Whats the recommended register




        Any register you don't need to save/restore.



        In all the mainstream 32-bit and 64-bit calling conventions, EAX, ECX, and EDX are call-clobbered, so AL, CL, and DL are good choices. In 64-bit mode, SIL, DIL, r8b, r9b and so on are also find choices, but require a REX prefix in the machine code so there's a minor code-size reason to avoid them.



        Generally avoid writing AH, BH, CH, or DH for performance reasons, unless you've read and understood the following links and any false dependencies or partial-register merging stalls aren't going to be a problem or happen at all in your code.




        • Why doesn't GCC use partial registers?

        • How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent





        (or should I use the stack instead)?




        First of all, you can't push a single byte at all, so there's no way you could do a byte load / byte store from the stack. For a word, dword, or qword (depending on CPU mode), you could push [src] / pop [dst], but that's a lot slower than copying via a register. It introduces an extra store/reload store-forwarding latency before the data can be read from the final destination, and takes more uops.



        Unless somewhere on the stack is the desired destination and you can't optimize that local variable into a register, in which case push [src] is just fine to copy it there and allocate stack space for it.



        See https://agner.org/optimize/ and other x86 performance links in the x86 tag wiki






        share|improve this answer













        That's correct, x86 machine code can't encode an instruction with two explicit memory operands (arbitrary addresses specified in )




        • Why isn't movl from memory to memory allowed?

        • What x86 instructions take two (or more) memory operands?



        Whats the recommended register




        Any register you don't need to save/restore.



        In all the mainstream 32-bit and 64-bit calling conventions, EAX, ECX, and EDX are call-clobbered, so AL, CL, and DL are good choices. In 64-bit mode, SIL, DIL, r8b, r9b and so on are also find choices, but require a REX prefix in the machine code so there's a minor code-size reason to avoid them.



        Generally avoid writing AH, BH, CH, or DH for performance reasons, unless you've read and understood the following links and any false dependencies or partial-register merging stalls aren't going to be a problem or happen at all in your code.




        • Why doesn't GCC use partial registers?

        • How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent





        (or should I use the stack instead)?




        First of all, you can't push a single byte at all, so there's no way you could do a byte load / byte store from the stack. For a word, dword, or qword (depending on CPU mode), you could push [src] / pop [dst], but that's a lot slower than copying via a register. It introduces an extra store/reload store-forwarding latency before the data can be read from the final destination, and takes more uops.



        Unless somewhere on the stack is the desired destination and you can't optimize that local variable into a register, in which case push [src] is just fine to copy it there and allocate stack space for it.



        See https://agner.org/optimize/ and other x86 performance links in the x86 tag wiki







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 25 '18 at 18:42









        Peter CordesPeter Cordes

        130k18196332




        130k18196332























            -2














            Just want to discuss "memory barrier" with you.
            In c code



            a = b;//Take data from b and puts it in a


            would be assembled to



            mov %eax, b # suppose %eax is used as the temp
            mov a, %eax


            The system cannot guarantee the atomicity of the assignment. That's why we need a rmb
            (read barrier)






            share|improve this answer


























            • x86 can't atomically copy from memory to memory. Barriers don't create atomicity, they only stop reordering (compile time or run-time or both, depending on the barrier).

              – Peter Cordes
              Nov 14 '17 at 21:57











            • @YuvalKeysar: your edit left one bug unfixed (which I hadn't noticed before): in AT&T syntax, destination comes 2nd. This asm actually stores EAX into b, then loads EAX from a. This answer just needs to be deleted, IMO, because the discussion about barriers is nonsense.

              – Peter Cordes
              Jan 5 at 19:13


















            -2














            Just want to discuss "memory barrier" with you.
            In c code



            a = b;//Take data from b and puts it in a


            would be assembled to



            mov %eax, b # suppose %eax is used as the temp
            mov a, %eax


            The system cannot guarantee the atomicity of the assignment. That's why we need a rmb
            (read barrier)






            share|improve this answer


























            • x86 can't atomically copy from memory to memory. Barriers don't create atomicity, they only stop reordering (compile time or run-time or both, depending on the barrier).

              – Peter Cordes
              Nov 14 '17 at 21:57











            • @YuvalKeysar: your edit left one bug unfixed (which I hadn't noticed before): in AT&T syntax, destination comes 2nd. This asm actually stores EAX into b, then loads EAX from a. This answer just needs to be deleted, IMO, because the discussion about barriers is nonsense.

              – Peter Cordes
              Jan 5 at 19:13
















            -2












            -2








            -2







            Just want to discuss "memory barrier" with you.
            In c code



            a = b;//Take data from b and puts it in a


            would be assembled to



            mov %eax, b # suppose %eax is used as the temp
            mov a, %eax


            The system cannot guarantee the atomicity of the assignment. That's why we need a rmb
            (read barrier)






            share|improve this answer















            Just want to discuss "memory barrier" with you.
            In c code



            a = b;//Take data from b and puts it in a


            would be assembled to



            mov %eax, b # suppose %eax is used as the temp
            mov a, %eax


            The system cannot guarantee the atomicity of the assignment. That's why we need a rmb
            (read barrier)







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jan 5 at 13:00









            Yuval Keysar

            214




            214










            answered Jul 17 '13 at 7:34









            QylinQylin

            8481823




            8481823













            • x86 can't atomically copy from memory to memory. Barriers don't create atomicity, they only stop reordering (compile time or run-time or both, depending on the barrier).

              – Peter Cordes
              Nov 14 '17 at 21:57











            • @YuvalKeysar: your edit left one bug unfixed (which I hadn't noticed before): in AT&T syntax, destination comes 2nd. This asm actually stores EAX into b, then loads EAX from a. This answer just needs to be deleted, IMO, because the discussion about barriers is nonsense.

              – Peter Cordes
              Jan 5 at 19:13





















            • x86 can't atomically copy from memory to memory. Barriers don't create atomicity, they only stop reordering (compile time or run-time or both, depending on the barrier).

              – Peter Cordes
              Nov 14 '17 at 21:57











            • @YuvalKeysar: your edit left one bug unfixed (which I hadn't noticed before): in AT&T syntax, destination comes 2nd. This asm actually stores EAX into b, then loads EAX from a. This answer just needs to be deleted, IMO, because the discussion about barriers is nonsense.

              – Peter Cordes
              Jan 5 at 19:13



















            x86 can't atomically copy from memory to memory. Barriers don't create atomicity, they only stop reordering (compile time or run-time or both, depending on the barrier).

            – Peter Cordes
            Nov 14 '17 at 21:57





            x86 can't atomically copy from memory to memory. Barriers don't create atomicity, they only stop reordering (compile time or run-time or both, depending on the barrier).

            – Peter Cordes
            Nov 14 '17 at 21:57













            @YuvalKeysar: your edit left one bug unfixed (which I hadn't noticed before): in AT&T syntax, destination comes 2nd. This asm actually stores EAX into b, then loads EAX from a. This answer just needs to be deleted, IMO, because the discussion about barriers is nonsense.

            – Peter Cordes
            Jan 5 at 19:13







            @YuvalKeysar: your edit left one bug unfixed (which I hadn't noticed before): in AT&T syntax, destination comes 2nd. This asm actually stores EAX into b, then loads EAX from a. This answer just needs to be deleted, IMO, because the discussion about barriers is nonsense.

            – Peter Cordes
            Jan 5 at 19:13




















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1299077%2fassembly-moving-between-two-memory-addresses%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wiesbaden

            Marschland

            Dieringhausen