VSzA techblog

Shift vs. division in managed languages

2012-09-27

From time to time, I hear it from low-level coder monkeys posing as either tech gurus or teachers that using the shift operators (<< and >> in C syntax) instead of multiplication and division in cases when the factor/divisor is an integer power of 2 results in faster code. While I've always been skeptical about such speculations – and I've been reassured several times by many sources, including MSDN forums and Stack Overflow – I haven't tried it for myself, especially in managed languages such as C# that are compiled to a byte code first.

Although Mono is not the reference implementation of C# and the .Net virtual machine, it's the one that runs on my notebook and allows for easy static compilation which makes it possible for me to inspect the machine code generated from the .Net executable file. First, I wrote a simple program that reads a byte from the standard input, divides it by 2, and writes the result to the standard output (mainly to avoid optimization that would replace division with compile-time evaluation).

using System;

class Program
{
    static void Main()
    {
        int a = Console.Read();
        int b = a / 2;
        Console.WriteLine(b);
    }
}

I compiled it with the Mono C# compiler and verified that it works (T = 84 in ASCII).

$ mcs monodiv.cs
$ file monodiv.exe
monodiv.exe: PE32 executable (console) Intel 80386 Mono/.Net assembly, for MS Windows
$ echo T | ./monodiv.exe
42

Dumping the .Net bytecode reveals that the first pass of compilation uses division.

$ monodis monodiv.exe
...
.method private static hidebysig 
       default void Main ()  cil managed 
{
    // Method begins at RVA 0x2058
    .entrypoint
    // Code size 17 (0x11)
    .maxstack 2
    .locals init (
            int32   V_0,
            int32   V_1)
    IL_0000:  call int32 class [mscorlib]System.Console::Read()
    IL_0005:  stloc.0 
    IL_0006:  ldloc.0 
    IL_0007:  ldc.i4.2 
    IL_0008:  div 
    IL_0009:  stloc.1 
    IL_000a:  ldloc.1 
    IL_000b:  call void class [mscorlib]System.Console::WriteLine(int32)
    IL_0010:  ret 
} // end of method Program::Main

Finally, transforming the bytecode into machine code assures us again that premature optimization is the root of all evil, as the code executed by the CPU at runtime contains the good old shift right (shr) opcode.

$ mono --aot=full monodiv.exe 
Mono Ahead of Time compiler - compiling assembly /home/dnet/_projekt/monodiv/monodiv.exe
Code: 38 Info: 4 Ex Info: 6 Unwind Info: 9 Class Info: 30 PLT: 3 GOT Info: 14 GOT: 48 Offsets: 47
Compiled 2 out of 2 methods (100%)
Methods without GOT slots: 2 (100%)
Direct calls: 0 (100%)
JIT time: 0 ms, Generation time: 0 ms, Assembly+Link time: 0 ms.
$ file monodiv.exe.so
monodiv.exe.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, not stripped
$ objdump -d monodiv.exe.so

monodiv.exe.so:     file format elf64-x86-64


Disassembly of section .text:

...

0000000000001020 <Program_Main>:
1020:       48 83 ec 08             sub    $0x8,%rsp
1024:       e8 17 00 00 00          callq  1040 <plt_System_Console_Read>
1029:       48 8b f8                mov    %rax,%rdi
102c:       c1 ef 1f                shr    $0x1f,%edi
102f:       03 f8                   add    %eax,%edi
1031:       d1 ff                   sar    %edi
1033:       e8 12 00 00 00          callq  104a <plt_System_Console_WriteLine_int>
1038:       48 83 c4 08             add    $0x8,%rsp
103c:       c3                      retq   
103d:       00 00                   add    %al,(%rax)
    ...

permalink


next posts >
< prev post

CC BY-SA RSS Export
Proudly powered by Utterson