In our previous post, we talked about what Cython is? Why should we use it? And how we can compile and run Cython code. With that knowledge in hand, it is time to explore Cython in depth.
If you are curious about why Cython works for performance improvement? it can be attributed to two differences: runtime interpretation versus ahead-of-time compilation, and dynamic versus static typing. That’s what we are going to discuss in this video.
Interpreted VS Compiled Execution
So, let’s start with Interpreted Versus Compiled Execution. For a better understanding of how and why Cython improves the performance of Python code, it is useful to compare how the Python runtime runs Python code with how an operating system runs compiled C code. Before being run, Python code is automatically compiled to Python bytecode. Bytecodes are fundamental instructions to be executed, or interpreted, by the Python virtual machine (VM). Because the VM abstracts away all platform-specific details, Python bytecode can be generated on one platform and run anywhere else. It is up to the VM to translate each high-level bytecode into one or more lower-level operations that can be executed by the operating system and, ultimately, the CPU. This virtualized design is common and very flexible, bringing with it many benefits—first among them is not having to fuss with picky compilers! The primary downside is that the VM is slower than running natively compiled code.
On the C side of the coin, there is no VM or interpreter, and there are no high-level bytecodes. C code is translated or compile, directly to machine code by a compiler. This machine code is consolidated into an executable or compiled library. It is tailored to a specific platform and architecture, it can be run directly by a CPU, and it is as low-level as it gets.
There is a way to bridge the divide between the bytecode-executing VM and machine code–executing CPU: the Python interpreter can run compiled C code directly and transparently to the end-user. The C code must be compiled into a specific kind of dynamic library known as an extension module. These modules are full-fledged Python modules, but the code inside of them has been pre compiled into machine code by a standard C compiler. When running code in an extension module, the Python VM no longer interprets high-level bytecodes but instead runs machine code directly. This removes the interpreter’s performance overhead while any operation inside this extension module is running.
How does Cython Works
But the main question is still there, How does Cython fit into the entire discussion? As you know, we can use the Cython and standard C compilers to translate Cython source code into a compiled platform-specific extension module. So, whenever Python runs anything inside an extension module, it is running compiled code, so no interpreter overhead can slow things down. But how big of a difference does interpretation versus direct execution make? It can differ widely, depending on the Python code in question, but usually, we can expect around a 10 to 30 percent speedup from converting Python code into an equivalent extension module. Cython gives us this speedup for free, and we are happy to take it. But the real performance improvements come from replacing Python’s dynamic dispatch with static typing. So let's try to understand this thing in detail.
Statically Typed Languages
Statically typed languages require the type of a variable to be fixed at compile time. Often we can accomplish this by explicitly declaring the type of a variable, or, when possible, the compiler can automatically infer a variable’s type. In either case, in the context where it is used, a variable has that type and only that type. But what benefits does static typing bring? Besides compile-time type checking, compilers use static typing to generate fast machine code that is tailored to that specific type.
Dynamically Typed Languages
Dynamically typed languages place no restrictions on a variable’s type: For example, the same variable can start out as an integer and end up as a string, or a list, or an instance of a custom Python object. Dynamically typed languages are typically easier to write because the user does not have to explicitly declare variables’ types, with the tradeoff that type-related errors are caught at runtime.
When running a Python program, the interpreter spends most of its time figuring out what low-level operation to perform and extracting the data to give to this low-level operation. Due to its design and flexibility, it has to determine that in a completely general way because a variable can have any type at any time. This is known as dynamic dispatch, and generally, it is slow.
For a better understanding, lets discuss a scenario for the execution of both languages Python and C. Consider what happens when the Python runtime evaluates a - b: so here are steps of the complete execution:
- The interpreter inspects the Python object referred to by “a” for its type, which requires at least one pointer lookup at the C level.
- The interpreter asks the type for implementation of the subtract method, which may require one or more additional pointer lookups and internal function calls.
- If the method in question is found, the interpreter then has an actual function it can call, implemented either in Python or in C.
- The interpreter calls the subtract function and passes in a and b as arguments.
- The subtract function extracts the necessary internal data from a and b, which may require several more pointer lookups and conversions from Python types to C types. If successful, only then it can perform the actual operation that subtracts b from a.
- The result then must be placed inside a new Python object and returned. Only then is the operation complete.
The situation for C is very different. Because C is compiled and statically typed, the C compiler can determine at compile time what low-level operations to perform and what low-level data to pass as arguments. At runtime, a compiled C program skips nearly all steps that the Python interpreter must perform. For something like a - b with a and b both being fundamental numeric types, the compiler generates a handful of machine code instructions to load the data into registers, substract them, and store the result.
That is an important factor for the performance of C language. The primary reason Cython yields such impressive performance boosts is that it brings static typing to a dynamic language. Static typing transforms runtime dynamic dispatch into type-optimized machine code. Before Cython , we could only benefit from it by reimplementing Python code in C. Cython makes it easy to keep our Python code as is and tap into C’s static type system.
Now, lets understand by looking at some code:In Cython, untyped dynamic variables behave exactly like Python variables.just take a look at this example code:
a = [x+1 for x in range(9)] b = a a = 49.0 assert b == 49.0 a = 10 assert isinstance(b, list)
We have a variable ‘a’, The assignment b = a allows both a and b to access the same list object created on the first line. Modifying the list via a= 49 modifies the same list referenced by b, so the assertion holds true. The assignment a = 10 leaves ‘b’ referring to the original list object, while ‘a’ is now referring to a Python integer object. This reassignment to ‘a’ changes a’s type, which is perfectly valid Python code.
To statically type variables in Cython, we use the cdef keyword with a type and the variable name, as you can see here:
cdef int a cdef int b cdef float c
Using these statically typed variables looks just like Python (or C) code:
b = 0 a = b c = 12.0 b = 2 * c assert a != b
In this example: a = b copies the integer data at ‘b’ to the memory location reserved for ‘a’. This means that ‘a’ and ‘b’ refer to independent entities, and can evolve separately.
As with C, we can declare several variables of the same type at once and can provide an optional initial value, as you can see here:
cdef long int j = 0, k = 0 cdef float price = 0.0, margin = 1.0