Some Basic Python Optimisations

WARNING!

For CPython, if you want performance you should prefer algorithmic optimizations (i.e. find better algorithms for what you're doing) or, if that's not possible, rewrite parts of your code in Cython.

You should generally aim for readability, and avoid micro-optimizations - except in cases where it's the same, like comprehension expressions: they are both more readable and faster (special-cased, emit less instructions than loops because there's no possibility of break, continue or return - compile.c, Ctrl+F, "List and set comprehensions").

You should always profile your application before optimizing, and you should profile with real (or realistic) data and in an as complete state as possible - concentrate on optimizing stuff that actually matters, not stuff you think matters.

You have been warned. On to the fun part.


>>> from dis import dis
>>> def fn(a):
    b = a


>>> dis(fn)
  2           0 LOAD_FAST                0 (a)
              3 STORE_FAST               1 (b)
              6 LOAD_CONST               0 (None)
              9 RETURN_VALUE
>>> dis(compile("b = a", "wtf", "exec"))
  1           0 LOAD_NAME                0 (a)
              3 STORE_NAME               1 (b)
              6 LOAD_CONST               0 (None)
              9 RETURN_VALUE

As you can probably guess, LOAD_FAST and its counter-part STORE_FAST are faster than LOAD_NAME and STORE_NAME. The reason is that local variable (and parameter) access in functions is optimized - instead of using full dictionary lookups it uses integer offsets; during compilation all local variables must be known for this offset table to work, so you can't dynamically add local variables to functions.

About why a = a + 1 is faster than a += 1:


>>> def fn(n):
    n = n + 1



>>> dis(fn)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (1)
              6 BINARY_ADD
              7 STORE_FAST               0 (n)
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE
>>> def fn(n):
    n += 1



>>> dis(fn)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (1)
              6 INPLACE_ADD
              7 STORE_FAST               0 (n)
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

So the only difference is INPLACE_ADD versus BINARY_ADD (this has nothing to do with binary numbers, it just means "add for two arguments").

These instructions are defined in ceval.c - to spare you the reading, they only differ in that for numbers one calls PyNumber_Add() while the other calls PyNumber_InPlaceAdd(). Those two are defined in abstract.c, where the main difference is that they call either binary_op1 or binary_iop1, defined in the same file.

binary_iop1 will check whether the object supports in-place operations (the existing object is modified instead of a new copy created and returned), but the answer in the case of integers (intobject.c) and floats (floatobject.c) is "no" (nb_inplace_add is zero - a new object is created, except for integers with values between -5 and 257, which are reused from a pool, but this is transparent to this code).

However, this won't be determined before a couple of pointer dereferences and conditional branches, until execution goes back to the non-inplace version. This is the overhead you are seeing (which is less than 5% on my computer, sometimes I even get the same results).

Using a similar workflow, you can probably answer the rest of your questions yourself - just use the dis module to disassemble the Python code, then check out how the instructions are handled in the source code of CPython at GitHub.

It's getting late, so I'll just leave this here and go to bed:


static long
int_hash(PyIntObject *v)
{
    /* XXX If this is changed, you also need to change the way
       Python's long, float and complex types are hashed. */
    long x = v -> ob_ival;
    if (x == -1)
        x = -2;
    return x;
}


>>> [hash(n) for n in range(-4, 5)]
[-4, -3, -2, -2, 0, 1, 2, 3, 4]

Guest Author

Vladislav Zorov

programming enthusiast.

Love my post? Donate to my gaming budget:

Follow Techlistic

YouTube Channel | Facebook Page | Telegram Channel | Quora Space

Some Basic Python Optimisations

Vladislav Zorov

Follow Techlistic

Popular posts from this blog

Real-World Examples and Demo Scripts in Selenium Python Automation

25+ Selenium WebDriver Commands: The Complete Cheat Sheet with Examples

Top 10 Demo Websites for Selenium Automation Practice

Mastering Selenium Practice: Automating Web Tables with Demo Examples

Top 7 Web Development Trends in the Market

Selenium IDE Tutorial: A Beginner's Guide to No-Code Automation Testing

My 4-Week Journey to Mastering Selenium WebDriver

Automating Google Search with Selenium WebDriver: Handling AJAX Calls

Master Prompt Engineering: A Beginner's Guide to LLMs Like ChatGPT & Claude

Applications of Artificial Intelligence in Healthcare

Recent Blogs

Trending Posts

Latest Tutorials

Techlistic Links

Some Basic Python Optimisations

Follow Techlistic

Popular posts from this blog