Pythongasm — Home

Memory Management In Python Interning And Optimization

Follow @adarshpunj

Introduction

Behind Python’s simplicity lies a series of thoughtful decisions that makes it so user friendly. In this article, we will look at some of these decisions, and understand how memory is managed in CPython.

is operator

Let’s start with the is operator.

We'll be using shell/IDLE for all these operations:

>>> a = [1,2,3]
>>> b = [1,2,3]
a is b

False

This is simply because a and b are pointing to two different locations in the memory.

Pointers and Memory

We can verify this using id().

id(a) == id(b)

False

We can confirm that is operator is doing nothing but just comparing the memory addresses of two objects if we look at its implementation:

int Py_Is(PyObject *x, PyObject *y)
{
    return (x == y);
}

Source: CPython on GitHub

Tip

Use is operator instead of comparing memory addresses using id(), which is a costlier operation.

Now, if we do b = a, then b starts pointing to same memory as a, hence a is b is True.

This is also true for all singletons in Python:

>>> a = None
>>> b = None
a is b

True

Singletons are objects that only exist once in the memory. If a variable is bind to a singleton object, it will point to this one and only memory.

However, integers are not singletons:

>>> a = 1234
>>> b = 1234
a is b

False

Great, let’s try that again with a different integer:

>>> a = 5
>>> b = 5
a is b

True

Oops, this looks weird. Actually, Python treats smaller integers a bit differently. When you start a Python environment, some integers are already allocated into memory. Hence, when you do a = 5, Python smartly tags a to an already allocated memory address. Same goes for b = 4.

“smaller integers” here means integers in the range [-5, 256].

Singleton objects

IDLE vs .py file

We know that integers are immutable objects (can’t be changed at a memory level), so doesn’t it make sense to reuse all immutable objects?

They are not going to change anyway — so why not just allocate some space when it’s created for the first time, and whenever a new assignment is made with that object, simply point to that address.

That’s what Python often does. This is called interning.

Let’s rerun the same code in a .py file:

a = 1234
b = 1234

k = a is b
k

True

This shouldn’t be surprising because when we compile a .py file, the whole code is parsed by Python (unlike shell or IDLE where each line is executed separately), and Python is able to intern values when required.

Interning responsibly

However, we shouldn’t count on this behaviour of interning. This doesn’t mean the memory management is weird and unpredictable.

The goal here is efficiency, regardless of how it is achieved — interning or not.

In some cases, the time spent for interning is a good trade-off but it might not be a great choice, when you have a big big string which probably isn’t going to be reused or compared. So it's better to save the time.

Since this is so subjective, and depends on the context, Python gives you the flexibility to choose if you want to intern a value manually, with it’s sys.intern() API.

Conclusion

We saw some choices Python makes to make things efficient. These are implementation level details, and not a guarantee which your code should rely on; rather the takeaway here is understanding low level design.

For example, interning -5 is a choice, and so is interning all strings of __len__() = 1. These might change over time.

There’s lot more to this topic, e.g. garbage collection, which will be covered in upcoming articles.


Permanent link to this article
pygs.me/003

Built with using FastAPI