Fixing and avoiding memory leaks in Python

12 Apr

Reference counting

Python uses reference counting for its memory management. This means that it keeps count of how many references your script still has to an object, and when that count becomes zero, the object is deleted.

a = "text"
b = a
a = None
b = None

The string object containing “text” is deleted after the last reference, in this case b, is set to None (or any other value).

The problem of circular references

A problem arises when one object holds a reference to another which has a reference back to the first, either directly or in a chain. This is not so hypothetical, as it often happens in hierarchical or cyclic graph structures. In the following tree structure for example, parent nodes hold references to child nodes, while the child nodes hold references to their parent.

class Node:
    def __init__(self, parent, data):
        self.parent = parent
        self.first = None
        self.last = None
        self.next = None
        self.data = data

        if parent:
            if self.parent.last:
                self.prev = self.parent.last
                self.parent.last.next = self
                self.parent.last = self
            else:
                self.prev = None
                self.parent.first = self
                self.parent.last = self
        else:
            self.prev = None

    def dump(self, indent = 0):
        text = " " * indent + self.data + "\n"

        node = self.first
        while node:
            text += node.dump(indent + 2)
            node = node.next

        return text

root = Node(None, "root")
leaf1 = Node(root, "leaf1")
leaf2 = Node(root, "leaf2")
leaf11 = Node(leaf1, "leaf11")
leaf12 = Node(leaf1, "leaf12")

print root.dump()

What this means is that even if root, leaf1, leaf2, leaf11 and leaf12 fall out of scope, the nodes still hold references to each other, so their reference count can’t reach zero. Python has a separate garbage collection procedure for such cases, however there are a few issues you need to be aware of. One is that this procedure isn’t run very often, so data can accumulate, and once it is run, it can stall the program for a moment if it needs to free many huge structures at once. While this is not a big problem in most cases, a more critical issue is that the objects won’t be collected if any of your objects make use of a __del__ destructor. Since Python has no idea in which order it needs to delete a circular chain of references to objects which use custom destructors. Adding the following code to the node class will keep python from collecting the tree.

def __del__(self):
        print "deleting " + self.data

Weak references

One solution is to use weak references in cases where we don’t actually own objects. In the tree example above, the parent nodes own their children, but the child nodes shouldn’t own their parent or siblings. These references should be weak references instead. To fix this example, the parent, prev and last references are made into weak references. The first and next references are kept, to make sure the nodes aren’t collected as long as the root exists.

import weakref

class Node:
    def __init__(self, parent, data):
        self.parent = weakref.ref(parent) if parent else None
        self.first = None
        self.last = None
        self.next = None
        self.data = data

        if parent:
            if parent.last:
                self.prev = parent.last
                parent.last().next = self
                parent.last = weakref.ref(self)
            else:
                self.prev = None
                parent.first = self
                parent.last = weakref.ref(self)
        else:
            self.prev = None

More information on the weakref module can he found here: http://docs.python.org/library/weakref.html.

Breaking the cycles

The other solution is to explicitly break the cycles when you don’t need the structure anymore. Thus adding a destroy method which explicitly releases all references.

def destroy(self):

    node = self.first
    while node:
        node.destroy()
        node = node.next

    self.first = None
    self.last = None
    self.prev = None
    self.next = None

Both solutions solve the memory leak. Weak references might be preferred however, since having to remember that you need to destroy certain objects is not Python-like, and will thus be easily overlooked.

3 Responses to “Fixing and avoiding memory leaks in Python”

  1. schlamar December 21, 2012 at 12:31 pm #

    `weakref.proxy` is much more suited in this case.

    • mflerackers December 21, 2012 at 11:16 pm #

      In this simple example it would make it look better by removing the need to dereference yes. Though I prefer ref as proxy comes with several problems when testing equality or hashing. Proxy is nice if you want to forget you are using a weakref and not a reference, but unless someone only needs limited functionality I would advise using ref instead.

  2. Sherif Saad March 21, 2015 at 7:58 pm #

    nice article, thank you very much

Leave a comment