Reference counting
Python uses reference counting for its memory management. This means that it keeps count of how many references your script still has to an object, and when that count becomes zero, the object is deleted.
a = "text" b = a a = None b = None
The string object containing “text” is deleted after the last reference, in this case b, is set to None (or any other value).
The problem of circular references
A problem arises when one object holds a reference to another which has a reference back to the first, either directly or in a chain. This is not so hypothetical, as it often happens in hierarchical or cyclic graph structures. In the following tree structure for example, parent nodes hold references to child nodes, while the child nodes hold references to their parent.
class Node: def __init__(self, parent, data): self.parent = parent self.first = None self.last = None self.next = None self.data = data if parent: if self.parent.last: self.prev = self.parent.last self.parent.last.next = self self.parent.last = self else: self.prev = None self.parent.first = self self.parent.last = self else: self.prev = None def dump(self, indent = 0): text = " " * indent + self.data + "\n" node = self.first while node: text += node.dump(indent + 2) node = node.next return text root = Node(None, "root") leaf1 = Node(root, "leaf1") leaf2 = Node(root, "leaf2") leaf11 = Node(leaf1, "leaf11") leaf12 = Node(leaf1, "leaf12") print root.dump()
What this means is that even if root, leaf1, leaf2, leaf11 and leaf12 fall out of scope, the nodes still hold references to each other, so their reference count can’t reach zero. Python has a separate garbage collection procedure for such cases, however there are a few issues you need to be aware of. One is that this procedure isn’t run very often, so data can accumulate, and once it is run, it can stall the program for a moment if it needs to free many huge structures at once. While this is not a big problem in most cases, a more critical issue is that the objects won’t be collected if any of your objects make use of a __del__ destructor. Since Python has no idea in which order it needs to delete a circular chain of references to objects which use custom destructors. Adding the following code to the node class will keep python from collecting the tree.
def __del__(self): print "deleting " + self.data
Weak references
One solution is to use weak references in cases where we don’t actually own objects. In the tree example above, the parent nodes own their children, but the child nodes shouldn’t own their parent or siblings. These references should be weak references instead. To fix this example, the parent, prev and last references are made into weak references. The first and next references are kept, to make sure the nodes aren’t collected as long as the root exists.
import weakref class Node: def __init__(self, parent, data): self.parent = weakref.ref(parent) if parent else None self.first = None self.last = None self.next = None self.data = data if parent: if parent.last: self.prev = parent.last parent.last().next = self parent.last = weakref.ref(self) else: self.prev = None parent.first = self parent.last = weakref.ref(self) else: self.prev = None
More information on the weakref module can he found here: http://docs.python.org/library/weakref.html.
Breaking the cycles
The other solution is to explicitly break the cycles when you don’t need the structure anymore. Thus adding a destroy method which explicitly releases all references.
def destroy(self): node = self.first while node: node.destroy() node = node.next self.first = None self.last = None self.prev = None self.next = None
Both solutions solve the memory leak. Weak references might be preferred however, since having to remember that you need to destroy certain objects is not Python-like, and will thus be easily overlooked.
`weakref.proxy` is much more suited in this case.
In this simple example it would make it look better by removing the need to dereference yes. Though I prefer ref as proxy comes with several problems when testing equality or hashing. Proxy is nice if you want to forget you are using a weakref and not a reference, but unless someone only needs limited functionality I would advise using ref instead.
nice article, thank you very much