forked from RafeKettler/magicmethods
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmagicmethods.tex
executable file
·659 lines (489 loc) · 45.4 KB
/
magicmethods.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
\documentclass[a4paper,11pt]{article}
\title{A Guide to Python's Magic Methods}
\author{Rafe Kettler}
\date{\today}
\usepackage{fullpage}
\usepackage{underscore}
\usepackage{listings}
\lstloadlanguages{Python}
\lstset{
language = Python,
basicstyle = \ttfamily\footnotesize,
keepspaces = true,
showstringspaces = false
}
\newcommand{\code}[1]{\texttt{#1}}
\begin{document}
\maketitle
\section{Introduction}
This guide is the culmination of a few months' worth of blog posts. The subject is \textbf{magic methods}.
What are magic methods? They're everything in object-oriented Python. They're special methods that you can define to add "magic" to your classes. They're always surrounded by double underscores (e.g. \code{__init__} or \code{__lt__}). They're also not as well documented as they need to be. All of the magic methods for Python appear in the same section in the Python docs, but they're scattered about and only loosely organized. There's hardly an example to be found in that section (and that may very well be by design, since they're all detailed in the \emph{language reference}, along with boring syntax descriptions, etc.).
So, to fix what I perceived as a flaw in Python's documentation, I set out to provide some more plain-English, example-driven documentation for Python's magic methods. I started out with weekly blog posts, and now that I've finished with those, I've put together this guide.
I hope you enjoy it. Use it as a tutorial, a refresher, or a reference; it's just intended to be a user-friendly guide to Python's magic methods.
\section{Construction and Initialization}
Everyone knows the most basic magic method, \code{__init__}. It's the way that we can define the initialization behavior of an object. However, when I call \code{x = SomeClass()}, \code{__init__} is not the first thing to get called. Actually, it's a method called \code{__new__}, which actually creates the instance, then passes any arguments at creation on to the initializer. At the other end of the object's lifespan, there's \code{__del__}. Let's take a closer look at these 3 magic methods:
\begin{description}
\item[\code{__new__(cls, [...)}]
\code{__new__} is the first method to get called in an object's instantiation. It takes the class, then any other arguments that it will pass along to \code{__init__}. \code{__new__} is used fairly rarely, but it does have its purposes, particularly when subclassing an immutable type like a tuple or a string. I don't want to go in to too much detail on \code{__new__} because it's not too useful, but it is covered in great detail in the Python docs.
\item[\code{__init__(self, [...)}]
The initializer for the class. It gets passed whatever the primary constructor was called with (so, for example, if we called \code{x = SomeClass(10, 'foo')}, \code{__init__} would get passed \code{10} and \code{'foo'} as arguments. \code{__init__} is almost universally used in Python class definitions.
\item[\code{__del__(self)}]
If \code{__new__} and \code{__init__} formed the constructor of the object, \code{__del__} is the destructor. It doesn't implement behavior for the statement \code{del x} (so that code would not translate to \code{x.__del__()}). Rather, it defines behavior for when an object is garbage collected. It can be quite useful for objects that might require extra cleanup upon deletion, like sockets or file objects. Be careful, however, as there is no guarantee that \code{__del__} will be executed if the object is still alive when the interpreter exits, so \code{__del__} can't serve as a replacement for good coding practices (like always closing a connection when you're done with it. In fact, \code{__del__} should almost never be used because of the precarious circumstances under which it is called; use it with caution!
\end{description}
\noindent
Putting it all together, here's an example of \code{__init__} and \code{__del__} in action:
\lstinputlisting{listings/fileobject.py}
\section{Making Operators Work on Custom Classes}
One of the biggest advantages of using Python's magic methods is that they provide a simple way to make objects behave like built-in types. That means you can avoid ugly, counter-intuitive, and nonstandard ways of performing basic operators. In some languages, it's common to do something like this:
\begin{lstlisting}
if instance.equals(other_instance):
# do something
\end{lstlisting}
\noindent
You could certainly do this in Python, too, but this adds confusion and is unnecessarily verbose. Different libraries might use different names for the same operations, making the client do way more work than necessary. With the power of magic methods, however, we can define one method (\code{__eq__}, in this case), and say what we \emph{mean} instead:
\begin{lstlisting}
if instance == other_instance:
#do something
\end{lstlisting}
\noindent
That's part of the power of magic methods. The vast majority of them allow us to define meaning for operators so that we can use them on our own classes just like they were built in types.
\subsection{Comparison magic methods}
Python has a whole slew of magic methods designed to implement intuitive comparisons between objects using operators, not awkward method calls. They also provide a way to override the default Python behavior for comparisons of objects (by reference). Here's the list of those methods and what they do:
\begin{description}
\item[\code{__cmp__(self, other)}]
: \code{__cmp__} is the most basic of the comparison magic methods. It actually implements behavior for all of the comparison operators (<, ==, !=, etc.), but it might not do it the way you want (for example, if whether one instance was equal to another were determined by one criterion and and whether an instance is greater than another were determined by something else). \code{__cmp__} should return a negative integer if \code{self < other}, zero if \code{self == other}, and positive if \code{self > other}. It's usually best to define each comparison you need rather than define them all at once, but \code{__cmp__} can be a good way to save repetition and improve clarity when you need all comparisons implemented with similar criteria.
\item[\code{__eq__(self, other)}]
Defines behavior for the equality operator, \code{==}.
\item[\code{__ne__(self, other)}]
Defines behavior for the inequality operator, \code{!=}.
\item[\code{__lt__(self, other)}]
Defines behavior for the less-than operator, \code{<}.
\item[\code{__gt__(self, other)}]
Defines behavior for the greater-than operator, \code{>}.
\item[\code{__le__(self, other)}]
Defines behavior for the less-than-or-equal-to operator, \code{<=}.
\item[\code{__ge__(self, other)}]
Defines behavior for the greater-than-or-equal-to operator, \code{>=}.
\end{description}
For an example, consider a class to model a word. We might want to compare words lexicographically (by the alphabet), which is the default comparison behavior for strings, but we also might want to do it based on some other criterion, like length or number of syllables. In this example, we'll compare by length. Here's an implementation:
\lstinputlisting{listings/word.py}
Now, we can create two \code{Word}s (by using \code{Word('foo')} and \code{Word('bar')}) and compare them based on length. Note, however, that we didn't define \code{__eq__} and \code{__ne__}. This is because this would lead to some weird behavior (notably that \code{Word('foo') == Word('bar')} would evaluate to true). It wouldn't make sense to test for equality based on length, so we fall back on \code{str}'s implementation of equality.
Now would be a good time to note that you don't have to define every comparison magic method to get rich comparisons. The standard library has kindly provided us with a class decorator in the module \code{functools} that will define all rich comparison methods if you only define \code{__eq__} and one other (e.g. \code{__gt__}, \code{__lt__}, etc.) This feature is only available in Python 2.7, but when you get a chance it saves a great deal of time and effort. You can use it by placing \code{@total_ordering} above your class definition.
\subsection{Numeric Magic Methods}
Just like you can create ways for instances of your class to be compared with comparison operators, you can define behavior for numeric operators. Buckle your seat belts, folks, there's a lot of these. For organization's sake, I've split the numeric magic methods into 5 categories: unary operators, normal arithmetic operators, reflected arithmetic operators (more on this later), augmented assignment, and type conversions.
\subsubsection{Unary operators and functions}
Unary operators and functions only have one operand, e.g. negation, absolute value, etc.
\begin{description}
\item[\code{__pos__(self)}]
Implements behavior for unary positive (e.g. \code{+some_object})
\item[\code{__neg__(self)}]
Implements behavior for negation (e.g. \code{-some_object})
\item[\code{__abs__(self)}]
Implements behavior for the built in \code{abs()} function.
\item[\code{__invert__(self)}]
Implements behavior for inversion using the \code{\char126} operator.
\item[\code{__round__(self, n)}]
Implements behavior for the buil in \code{round()} function. \code{n} is the number of decimal places to round to.
\item[\code{__floor__(self)}]
: Implements behavior for \code{math.floor()}, i.e., rounding down to the nearest integer.
\item[\code{__ceil__(self)}]
: Implements behavior for \code{math.ceil()}, i.e., rounding up to the nearest integer.
\item[\code{__trunc__(self)}]
: Implements behavior for \code{math.trunc()}, i.e., truncating to an integral.
\end{description}
\subsection{Normal arithmetic operators}
Now, we cover the typical binary operators (and a function or two): +, -, * and the like. These are, for the most part, pretty self-explanatory.
\begin{description}
\item[\code{__add__(self, other)}]
Implements addition.
\item[\code{__sub__(self, other)}]
Implements subtraction.
\item[\code{__mul__(self, other)}]
Implements multiplication.
\item[\code{__floordiv__(self, other)}]
Implements integer division using the \code{//} operator.
\item[\code{__div__(self, other)}]
Implements division using the \code{/} operator.
\item[\code{__truediv__(self, other)}]
Implements _true_ division. Note that this only works when \code{from __future__ import division} is in effect.
\item[\code{__mod__(self, other)}]
Implements modulo using the \code{\%} operator.
\item[\code{__divmod__(self, other)}]
Implements behavior for long division using the \code{divmod()} built in function.
\item[\code{__pow__}]
Implements behavior for exponents using the \code{**} operator.
\item[\code{__lshift__(self, other)}]
Implements left bitwise shift using the \code{<<} operator.
\item[\code{__rshift__(self, other)}]
Implements right bitwise shift using the \code{>>} operator.
\item[\code{__and__(self, other)}]
Implements bitwise and using the \code{\&} operator.
\item[\code{__or__(self, other)}]
Implements bitwise or using the \code{|} operator.
\item[\code{__xor__(self, other)}]
Implements bitwise xor using the \code{\char94} operator.
\end{description}
\subsubsection{Reflected arithmetic operators}
You know how I said I would get to reflected arithmetic in a bit? Some of you might think it's some big, scary, foreign concept. It's actually quite simple. Here's an example:
\begin{lstlisting}
some_object + other
\end{lstlisting}
\noindent
That was "normal" addition. The reflected equivalent is the same thing, except with the operands switched around:
\begin{lstlisting}
other + some_object
\end{lstlisting}
\noindent
So, all of these magic methods do the same thing as their normal equivalents, except the perform the operation with other as the first operand and self as the second, rather than the other way around. In most cases, the result of a reflected operation is the same as its normal equivalent, so you may just end up defining \code{__radd__} as calling \code{__add__} and so on. Note that the object on the left hand side of the operator (\code{other} in the example) must not define (or return \code{NotImplemented}) for its definition of the non-reflected version of an operation. For instance, in the example, \code{some_object.__radd__} will only be called if `other` does not define \code{__add__}.
\begin{description}
\item[\code{__radd__(self, other)}]
Implements reflected addition.
\item[\code{__rsub__(self, other)}]
Implements reflected subtraction.
\item[\code{__rmul__(self, other)}]
Implements reflected multiplication.
\item[\code{__rfloordiv__(self, other)}]
Implements reflected integer division using the \code{//} operator.
\item[\code{__rdiv__(self, other)}]
Implements reflected division using the \code{/} operator.
\item[\code{__rtruediv__(self, other)}]
Implements reflected _true_ division. Note that this only works when \code{from __future__ import division} is in effect.
\item[\code{__rmod__(self, other)}]
Implements reflected modulo using the \code{\%} operator.
\item[\code{__rdivmod__(self, other)}]
Implements behavior for long division using the \code{divmod()} built in function, when \code{divmod(other, self)} is called.
\item[\code{__rpow__}]
Implements behavior for reflected exponents using the \code{**} operator.
\item[\code{__rlshift__(self, other)}]
Implements reflected left bitwise shift using the \code{<<} operator.
\item[\code{__rrshift__(self, other)}]
Implements reflected right bitwise shift using the \code{>>} operator.
\item[\code{__rand__(self, other)}]
Implements reflected bitwise and using the \code{\&} operator.
\item[\code{__ror__(self, other)}]
Implements reflected bitwise or using the \code{|} operator.
\item[\code{__rxor__(self, other)}]
Implements reflected bitwise xor using the \code{\char94} operator.
\end{description}
\subsubsection{Augmented assignment}
Python also has a wide variety of magic methods to allow custom behavior to be defined for augmented assignment. You're probably already familiar with augmented assignment, it combines ``normal'' operators with assignment. If you still don't know what I'm talking about, here's an example:
\begin{lstlisting}
x = 5
x += 1 # in other words x = x + 1
\end{lstlisting}
Each of these methods should return the value that the variable on the left hand side should be assigned to (for instance, for \code{a += b}, \code{__iadd__} might return \code{a + b}, which would be assigned to \code{a}). Here's the list:
\begin{description}
\item[\code{__iadd__(self, other)}]
Implements addition with assignment.
\item[\code{__isub__(self, other)}]
Implements subtraction with assignment.
\item[\code{__imul__(self, other)}]
Implements multiplication with assignment.
\item[\code{__ifloordiv__(self, other)}]
Implements integer division with assignment using the \code{//=} operator.
\item[\code{__idiv__(self, other)}]
Implements division with assignment using the \code{/=} operator.
\item[\code{__itruediv__(self, other)}]
Implements _true_ division with assignment. Note that this only works when \code{from __future__ import division} is in effect.
\item[\code{__imod_(self, other)}]
Implements modulo with assignment using the \code{\%=} operator.
\item[\code{__ipow__}]
Implements behavior for exponents with assignment using the \code{**=} operator.
\item[\code{__ilshift__(self, other)}]
Implements left bitwise shift with assignment using the \code{<<=} operator.
\item[\code{__irshift__(self, other)}]
Implements right bitwise shift with assignment using the \code{>>=} operator.
\item[\code{__iand__(self, other)}]
Implements bitwise and with assignment using the \code{\&=} operator.
\item[\code{__ior__(self, other)}]
Implements bitwise or with assignment using the \code{|=} operator.
\item[\code{__ixor__(self, other)}]
Implements bitwise xor with assignment using the \code{\char94=} operator.
\end{description}
\subsubsection{Type conversion magic methods}
Python also has an array of magic methods designed to implement behavior for built in type conversion functions like \code{float()}. Here they are:
\begin{description}
\item[\code{__int__(self)}]
Implements type conversion to int.
\item[\code{__long__(self)}]
Implements type conversion to long.
\item[\code{__float__(self)}]
Implements type conversion to float.
\item[\code{__complex__(self)}]
Implements type conversion to complex.
\item[\code{__oct__(self)}]
Implements type conversion to octal.
\item[\code{__hex__(self)}]
Implements type conversion to hexadecimal.
\item[\code{__index__(self)}]
Implements type conversion to an int when the object is used in a slice expression. If you define a custom numeric type that might be used in slicing, you should define \code{__index__}.
\item[\code{__trunc__(self)}]
Called when \code{math.trunc(self)} is called. \code{__trunc__} should return the value of \code{self} truncated to an integral type (usually a long).
\item[\code{__coerce__(self, other)}]
Method to implement mixed mode arithmetic. \code{__coerce__} should return \code{None} if type conversion is impossible. Otherwise, it should return a pair (2-tuple) of \code{self} and \code{other}, manipulated to have the same type.
\end{description}
\section{Representing your Classes}
It's often useful to have a string representation of a class. In Python, there's a few methods that you can implement in your class definition to customize how built in functions that return representations of your class behave.
\begin{description}
\item[\code{__str__(self)}]
Defines behavior for when \code{str()} is called on an instance of your class.
\item[\code{__repr__(self)}]
Defines behavior for when \code{repr()} is called on an instance of your class. The major difference between \code{str()} and \code{repr()} is intended audience. \code{repr()} is intended to produce output that is mostly machine-readable (in many cases, it could be valid Python code even), whereas \code{str()} is intended to be human-readable.
\item[\code{__unicode__(self)}]
Defines behavior for when \code{unicode()} is called on an instance of your class. \code{unicode()} is like \code{str()}, but it returns a unicode string. Be wary: if a client calls \code{str()} on an instance of your class and you've only defined \code{__unicode__()}, it won't work. You should always try to define \code{__str__()} as well in case someone doesn't have the luxury of using unicode.
\item[\code{__format__(self, formatstr)}]
Defines behavior for when an instance of your class is used in new-style string formatting. For instance, \code{"Hello, {0:abc}!".format(a)} would lead to the call \code{a.__format__("abc")}. This can be useful for defining your own numerical or string types that you might like to give special formatting options.
\item[\code{__hash__(self)}]
Defines behavior for when \code{hash()} is called on an instance of your class. It has to return an integer, and its result is used for quick key comparison in dictionaries. Note that this usually entails implementing \code{__eq__} as well. Live by the following rule: \code{a == b} implies \code{hash(a) == hash(b)}.
\item[\code{__nonzero__(self)}]
Defines behavior for when \code{bool()} is called on an instance of your class. Should return True or False, depending on whether you would want to consider the instance to be True or False.
\item[\code{__dir__(self)}]
: Defines behavior for when \code{dir()} is called on an instance of your class. This method should return a list of attributes for the user. Typically, implementing \code{__dir__} is unnecessary, but it can be vitally important for interactive use of your classes if you redefine \code{__getattr__} or \code{__getattribute__} (which you will see in the next section) or are otherwise dynamically generating attributes.
\end{description}
\noindent
We're pretty much done with the boring (and example-free) part of the magic methods guide. Now that we've covered some of the more basic magic methods, it's time to move to more advanced material.
\section{Controlling Attribute Access}
Many people coming to Python from other languages complain that it lacks true encapsulation for classes (e.g. no way to define private attributes and then have public getter and setters). This couldn't be farther than the truth: it just happens that Python accomplishes a great deal of encapsulation through ``magic'', instead of explicit modifiers for methods or fields. Take a look:
\begin{description}
\item[\code{__getattr__(self, name)}]
You can define behavior for when a user attempts to access an attribute that doesn't exist (either at all or yet). This can be useful for catching and redirecting common misspellings, giving warnings about using deprecated attributes (you can still choose to compute and return that attribute, if you wish), or deftly handing an \code{AttributeError}. It only gets called when a nonexistent attribute is accessed, however, so it isn't a true encapsulation solution.
\item[\code{__setattr__(self, name, value)}]
Unlike \code{__getattr__}, \code{__setattr__} is an encapsulation solution. It allows you to define behavior for assignment to an attribute regardless of whether or not that attribute exists, meaning you can define custom rules for any changes in the values of attributes. However, you have to be careful with how you use \code{__setattr__}, as the example at the end of the list will show.
\item[\code{__delattr__}]
This is the exact same as \code{__setattr__}, but for deleting attributes instead of setting them. The same precautions need to be taken as with \code{__setattr__} as well in order to prevent infinite recursion (calling \code{del self.name} in the implementation of \code{__delattr__} would cause infinite recursion).
\item[\code{__getattribute__(self, name)}]
After all this, \code{__getattribute__} fits in pretty well with its companions \code{__setattr__} and \code{__delattr__}. However, I don't recommend you use it. \code{__getattribute__} can only be used with new-style classes (all classes are new-style in the newest versions of Python, and in older versions you can make a class new-style by subclassing \code{object}. It allows you to define rules for whenever an attribute's value is accessed. It suffers from some similar infinite recursion problems as its partners-in-crime (this time you call the base class's \code{__getattribute__} method to prevent this). It also mainly obviates the need for \code{__getattr__}, which only gets called when \code{__getattribute__} is implemented if it is called explicitly or an \code{AttributeError} is raised. This method can be used (after all, it's your choice), but I don't recommend it because it has a small use case (it's far more rare that we need special behavior to retrieve a value than to assign to it) and because it can be really difficult to implement bug-free.
\end{description}
You can easily cause a problem in your definitions of any of the methods controlling attribute access. Consider this example:
\begin{lstlisting}
def __setattr__(self, name, value):
self.name = value
# since every time an attribute is assigned, __setattr__()
# is called, this is recursion. So this really means
# self.__setattr__('name', value). Since the method keeps
# calling itself, the recursion goes on forever causing a crash
def __setattr__(self, name, value):
self.__dict__[name] = value # assigning to the dict of names in the class
# define custom behavior here
\end{lstlisting}
Again, Python's magic methods are incredibly powerful, and with great power comes great responsibility. It's important to know the proper way to use magic methods so you don't break any code.
So, what have we learned about custom attribute access in Python? It's not to be used lightly. In fact, it tends to be excessively powerful and counter-intuitive. But the reason why it exists is to scratch a certain itch: Python doesn't seek to make bad things impossible, but just to make them difficult. Freedom is paramount, so you can really do whatever you want. Here's an example of some of the special attribute access methods in action (note that we use \code{super} because not all classes have an attribute \code{__dict__}):
\lstinputlisting{listings/access.py}
\section{Making Custom Sequences}
There's a number of ways to get your Python classes to act like built in sequences (dict, tuple, list, string, etc.). These are by far my favorite magic methods in Python because of the absurd degree of control they give you and the way that they magically make a whole array of global functions work beautifully on instances of your class. But before we get down to the good stuff, a quick word on requirements.
\subsection{Requirements}
Now that we're talking about creating your own sequences in Python, it's time to talk about _protocols_. Protocols are somewhat similar to interfaces in other languages in that they give you a set of methods you must define. However, in Python protocols are totally informal and require no explicit declarations to implement. Rather, they're more like guidelines.
Why are we talking about protocols now? Because implementing custom container types in Python involves using some of these protocols. First, there's the protocol for defining immutable containers: to make an immutable container, you need only define \code{__len__} and \code{__getitem__} (more on these later). The mutable container protocol requires everything that immutable containers require plus \code{__setitem__} and \code{__delitem__}. Lastly, if you want your object to be iterable, you'll have to define \code{__iter__}, which returns an iterator. That iterator must conform to an iterator protocol, which requires iterators to have methods called \code{__iter__}(returning itself) and \code{next}.
\subsection{The magic behind containers}
\begin{description}
\item[\code{__len__(self)}]
Returns the length of the container. Part of the protocol for both immutable and mutable containers.
\item[\code{__getitem__(self, key)}]
Defines behavior for when an item is accessed, using the notation \code{self[key]}. This is also part of both the mutable and immutable container protocols. It should also raise appropriate exceptions: \code{TypeError} if the type of the key is wrong and \code{KeyError} if there is no corresponding value for the key.
\item[\code{__setitem__(self, key, value)}]
Defines behavior for when an item is assigned to, using the notation \code{self[key] = value}. This is part of the mutable container protocol. Again, you should raise \code{KeyError} and \code{TypeError} where appropriate.
\item[\code{__delitem__(self, key)}]
Defines behavior for when an item is deleted (e.g. \code{del self[key]}). This is only part of the mutable container protocol. You must raise the appropriate exceptions when an invalid key is used.
\item[\code{__iter__(self)}]
Should return an iterator for the container. Iterators are returned in a number of contexts, most notably by the \code{iter()} built in function and when a container is looped over using the form \code{for x in container:}. Iterators are their own objects, and they also must define an \code{__iter__} method that returns \code{self}.
\item[\code{__reversed__(self)}]
Called to implement behavior for the \code{reversed()} built in function. Should return a reversed version of the sequence. Implement this only if the sequence class is ordered, like list or tuple.
\item[\code{__contains__(self, item)}]
\code{__contains__} defines behavior for membership tests using \code{in} and \code{not in}. Why isn't this part of a sequence protocol, you ask? Because when \code{__contains__} isn't defined, Python just iterates over the sequence and returns \code{True} if it comes across the item it's looking for.
\item[\code{__missing__(self, key)}]
\code{__missing__} is used in subclasses of \code{dict}. It defines behavior for whenever a key is accessed that does not exist in a dictionary (so, for instance, if I had a dictionary \code{d} and said \code{d["george"]} when \code{"george"} is not a key in the dict, \code{d.__missing__("george")} would be called).
\end{description}
\subsection{An example}
For our example, let's look at a list that implements some functional constructs that you might be used to from other languages (Haskell, for example):
\lstinputlisting{listings/list.py}
\noindent
There you have it, a (marginally) useful example of how to implement your own sequence. Of course, there are more useful applications of custom sequences, but quite a few of them are already implemented in the standard library (batteries included, right?), like \code{Counter}, \code{OrderedDict}, and \code{NamedTuple}.
\section{Reflection}
You can also control how reflection works using the built in functions \code{isinstance} and \code{issubclass()} behaves by defining magic methods. The magic methods are:
\begin{description}
\item[\code{__instancecheck__(self, instance)}]
Checks if an instance is an instance of the class you defined (e.g. \code{isinstance(instance, class)}.
\item[\code{__subclasscheck__(self, subclass)}]
Checks if a class subclasses the class you defined (e.g. \code{issubclass(subclass, class)}).
\end{description}
The use case for these magic methods might seem small, and that may very well be true. I won't spend too much more time on reflection magic methods because they aren't very important, but they reflect something important about object-oriented programming in Python and Python in general: there is almost always an easy way to do something, even if it's rarely necessary. These magic methods might not seem useful, but if you ever need them you'll be glad that they're there (and that you read this guide!).
\section{Abstract Base Classes}
See http://docs.python.org/2/library/abc.html.
\section{Callable Objects}
As you may already know, in Python, functions are first-class objects. This means that they can be passed to functions and methods just as if they were objects of any other kind. This is an incredibly powerful feature.
A special magic method in Python allows instances of your classes to behave as if they were functions, so that you can ``call'' them, pass them to functions that take functions as arguments, and so on. This is another powerful convenience feature that makes programming in Python that much sweeter.
\begin{description}
\item[\code{__call__(self, [args...])}]
Allows an instance of a class to be called as a function. Essentially, this means that \code{x()} is the same as \code{x.__call__()}. Note that \code{__call__} takes a variable number of arguments; this means that you define \code{__call__} as you would any other function, taking however many arguments you'd like it to.
\end{description}
\code{__call__} can be particularly useful in classes whose instances that need to often change state. ``Calling'' the instance can be an intuitive and elegant way to change the object's state. An example might be a class representing an entity's position on a plane:
\lstinputlisting{listings/call.py}
\section{Context Managers}
In Python 2.5, a new keyword was introduced in Python along with a new method for code reuse, the \code{with} statement. The concept of context managers was hardly new in Python (it was implemented before as a part of the library), but not until PEP 343 was accepted did it achieve status as a first class language construct. You may have seen with statements before:
\begin{lstlisting}
with open('foo.txt') as bar:
# perform some action with bar
\end{lstlisting}
Context managers allow setup and cleanup actions to be taken for objects when their creation is wrapped with a \code{with} statement. The behavior of the context manager is determined by two magic methods:
\begin{description}
\item[\code{__enter__(self)}]
Defines what the context manager should do at the beginning of the block created by the \code{with} statement. Note that the return value of \code{__enter__} is bound to the \emph{target} of the \code{with} statement, or the name after the \code{as}.
\item[\code{__exit__(self, exception_type, exception_value, traceback)}]
Defines what the context manager should do after its block has been executed (or terminates). It can be used to handle exceptions, perform cleanup, or do something always done immediately after the action in the block. If the block executes successfully, \code{exception_type}, \code{exception_value}, and \code{traceback} will be \code{None}. Otherwise, you can choose to handle the exception or let the user handle it; if you want to handle it, make sure \code{__exit__} returns \code{True} after all is said and done. If you don't want the exception to be handled by the context manager, just let it happen.
\end{description}
\code{__enter__} and \code{__exit__} can be useful for specific classes that have well-defined and common behavior for setup and cleanup. You can also use these methods to create generic context managers that wrap other objects. Here's an example:
\lstinputlisting{listings/closer.py}
\noindent
Here's an example of \code{Closer} in action, using an FTP connection to demonstrate it (a closable socket):
\begin{lstlisting}
>>> from magicmethods import Closer
>>> from ftplib import FTP
>>> with Closer(FTP('ftp.somesite.com')) as conn:
... conn.dir()
...
# output omitted for brevity
>>> conn.dir()
# long AttributeError message, can't use a connection that's closed
>>> with Closer(int(5)) as i:
... i += 1
...
Not closable.
>>> i
6
\end{lstlisting}
\noindent
See how our wrapper gracefully handled both proper and improper uses? That's the power of context managers and magic methods. Note that the Python standard library includes a module \code{contextlib} that contains a context manager, \code{contextlib.closing()}, that does approximately the same thing (without any handling of the case where an object does not have a \code{close()} method).
\section{Building Descriptor Objects}
Descriptors are classes that can be used as proxies for getting, setting, and deleting attributes. Descriptors are assigned as class attributes on a so-called owner class. When that attribute is accessed, the descriptor's special methods are called. Descriptors can be used to execute side-effects when attributes are updated, or to provide multiple views over an object's state (as shown in the example below).
A descriptor class implements at least one of \code{__get__}, \code{__set__}, or \code{__delete__}. In the following, \code{owner} is the owner class, and \code{instance} is the instance of the owner class.
\begin{description}
\item[\code{__get__(self, instance, owner)}]
Define behavior for when the attribute is retrieved. If the attribute is accessed via the class instead of an instance, \code{instance} is \code{None}.
\item[\code{__set__(self, instance, value)}]
Define behavior for when the attribute is assigned on an instance. (Note that assigning to the attribute on the class will not trigger this method, but simply replace the descriptor itself.)
\item[\code{__delete__(self, instance)}]
Define behavior for when the attribute is deleted.
\end{description}
\noindent
Now, an example of a useful application of descriptors: unit conversions. The \code{Distance} class can be accessed using either meters or feet. One of them is the authoritative value, and the other one is derived from that.
\lstinputlisting{listings/descriptor.py}
Some of Python's internal constructs, such as properties and bound methods, are implemented under the hood using descriptors.
\section{Copying}
Sometimes, particularly when dealing with mutable objects, you want to be able to copy an object and make changes without affecting what you copied from. This is where Python's \code{copy} comes into play. However (fortunately), Python modules are not sentient, so we don't have to worry about a Linux-based robot uprising, but we do have to tell Python how to efficiently copy things.
\code{__copy__(self)}
: Defines behavior for \code{copy.copy()} for instances of your class. \code{copy.copy()} returns a _shallow copy_ of your object -- this means that, while the instance itself is a new instance, all of its data is referenced -- i.e., the object itself is copied, but its data is still referenced (and hence changes to data in a shallow copy may cause changes in the original).
\code{__deepcopy__(self, memodict={})}
: Defines behavior for \code{copy.deepcopy()} for instances of your class. \code{copy.deepcopy()} returns a _deep copy_ of your object -- the object _and_ its data are both copied. \code{memodict} is a cache of previously copied objects -- this optimizes copying and prevents infinite recursion when copying recursive data structures. When you want to deep copy an individual attribute, call \code{copy.deepcopy()} on that attribute with \code{memodict} as the first argument.
What are some use cases for these magic methods? As always, in any case where you need more fine-grained control than what the default behavior gives you. For instance, if you are attempting to copy an object that stores a cache as a dictionary (which might be large), it might not make sense to copy the cache as well -- if the cache can be shared in memory between instances, then it should be.
\section{Pickling}
If you spend time with other Pythonistas, chances are you've at least heard of pickling. Pickling is a serialization process for Python data structures, and can be incredibly useful when you need to store an object and retrieve it later. It's also a major source of worries and confusion.
Pickling is so important that it doesn't just have its own module (\code{pickle}), but its own \emph{protocol} and the magic methods to go with it. But first, a brief word on how to pickle existing types(feel free to skip it if you already know).
\subsection{Pickling: A Quick Soak in the Brine}
Let's dive into pickling. Say you have a dictionary that you want to store and retrieve later. You could write it's contents to a file, carefully making sure that you write correct syntax, then retrieve it using either \code{exec()} or processing the file input. But this is precarious at best: if you store important data in plain text, it could be corrupted or changed in any number of ways to make your program crash or worse run malicious code on your computer. Instead, we're going to pickle it:
\begin{lstlisting}
import pickle
data = {'foo': [1, 2, 3],
'bar': ('Hello', 'world!'),
'baz': True}
jar = open('data.pkl', 'wb')
pickle.dump(data, jar) # write the pickled data to the file jar
jar.close()
\end{lstlisting}
\noindent
Now, a few hours later, we want it back. All we have to do is unpickle it:
\begin{lstlisting}
import pickle
pkl_file = open('data.pkl', 'rb') # connect to the pickled data
data = pickle.load(pkl_file) # load it into a variable
print data
pkl_file.close()
\end{lstlisting}
\noindent
What happens? Exactly what you expect. It's just like we had \code{data} all along.
Now, for a word of caution: pickling is not perfect. Pickle files are easily corrupted on accident and on purpose. Pickling may be more secure than using flat text files, but it still can be used to run malicious code. It's also incompatible across versions of Python, so don't expect to distribute pickled objects and expect people to be able to open them. However, it can also be a powerful tool for caching and other common serialization tasks.
\subsection{Pickling your own Objects}
Pickling isn't just for built-in types. It's for any class that follows the pickle protocol. The pickle protocol has four optional methods for Python objects to customize how they act (it's a bit different for C extensions, but that's not in our scope):
\begin{description}
\item[\code{__getinitargs__(self)}]
If you'd like for \code{__init__} to be called when your class is unpickled, you can define \code{__getinitargs__}, which should return a tuple of the arguments that you'd like to be passed to \code{__init__}. Note that this method will only work for old-style classes.
\item[\code{__getnewargs__(self)}]
For new-style classes, you can influence what arguments get passed to \code{__new__} upon unpickling. This method should also return a tuple of arguments that will then be passed to \code{__new__}.
\item[\code{__getstate__(self)}]
Instead of the object's \code{__dict__} attribute being stored, you can return a custom state to be stored when the object is pickled. That state will be used by \code{__setstate__} when the object is unpickled.
\item[\code{__setstate__(self, state)}]
When the object is unpickled, if \code{__setstate__} is defined the object's state will be passed to it instead of directly applied to the object's \code{__dict__}. This goes hand in hand with \code{__getstate__}: when both are defined, you can represent the object's pickled state however you want with whatever you want.
\item[\code{__reduce__(self)}]
When defining extension types (i.e., types implemented using Python's C API), you have to tell Python how to pickle them if you want them to pickle them. \code{__reduce__()} is called when an object defining it is pickled. It can either return a string representing a global name that Python will look up and pickle, or a tuple. The tuple contains between 2 and 5 elements: a callable object that is called to recreate the object, a tuple of arguments for that callable object, state to be passed to \code{__setstate__} (optional), an iterator yielding list items to be pickled (optional), and an iterator yielding dictionary items to be pickled (optional).
\item[\code{__reduce_ex__(self)}]
\code{__reduce_ex__} exists for compatibility. If it is defined, \code{__reduce_ex__} will be called over \code{__reduce__} on pickling. \code{__reduce__} can be defined as well for older versions of the pickling API that did not support \code{__reduce_ex__}.
\end{description}
\subsection{An Example}
Our example is a \code{Slate}, which remembers what its values have been and when those values were written to it. However, this particular slate goes blank each time it is pickled: the current value will not be saved.
\lstinputlisting{listings/slate.py}
\section{Conclusion}
The goal of this guide is to bring something to anyone that reads it, regardless of their experience with Python or object-oriented programming. If you're just getting started with Python, you've gained valuable knowledge of the basics of writing feature-rich, elegant, and easy-to-use classes. If you're an intermediate Python programmer, you've probably picked up some slick new concepts and strategies and some good ways to reduce the amount of code written by you and clients. If you're an expert Pythonista, you've been refreshed on some of the stuff you might have forgotten about and maybe picked up a few new tricks along the way. Whatever your experience level, I hope that this trip through Python's special methods has been truly magical (I couldn't resist the final pun).
% Begin the appendix
\newpage
\section{Appendix 1: How to Call Magic Methods}
Some of the magic methods in Python directly map to built-in functions; in this case, how to invoke them is fairly obvious. However, in other cases, the invocation is far less obvious. This appendix is devoted to exposing non-obvious syntax that leads to magic methods getting called.
\begin{center}
\begin{tabular}{| p{5cm} | p{5cm} | p{5cm} |}
\hline
$Magic\ Method$ & $When\ it\ gets\ invoked$ & $Explanation$\\
\hline
\code{__new__(cls [,...])} & \code{instance = MyClass(arg1, arg2)} & \code{__new__} is called on instance creation\\
\hline
\code{__init__(self [,...])} & \code{instance = MyClass(arg1, arg2)} & \code{__init__} is called on instance creation\\
\hline
\code{__cmp__(self, other)} & \code{self == other}, \code{self > other}, etc. & Called for any comparison\\
\hline
\code{__pos__(self)} & \code{+self} & Unary plus sign\\
\hline
\code{__neg__(self)} & \code{-self} & Unary minus sign\\
\hline
\code{__invert__(self)} & \code{~self} & Bitwise inversion\\
\hline
\code{__index__(self)} & \code{x[self]} & Conversion when object is used as index\\
\hline
\code{__nonzero__(self)} & \code{bool(self)} & Boolean value of the object\\
\hline
\code{__getattr__(self, name)} & \code{self.name \# name doesn't exist} & Accessing nonexistent attribute\\
\hline
\code{__setattr__(self, name, val)} & \code{self.name = val} & Assigning to an attribute\\
\hline
\code{__delattr__(self, name)} & \code{del self.name} & Deleting an attribute\\
\hline
\code{__getattribute__(self, name)} & \code{self.name} & Accessing any attribute\\
\hline
\code{__getitem__(self, key)} & \code{self[key]} & Accessing an item using an index\\
\hline
\code{__setitem__(self, key, val)} & \code{self[key] = val} & Assigning to an item using an index\\
\hline
\code{__delitem__(self, key)} & \code{del self[key]} & Deleting an item using an index\\
\hline
\code{__iter__(self)} & \code{for x in self} & Iteration\\
\hline
\code{__contains__(self, value)} & \code{value in self}, \code{value not in self} & Membership tests using \code{in}\\
\hline
\code{__call__(self [,...])} & \code{self(args)} & "Calling" an instance\\
\hline
\code{__enter__(self)} & \code{with self as x:} & \code{with} statement context managers\\
\hline
\code{__exit__(self, exc, val, trace)} & \code{with self as x:} & \code{with} statement context managers\\
\hline
\code{__getstate__(self)} & \code{pickle.dump(pkl_file, self)} & Pickling\\
\hline
\code{__setstate__(self)} & \code{data = pickle.load(pkl_file)} & Pickling\\
\hline
\end{tabular}
\end{center}
\section{Appendix 2: Changes in Python 3}
Here, we document a few major places where Python 3 differs from 2.x in terms of its object model:
\begin{itemize}
\item Since the distinction between string and unicode has been done away with in Python 3, \code{__unicode__} is gone and \code{__bytes__} (which behaves similarly to \code{__str__} and \code{__unicode__} in 2.7) exists for a new built-in for constructing byte arrays.
\item Since division defaults to true division in Python 3, \code{__div__} is gone in Python 3
\item \code{__coerce__} is gone due to redundancy with other magic methods and confusing behavior
\item \code{__cmp__} is gone due to redundancy with other magic methods
\item \code{__nonzero__} has been renamed to \code{__bool__}
\end{itemize}
\end{document}