Latest update: November 28, 2008
As usual, both I and O'Reilly's production staff have worked very hard to make this book as typo-free and accurate as possible. And as usual, a set of errata have managed to find their way into the book anyhow. Although this is perhaps the most typo-free book I have written to date, errata are an unavoidable fact of life in this field.
To put a more personal spin on that, after writing 9 computer books over the last 12 years, I believe I can state with some authority that book publishing is no place for a perfectionist to be. As you'll notice in the lists below, even if an author doesn't break things, the editing and production processes almost certainly will. Roughly one third of the errata listed here were introduced by editing and production, steps that are supposed to fix problems. Moreover, changes in Python itself can invalidate some material over time. I maintain this page both to address such factors, and serve as a supplement to the book.
Below is the current and official list of corrections and notes for this edition of the book. Only the first of these sections is true errata; the second contains supplemental information for readers about both this book and changes in Python itself. Sections on this page:
Also see O'Reilly's errata page for this book, which may or may not intersect with my list here over time. Their list contains items emailed to O'Reilly, not to me. Please note: I check their page only occasionally, and some of its items may not be legitimate errata. In fact, its unconfirmed list often includes incorrect suggestions from readers that would break valid code. Be sure to look for errata here, or in O'Reilly's "confirmed" list only.
Finally, note that O'Reilly has been fixing some errata in later printings of this edition. In the corrections list below, the descriptions of items that have been patched begin with the date of the printing in which they were fixed, in parenthesis. For example, "(fixed 7/08)" means the item was fixed in the 7/08 reprint. At this writing, the next reprint is scheduled for 11/08, and should fix most of the remaining items in the corrections list here. The date of a book reprint is listed on the inside of the second page.
If you find something else that looks like a possible error, please either contact O'Reilly, or contact me. We'd like to patch these in future printings or editions, and very much welcome the wide scrutiny this book has been fortunate enough to receive.
This section lists genuine corrections for the book -- miswordings, code bugs, typos, and so on. Most of these are minor typos. As of November 2008, there are 50 errata, which comes out to about 1 per every 15 pages on average (at 746 total pages). Of these, 16 errata, roughly one third, were introduced by the production/editing process after I submitted the final draft. This list is ordered by page number. Items here:
(fixed 7/08) In the Python 3.0 changes section, this bullet item describes the demise of the `X` backquotes expression, but seems to show this using straight quotes as 'X'. This was changed this way somewhere in the production pipeline, but is a fairly minor issue, given the surrounding text. Backquotes are formatted properly later, on pages 96 and 136.
(fixed 7/08) Also in the Python 3.0 section, a comma in program code was deleted, also somewhere in the production process. The "except name value" should read as "except name, value" as described later in the book (see pages 583, 594, 612, and 614).
(fixed 7/08) The "Python mplements" should be "Python implements", of course. Alas, this was spelled correctly in the final draft I sent to O'Reilly, but was broken somewhere in the production pipeline. It originally had 2 spaces before the word, which accounts for the edit. (I make mistakes too, but I've come to believe that the number of typos in a book is directly proportional to the number of hands that touch it before it is published.)
There is a bogus "km" at the end of the output line on this page that looks like this: "1267650600228229401496703205376km". The "km" shouldn't be there, and I have no idea where it came from; it was not present in the final draft I sent to the publisher, prior to production edits. The correct output is more or less implied by the immediately following code listing which shows the same result without the "km", but this might confuse.
(fixed 7/08) The "...error text omittted..." should be: "...error text omitted...". This is in a harmless comment added for the book, not Python code or output.
The "...doesn't completely help, either because the hardware..." should be "...doesn't completely help either, because the hardware...". This is unfortunately another error introduced by the editing process; in this case, the sentence was changed from valid to grammatically incorrect. The original I submitted was worded this way:
Printing the result to produce the user-friendly display format doesn’t completely help either: the hardware related to floating-point math is inherently limited in accuracy
The second code listing on the page should have a "import decimal" line added at the beginning --the whole module has not been imported anywhere above. It should read:
>>> import decimal
>>> decimal.Decimal(1) / decimal.Decimal(7)
Decimal("0.1428571428571428571428571429")
The footnote at the bottom of this page should be deleted altogether. It reflects an arguably horrible change made by an editor. Apparently, the editor reworded the first paragraph of this section slightly to change the segue to the next paragraph, but cut and paste the original wording, in its entirety, into the footnote. Hence the redundancy!
(fixed 7/08) There is a space missing between "of" and "small" in the line: "...mutable sequence ofsmall integers...". This note was text I asked to have inserted late in the production process (during QC1) to address a Python 3.0 change. The space was present in the text I submitted, but in defense of production, this was a last-minute addition.
Minor typo: in the "%G" row, "f" at the end should be "F" (we lost the uppercase in Word somewhere along the way).
On page 161, near the end of Table 8-2, an entry lists D4 = dict.fromvalues(['a', 'b']) as an alternative construction technique. There is no fromvalues dictionary method. This should be dict.fromkeys(['a', 'b']), as shown later on page 170, and as can be surmised by running a dir(dict) call interactively.
A "print" must be added at the start of this line in order to see the None return values shown (None displays as nothing, unless it's run through an explicit print). The current:
>>> d2.get('toast') # A key that is missing
None
should be:
>>> print d2.get('toast') # A key that is missing
None # Must print to see None
In the text: "or a series of assignments like D = [], D['a'] = 0, D['b'] = 0", the first assignment "D = []" should be "D = {}" to initialize a dictionary, not a list.
Also, in the last sentence of this paragraph, "because all the keys are the same", should be "because all the values are the same".
(fixed 7/08) In the interaction listing here, "# This works: can chage mutables inside" should be "# This works: can change mutables inside". This is in a harmless comment inserted for the book, not Python code or output.
The "outout.flush()" here should be "output.flush()", for consistency with the rest of this table. Note that this table is abstract, not actual code, so this is a minor typo.
At the end of the fourth paragraph on this page, "intended the loop four spaces" should be "indented the loop four spaces". This was also correct in the final draft of the book I submitted, but was broken by the production process because of a radical rewording on the part of editors aimed at fixing an unrelated typo.
The original wording in the final draft I submitted was: "Ignoring that, what one would often see in C++ code that the loop began being indented 4 spaces by the first person who worked on it:". This was edited into: "Ignoring that, here's the scenario I often encountered in C++ code. The first person who worked on the code intended the loop four spaces:". This rewording fixes the missing "is" in the original and is perhaps easier to parse, but introduces a typo of its own by missing the technical importance of indentation in this discussion.
Minor typo: "if" should be "of" here (especially in a section about if statements).
Minor typo: "the" should be added here.
Minor typo: "third" should be "fourth" here (this refers to an item in Table 11-1).
Minor typo: at the middle of the clause that starts "(recall that variables...", the "initial counters" should be "initialize counters".
(fixed 7/08) In the very first sentence on this page, the "rwords" should be "words". This was correct in the final draft of the book I submitted, but was broken by the production process as well (all the more frustrating, given that that process is supposed to fix spelling errors, not introduce them after the material leaves the author's hands).
(fixed 7/08) In the text here, "sys.sytdout" should be "sys.stdout". This is a minor typo, and is implied by the correct spelling in the immediately preceding code listing.
Minor typo: the "in the context an iteration tool" here should be "in the context of an iteration tool".
Minor typo: the user inputs in the 2nd last line here should be bold.
This appears to be version skew; Python 2.4's "open" doscstring is what is shown in the book. It was changed in Python 2.5, where we do need to use "file" to get the output shown. Python 2.6's "open" output is similar to 2.5's, but has an extra sentence. In any case, use "file.__doc__" if you really want to match the output shown in the text in 2.5 and later.
(fixed 7/08) Sentence 2 of paragraph 4 on this page incorrectly states that the * operator, and hence the times function "will work on numbers (performing multiplication), two strings or a string and a number (performing repetition), or any other combination of objects supporting the expected interface". This is incorrect -- * works on two numbers, or a string and a number, but not on two strings. That is, string * string is not a valid operation. In fact, if you pass two strings into the times function, it generates an exception, the point of the next paragraph in the book.
This was also something I DID NOT SAY in the final draft I submitted to the publisher. My version mentions the numbers and string/number cases shown in the code listings, but not the obviously incorrect string/string case -- this third case was added by production. Specifically, the editing/production process broke the meaning here by adding the part "two strings or". This was despite the fact that the editors were not Python programmers, and did not ask about the accuracy of the insertion. Unfortunately, a minor insertion of text like this can have a major impact on meaning in a technical book, and editors are sometimes not as careful as they could be.
In the first code listing on this page, the 4th line is indented incorrectly. The "return action" line should be moved 4 spaces to the right, to be indented the same as the line "def action(X):". That is, the "action" function is returned by the "maker" function. As is, the code's indentation generates a syntax error. This will probably be easy for most readers to spot, especially given the description, and the other similar examples in this area of the book. It appears to have been the product of an unfortunate cut-and-paste from the IDLE GUI during the writing phase.
In the last code listing on this page, the 5th line is indented incorrectly. The "return acts" line should be moved 4 spaces to the right, to be indented the same as the line "for i in range(5):", the same as it is in the first code listing on this page. That is, the "acts" list is returned by the "makeActions" function. As is, the code's indentation generates a syntax error. This will probably be easy for most readers to spot, especially given the description, and the other similar example at the top of this page. Just like the prior errata, it appears to have been the product of an unfortunate cut-and-paste from the IDLE GUI during the writing phase.
There is a typo in this sentence that inverts its meaning: the text "our programs with making multiple copies" should be "our programs without making multiple copies". This is implied by other discussion nearby, but might be confusing.
Minor typo: in this sentence, the ", it also possible" should be ", it is also possible".
This was a change made by editors during production on this book. The original I submitted didn't contain the first clause of this sentence, and began "Is it also possible", which was also wrong (this isn't a question), but in trying to repair the mistake, the editors introduced a brand new one of their own (!).
Minor typo: in this sentence, the "defines is own X" should be "defines its own X".
This was a change made by editors during production on this book. The original I submitted was very different, but grammatically correct. In fact, both this and the prior sentences were shorter in my version:
mod2.py imports the first and uses qualification to access the imported module’s attribute: ... And mod1.py imports the second, and fetches attributes in both the first and second files: ...
The good news is that the editors expanded each of the sentences around the code snippets here to make them more descriptive. The bad news is that they introduced this errata in the process.
There is a typo in this sentence that inverts its meaning: the text "with stopping and restarting" should be "without stopping and restarting". This is implied by other discussion on this page, but might be confusing.
Minor typo: this should read "having to go through the module".
At the end of the second line in this paragraph, "2.4" should be "2.5". The absolute/relative imports model is only partially implemented in 2.5, the version this book covers. The import syntax itself is enabled in 2.5, but the new absolute-by-default search order is not until 2.7, unless a "from _future__" is used.
(fixed 7/08) In 2 spots of this section, I transposed the I1/I2 and C2/C3 variable names unintentionally. The typo should be apparent from the surrounding text, but it could also be a bit confusing given the introductory nature of this section. Specifically: in this section's very first sentence, "C2.w" should be "C3.w"; in the last sentence of this section's second paragraph, "C3.w(I1)" should be "C3.w(I2).". The more realistic "bob.giveRaise" example at the end of this section clarifies the intent here. Also see pages 467 and 484 for more complete descriptions of method call mapping.
Minor typo: this should read "which would allow you to store...".
In the answer to quiz question #2 on this page, the text states that the inheritance search for an attribute looks "first in the instance object, then in the class the instance was created from, then in all higher superclasses, progressing from the top to the bottom of the object tree, and from left to right (by default)". The "top to the bottom" part of this is incorrect; the search actually proceeds from "bottom to top" in the tree, as implied by the earlier parts of this sentence.
This was also something I DID NOT SAY in the final draft I submitted to the publisher. In my version, the last part of this sentence correctly says the search is "depth-first and left-to-right (by default)", instead of top-to-bottom, which is clearly wrong. In fact, the correct bottom-to-top order through the search tree is stated explicitly numerous times in this chapter. This was broken by an editor who expanded the wording here, which is especially unfortunate in an answer to a chapter quiz question!
The comment here gives the wrong class name as is. It should read: "# self.data differs, MixedNames.data is the same". This is in a comment, of course, so it's not quite a code bug. The point being illustrated is that the instance and class attributes differ, because they are stored on different objects.
The "top to bottom" here must be "bottom to top", as in the prior errata.
Alas, this also reflects an inaccurate embellishment added by copy editors. My original final draft version of this reads as "(...) Python searches the namespace tree at and above object, for the first attr it can find.". I don't mind adding "bottom to top" as is implied by the figure caption at the bottom of this page, but the added "top to bottom" is incorrect. In fact, this is the second place where this incorrect description was added; it also happens in the answer to quiz #2 on page 463, as noted in the prior errata on this list. I don't think the editor quite appreciated what top-to-bottom really means, or how specific it is in this context.
In the middle of the first interaction listing on this page, the "4", which is the correct output of the line ">>> X[2]", appears at the very end of this line, rather than on a line all by itself as it should. As is, the "4" output appears at the end of the prior line's comment, incorrectly. This should be apparent given the interaction that follows, but might confuse.
This is apparently a typo we inherited from the 2nd Edition, and was introduced by the production cycle during the development of that edition. On the 2nd Edition, this was correct in the version submitted to the publisher, but was broken by the time that edition was published. The typo was present in the material I was given to work on for the 3rd Edition, and was never spotted.
The output of the 1st line displays as "True" today in Python 2.5, not as "1" (this is legacy from a prior edition's Python). It should be:
>>> 'p' in X True
The user input "str(X), repr(X)" should be in bold font here, as they are elsewhere on this page.
(fixed 7/08) There is a case typo in the commented-out line here. It should read "#print C.m.X", with the uppercase "C". This is in a comment, so it's not a bug, and the code would fail if run even with the case fix. It might be confusing to some human readers, though, given that Python is case-sensitive.
Minor typo: change the ",' to "." here.
Bad table reference: this should say "Table 27-1". This was a last-minute insert added for a 3.0 change, which I have no original record of (and no clue as to where this went bad).
The "Mixing from..." here should be "Mixing finally..." (as is, this confuses exceptions with imports).
This entire 4-line statement block here, from "try:" through "print 'caught'" should be in bold font, because it is both code and user input.
The last line in the code sample has a typo. Instead of: "print 'Got', sys.exc_info(){0], ...", it should say: "print 'Got', sys.exc_info()[0], ...".
This is not an error or code bug, but the indentation in the first code section at the top of this page looks odd. It was munged by production, when the code was shifted to the right; it became too long to avoid line breaks, even though the original was within line-length limits. The code will run as is, but it looks choppy.
Specifically, 2 lines should be adjusted to make this easier to read. The "relief=RAISED)" line should be moved 10 spaces to the right, to line up vertically with "font" on the prior line. The "fill=X)" line should also be moved 7 space to the right, to line up with "win" on the prior line. There are more readable ways to code this, but these changes will help fix the choppiness.
At the top of this page, in the bold font comment, "client-side script" should say "server-side script". A minor typo, because this is a comment about an example that is not described at all, but included just as suggested reading by the final exercise.
In brief, CGI scripts run on a web server, to process form inputs and produce an HTML reply page, normally. They are the most basic way to add interaction to a web site, and larger web frameworks build upon their basic model. Client-side scripts are very different; they typically run in the context of a web browser using protocols like AJAX or applets to add interaction. The term client-side, though, can also refer to any non-web scripts that talk to servers (e.g., email and ftp scripts). To learn more about CGI server-side scripts, and network based scripting in general, see the book Programming Python or other resources.
This section collects notes about the text, as well as changes in Python, designed to augment or clarify the published material. These are not errata, and require no patches; they are just supplements for readers of this book. This list is ordered by date of addition, not page number. Items here:
This page in the book describes the upcoming set literal and set comprehension syntax to be added in Python 3.0: the new literal syntax {1, 3, 2} is equivalent to the current set([1, 3, 2]), and the new set comprehension syntax {f(x) for x in S if P(x)} is like the current generator expression set(f(x) for x in S if P(x)).
In addition, although not mentioned in the text, Python 3.0 will now also have a dictionary comprehension syntax: {key:val for (key, val) in zip(keys, vals)}, works like the current dict(zip(keys, vals)), and {x:x for x in items} is like the current dict((x, x) for x in items). Here's a summary of all the comprehension alternatives in 3.0 (the last 2 are new):
>>> [x*x for x in range(10)] # list comprehension: builds list ([x, y] is a list)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> (x*x for x in range(10)) # generator expression: produces items (parens often optional)
<generator object at 0x009E7328>
>>> {x*x for x in range(10)} # set comprehension, new in 3.0 ({x, y} is a set)
{0, 1, 4, 81, 64, 9, 16, 49, 25, 36}
>>> {x:x*x for x in range(10)} # dictionary comprehension, new in 3.0 ({x:x, y:y} is a dict)
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
The 5th bullet up from the end of page xxxviii should probably mention that "print" might no longer be a reserved word in 3.0, because the new print built-in function is replacing the current statement. See the next item for more on the new print function; 3.0 changes are still somewhat uncertain.
The full list of anticipated 3.0 reserved word changes:
Note: in Python 2.6, "with" and "as" have already become reserved words, because the context manager statement has been officially enabled. This is suggested in the table on page 226. See pages 596-600 for a full discussion of this feature.
On pages xxxviii and 234, I mention that the print statement is to become a function call in 3.0, to support more features. To help describe how it will work, the following is a function that emulates much of the working of the 3.0 print function. Also note that "print" might no longer be a reserved word in 3.0; it is today, though, which is why we can't call this function "print":
"emulate most of the 3.0 print function"
import sys
def print30(*args, **kargs):
sep = kargs.get('sep', ' ') # keyword arg defaults
end = kargs.get('end', '\n')
file = kargs.get('file', sys.stdout)
output = ''
for arg in args:
output += (str(arg) + sep)
file.write(output + end)
if __name__ == '__main__':
print30(1, 2, 3) # "1 2 3\n"
print30(1, 2, 3, sep='') # "123\n"
print30(4, 5, 6, sep='', end='') # "456"
print30() # "\n"
print30(1, 2, 3, sep='??', file=sys.stderr) # "1??2??3??\n"
On this page, the iteration alternatives timing script uses time.time to compute elapsed time. This works fine for this example, but on Windows, using time.clock instead may give better timer precision than time.time (time.clock is microsecond granularity). On Linux, however, time.time is the preferred alternative. See the library manuals for more details, or use the suggested timeit module to finesse such details altogether.
On page 139, there is a loop that uses the ord(char) function to convert a binary digit string to integer. It works as shown and serves to demo ord, but note that the same effect can be had in Python 2.5 today by calling the built-in function int(binstr, 2), giving an explicit base of 2. See page 105 for other examples of using a base with this built-in function.
That is the only built-in support for binary representation in Python 2.5, though. Perhaps a more interesting exercise is to also convert the other way, from integer to binary digit string. As mentioned at the top of page xxxviii, Python 3.0 will support this with a new built-in function bin(int), and will also have a new 0b1010 binary literal syntax for integers (octal literals also become 0oNNN, not 0NNN, and hex remains 0xNNN). However, to-binary conversion can be emulated in code today, by using the bitwise operations described on page 103 to extract bits.
The following module, binary.py, shows one way to code this (albeit with possible platform size and endianness dependencies that could be tailored by checking sys.byteorder and sys.maxint):
def fromBinary(B):
"""
convert binary string B to its decimal integer value;
this can also be done today by the built-in: int(B, 2);
caveat: this doesn't do any error checking in the string
"""
I = 0
while B:
I = I * 2 + (ord(B[0]) - ord('0')) # or: I << 2
B = B[1:]
return I
def toBinary(I):
"""
convert 32-bit integer I to a binary digits string;
there is no built-in for this today, but 3.0 will have
a new bin(I) call; 3.0 also supports binary literals:
0b1010 will equal 10, and bin(10) will return "0b1010";
caveat: this depends on integer size and bit endian-ness
"""
B = ''
while I:
low = I & 0x00000001 # extract low bit
I = (I >> 1) & 0x7FFFFFFF # shift off low bit, 0 high
B = chr(ord('0') + low) + B # or: '1' if low else '0'
return B
if __name__ == '__main__':
# self-test code
for binstr in ('1101', '01111111', '10000000', '10000001', '0', '111'):
print fromBinary(binstr)
for intobj in (13, 127, 128, 129, 1, -1, -13):
print toBinary(intobj)
A reader filed an errata report with O'Reilly as serious, stating that the "D['quantity'] += 1" on the 3rd line from the bottom of this page should be "D['quantity'] + 1". This is not correct -- it must be "+=" here, not "+". The suggested change would break this example. Admittedly, the "+=" statement has not been explained yet at this point in the book, but as stated clearly on page 68, this chapter is a preview that does not explain most of its content in any sort of depth.
Please keep in mind that Chapter 4 is a preview intended to whet readers' appetites for later details, and deliberately avoids explaining much of its content. In this specific case, the "+=" means: add to the item in place, which is why it is different when printed immediately after this line. The statement shown is essentially shorthand, and equivalent to the longer: "D['quantity'] = D['quantity'] + 1", as covered in detail later in the book on pages 223-225.
To clarify, we might add a sentence at the very end of this page which reads: "Python's X += Y in-place addition statement used here is shorthand for X = X + Y." This seems spurious, though; if we tried to explain every unexplained item in this chapter, we'd wind up repeating the rest of the book!
Another reader wrote to ask about this. The choice is a bit ambiguous and subjective. Really, sets are both numeric in nature, and collections, so they could arguably be placed in either category. In this figure's tree, an item can't be in two categories at once, so the number choice is as good as the other. In fact, sets are covered in the Numbers chapter of this book for this reason, not in the collections parts. Sets will become more collection-like in 3.0 (with comprehensions and such), but they still have a dual-mode nature. To most people, though, set intersection, union, and difference, have a strong mathematical basis.
A reader wrote to suggest that the first line of the example at the end of this page should read "def knights(name)", and the fourth line should read "return action(name)". This is incorrect, but underscores the subtleties of nested function scopes in general. Here is the example code, and a few additional words of explanation.
>>> def knights1():
title = 'Sir'
action = (lambda x: title + ' ' + x)
return action
>>> act = knights1()
>>> act('robin')
'Sir robin'
This example is similar to that on page 324. Here, the point is to use enclosing scope references to remember the current value of "title", for use when the function assigned to "action" is later called. "action" is not called by "knights", but is created and returned by it. Notice that "knights" is called with no arguments at the top of the next page; the function it creates and returns is assigned to "act". When "act" is finally called, string "robin" matches the lambda's "x" argument, but the value of "title" was remembered by the function object created during the "knights" call. Hence, "Sir" is tacked onto the front of the string returned by the lambda function.
If we make the changes suggested by the reader, we would need to pass an argument to "knights", and the "Sir robin" string would be passed back from the "knights" call, not the "act" call. And that's the larger point of the example: enclosing scope references are retained by nested functions, even after the call to the enclosing function has returned. See page 322 for a similar, and more deeply explained, example.
Confusing, perhaps, but that's what enclosing scope references are mostly for -- state retention from enclosing scopes. Lambdas in general are largely used for deferring execution of code, and for retaining state to be used in a later call. If the lambda in this example makes it more confusing than it need be, you can always achieve the same behavior with a nested def instead:
>>> def knights2():
title = 'Miss'
def action(x):
return title + ' ' + x
return action
>>> act = knights2()
>>> act('demeanor')
'Miss demeanor'
In the "Why You Will Care: File Scanners" box, second to last code example, xreadlines() is presented very briefly as an option. One reader wrote to point out to me that, according to the Python Library Reference, xreadlines() is "...Deprecated since release 2.3. Use "for line in file" instead", stating that it shouldn't be mentioned at all.
I know about the deprecation, of course, but I disagree with the argument. It doesn't matter that the library manual labels this deprecated; it is still used in much 2.X code that people have to use and maintain today. As of 2.6, in fact, xreadlines() is still available, and does not issue a deprecation warning. It is still part of standard Python.
Not to pick on this particular reader, but I get quite a few comments about omitting things like this, and they seem, frankly, a bit controlling. My job as author is to teach what people need, not what I personally believe they should do. Best practice does matter, of course, but this is one of many cases where common practice is just as important. For example, the fact that the file iterator version is preferred today is stated clearly in the immediately following section of this chapter. It's one thing to suggest better alternatives, though, and quite another to try and blot out history altogether. This is especially true when people still need to deal with that history today.
xreadlines() will certainly go away in a future 3.0 edition of this book, but because it is still present in existing code, it merits at least a sentence fragment in the current one. Ditto for xrange(), the memory-efficient alternative to range() until 3.0. For now, the very brief mention they get is justified.
And if you still aren't buying this, it might help you to know that I still occasionally teach Python to people who, for various sad reasons, are still compelled to use Python 1.5.2, where xreadlines() was a Good Thing. Although shiny new releases are always more fun to focus on, these people are Python programmers too.
A reader wrote to suggest that a "self.writer.close()" call should be inserted after the loop in the Processor.process method on page 525, in order to properly close the output stream file object, and make the interaction at the top of page 526 work. The examples on page 526 do work as shown in the book without this change, but this raises some important points about files.
First of all, there are some subtle issues in this code, which, as the reader found, make explicit close calls tricky. Adding the close call won't work as is for the HTMLize class (you would need to add a close method to that class that does nothing but pass), and probably isn't what you want when sys.stdout is the writer on page 525 (you won't be able to print anymore).
More importantly, close calls are not generally required -- Python file objects automatically close themselves when garbage collected, and flush their output buffers in the process. Hence, an output file should be automatically flushed and finalized after the last reference to the file object is lost. This always happens when you exit Python or a Python script, and should happen if you don't save a file by assigning it to a variable.
The only place where you might notice exceptions to this rule is when working interactively -- to support debugging, some shells like IDLE may hold onto file objects longer than expected, thus preventing garbage collection and auto-close. This doesn't happen in the interactive session shown in the book. If that is an issue for you when testing interactively, though, try assigning the output file to a variable, and run the file close method through that variable after the process call returns:
>>> temp = open('spamup.txt', 'w')
>>> prog = converters.Uppercase(open('spam.txt'), temp)
>>> prog.process()
>>> temp.close()
It's probably better to handle this in your interactive session this way, rather than in the class itself, since this is only an issue in certain interactive shells. It's not an issue for Python itself.
Someone wrote to say that the first code snippet in sidebar "Why You Will Care: Callbacks" did not work when they typed it on their machine. This code is not intended to be a complete working program; the "...use message..." in the other listing in this sidebar attempts to imply as much.
To make the snippet actually work, though, you also need to import Tkinter, pack the button to arrange it with the geometry manager, and kick off the GUI event loop (unless your IDE is already running one). Here's the complete version:
import sys
from Tkinter import Button, mainloop
x = Button(
text ='Press me',
command=(lambda:sys.stdout.write('Spam\n')))
x.pack()
mainloop()
This still won't work if you're on a machine without Tk GUI support installed (it should be on Mac, Windows, and most Linux). If you're really interested in Tkinter GUIs, though, that is largely the realm of the larger book Programming Python; it's fun stuff, but there's more to it then Learning Python can or should get into.
The real point of the sidebar was that lambdas defer execution; without the lambda in this example, the code would write to stdout when the Button is being created, instead of when it is later pressed. Lambdas also serve to save state information for later use; the lambda in this example defers the write call, but also effectively "remembers" both the function to be called, and the text to be printed.
A reader asked if I could provide the second part of the solution to #4 on page 379 -- the part that asks you to generalize adder for any number of keyword arguments. This isn't given in its entirety in the solutions appendix. It's straightforward to iterate over dictionary keys, but more difficult to get the values to sum or concatenate.
Complete and alternative solutions for the "**" keywords problem are given in the script below. This is actually a fairly difficult problem due to the nested indexing requirement, unless you "cheat" by converting the dictionary to a list of values and fall back on the positional version. Run this on your machine to see its output.
# expanded solutions to Part IV #4
def adder1(*args): # sum any mumber positional args
tot = args[0]
for arg in args[1:]:
tot += arg
return tot
def adder2(**args): # sum any number keyword args
tot = args[args.keys()[0]]
for key in args.keys()[1:]:
tot += args[key]
return tot
def adder3(**args): # same but convert to list of values
args = args.values()
tot = args[0]
for arg in args[1:]:
tot += arg
return tot
def adder4(**args): # same, but reuse positional version
return adder1(*args.values())
if __name__ == '__main__':
print adder1(1, 2, 3), adder1('aa', 'bb', 'cc')
print adder2(a=1, b=2, c=3), adder2(a='aa', b='bb', c='cc')
print adder3(a=1, b=2, c=3), adder3(a='aa', b='bb', c='cc')
print adder4(a=1, b=2, c=3), adder4(a='aa', b='bb', c='cc')
The book discusses function decorators, available in Python 2.5, a way to add automatically-invoked logic to a function or method call. Pyton 2.6 and later extend this concept to add class decorators, a way to augment or manage instances when they are created.
This book discusses function decorators on pages 556-558, as a way to wrap up a specific function or method calls with extra logic that generically augments the call in some fashion. For example, the decorator may add logic that adds call tracing, performs argument validity testing during debugging, times calls made to function, and so on. To get started, here's a function decorator example taken from the book:
class tracer:
def __init__(self, func):
self.calls = 0
self.func = func
def __call__(self, *args):
self.calls += 1
print 'call %s to %s' % (self.calls, self.func.__name__)
self.func(*args)
@tracer
def spam(a, b, c): # Wrap spam in a decorator object
print a, b, c # same as: spam = tracer(spam)
>>> spam(1, 2, 3) # Really calls the tracer wrapper object
call 1 to spam
1 2 3
>>> spam('a', 'b', 'c') # Invokes __call__ in class
call 2 to spam
a b c
In this example, the tracer class saves away the decorated function, and intercepts later calls to it, in order to add a layer of trace logic that counts and prints each call. For function calls, this can be more convenient than modifying each call to account for the extra logic level, and avoids accidentally calling the original function directly. A non-decorator equivalent, such as the following, can be used on any function and without the special "@" syntax, but requires special syntax when the function is called, and does not ensure that the extra layer will be invoked for normal calls.
calls = 0
def tracer(func, *args):
global calls
calls += 1
print 'call %s to %s' % (calls, func.__name__)
func(*args)
def spam(a, b, c):
print a, b, c
>>> spam(1, 2, 3) # normal non-traced call: accidental?
1 2 3
>>> tracer(spam, 1, 2, 3) # special traced call without decorators
call 1 to spam
1 2 3
To sample the full flavor of what function decorators are capable of, here is another example that times calls made to a decorated function, both for one call, and the total time among all calls. The decorator is applied to two functions, in order to compare the time requirements of list comprehensions and the map built-in call (see pages 366-369 in the book for another non-decorator example that times iteration alternatives like these):
class timer:
def __init__(self, func):
self.func = func
self.time = 0
def __call__(self, *args):
start = time.clock()
result = self.func(*args)
elapsed = time.clock() - start
self.time += elapsed
print elapsed, self.time
return result
@timer
def builder_listcomp(N):
return [x * 2 for x in range(N)]
@timer
def builder_mapcall(N):
return map((lambda x: x * 2), range(N))
>>> builder_listcomp(5)
2.24190504809e-05 2.24190504809e-05
[0, 2, 4, 6, 8]
>>> x = builder_listcomp(50000) # time for this call, all calls
0.0167198624406 0.0167422814911
>>> x = builder_listcomp(500000)
0.164169258942 0.180911540433
>>> builder_listcomp.time # total time for all calls
0.18091154043321467
>>> builder_mapcall(5)
3.21269881738e-05 3.21269881738e-05
[0, 2, 4, 6, 8]
>>> x = builder_mapcall(50000)
0.034654753607 0.0346868805951
>>> x = builder_mapcall(500000)
0.233493223301 0.268180103896
>>> builder_mapcall.time
0.26818010389587243
In this case, a non-decorator approach would allow the subject functions to be used with or without timing, but it would also complicate the call signature when timing is desired (we'd add code at the call instead of at the def), and there would be no direct way to guarantee that all builder calls in a program are routed through timer logic, short of finding and potentially changing them all.
Python 2.6 and 3.0 extend decorators to work on classes too. As described in the rest of this section, the concept is similar, but class decorators augment instance creation calls with extra logic, instead of a particular function or method. Also like function decorators, class decorators are really just optional syntactic sugar, though some believe that they can make a programmer's intent more obvious and minimize erroneous calls.
Class decorators' semantics and syntax are very similar to function decorators. Rather than wrapping individual functions or methods, though, class decorators are a way to wrap up instance construction calls, with extra logic that manages or augments instances created. In the following, assuming that "foo" is a 1-argument function that returns a callable, the Python 2.6 class decorator syntax:
@foo
class A:
pass
is now equivalent to the following -- the class is automatically
passed to the decorator function, and the decorator's result is
assigned back to the class name:
class A:
pass
A = foo(A)
The net effect is that calling the class name later to create an instance winds up triggering the callable returned by the decorator function, instead of calling the original class itself.
Just as for functions, multiple class decorators result in multiple nested function calls, and hence multiple levels of wrapper logic around instance creation calls; the following are equivalent too:
@foo
@bar
class A:
pass
# same as...
class A:
pass
A = foo(bar(A))
Here's a larger example, run under 2.6, to demonstrate -- the classic singleton coding pattern, where at most one instance of a class ever exists; "singleton" defines and returns a function for managing instances, and the "@" syntax automatically wraps up the class in this function:
def singleton(aClass):
instances = {}
def getInstance(*args):
if aClass not in instances:
instances[aClass] = aClass(*args)
return instances[aClass]
return getInstance
@singleton
class Person:
def __init__(self, name, hours, rate):
self.name = name
self.hours = hours
self.rate = rate
def pay(self):
return self.hours * self.rate
Now, when the Person class is later used, the wrapping logic layer provided
by the decorator routes instance construction calls to "getInstance', which
manages and shares a single instance, regardless of how many construction calls
are made:
>>> bob = Person('Bob', 40, 10)
>>> bob.name
'Bob'
>>> bob.pay()
400
>>> sue = Person('Sue', 50, 20) # same, single object
>>> sue.name
'Bob'
>>> sue.pay()
400
Let's look at larger use-case example. On pages 527-528, the __getattr__ method is shown as a way to wrap up entire object interfaces of embedded instances. Here's the book's original example for reference, working on a built-in list object:
class wrapper:
def __init__(self, object):
self.wrapped = object # Save object
def __getattr__(self, attrname):
print 'Trace:', attrname # Trace fetch
return getattr(self.wrapped, attrname) # Delegate fetch
>>> x = wrapper([1,2,3]) # Wrap a list
>>> x.append(4) # Delegate to list method
Trace: append
>>> x.wrapped # Print my member
[1, 2, 3, 4]
In this code, the wrapper class intercepts access to any of the wrapped object's attributes, prints a message, and uses getattr to pass off the access to the wrapped object. This differs from function decorators, which wrap up just one specific method. In some sense, class decorators provide an alternative way to code the __getattr__ technique to wrap an entire interface. In 2.6, for example, the class example above can be coded as a class decorator that triggers wrapped instance creation, instead of passing an instance into the wrapper's constructor:
def Tracer(aClass): # on @ decoration
class Wrapper:
def __init__(self, *args): # on instance creation
self.wrapped = aClass(*args) # use enclosing scope name
def __getattr__(self, attrname):
print 'Trace:', attrname
return getattr(self.wrapped, attrname)
return Wrapper
@Tracer
class Spam: # like: Spam = Tracer(Spam)
def display(self):
print 'Spam!' * 8
@Tracer
class Person:
def __init__(self, name, hours, rate):
self.name = name
self.hours = hours
self.rate = rate
def pay(self):
return self.hours * self.rate
food = Spam() # triggers Wrapper()
food.display() # triggers __getattr__
bob = Person('Bob', 40, 50)
print bob.name
print bob.pay()
Here is the output produced: attribute fetches on instances of
both the Spam and Person classes invoke the __getattr__ logic
in the Wrapper class, because "food" and "bob" are really
instances of Wrapper, thanks to the decorator's redirection of
instance creation calls:
Trace: display Spam!Spam!Spam!Spam!Spam!Spam!Spam!Spam! Trace: name Bob Trace: pay 2000
Notice that the preceding applies decoration to a user-defined class. Just like the book's original example, we can also use the decorator to wrap up a built-in type such as a list, as long as we subclass so as to allow decoration of instance creation. In the following, "x" is really a Wrapper again due to the indirection of decoration; notice how directly printing x invokes Wrapper's __getattr__, which in turn dispatches to the __repr__ of the built-in list superclass of the embedded instance:
@Tracer class MyList(list): pass # triggers Tracer() >>> x = MyList([1, 2, 3]) # triggers Wrapper() >>> x.append(4) # triggers __getattr__, append Trace: append >>> x.wrapped [1, 2, 3, 4] >>> x # triggers __getattr__, __repr__ Trace: __repr__ [1, 2, 3, 4]
Interestingly, the decorator function in this example can also be coded as a class instead of a function, with the proper operator overloading protocol. The following alternative works the same, because its __init__ is triggered when the "@" decorator is applied to the class, and its __call__ is triggered when a subject class instance is created. Our objects are really instances of Tracer this time, and we essentially just trade an enclosing scope reference for an instance attribute here:
class Tracer:
def __init__(self, aClass): # on @decorator
self.aClass = aClass # use instance attribute
def __call__(self, *args): # on instance creation
self.wrapped = self.aClass(*args)
return self
def __getattr__(self, attrname):
print 'Trace:', attrname
return getattr(self.wrapped, attrname)
@Tracer
class Spam: # like: Spam = Tracer(Spam)
def display(self):
print 'Spam!' * 8
...
food = Spam() # triggers __call__
food.display() # triggers __getattr__
... rest is the same ...
Of course, the preceding example ultimately still relies on __getattr__ to intercept fetches on a wrapped and embedded instance object. In fact, all we've really accomplished in either Tracer decorator version above is to move the instance creation call inside a class, instead of passing in the instance to a manager function. With the book's non-decorator version of this example, we would simply code instance creation differently:
class Spam: # non-decorator version
... # any class will do
food = wrapper(Spam()) # special creation syntax
@Tracer
class Spam: # decorator version
... # requires @ syntax at class
food = Spam() # normal creation syntax
Essentially, decorators simply shift special syntax requirements from the instance creation call, to the class statement itself. This is also true for the singleton example above -- rather than decorating, we could simply pass the class and its construction arguments into a manager function:
instances = {}
def getInstance(aClass, *args):
if aClass not in instances:
instances[aClass] = aClass(*args)
return instances[aClass]
bob = getInstance(Person, 'Bob', 40, 10) # versus: bob = Person('Bob', 40, 10)
Alternatively, we could use Python's introspection facilities to fetch the class from an already-created instance:
instances = {}
def getInstance(object):
aClass = object.__class__
if aClass not in instances:
instances[aClass] = object
return instances[aClass]
bob = getInstance(Person('Bob', 40, 10)) # versus: bob = Person('Bob', 40, 10)
Although the decorator versions are much more implicit than either of the prior two alternatives (at least until you become familiar with decorators), the decorator alternatives are also at least arguably less intrusive on the code that creates objects to be wrapped, and ensure that the wrapping layer gets invoked for normal class calls. The special "@" syntax makes the programmer's intent more clear to readers of the class, and retain the normal instance-creation coding style. This latter point may be less significant for class decorators than function decorators, because instance creation calls tend to appear less often than calls to general functions. Like function decorators, though, class decorators also remove the risk that a programmer may accidentally call an undecorated class directly, thereby missing out on the decoration logic.
On the other hand, the non-decorator alternatives for both tracing and singleton management shown here can be used with arbitrary classes, without requiring the special "@" syntax. In fact, the non-decorator options can be used on classes that may have been coded in the past, before decoration extensions were even foreseen. As one example, using the non-decorator Wrapper with a built-in list, as shown in the previous section, is more straightforward than decorating; the list must be augmented for a new decorator by artifically subclassing it.
Because they can be used with any class, the non-decorator alternatives to class decorators might be considered more general. However, they are also perhaps less obvious in intent at class statements (though more obvious at instance creation calls), and may be less forgiving to programmers who might forget to route new instances through the wrapping layer. The exact same is true of function decorators, though the tradeoff involves def statements and function calls, instead of class statements and instance creation calls. As with most coding alternatives, you should weigh such tradeoffs for yourself.
Also keep in mind that the utility of both function and class decorators can always be achieved without the "@" syntax; simply use the nested calls equivalence explicitly. In the first decorator example above, for instance, we could simply code "Person = singleton(Person)", and skip the decorator syntax altogether. Again, decorators are largely just syntactic sugar for a common coding pattern, but they can help to make the wrapping more obvious, and minimize the chance that clients of a function or class will inadvertently forget to use the wrapping logic. See Python 2.6 and 3.0 documentation for more details.
The book discusses the "%" formatting expression for strings, primarily on pages 140-143. Python 2.6 and 3.0 add a new, alternative way to format strings -- the string object's new "format" method. Depending on which source you cite, this new method is either simpler or more advanced that the traditional "%" expression. In any case, it's a reasonable alternative, which may or may not become as widespread as "%" over time.
In short, the new format() method uses the subject string as a template, takes its argument to be values to be substituted, and assumes curly-braces in the subject string designate substitution targets and name arguments to insert by position or keyword. In Python 2.6, for example:
>>> template = '{0}, {1} and {2}' # position
>>> template.format('spam', 'ham', 'eggs')
'spam, ham and eggs'
>>> template = '{motto}, {pork} and {food}' # keyword
>>> template.format(motto='spam', pork='ham', food='eggs')
'spam, ham and eggs'
>>> template = '{motto}, {0} and {food}' # both
>>> template.format('ham', motto='spam', food='eggs')
'spam, ham and eggs'
Naturally, the string can be a literal, and arbitrary object types can be substituted:
>>> '{motto}, {0} and {food}'.format(42, motto=3.14, food=[1, 2])
'3.14, 42 and [1, 2]'
Beyond this, format calls can become more complex, to support more advanced usage. For instance, format strings can name object attributes and dictionary keys:
>>> 'My {1[spam]} runs {0.platform}'.format(sys, {'spam': 'laptop'})
'My laptop runs win32'
>>> 'My {config[spam]} runs {sys.platform}'.format(sys=sys, config={'spam': 'laptop'})
'My laptop runs win32'
More specific layouts can be achieved by adding a colon after the substitution target's identification, followed by a format specifier which can name field size, justification, and a specific type code:
>>> '{0:10} = {1:10}'.format('spam:', 123.4567)
'spam: = 123.457'
>>> '{0:>10} = {1:<10}'.format('spam:', 123.4567)
' spam: = 123.457 '
>>> '{0:e}, {1:.3e}, {2:g}'.format(3.14159, 3.14159, 3.14159)
'3.141590e+00, 3.142e+00, 3.14159'
>>> '{0:X}, {1:o}, {2:b}'.format(255, 255, 255)
'FF, 377, 11111111'
At least for positional references and dictionary keys, this begins to look very much like the current "%" formatting expression, especially in advanced use with type codes and extra formatting syntax. The current expression can't handle keywords, attribute references, and boolean type codes, though:
>>> template = '%s, %s, %s'
>>> template % ('spam', 'ham', 'eggs')
'spam, ham, eggs'
>>> '%s, %s and %s' % (3.14, 42, [1, 2])
'3.14, 42 and [1, 2]'
>>> 'My %(spam)s runs %(platform)s' % {'spam': 'laptop', 'platform': sys.platform}
'My laptop runs win32'
>>> '%-10s = %10s' % ('spam', 123.4567)
'spam = 123.4567'
>>> '%10s = %-10s' % ('spam', 123.4567)
' spam = 123.4567 '
>>> '%e, %.3e, %g' % (3.14159, 3.14159, 3.14159)
'3.141590e+00, 3.142e+00, 3.14159'
>>> '%x, %o' % (255, 255)
'ff, 377'
As usual, the Python community will have to decide which technique proves itself better over time. Experiment with some of these on your own to get a feel for what is available, and be sure to see Python 2.6 and 3.0 documentation for more details.
Python 2.6 introduces a new numeric type, Fraction, which implements a rational number object. It essentially keeps both numerator and denominator explicitly, so as to avoid some of the inaccuracies and limitations of floating point math hardware.
Fraction is something of a cousin to the existing Decimal fixed-precision type described on pages 107-108, which also can be used to control numerical accuracy, by fixing decimal digits and specifying rounding or truncation policies. It's also used in similar ways -- like Decimal, this new type resides in a module; import its constructor, and pass in numerator and denominator to make one. The following interaction in Python 2.6 shows how:
>>> from fractions import Fraction >>> x = Fraction(1, 3) >>> y = Fraction(4, 6) >>> x Fraction(1, 3) >>> y Fraction(2, 3) >>> print y 2/3
Once created, Fractions can be used in mathematical expressions as usual:
>>> x + y Fraction(1, 1) >>> x - y Fraction(-1, 3) >>> x * y Fraction(2, 9)
Notice that this is different from floating-point type math, which is dependent on the underlying limitations of floating-point hardware:
>>> a = 1 / 3. >>> b = 4 / 6. >>> a 0.33333333333333331 >>> b 0.66666666666666663 >>> a + b 1.0 >>> a - b -0.33333333333333331 >>> a * b 0.22222222222222221
This is especially true for floating-point values that cannot be represented accurately given their limited number of bits; both Fraction and Decimal provide ways to get exact results:
>>> 0.1 + 0.1 + 0.1 - 0.3
5.5511151231257827e-17
>>> from fractions import Fraction
>>> Fraction(1, 10) + Fraction(1, 10) + Fraction(1, 10) - Fraction(3, 10)
Fraction(0, 1)
>>> from decimal import Decimal
>>> Decimal('0.1') + Decimal('0.1') + Decimal('0.1') - Decimal('0.3')
Decimal('0.0')
>>> Fraction(1000, 1234567890)
Fraction(100, 123456789)
>>> 1000./1234567890
8.1000000737100011e-07
To support conversions, floating-point objects also now have a method that yields their numerator and denominator ratio, and float() accepts a Fraction as an argument. Trace through the following interaction to see how this pans out:
>>> (2.5).as_integer_ratio() (5, 2) >>> f = 2.5 >>> z = Fraction(*f.as_integer_ratio()) >>> z Fraction(5, 2) >>> x Fraction(1, 3) >>> x + z Fraction(17, 6) >>> float(x) 0.33333333333333331 >>> float(z) 2.5 >>> float(x + z) 2.8333333333333335
Finally, some type-mixing is allowed in expressions, though Fraction must sometimes be manually propagated to retain accuracy. Study the following interaction to see how this works:
>>> x Fraction(1, 3) >>> x + 2 Fraction(7, 3) >>> x + 2.0 2.3333333333333335 >>> x + (1./3) 0.66666666666666663 >>> x + (4./3) 1.6666666666666665 >>> x + Fraction(*(4./3).as_integer_ratio()) Fraction(22517998136852479, 13510798882111488) >>> 22517998136852479 / 13510798882111488. 1.6666666666666667 >>> x + Fraction(4, 3) Fraction(5, 3)
For more details on the Fraction type, see Python 2.6 and 3.0 documentation.
Someone wrote to ask for clarification on the internal implementation of lists and dictionaries. The book touches very briefly on these: on page 153, it explains that lists are implemented as C arrays of pointers instead of linked lists; and on page 161, it states that dictionaries are implemented as expandable hashtables.
There's more to it than this, of course, and these are low-level internal details that most programmers don't need to care about. Further, this varies in alternate Pythons. Jython and IronPython, for instance, may use very different techniques, and even standard Python is free to change the details of how this works over time (in fact, it has). To underscore how carefully Python has been optimized, though, here are a few more words on the subject as of Python 2.6.
Basically, lists are stored as arrays of pointers to other objects, and the arrays used to implement lists are overallocated. The array's allocated block of memory includes extra space at the end to allow for future expansion. That way, most additions don't require making a new array and copying over -- most appends simply store a pointer near the end of the block, and most insertions and deletions require just a quick memory copy to shift some items within the block.
Eventually, if the list grows large enough to overflow its array's block of memory, a new, larger block is created, overallocated again; all items in the old block are copied over; and a header in the list object is set to point to the new, larger block. The list's array is also shrunk if it becomes less than half full, by copying all items to a new, smaller block. Copies can be expensive, but by including space in the arrays for future expansion, and delaying contraction until the arrays are half empty, the need to reallocate and copy over is relatively rare.
This scheme turns out to be better on space and time than a linked-list structure, at least for typical Python code. Python core developers actually did some heavy-duty analysis of how both lists and dictionaries are commonly used, in order to come up with the optimal data structures for Python. For lists, the cost of occasionally shifting and copying items in arrays is less than the memory management overheads associated with a linked-list structure, where items are kept in individual blocks.
If you want to see how this works for lists, check out the listobject.c file in Python's source code distribution (one of the advantages of open source). Although more complex in the past, today Python uses a fairly simple algorithm to compute the size of over-allocation in the blocks: (newsize >> 3) + (newsize < 9 ? 3 : 6). Per this file in Python 2.6: "This over-allocates proportional to the list size, making room for additional growth. The over-allocation is mild, but is enough to give linear-time amortized behavior over a long sequence of appends() in the presence of a poorly-performing system realloc()." In other words, they've done a great job at handling low-level details, so we don't have to.
Dictionaries use a similar expandable model, though they are hashtables, and their structure is thus a bit more complex. Essentially, Python dictionaries today use table probing instead of chains of items at hash table slots, along with hashing algorithms tailored for common Python usage patterns. According to Python 2.6's dictobject.c file: "This is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. Open addressing is preferred over chaining since the link overhead for chaining would be substantial (100% with typical malloc overhead)."
In terms of memory, dictionary tables also begin small or presized, and may grow or shrink over time; they double or quadruple in size when they become 2/3 full, and may shrink as items are removed. Also from dictobject.c: "If fill >= 2/3 size, adjust size. Normally, this doubles or quaduples the size, but it's also possible for the dict to shrink [...] Quadrupling the size improves average dictionary sparseness (reducing collisions) at the cost of some memory and iteration speed (which loops over every possible entry). It also halves the number of expensive resize operations in a growing dictionary. Very large dictionaries (over 50K items) use doubling instead. This may help applications with severe memory constraints." In other words, dictionaries are already more efficient than you or I could probably make them.
As another optimization, Python 2.6 also maintains tables of up to 80 empty lists and dictionaries, to be reused. Object deallocations add to a table if it's not full, and new object requests take from a table if it isn't empty, thereby minimizing the number of expensive memory allocation/release operations. This is similar in spirit to the caches of reused small integers and strings that Python keeps internally, except that integers and strings can be referenced many times while being retained in the cache; lists and dictionaries, being mutable, cannot be shared by multiple references, and so must be removed from the table when in use. That is, a given integer or string in the cache can be reused any number of times, whereas the list and dictionary tables are just temporary staging areas for recently-freed and soon-to-be-reallocated objects. (Python's string and number reuse caches are described in the book, first on page 114, and again on pages 119-121).
See these Python source files for more details. Again, keep in mind that this is all prone to change over time, and specific to Python implementations. In general, Python programmers aren't supposed to have to care about the underlying C implementation details, though they can help you understand performance implications, and are instructive to study in general.
[November 2008] I've begun receiving emails from readers wondering if a Python 3.0 version of this book is in the works. While we will publish a 3.0 edition eventually, it won't happen in the near term, because virtually every Python programmer will be using the 2.X line for some time to come. The 3.0 user base is almost non-existent today, and isn't expected to become dominant for at least 1-2 years. Although a 3.0 book might appeal to some early adopters, it would alienate the vast majority of people using Python today.
As a comparison (and to help you guess when a 3.0 edition might happen), I don't expect any of the students in my live classes to have to know 3.0 specifically for at least one year. They have 2.X dependencies in their work that preclude 3.0 adoption today. Because of that, for all of 2009, I will be teaching with Python 2.6 in classes, and pointing out upcoming 3.0 changes along the way.
This is essentially what the current edition of this book does too. The 3rd Edition of this book is based upon Python 2.5, with discussion of upcoming Python 3.0 features in notes and a Preface section. Because of this approach, this book applies to 2.6 directly, and can be used by 3.0 adopters in conjunction with the 3.0 notes it includes. For reasons described in more detail below, I recommend readers use this book to start out with 2.X Python today, and explore differences in 3.0 later as it becomes more widely used.
Python 2.6 was released in October 2008, one year after this book was published. Python 2.6 is fully backward-compatible with 2.5, and is a continuation of the 2.X Python line which simply adds a handful of minor and optional extensions. Because of that, this book applies completely and directly to 2.6, as well as earlier 2.X versions.
For instance, 2.6 introduces the new string format method, class decorators, and fractional numbers, described earlier on this page. Other 2.6 features such as "with" context managers (now enabled in 2.6), and absolute/relative imports (still partially enabled in 2.6) are already covered in the current edition. With the exception of "with" and "as" becoming official reserved words in 2.6, these are all non-intrusive extensions.
On the other hand, Python 3.0, currently due to be released in December 2008, will not be backward compatible with the 2.X Python line. That is, most 2.X code will not run under 3.0 unchanged. Fundamental changes, such as the new print function, the stings/bytes distinction, and dictionary method changes, guarantee some code breakage. A script to be shipped with 3.0, "2to3", will automatically convert much 2.X code to run on 3.0, but this addresses existing code, not new development. Although most of the language is the same in 3.0 and many applications programmers will not notice a major difference, 3.0 does introduce new tools and techniques not fully explored in 2.X-based books.
Most observers do not expect Python 3.0 to be the most widely used version for perhaps two years or more, for a variety of reasons. Many popular 3rd-party extensions for Python are not expected to become available in 3.0 form for up to one year after 3.0 release. Moreover, most Python programmers today must use systems and code based upon Python 2.X, and so may be unable to migrate to 3.0 for years to come. In fact, because the existing code and user bases are so large today, the 2.X line will be developed and fully supported in parallel with the 3.X line for perhaps 3 to 5 more years, with 2.7 and 2.8 releases already planned.
The net effect of this dual-version strategy is that the Python user base may be split for the next few years, between the 2.X and 3.X lines. The 2.X camp will dominate in the near term, but will likely be overtaken by 3.0 over time.
This can make it difficult for newcomers to decide which version to get started with: does one jump up to 3.0 immediately, or start with the more widely-used 2.X line? Most programmers have no choice today: 2.X is required by nearly all existing Python-based systems. If you have the luxury of truly starting from scratch, though, the choice is less clear.
Because almost all programmers need to learn and use 2.X code today, unless you have more specific needs, I recommend starting out with the 2.X coverage in this book, and studying 3.0 changes slowly, using the resources in this book as well as those available on-line. The core ideas stressed in this book are the same, regardless of which version of Python you use. The differences are largely in smaller details. Most of what you learn for 2.X today will apply to Python 3.0 in the future, if and when you are able to migrate.
For more details on the 2.6/3.0 fork, see the release pages at www.python.org. Also note that Python core developers themselves suggest a similar approach: writing 2.X code now, and using the auto-conversion script to move to 3.0 when the time comes. A book is more focused on teaching than coding, of course, but the same general recommendation applies.
Back to this book's main page