AWS News Blog
Amazon DevCon – Guido van Rossum
Guido van Rossum is the creator of Python.
Guido started by saying, “This is my favorite talk for technical audience. I used to always talk about What’s new in 1.0? What’s new in 2.0? They were too much like laundry lists. I think it’s more interesting to talk in detail about one or two interesting features.”
About Guido: “I’ve been a geek all my life”. I work in a cool start up, ElementalSecurity.
Guido said Python is “a dynamically-typed, object-oriented, buzzword-loaded language”. Mostly procedural, very extensible. Used at Google, ILM, NASA, Nokia, etc.
Why use Python? Dynamic languages are more productive. Code is more readable, maintainable, and has fast, high-level data types. “Developer time is more expensive than CPU time.”
Don’t use Python (yet)… for packet filters, MP3 codecs, etc. Instead, write in C/C++ and wrap Python around it.
- no declarations
- indentation+colon for statement grouping
- doc string part of function syntax
- parallel assignment (to swap a and b: “a, b = b, a”)
- Server-side programming
- Client-side programming
- XML processing
- Databases GUI programming
- Scientific/numeric computing
- Scripting Unix and Windows
- Rapid prototyping (e.g. at Google)
- Programming education – good language for teaching
Standard Library: The stuff you’d expect plus an enormous amount of third-party functions. Everything except a few applications is open source. 80-90% of third-party add-ons are open source. License is BSD-ish.
There are about 60 developers. Everyone is basically a volunteer; although some are paid by their employers to work on Python since they use it for their jobs. The Python development team spans the globe.
Process for introducing new features: new features voted in by consensus by the developers on python-dev. It’s not a democracy. Guido is the “BDFL: Benevolent Dictator For Life.” He says this is OK, because he’s a fairly typical user, and he lets everyone give input before he makes the call.
Releases happen every 12-18 months. Minor releases are purely focused on stability and backwards compatibility. Code is compatible backward and forward between the different releases. Previous release is kept alive for most of the lifetime of the next release. Python-dev will introduce more backwards incompatibilities out of necessity at some point. But it won’t be like, “Let’s design a new language but give it the same name.”
Case Study 1: Iterators and Generators
Evolution of the for loop: Guido showed the evolution of “for” through Pascal, C, and then Python.
for <variable> in <sequence>:
Guido then described the evolution of sequences in Python:
built-in sequence types > user-defined sequence types > lazy sequences > pseudo-sequences
The Iterator protocol was introduced in release 2.2:
for <variable> in <iterable>:
The iterator supports only one method: next(). It just loops through, and there’s no index to increment. Using an iterator is actually faster (Guido recalls 40% faster) than loops with sequences. Other alternatives were more expensive too (i.e., creating a tuple). They did not introduce any backwards incompatibilities with iterators. Any sequence will continue to work.
for key in d.keys():
print key, “->”, d[key]
for key in d:
print key, “->”, d[key]
Savings: Python 2.1 copies the keys into a list; Python 2.2 doesn’t.
Downside: With Python 2.2 option, dictionary has to remain unchanged during looping.
Loop over all lines of a file in Python 2.1:
line = fp.readline()
line = fp.readline()
In Python 2.2:
for line in fp:
This is also 40% faster and looks better in Python 2.2.
Remember coroutines? Or think or a parser and a tokenizer. A parser like to sit in a loop and occasionally ask the tokenizer for the next token. The tokenizer would like to sit in a loop and occasionally give the parser the next token. How can both sides be happy?
Generators let you write both sides (consumer and producer) as a loop:
def tokenizer(): # producer (generator)
def parser(tokenStream): # consumer
token = tokenStream.next()
Generator functions are useful iterator filters. For example, A B C D goes in and A A B B C C D D goes out. Iterator algebra: all kinds of iterators can be combined.
Generators are used in the Standard Library: tokenize module (a tokenizer for Python code), difflib module (generalized diff library – uses generators to minimize memory usage during diffs), os.walk() (directory tree walker).
Generator Expressions came along in Python 2.4. These can complete list comprehension, sums, and other calculations faster without creating a list and using less memory.
Q: Can you talk about the differences between Ruby and Python?
A: Ruby seems to me like an excellent attempt at cleaning up Perl. Ruby to me has a lot of Perlesque features.
Q: In a number of scripting languages, there’s always a problem with debugging time? Any suggestions for that with Python?
A: Run your code through PyChecker http://pychecker.sourceforge.net/. It’s very good. Unit testing is the other half of the picture, but you can’t unit test everything perfectly. For example, I don’t know how to force an I/O exception in all cases.
Q: What is stopping Python from being a premiere, first-class language like C++ or Java?
A: Purely a mind set thing. It will probably take another generation of programmers. There are some cases where it runs slower. For example, it can’t count to one million as fast as Java. But it can scan a file for a particular string faster than Java.
Q: Do you ever see any languages out there with features that you want to get into Python?
A: All the time. I have to be careful, because people come to me all the time asking for new features and fixes. If the language changes too much, we’ll lose our existing community. For the last 4-5 weeks, I’ve been looking at a way to add optional type declarations. That’s a delicate subject, and there’s a lot to learn from other languages. The right solution will end up being unique to Python though. It has to feel like Python.