Saturday, November 03, 2007

Beautiful code

I've read "Beautiful Code", a collection of 33 essays by leading computer scientists, edited by Andy Oram and Greg Wilson. What follows is by no means a review, just my impressions.
  1. Chapter 1, A Regular Expression Matcher, by Brian Kernighan. An interesting insight into implementation of a simplified regular expression matcher. Shows off recursion-based elegant and compact C code.
  2. Chapter 2, Subversion's Delta Editor: Interface as Ontology, by Karl Fogel. Describes a piece of Subversion version control system, the Delta Editor, which is the data structure that expresses the difference between two source code trees. The data structure is really an interface (being implemented in C, a language that doesn't support interfaces natively, it's a structure of function pointers). Instead of representing the delta explicitly, it provides a standard interface for a delta producer to provide the delta data, and for the delta consumer to use it. This way it can handle deltas that can't fit in memory due to sheer size or that are not available all at once (e.g. are downloaded from network). Various advantages of this design are discussed. An interesting approach of representing data as an interface.
  3. Chapter 3, The Most Beautiful Code I Never Wrote, by Jon Bentley. Praises brevity and simplicity. Examines the code that analyses QuickSort algorithm (which itself is viewed as the most beautiful code by the author). The performance analysis code gets progressively smaller and smaller and simpler and simpler through careful mathematical analysis. An interesting example of algorithm development and optimization, though, perhaps, mostly of academic value.
  4. Chapter 4, Finding Things, by Tim Bray. Explores a few common, if somewhat disconnected, topics related to search. Interesting for a brief instroduction into Ruby language, and its comparison to Java in terms of performance (Java wins). Touches regular expressions, associative arrays, binary search and then quickly discusses web search. I find this chapter rather superficial and unfocused.
  5. Chapter 5, Correct, Beautiful, Fast (In That Order): Lessons From Designing XML Verifiers, by Elliotte Rusty Harold. Describes the author's quest to implement verification of XML names in an XML parser correctly, but without sacrificing performance. A few smart improvement eventually lead the author to a surprisingly simple and efficient solution. "If there's a moral to this story, it is this: do not let performance considerations stop you from doing what is right. You can always make the code faster with a little cleverness. You can rarely recover so easily from a bad design".
  6. Chapter 6, Framework for Integrated Test: Beauty through Fragility, by Michael Feathers. Describes the desing of Framework for Intergated Testing (FIT), an automated testing framework. Talks how this framework is different from most of other frameworks by being very open, flexible and extensible. This radical alternative to traditional framework design (with few well-defined points of extensibility) proves useful in its application domain (testing).
  7. Chapter 7, Beautiful Tests, by Alberto Savoia. Talks of the beauty of testing. Sets off by describing the infamous binary search implementation bug and continues to test the supposedly corrected implementation using JUnit testing framework. Demonstrates the concepts of smoke testing, boundary value testing, testing theories, mutation testing, performance testing. Conclusion:
    • Some tests are beautiful for their simplicity and efficiency
    • Other tests are beautiful because, in the process of writing them, they help you improve the code they are meant to test in subtle but important ways
    • Finally, some tests are beautiful for their breadth and thoroughness
  8. Chapter 8, On-the-Fly Code Generation for Image Processing. Demonstrates an unusual optimization technique, on-the-fly code generation for time-critical procedures. The trick is that some branching and calculations are performed at code-generation time (once) rather than at run time. The .NET-based example uses System.Reflection.Emit namespace classes to emit Intermediate Language instructions at run time. Interesting, if highly-specialized technique of limited general utility. Conclusion: "Now, you might regard FilterMethodIL as ugly, and I'd be willing to concede that it sure isn't the prettiest code I've ever seen. But when an algorithm clocks in at a quarter of the execution time of some earlier code, then the only word that I find appropriate is beautiful".
  9. Chapter 9, Top-Down Operator Precedence, by Douglas Crockford. The author creates a JavaScript language parser in JavaScript using "Top-Down Operator Precedence" technique. The chapter is beyond me; didn't understand much.
  10. Chapter 10, The Quest for an Accelerated Population Count, by Henry S. Warren, Jr. Describes the algorithm for "population count", i.e. counting the number of 1-bits in a byte or word. Employs interesting mathematical techniques to speed up the algorithm. The topic is a little outside of my expertise and interests.
  11. Chapter 11, Secure Communication: The Technology of Freedom, by Ashish Gulhati. Describes the evolution of Cryptonite, the author's web-based email system that emphasized PGP-based security. Interesting for describing the system's architecture and its evolution, as well as PGP overview. Ends with a philosophical pitch for secure communications as a basis for personal freedom.
  12. Chapter 12, Growing Beautiful Code in BioPerl, by Lincoln Stein. Describes the author's Perl module for graphics in DNA research. Interesting insight into framework design.
  13. Chapter 13, The Design of the Gene Sorter, by Jim Kent. Describes the Gene Sorter, a web-based gene display program (http://genome.ucsc.edu/cgi-bin/hgNear). While I have no doubt it is very useful, I don't share the author's opinion on its beauty. I don't think CGI scripts are beautiful at all, especially when written in C. Something that starts a new process on each page access and has to keep persistent state in a database doesn't apeal to me. The "cart" mechanism is a neat trick, but it just patches the inherent limitation of CGI architecture. The polymorphic column structure is also a neat trick, but why attempt object-oriented programming in C when there is C++? Text file-based configuration of columns doesn't look particularly beautiful to me either. Why filters are not SQL-based? Why read all data from the database and filter in memory when the database can return only relevant data? To summarize, I am not impressed. Tastes differ.
  14. Chapter 14, How Elegant Code Evolves With Hardware: The Case Of Gaussian Elimination, by Jack Dongarra and Piotr Luszczek. Describes the evolution of linear algebra libraries (LINPACK, LAPACK, ScaLAPACK) with the evolution of hardware. Informative how algorithms change with the hardware, but I fail to see the beauty of this arcane Fortran code.
  15. Chapter 15, The Long-Term Benefits of Beautiful Design, by Adam Kolawa. Describes the design of LAPACK linear algebra library and is related to the previous chapter. With all the due respect to this venerable library, it's hard to see the beauty in this highly specialized Fortran code.
  16. Chapter 16, The Linux Kernel Driver Model: The Benefits of Working Together, by Greg Kroah-Hartman. An interesting insight into how the Linux kernel represents hardware devices. Fine example of object-oriented programming in C. Stresses the typically Linuxish development approach "make it harder so people have to think very carefully" by explaining how no type checking is done on device structures. In author's words: "It keeps easy hacks from springing up within the kernel and forces everyone to be very exact in their logic". Also emphasizes the interative and collaborative nature of Linux development.
  17. Chapter 17, Another Level of Indirection, by Diomidis Spinellis. Explains the layering of FreeBSD I/O subsystem and how it achieves a very fine separation of concerns among the layers, with resulting flexibility and maintainability. Touches the subject of "domain-specific languages". Ends with a warning to not overuse layering approach, since "layers are for cakes, not for software".
  18. Chapter 18, Python's Dictionary Implementation: Being All Things to All People, by Andrew Kuchling. Explains the design of Dictionary type in Python. Based on general-purpose hash table, the type has several special cases for more efficient handling of common scenarios, such as small dictionaries (5 elements of fewer) and dictionaries with string-type keys. Discusses the approaches to collision handing (carefully tuned open addressing), resizing and iteration.
  19. Chapter 19, Multi-Dimensional Iterators in NumPy, by Travis E. Oliphant. Describes the design of iterators for multidimensional arrays in NumPy, a Python package. Implements clever tricks to iterate over arrays of arbitrary dimensions efficiently. Interesting, but I must admit I did not understand the feature the author calls "broadcasting".
  20. Chapter 20, A Highly Reliable Enterprise System for NASA's Mars Rover Mission, by Ronald Mak. Discusses the design of Collaborative Information Portal (CIP), a large web-based information sharing and management application used by NASA Mars Exploration Rover mission. Gives a good overview of Service-Oriented Architecture in Enterprise Java Beans based applications. Explains the details in a case study of the Streamer Service (CIP's component). Great article on enterprise application design. My favorite quote: "Martians have zero tolerance for ugly software".
  21. Chapter 21, ERP5: Designing for Maximum Adaptability, by Rogerio Atem de Carvalho and Rafael Monnerat. Describes the design of EPR5, an open source ERP application based on Zope platform. Very difficult to read, I didn't understand much.
  22. Chapter 22, A Spoonful of Sewage, by Bryan Cantrill. A very dynamic and well-narrated story of the author's hunt for an elusive bug in Solaris thread synchronization code. Explains the problem of priority inversion, its theoretical solution, and the bug in the code that was supposed to solve it. The resolution of the bug, although not utterly beautiful on the surface, is truly beautiful, the author argues, because "in the seven years ... no one has done better".
  23. Chapter 23, Distributed Programming with MapReduce, by Jeff Dean and Sanjay Ghemawat. An interesting explanation of how Google's MapReduce programming model works (break the task into subtasks, run them in parallel, combine results), mentioning some of the specific challenges it faces due to its sheer scale (e.g. failure of some computers).
  24. Chapter 24, Beautiful Concurrency, by Simon Peyton Jones. Now that the hardware evolution trend is to increase the number of CPU cores rather than clock speed, the article says, concurrency is very important. It is also hard. The article explans the numerous problems of lock-based concurrency management, concluding that "the fundamental shortcoming of lock-based programming is that locks and conditional variables do not support modular programming" (locking logic is hard to modularize and disentangle from application logic). The article then goes on to describe the concept of Software Transactional Memory (STM) and how it overcomes the shortcomings of locks. The concept is presented using Haskell programming language and a brief introduction to it is included. Interesting for the overview of concurrency problems, introduction to Haskell and STM. Whether STM can be practially implemented in more mainstream languages remains to be seen.
  25. Chapter 25, Syntactic Abstraction: The syntax-case Expander, by Kent Dybvig. Starts with a brief overview of syntactic abstraction in programming languages (ability to define custom syntax constructs, such as preprocessor macros in C). Goes on to describe a macro expansion algorithm in Scheme, which I fail to penetrate.
  26. Chapter 26, Labor-Saving Architecture: An Object-Oriented Framework for Networked Software, by William Otte and Douglas C. Schmidt. Talks about object-oriented framework design and explores the subject in detail by presenting an example of Logging Server Framework. A very useful and interesting practical case study of OO design.
  27. Chapter 27, Integrating Business Partners the RESTful Way, by Andrew Patzer. Describes an approach to Web Services that is more light-weight than the popular (if overhyped) SOAP architecture. This light-weight approach is based on exchange of XML-encoded requests and responses using simple GET or POST requests over HTTP. As the author puts it, "Although I didn't know it at the time, this architectural style is now commonly referred to as REST or Representational State Transfer". Interesting for introduction to REST. I would disagree with the author on some of the implementation choices, though, e.g. there must be a better way to determine the request type than looking for a specific long string within the request body (e.g. listing on page 454).
  28. Chapter 28, Beautiful Debugging, by Andreas Zeller. Talks about delta debugging, an approach aiming to isolate bugs automatically by isolating the source code change that caused a failure. The concept is then expanded to automatic debugging through isolation of program state change that caused a failure. Interesting, but I still have doubts if the approach can be practically useful.
  29. Chapter 29, Treating Code as an Essay, by Yukihiro Matsumoto. This is the shortest and the least practical essay in the book. In fact, it doesn't describe any real piece of code or any practical problem. Perhaps it is indicative that it is the only essay written by a Japanese author (the author of Ruby language). This concise article talks about the abstract nature of code beauty. The author defines code beauty as combination of brevity, familiarity, simplicity, flexibility and balance.
  30. Chapter 30, When a Button Is All That Connects You to the World, by Arun Mehta. Apparently, no book published in the US can be complete without touching the subject of persons with disabilities in some way. This article talks about the author's noble attempt to design a software system for professor Stephen Hawking, eminent theoretical physicist, who, due to a severe disability, can only press one button. This extreme limitation makes for tough design problems and smart solutions. This said, the code examples in Visual Basic do not seem very beautiful to me. The author seems to acknowledge this: "I might warn you that it [the code] bears some resemblance to a bowl of spaghetti".
  31. Chapter 31, Emacspeak: The Complete Audio Desktop, by TV Raman. Describes Emacspeak, a complete Emacs-based audio desktop for visually impaired users. It's interesting to rethink the UI fundamentals we take for granted. Among other things, the article is interesting for a brief introduction of Emacs Lisp "advice" facility and Aspect Oriented Programming. Another notable thing is how the choices the author made to make web content accessible "in an eyes-free environment" were later paralleled in mainstream to make the web accessible to automated processing (such as RSS feeds).
  32. Chapter 32, Code in Motion, by Laura Wingerd and Christopher Seiwald. The article talks about "Seven Pillars of Pretty Code", a set of rules for good code formatting, as practiced by the authors of Perforce software configuration management system.
  33. Chapter 33, Writing Programs for "The Book," by Brian Hayes. The chapter describes the author's quest for the perfect solution to a simple computational geometry problem: given three points in the plane, do all of the points lie along the same line? That's one of those problems that's easy to solve in a wrong or ugly way. After trying several approaches, the author finally finds the perfect algorithm. I can't help adding a personal comment: it's frustrating that many great programmers lack fundamental mathematical training. Had the author known a little more linear algebra, the solution would be obvious from the beginning.

1 comment:

Anonymous said...

Keep up the good work.