Guide on how to deal with large software systems:

In this piece, I am going to write a short and interesting guide on dealing with a large system. This is juxtaposed with my introduction to and experience with the OpenEthereum codebase. Before I begin, shout out to Fan Long for helping me with the theoretical design and the project implementation (more on the project after the paper is out).

I and my colleagues designed a blockchain state sharding protocol capable of handling arbitrary smart contract cross-shard transactions. Then came my turn to implement the design on top of the Ethereum protocol by modifying OpenEthereum. The language of implementation is Rust which provides innate memory security guarantees. While I was having a first-hand experience with OpenEthereum, I realized a need for a guide to deal with large software systems which do not have extensive documentation. Although most of the open source codebases do not have a guide, it is up to the individual to explore and understand the system. Thus, with this short article, I try to summarise three key practices that will help a first-time system programmer:

  1. Be absolutely sure about what is going on: although I was familiar with the theoretical idea of the Ethereum protocol, I had no idea about its implementation. As it turns out, the code is organized into 13 crates with each crate consisting of libraries and sub-modules. For a codebase of this size, you do not want to modify any line that you are not sure about. I am putting emphasis here because one module is usually linked to multiple others. Thus, it is very easy to mess up the entire system with a slight ignorant change. After looking at some online resources, all I could find was a YouTube video explaining, in brief, each module. That was, however, not very helpful. Then I realized that one should not be dependent on others to understand open source code. The whole system is available to you, so hunt down and find everything that you can. Believe me, you can do it. For example, one helpful resource that I found soon after was the Rust documentation generated by cargo. Although it was not detailed, it came in handy every time I wanted to dig into a certain module.

    When it comes to understanding object-oriented code, my advice is to use the best IDE ( integrated development environment) available to you. If you want to challenge this advice, let me know upfront and we can bet on this. I used CLION (the best one for Ubuntu) by Jetbrains. It makes it possible to perform global searches throughout the project thus accelerating the learning process. On top of this, you don't have to worry about memorizing programming semantics and in some cases, you do not even have to know the language, the IDE does it for you. All you need is to understand the underlying concepts of the language such as the memory allocation design.

  2. Do not make changes: Once you understand what is the purpose of a class or a module, try to add entities on top instead of changing the existing code, i.e. no deletions. For instance, try to extend the methods and implementations of a class instead of creating a new trait or a new struct itself. Such kind of modifications guarantees no interference with the existing code implementation. If anything goes wrong, you know the recent additions and thus, you know where to look.
  3. When in doubt, println!: languages like rust provide crates generally known as loggers that help you to conditionally print during code execution. If anything goes wrong, which it will, do not use your brain to guess and debug. Instead, add loggers in the intermediate parts of the code, run again, observe, and debug. Unless compile time is very long, follow such an empirical strategy for optimized performance; optimized because it makes debugging enjoyable and increases your sitting hours because now you do not feel frustrated.

    If you are working on optimizing the performance of the program, print intermediate clock instances throughout the program until you find the performance bottleneck. If you want to avoid lots of print messages, use binary search to find the bottleneck. This will speed up your debugging process from O(n) to O(log(n)).

Now that you have successfully added your first modification to the code, repeat the above steps from 2 (fun) to 20 (module-level changes) to 120 (project-level changes) times, depending on your goal, until you successfully finish your desired implementation.

Happy programming!