Abdul Qadir
How do we easily and scalably patch 100,000s of lines of source code? Read about how we used a simple yet powerful data structure – Abstract Syntax Tree (AST) to create a system that from one single central point, maps source code dependencies and in-turn patches all dependencies.
A software system is usually built with assumptions around how dependencies such as the underlying language system, frameworks, libraries etc. are written. Changes in these dependencies may have a ripple effect into the software system itself. For example, recently, the famous Python package pandas released its 1.0.0 version, which has deprecated and changed several functionalities that existed in its previous 0.25.x version. An organization may have many systems using 0.25.x version of pandas. Hence, upgrading it to 1.0.0 will require developers of every system to go through the pandas change documentation and patch their code accordingly.
Almost every language has a way to generate AST from its code. We use Python to build several critical parts of our systems. Hence, this article uses Python to give examples and highlights, but the learnings from here can be applied to any other language.
Output:
Output (prettified):
Looking at the ast.dump output, we can see that the head object which is of type Module has an attribute body whose value is a list of 2 nodes – one representing var = 1 and the other representing print(var). The first node representing var = 1 has a target attribute representing the LHS var and a value attribute representing the RHS 1. Let’s see if we can print the RHS.
Output:
Code:
Output (prettified):
Output:
Now that we understand ASTs and how to generate them, inspect them, modify them and re-create code from them, let’s go back to the problem of writing patch scripts to modify the code of a system to use pandas 1.0.0 instead of pandas 0.25.x. We call these AST based patch scripts as “IntelliPatch”.
Code using pandas 0.25.x:
Output:
The IntelliPatch needs to do the following:
Below is the IntelliPatch script that does that.
intelli_patch.py
Usage Example 1:
Output:
Usage Example 2:
Output:
One can extend the patch script to take care of all backward incompatibilities in pandas 1.0.0. And then write an outer function that goes through every Python file of a system, reads its code, patches it and writes it back to disk.
It is important to note that a developer should review the changes done by the IntelliPatch before committing it. For example, if code is hosted on git, then a git diff should be performed and reviewed by the developer.
At Soroco, we have written 5 IntelliPatch scripts so far that were ran on 10 systems. Each script successfully parsed and patched about 150,000 lines of code across 10 systems. In terms of productivity, this effort took one of our engineers three full days to complete. This engineer learnt about ASTs before implementing these solutions.
Of the five scripts, one particular script was unique – a code scrubber and not a traditional patch. This need stemmed from an external party seeking to review the outline of the code, without sharing the actual logic and specifics of the code. Hence, we wrote a scrubber, that scrubs logic and other key elements in the code while retaining only the imports, class and function definitions, docstrings, type annotations and some very specific information required for the review. Therefore, the AST proved to be a valuable tool for buiding a code scrubber as well.
Now that we understand how ASTs can be very useful to write intelligent patch scripts, in this section we will explain how it can be used to assess code quality.
variable_name_check.py
Usage:
Output:
Usage:
Output:
The usefulness of ASTs extends far beyond the discussion in this article. For example, the ASTs of the files in a given system can be used to create a call graph. A call graph created during run-time may not cover all the code paths. But a call graph created using ASTs statically will cover all the code paths and thus will be comprehensive. The call graph then can be used to generate a human readable documentation of the system. We have built such a functionality in Soroco that we call “LiveDoc”, but that is a topic for another day in an another article 🙂
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |