PhD Project Summary
Retrofitting strong type checks onto weakly typed languages.
is the single-sentence explanation of my PhD topic.
Many large scale software projects are continue to be developed using languages with very weak type systems. These languages, such as C and Java, were developed a long time ago, when many of the techniques we know of today were either not known yet, seen as too arcane, or inefficient.1
These languages are very much still relevant, both because the redevelopment of software in a new language is expensive and because they offer other features that make them useful int he domains they were designed for.
Weak type systems are only able to express limited, simple relationships between data. Programs which require more complex data interactions must circumvent the type system to achieve them. This carries inherent risks, as the type system can no longer guarantee safety. Especially larger applications often require complex type interactions to achieve modularity and extensibility. At the same time it is very difficult to manually ensure that these are safe, especially in large code bases.
There are already a plethora of tools for checking certain properties of C, such as memory safety. We can loosely place these tools on a spectrum from thorough to fast. Generally speaking the more sophisticated the analysis, the longer it takes. Thorough techniques, such as Model Checking, which simulates a large part of all possible program executions, can give very strong guarantees for the safety and correctness of a program. These are usually applied only on small, highly safety critical programs, because they need a lot of resources and time to compute. They are not well suited for large code bases and agile development practices, as are common these days.
The goal of my PhD is to bring some of the simpler, advanced techniques that are common in most modern languages, such as polymorphism, to languages such as C. This tool would be targeted to doing fast on-line analysis, providing feedback directly during the development process, or perhaps slightly slower as part of a code review routine.
I want to achieve this by transforming code, reifying polymorphic type variables and making their manipulations explicit. Then locations where data may be used in a not type-safe way are identified. In a third step the transformed program is sliced. This means most of the program is discarded, leaving only what may influence the types in the type-unsafe locations. This reduced program is fed to a Symbolic Execution Engine, which simulates the execution and computes the precise shape of the data at runtime. If the shape is as the code expects, we know it is in fact safe, if it is not, we can inform the programmer.
Some current languages are also designed with weak type systems, such as Go. It seems we just do not learn.↩
I am a member of the following research groups:
My research interests span several, loosely connected areas.
My undergraduate (and masters) studies revolved mainly around Parallel and Distributed Computing. What tickled my fancy in particular is how to design simple and safe abstractions on top of the complicated protocols that are necessary to make parallel and distributed programs behave in predictable ways.
This area, which I'll broadly call Abstraction and API Design is the second of my areas of interest. I say broadly because, for me, this encompasses designing library interfaces as well as entire languages and language features. During my masters studies I was involved with this as a student assistant working on Compiler Construction. One subarea here that I'd like to note in particular is Functional Programming which I find to be good basis for API and language design. It is extraordinarily expressive, flexible but also has a declarative nature, describing intent rather than method. In my opinion this not only makes code more readable, but also easier to transform and optimise.
Last but not least I am interested in program analysis, in particular Type Systems. It's useful to make programs safer, but is also a tool that can be leveraged when designing API's as it can guide the users towards correct usage.