Article adaptation of Master’s thesis

In the last few weeks I have adapted my Master’s thesis into an article. While working with my results, which were very specific to a certain algorithm, I found that they should be easy to generalise. Together with my Master’s thesis advisor, I worked through this generalisation, ending with far stronger results than those in my thesis.

First, some simple background information. My thesis was about a small part of a query rewriting process, called Ontology Base Data Access (OBDA) rewriting. The full rewriting process rewrites a query (usually a Description logic formula) according to a set of axioms (eg., every house is a building, or everyone who has a child is a parent), before reformulating the result into a different language (usually SQL). The first of these stages is known as ontology rewriting, the second as unfolding. The complete rewriting process can be very time consuming, and so it is often useful to apply knowledge of how a certain query should be rewritten (stored in the form of what we call a perfect rewriting). We obtain such knowledge either through system analysis, or through storing the results of previous rewritings. The topic of my thesis was analysing the effect of applying perfect rewritings in combination with an ontology rewriting algorithm called the Tree-witness rewriting.

As we reread my results, we realised that we should be able to improve on them. My key result had been a set of observations about which perfect rewritings were most useful to apply in combination with the Tree-witness rewriting, and we felt that we should be able to quantify my results. While working on our improvements, we soon realised that the mechanics of the ontology rewriting were not relevant to the way we measured the usefulness of a perfect rewriting. Instead of trying to predict the savings achieved when applying a perfect rewriting, we could measure this saving by applying the rewriting algorithm.

Again, some theory. Rewriting a query in an OBDA system can result in an exponential increase in query size. In most applications, the resulting query contains much redundancy, and can be shortened significantly, at the cost of much computation. If we happen to know how a query is rewritten, we say we have a perfect rewriting of that query, and we do not need to rewrite it again. Another interesting situation is when we have a perfect rewriting for a part of a query. It turns out that, under certain conditions, it is possible to apply a perfect rewriting to parts of a query. The result is that we have to rewrite a smaller query, which is very good news considering how poorly OBDA rewriting scales with query size.

The realisation that came upon us was that the exact gain of applying a perfect rewriting was reflected by how much work went into finding that perfect rewriting in the first place. By quantifying this work, we could create a measure that could order perfect rewritings according to how useful they would be. Why is this important? Because it is not always possible to apply any combinations of perfect rewritings. We may have to chose between different combinations, and when we do, it is very good to have a measure of their usefulness.

The final paper, which is more a continuation of my Master’s thesis that an adaptation, was submitted to the 28th International Workshop on Description Logics.