School of Computing

Suggested PhD Projects

Semantics in Genetic Programming

Contact: Colin Johnson

Genetic Programming (GP) is a method of problem solving that takes its inspiration from biological evolution. Specifically, GP is concerned with evolving a population of programs (or other executable structures) to solve a specific problem. GP has been successful in a number of areas, e.g. data mining, bioinformatics, game AI, circuit design, symbolic regression, and many others.

In GP the semantics of programs are important, i.e. what programs actually do, as opposed to the syntax of programs, e.g. their text. A lot of work in GP has focused on these syntactic aspects of GP and neglected the semantic aspects. In the last few years a decent amount of work has been carried out on semantic aspects of GP, both at Kent and elsewhere. For example, the semantic effects of evolutionary operators can be studied (e.g., does the exchange of text between two programs result in a program that has aspects of the behaviour of the two parent programs?), studying the semantic diversity of the population (e.g., is the initial population semantically diverse? How does this semantic diversity change over an evolutionary run?), aspects concerned with fitness (e.g., can we define a fitness function in terms of the desired behaviour of programs and measure the distance of a program from that ideal semantics?).

There is lots of scope for exploring different aspects of this. A good starting point would be to look at the papers that I have written with my research student Lawrence Beadle (e.g. on semantic initialisation, crossover and mutation, and the work by groups in Dublin and Poznan.

Deep Genetic Programming

Contact: Colin Johnson

In the last few years, deep learning using neural networks has become the most prominent technique in machine learning (see the Goodfellow/Bengio/Courville book.) It could be argued that the success of deep learning has little to do with neural networks as such; it is the development of deep, layered representations that is important, together with means of training these. In particular, there are powerful ideas such as stacked autoencoders, where layers of unsupervised representation extract increasingly complex representations in a layered fashion before finally applying a supervised process to solve a particular problem; these are concerned with training regimes rather than neural networks as such.

This opens an interesting area for investigation, which is the development of deep learning algorithms that use representations other than neural networks. One idea would be to use code fragments, in the tradition of methods such as genetic programming (see the GP Field Guide.) The aim of this project would be to investigate the power of combining GP-like representations with deeply layered representations and training regimes from the deep learning tradition. We would aim to answer questions such as whether these approaches can produce results on standard test problems that are as accurate or more than neural-based representations, whether they produce more comprehensible outputs, and whether the models learned are smaller than the neural models.

This leads onto a deeper investigation, which is whether this approach can be used to learn models that combine pattern recognition and reasoning methods. Current deep learning methods are good at learning to recognise patterns in e.g. visual images, but weak at problems that require e.g. arithmetical or logical reasoning. One approach would be to create deep learning methods that have “recognition” layers and “reasoning layers”; the challenge then is to devise appropriate training regimes. For the philosophically inclined, we can see this as a challenge of how to combine different kinds of knowledge: in particular, how to combine a posteriori knowledge (from experience) with a priori knowledge (independent of experience); in particular, Kant’s notion of synthetic a priori concepts. There might be some useful clues from the philosophical literature about how to represent and combine these ideas.

Some related ideas can be found in the recent paper by Lino Rodriguez-Coayahuitl, Alicia Morales-Reyes, Hugo Jair Escalante (Structurally Layered Representation Learning: Towards Deep Learning through Genetic Programming, Proceedings of the 2018 European Conference on Genetic Programming)

Giving Artificial Intelligence a Mind’s Eye

Contact: Colin Johnson

"If you dangle a rat by the tail, which is closer to the ground, it’s nose or ears?" (Shananhan, 2015). This is a trivial question for a person, but a very difficult question for a computer. A person would solve the problem by visualising the scenario in their mind’s eye. The aim of this project would be to give this capacity of mind’s eye visualisation to an artificial intelligence system, and test the effectiveness of that system.

One way to represent this mind’s eye representation would be for the system to represent what it learns as objects in a game engine such as Unity. The idea would be that we train a system not just to associate labels with objects, but to associate rich, active models, including physical models and models of behaviour. Then, we could answer questions about the world by setting up those models in the game engine, and letting them interact.

There are a number of possible systems that could be used to test such as system. One might be computational thinking systems such as Bebras, because these exemplify exactly what we want AI systems to do—bridge the gap between informal descriptions of tasks and computational descriptions.

The recent papers by Kunda (2018) and Ha and Schmidhuber (2018) give useful background for this project and are a good starting point for understanding more.

References
David Ha and Jürgen Schmidhuber (2018) World Model,
Maithilee Kunda (2018) Visual mental imagery: A view from artificial intelligence. Cortex, 105:155–172.
Murray Shanahan (2015), The Technological Singularity, MIT Press.

Machine Learning of Adjectives

Contact: Colin Johnson

Recent advances in machine learning have focused on two topics. One is classification problems—associating labels with data. Another is learning behaviours through reinforcement learning. Looked at from a linguistic perspective, we can see these as being the nouns and verbs in language. However, a nuanced view of the world requires other components of language, in particular adjectives and adverbs; and ways of composing these components.

There has been some work in AI and machine learning on specific aspects of adjectival and adverbial descriptions. For example, there is work in computer vision on learning the concept of “uprightness”, and on ideas of texture recognition; in audio processing there is some work on timbre spaces. However, there hasn’t been a systematic attempt to investigate the capacity of machine learning to detect adjectives, or to build new learning approaches that are particularly suited to adjectival learning. To do so would add much richness to AI systems.

Therefore, the aim of this project is to build an appropriate dataset of data with adjectival descriptions, to test the capacity of existing machine learning methods to learn them, and by reflecting on the successes and failures of those tests to devise new machine learning methods that are specifically adapted to adjectives and adverbs.

One specifically interesting set of continuum adjectives is those that describe distance from a desirable state. For example, the idea of “blurriness” in visual images, the idea of “clarity” in sound. If a system could be devised that could learn these concepts, then it could be used to build fast learning systems that move from a less desired to a more desired state (e.g. cleaning a noisy sound recording). There is a sketch of how this might work in this paper: https://kar.kent.ac.uk/69595/

Genetic programming for temperature weather derivatives

Contact: Michael Kampouridis

Weather derivatives are financial instruments used as part of a risk management strategy to reduce risk associated with adverse or unexpected weather conditions. Just as traditional contingent insurance claims, whose payoffs depend upon the price of some fundamental, a weather derivative has an underlying measure, such as rainfall, temperature, humidity, or snowfall. However, in the majority of the weather derivatives, the underlying asset is a temperature index. Hence, the proposed work will be focusing on temperature weather derivatives.
The problem of temperature weather derivatives can be divided into two main parts: (i) temperature prediction, and (ii) pricing of weather derivative contracts. This project will use an evolutionary approach, called Genetic Programming (GP) to predict future temperature and derive pricing equations. GP is a nature-inspired algorithm, which uses the principle of natural evolution to find computer programs that perform well in a given task. One of the main advantages of GP is its ability to perform well in high-dimensional combinatorial problems, such as the one of weather derivatives pricing.


Financial forecasting with directional changes

Contact: Michael Kampouridis

In the aftermath of a global financial crisis, it is more important than ever to have a better understanding of the markets and be able to forecast their movement. Directional changes (DC) is a new concept that is based on the idea that an event-based system can capture significant points in price movements that the traditional physical time methods cannot. For instance, if one was using daily closing prices, s/he would never notice the Dow Jones Industrial Average flash crash on the 6th of May 2010, where an almost 1000 point loss (about 9%) took place, only to recover most of those loses within minutes. Hence, instead of looking at the market from an interval-based perspective, it is proposed to record the key events in the market (e.g., changes in the stock price by a pre-specified percentage), and summarise the data based on these events.
This project will use Genetic Programming (GP) methods to create trading strategies. GP is an evolutionary technique inspired by natural evolution, where computer programs act as the individuals of a population. GP has been extensively used in the past for financial forecasting, and has shown it is able to identify patterns in financial data.

Quantum Stochastic Thermodynamics of Computation

Contact: Dominique Chu

Any sort of computation done in the real world must correspond to a manipulation of some concrete physical system. In practice this is often electronic hardware, but in principle computation can be implemented with many different media, including chemical system, mechanical devices or . Any such manipulation requires physical resources in that it takes some time to complete, consumes energy (work) and increases the entropy of the world. Depending on how the computation is implemented more or less of such resource is consumed. The key question of this project is how much resource is at least needed in order to perform a computation.

This interdisciplinary project will investigate minimal energy requirements of computation, probing fundamental physical limits. This will be done using the framework of quantum stochastic thermodynamics, which will be applied to models of minimal computers in order to derive fundamental limits to the resource usage of computation.

The project will be suitable for a physics graduate with interest in computation. It will require the willingness and ability to learn the basics of quantum stochastic thermodynamics, as well as the fundamentals of theoretical computer science.

Understanding Spiking Neural Networks

Contact: Dominique Chu

Spiking Neural Networks (SNN) are brain-like neural networks. Unlike standard rate coding neural networks, signals are encoded in time. This makes them ideal for processing data that has a temporal component, such as time-series data, video or music. Another advantage of SNNs is that there exists neuromorphic hardware that can efficiently simulate SNNs.

SNNs are generally thought to be “more powerful” than standard rate coding networks. However, it is not clear precisely in what sense they are more powerful, or what precisely it is that makes them more powerful.

The idea of this project is to investigate this claim using a combination of mathematical and computational methods. As such the project will require an interdisciplinary research methodology at the interface between mathematics, computer science and neuroscience.

The project would be suitable for a student who wishes to become and expert in an up-and-coming method in artificial intelligence. It has the scope for both theoretical investigations, but will also require implementing neural networks.

Training algorithms for spiking neural networks

Contact: Dominique Chu

Spiking neural networks encode information through the temporal order of the signals. They are more realistic models of the brain than standard artificial neural networks and they are also more efficient in encoding information. Spiking neural networks are therefore very popular in brain simulations. A disadvantage of spiking neural networks is that there are
not many efficient training algorithms available.

This project will be about finding novel training algorithms for spiking neural networks and to compare the trained networks with standard artificial neural networks on a number of benchmark AI tasks. An important part of this project will be not only to evaluate how well these spiking neural networks perform in relation to standard networks, but also to
understand whether or not they are, as is often claimed, more efficient in the sense that they need smaller networks or fewer computing resources.

The main approach of the model will be to gain inspiration from existing theories about how the how the human brain develops and learns. These existing theories will then be adapted so as to develop efficient training algorithms. This project will be primarily within AI, but it will also provide the opportunity to learn and apply techniques and ideas from computational neuroscience.

Computational models of brain-like networks

Contact: Dominique Chu

The human brain consists of billions of individual neurons whose firing patterns collectively create the computation that allows us to walk, be conscious and perform tasks. We are still far away from understanding in detail how the brain works. However, we are now able to address questions about how brain-like systems can perform particular tasks. For example,
how are we able to keep an internal map of our location in space? How are we able to hear a tune and then immediately reproduce it? Or how can we learn to perform mental arithmetic?

This project will construct minimal neural networks models that can perform such tasks. The main purpose of the project is not primarily to understand how the brain is actually solving these tasks, but to understand what the minimal networks are that can perform particular tasks.

Machine Learning for the Pharmacology of Ageing

Contact: Alex Freitas

Recently, there has been a growing interest in ageing research, since the proportion of elderly people in the world’s population is expected to increase substantially in the next few decades. As people live longer, it becomes increasingly more common for a person to suffer from multiple age-related diseases. Since old age is the ultimate cause or the greatest risk factor for most of these diseases, progress in ageing research has the potential to lead to a more cost-effective treatment of many age-related diseases in a holistic fashion. In this context, researchers have collected a significant amount of data about ageing-related genes and medical drugs affecting an organism’s longevity – mainly about simpler model organisms, rather than humans. This data is often freely available on the web, which has facilitated the application of machine learning methods to the pharmacology or biomedicine of ageing, to try to discover some knowledge or patterns in such datasets.

This project will focus on developing machine learning algorithms for analysing data about the pharmacology of ageing, i.e., data about medical drugs or chemical compounds that can be used as an intervention against ageing, mainly in model organisms. The broad type of machine learning method to be developed will be supervised machine learning (mainly classification), but the specific type of algorithm to be developed will be decided later, depending on the student’s interest and suitability to the target datasets. Note that, although this is an interdisciplinary project, this is a project for a PhD in Computer Science, so the student will be expected to develop a novel machine learning method. As examples of interdisciplinary papers on machine learning for ageing research, see e.g. (the first paper is particularly relevant for this project, whilst the second includes a broader discussion about machine learning for ageing research):

D.G. Barardo, D. Newby, D. Thornton, T. Ghafourian, J.P. de Magalhaes and A.A. Freitas. Machine learning for predicting lifespan-extending chemical compounds. Aging (Albany NY), 9(7), 1721-1737, 2017.
F. Fabris, J.P. de Magalhaes, A.A. Freitas. A review of supervised machine learning applied to ageing research. Biogerontology, 18(2), 171-188, April 2017.

Predicting Recovery from Stroke with Machine Learning

Contact: Howard Bowman

Contact: Marek Grzes

This PhD research would focus on interpretable machine learning applied to data acquired from stroke patients (https://www.ucl.ac.uk/ploras/). This work will be with Professor Cathy Price (Wellcome Centre for Human Neuroimaging, UCL), whose (PLORAS) team has collected one of the largest data sets of stroke patients (greater than 1,000), including structural MRI scans, behaviour and demographics. A key focus of Cathy Price’s work is to predict the recovery trajectory of stroke patients from their structural MRI scans, particularly patients with language deficits (i.e. that are aphasic). Progress has been made on this using traditional and now deep learning methods.

Critical to clinical uptake of machine learning in this area is the ability to interpret the predictions it provides in a fashion that can be communicated to clinicians, patients and carers. The PhD student would work on this topic, using methods such as neural-symbolic techniques. The student will be located in the School of Computing at the University of Kent, but will regularly visit and work closely with Cathy Price’s team at the Wellcome Centre for Human Neuroimaging. Expertise in machine learning will be provided by Dr Thomas Hope (Wellcome Centre for Human Neuroimaging, UCL) and Dr Marek Grzes (School of Computing, University of Kent).

Relevant articles:
Besold, T. R., Garcez, A. D. A., Bader, S., Bowman, H., Domingos, P., Hitzler, P., ... & de Penning, L. (2017). Neural-symbolic learning and reasoning: A survey and interpretation. arXiv preprint arXiv:1711.03902.

Hope, T. M., Seghier, M. L., Leff, A. P., & Price, C. J. (2013). Predicting outcome and recovery after stroke with lesions extracted from MRI images. NeuroImage: clinical, 2, 424-433.

Seghier, M. L., Patel, E., Prejawa, S., Ramsden, S., Selmer, A., Lim, L., ... & Price, C.J. (2016). The PLORAS database: a data repository for predicting language outcome and recovery after stroke. Neuroimage, 124, 1208-1212.

Computational Modelling of Attention

Contact: Howard Bowman

Humans are very good at prioritising competing processing demands. In particular, perception of a salient environmental event can interrupt ongoing processing, causing attention, and accompanying processing resources, to be redirected to the new event. A classic example of this is the well-known Cocktail Party Effect. Not only are we easily able to follow just one conversation when several people are speaking, but the occurrence of a salient phrase in a peripheral conversation stream, such as somebody mentioning our name, causes auditory attention to be redirected. It is also clear that emotions, motivation and physiological state in general, play a key role in such prioritisation.

The proposed PhD will investigate the construction of computational models of these cognitive phenomena, with particular emphasis on neural level modeling. The modeling work will be guided by (and will also guide) the converging evidence now being made available by behavioural studies and brain mapping (both fMRI and EEG), some of which is being collected in Bowman’s research groups at Kent and Birmingham. A particular focus will be on extending Bowman & Wyble’s Simultaneous Type/ Serial Token model. One possible line would be adding spatial attentional mechanisms to the model and making the model a more complete theory of conscious perception.

Experimental Studies of Human Attention Using Behavioural and EEG Methods

Contact: Howard Bowman

Humans are very good at prioritising competing processing demands. In particular, perception of a salient environmental event can interrupt ongoing processing, causing attention, and accompanying processing resources, to be redirected to the new event. A classic example of this is the well-known Cocktail Party Effect. Not only are we easily able to follow just one conversation when several people are speaking, but the occurrence of a salient phrase in a peripheral conversation stream, such as somebody mentioning our name, causes auditory attention to be redirected. It is also clear that emotions, motivation and physiological state in general, play a key role in such prioritisation.

Through the combination of behavioural experimentation and the recent application of brain imaging, modern cognitive neuroscience is starting to clarify the mechanisms that underlie human deployment of attention. In particular, a number of experimental paradigms have started to reveal how timing constraints and sensitivity to salient events are reconciled in humans. Two experimental paradigms that are currently being explored in the attention research group at Kent are the attentional blink and the Emotional Stroop Effect. In particular, we have developed detailed theoretical accounts of both these phenomena.

The proposed PhD will undertake experimental studies targeted at evaluating a number of key predictions arising from our theoretical accounts of these phenomena. Many of these predictions focus on the nature of human conscious perception and the constraints that govern whether a stimulus is, or is not, perceived. In this respect, we are particularly evaluating the relationship between encoding into working memory and conscious perception.

The proposed experimental methods are traditional behavioural experimentation and electrophysiological work (i.e. EEG and ERP recording). The latter of these will be performed using the School of Computing at Kent’s BioSemi EEG system. There is also the possibility to run functional magnetic resonance imaging studies through Bowman’s part-time Professorship in the School of Psychology at the University of Birmingham.

Lie Detection and Brain-Computer Interaction on the Fringe of Awareness

Contact: Howard Bowman

We have developed methods to detect with EEG when a subject perceives a salient stimulus amongst a list of rapidly presented stimuli (10 per second). The majority of stimuli presented at this rate are not consciously reportable. However, the brain is selectively processing stimuli at such speeds; for example, it can detect the presence and identify of stimuli that fit a task template (e.g. the image containing an animal) or are intrinsically (and personally) salient (e.g. the word “spider” for a spider phobic).

Such modes of presentation have been extensively studied from a theoretical perspective, thereby, clarifying perceptual and attentional processing in humans. While continuing to investigate such fringe awareness theoretically, we are also exploring applications of such techniques in brain computer interaction. The space of potential applications of such methods is broad, including, lie detecting, interacting with vegetative and coma patients, control of computing and mobility devices, brain-salience directed search and retrieval, and adaptive computer interfaces. In this context, we have developed a brain-computer interface called the P3-Rapid method and a lie detector called the fringe-P3 method. All these areas along with theoretical investigations of perception and attention are potential subjects for PhD research.

Connectionism and Consciousness

Contact: Howard Bowman

How consciousness emerges from the physical matter of the brain remains one of the greatest mysteries of science. However, as a result of modern neuroscience and brain imaging techniques, theories of the neural mechanisms underlying conscious experience are starting to be proposed. For example, there are theories concerning synchronous firing of neurons and consciousness and there are explanations that focus on brain regions, e.g. the what and where pathways from visual cortex. In addition, neural network modelling is playing an important role in this debate. For example, explanations focussing on synchronous neural spiking have been investigated using neural network simulations.

There is considerable room for PhD level research on using neural networks to simulate theories of consciousness. One direction would be to develop models of how masking works in psychological studies of perception. It is well known that following a stimulus by a mask prevents conscious experience of the stimulus (in fact, related effects arise if the mask is presented at the same time as the stimulus). However, even though we have no awareness of the stimulus, our motor system still responds to it. Notice that such masking underlies the subliminal presentation of frames during films.

Despite the fact that such masking has empirically been investigated very extensively and indeed many theories of its functioning exist, there is currently no comprehensive computational model of the phenomenon. Thus, a possible avenue for a PhD in this area would be to construct neural network models of the competing theories of masking in order to verify their validity.

Computational creativity and automated evaluation

Contact: Anna Jordanous

In exploring how computers can perform creative tasks, computational creativity research has produced many systems that can generate creative products or creative activity. Evaluation, a critical part of the creative process, has not been employed to such a great extent within creative systems. Recent work has concentrated on evaluating the creativity of such computational systems, but there are two issues. Firstly, recent work in evaluation of computational creativity has consisted of the system(s) being evaluated by external evaluators, rather than by the creative system evaluating itself, or evaluation by other creative software agents that may interact with that system. Incorporation of self-evaluation into computational creativity systems *as part of guiding the creative process* is also under explored. (Anna currently has one PhD student at Kent and one external PhD student exploring different approaches to this latter issue, but there are many other possible avenues for investigation.)

In this project the candidate will experiment with incorporating evaluation methods into a creative system and analyse the results to explore how computational creativity systems can incorporate self-evaluation. The creative systems studied could be in the area of musical or linguistic creativity, or in a creative area of the student's choosing. It is up to the student to decide whether to focus on evaluation methods for evaluating the quality of output from a creative system or the creativity of the system itself (or both). The PhD candidate would be required to propose how they would will explore the above scenarios, for a more specific project. Anna is happy to guide students in this and help them develop their research proposal.

Please note, Anna is on maternity leave until August 2019, but please contact Colin Johnson if you are interested in one of these projects, who will act on her behalf for PhD applications

Expressive musical performance software

Contact: Anna Jordanous

Traditionally, when computational software performs music the performances can be criticised for being too unnatural, lacking interpretation and, in short, being too mechanical. However much progress has been made within the field of expressive musical performance and musical interpretation expression. Alongside these advances have been interesting findings in musical expectation (i.e. what people expect to hear when listening to a piece of music), as well as work on emotions that are present within music and on how information and meaning are conveyed in music. Each of these advances raises questions of how the relevant aspects could be interpreted by a musical performer.

Potential application areas for computer systems that can perform music in an appropriately expressive manner include, for example, improving playback in music notation editors (like Sibelius), or the automated performance of music generated on-the-fly for 'hold' music (played when waiting on hold during phone calls). Practical work exploring this could involve writing software that performs existing pieces, or could be to write software that can improvise, interpreting incoming sound/music and generating an appropriate sonic/musical response to it in real time.

Please note, Anna is on maternity leave until August 2019, but please contact Colin Johnson if you are interested in one of these projects, who will act on her behalf for PhD applications

Digital preservation of the information within musical/sonic material

Contact: Anna Jordanous

Digital preservation of audio material raises many interesting questions to be investigated, including how to archive a sound, what metadata to keep, and future-proofing. Of particular interest is how to explore issues of retention of musical/sonic information from relevant digital audio material, for later access and analysis. Sound and music are typically very open to interpretation, with much information being conveyed through musical/sonic material.

Music Information Retrieval (MIR) allows us to see what information is communicated by musical material, using techniques from Computing and Music. Typically MIR is applied to digital rather than physical materials and comes in a variety of forms that could be explored, such as using digital tools or computational analysis for informing and enhancing musicological analysis or musical interpretation. In this PhD project, the PhD candidate will carry out such explorations, towards the development of an archive or a methodology for existing archives to access and retrieve musical information from archive music-based data.

Please note, Anna is on maternity leave until August 2019, but please contact Colin Johnson if you are interested in one of these projects, who will act on her behalf for PhD applications

Music on the Semantic Web

Contact: Anna Jordanous

The Semantic Web is a vision of the Web where items on the web are data, which get linked together if they are data referring to similar things. In the Semantic Web, "a computer program can learn enough about what the data means to process it." (Tim Berners-Lee, Weaving the Web, 2000) There are some data and ontologies (computational models of knowledge) published on the Semantic Web about music, for example the Music Ontology (musicontology.com).

Research is starting to emerge on using information retrieval in conjunction with data on the Semantic Web; this project proposes that the PhD candidate explores how Music Information Retrieval (MIR) can be enhanced using Semantic Web data and tools. During this PhD project, the candidate would look at a particular question in music information retrieval, such as how to use MIR to perform computational musicological analysis or how to identify music that is intended to express similar meanings or emotions. (Alternatively the candidate may wish to address a different music information retrieval problem, in an area of specific interest to them; this is welcome.) The PhD candidate would explore how this MIR question can be addressed by using music-specific Semantic Web data/models/technologies to enhance the process of identifying relevant information.

It is expected that the PhD candidate will produce computational tools or software that engages directly with the Semantic Web in order to perform the musical information retrieval task. The performance of Semantic-Web enhanced solutions should be compared to traditional MIR solutions for that task, if any exist, and evaluated as to the accuracy and comprehensiveness with which the tools or software carry out the task.

Please note, Anna is on maternity leave until August 2019, but please contact Colin Johnson if you are interested in one of these projects, who will act on her behalf for PhD applications

Chatbots for Language Tutoring

Contact: Marek Grzes

The old proverb says: "The more languages you speak, the more human you are". More pragmatically, knowing a foreign language opens doors to many opportunities. These benefits motivate people worldwide to spend billions of dollars every year on foreign language learning. It is clear that learners who engage in meaningful conversations (not necessarily oral) make faster progress. Unfortunately, communications with native speakers are rare or even impossible when the learner is in her country of origin. Chatbots provide an easy access to lifelike conversations to millions of people worldwide, regardless of the target language or location of the learner. Imagine that an accurate and entertaining chatbot for teaching English is available on the web or as a contact person on Skype, Kik Messenger, Gadu-Gadu, or on any other platform of this kind. The benefits for students as well as business opportunities for industry are evident.

This research topic could be engaging for both academically oriented PhD students and those who are interested in a commercial deployment of their PhD work. It would require very good programming skills as well as interest in machine learning, statistical natural language processing, and information retrieval.

The student would be encouraged to take a highly personal interpretation of the problem.

Mining Educational Data using Deep Neural Networks / Machine Learning

Contact: Marek Grzes

The School of Computing at the University of Kent has a Computing Education Research group that is internationally recognised for its contributions to research related to teaching programming languages. The members of our Computing Education Research group have access to large amounts of educational data that needs to be analysed using machine learning techniques. If you would like to work on an applied project in which advanced machine learning (e.g. deep neural networks) would be used to analyse educational data, I could serve as one of your supervisors. It would make sense if your main supervisor was from our Computing Education Research group. Note that I am a member of the Computational Intelligence Research group, and my core research interests are in machine learning.

Probabilistic Planning with Constraints

Contact: Marek Grzes

Many applications of probabilistic planning and artificial intelligence involve various types of constraints that should be satisfied (with a high probability at least). For example, in intelligent tutoring systems, it may be desirable to minimise the number of turns while ensuring that the probability of the student completing the task is maximised. Partially observable Markov decision processes (POMDPs) provide a natural framework to design applications that continuously make decisions based on noisy sensor measurements. POMDPs represent a factual mathematical model which makes them useful both for developing new algorithms and for creating specialisations for particular applications, such as intelligent tutoring systems. Particular focus of this project could be on extending the recent work of Grzes & Poupart to situations that involve constraints.

This project is very technical, and it would be suitable for an individual who wishes to work on the core algorithms for artificial intelligence planning. Ability to write software (in Matlab at least) would be essential together with a good mathematical background (especially knowledge of linear algebra).

This research challenge can be addressed in various ways, and the methods investigated within this project would depend on preferences of a particular student, his interests, skills, and his long term objectives.

Information Visualization Directed by Graph Data Mining

Contact: Peter Rodgers

Data visualization techniques are failing in the face of large data sets. This project attempts to increase the scale of graph data that can be visualized by developing data mining techniques to guide interactive visualization. This sophisticated combining of information visualization and data mining promises to greatly improve the size of data understandable by analysts, and will advance the state of the art in both disciplines. On successful completion, publications in high quality venues are envisaged.

This project is algorithmically demanding, requiring good coding skills. The implementation language is negotiable, but Java, JavaScript or C++ are all reasonable target languages. Data will be derived from publicly available network intrusion or social network data sets.

Tasks in this research project include:

  1. implementing graph display software and interface.
  2. developing project specific visualization algorithms.
  3. integrating graph pattern matching and other graph data mining systems into the visualization algorithms.

Visualizing Complex Data

Contact: Peter Rodgers

There is a lot of data that contains both associations (usually visualized as a graph) and set membership (visualized with various techniques, such as Euler diagrams, bubble sets or linear diagrams). Visualizing both of these types of relationship is a challenging task, however it is very topical, and significant research efforts are going into this work at present. This project would look at developing techniques for data of limited size, then examine the scalability of the system by developing a separate overview visualization, which summarizes the data. The user would then be able to examine the overview for interesting data, filter the data, and then examine interesting areas in detail.

The project would involve identifying an application area (for example, social network data or gene data), develop the two visualization systems and the interaction between them. The result would then be tested for success by examining subject specialists as they use the software.

Motif Finding in Set Based Data

Contact: Peter Rodgers

Seeking overrepresented subgraphs, or "Motifs" in graphs is widespread. For example, they are used to analyse gene data [DD07], social networks [WG06] and criminal patterns [DM15]. The process works by sampling a large number of subgraphs of similar size, placing equal subgraphs (isomorphic subgraphs) into buckets and counting the size of the buckets. Data analysts then examine subgraphs that occur more than would be expected by chance.

This project would apply the same general process to set based data to see if insights can be derived from detecting motifs. Such data example social network data, where people have might have shared interests and are therefore in the same set. Tags on data (such as twitter hashtags) can also be used classify items into sets.

The tasks in this project would be:

  • to derive suitable real world data
  • to work out randomization strategies that produce random data sets for
    comparison
  • design methods to sample small sections (set systems) of this data
  • develop fast set system isomorphism algorithms (equivalent to thehypergraph isomorphism problem)
  • implement statistical analysis methods to indicate over represented smallset systems
  • analyse these to see if useful information can be derived from theoverrepresented cases

[DD07] Das, M. K., & Dai, H. K. (2007). A survey of DNA motif finding algorithms. BMC bioinformatics, 8(7), 1.
[DM15] Davies T, Marchione E (2015) Event Networks and the Identification of Crime Pattern Motifs. PLoS ONE 10(11): e0143638. doi:10.1371/journal.pone.0143638
[WG06] Wei, X., & Gang, Z. (2006). U.S. Patent Application No. 11/603,284.

.

School of Computing, University of Kent, Canterbury, Kent, CT2 7NF

Enquiries: +44 (0)1227 824180 or contact us.

Last Updated: 13/03/2019