Research Seminar Informatica

contact info

Secretariat Faculty of Science
Open Universiteit
Valkenburgerweg 177
6419 AT Heerlen
Netherlands
 
phone: +31 (0)45 576 2877
email: secretariaat.bw@ou.nl
www: http://cs.ou.nl/
twitter: @ou_informatica

OUrsi stands for "Open University Research Seminar Informatica". The OUrsi is the research seminar of the Department of Computer Science of the Open University of the Netherlands. OUrsi talks typically take place on the last Tuesday of the month at 11:00 in Heerlen, and are normally streamed online for the benefit of interested parties elsewhere. The OUrsi features presentations on ongoing research in all areas of Computer Science.

If you are interested in giving a presentation, or attending one (live in Heerlen or online), please contact Daniel Stanley Tan. Below are listed the scheduled presentations, with the upcoming OUrsi  highlighted .

Presentations

No. Date Presenter Title  

TITLE
NAME

ABSTRACT
108. 23 July Stef Joosten Schema-changing data migrations abstract

Schema-changing data migrations
Stef Joosten

Once information systems are in production, they can last for decades. During that time, data migration is sometimes unavoidable. And what if that data migration also involves a changing database schema? Can you take those kinds of risks with production data?

OU colleague Stef Joosten has investigated this issue. To what extent can such data migrations be automated? In a scientific article, he and his son, Sebastiaan, have developed a theory that allows schema-changing data migrations (SCDMs) to be performed in a manageable way. In this meeting, he will explain how it works and give a short demonstration.

107. 2 July Rick Koenders Output-decomposed learning of Mealy Machines abstract

Output-decomposed learning of Mealy Machines
Rick Koenders

Model learning can be used to infer the behaviour of black-box systems. However, models of software systems can be very big. This makes them difficult to learn with existing algorithms. We present an active automata learning algorithm which learns a decomposition of a finite state machine, based on projecting onto individual outputs. When projecting the outputs to a smaller set, the model itself is reduced in size. By having several such projections, we do not lose any information and the full system can be reconstructed. Depending on the structure of the system this reduces the number of queries drastically, as shown by a preliminary evaluation of the algorithm.
106. 25 June Ashish Sai Advances in blockchain energy and environmental footprint research abstract

Advances in blockchain energy and environmental footprint research
Ashish Sai

Interest in the energy and environmental impact of cryptocurrencies like Bitcoin and Ethereum, and alternatives like proof-of-stake (PoS) and proof-of-spacetime (PoSt), is growing, yet existing studies often inaccurately estimate energy consumption due to flawed methodologies and assumptions. This work evaluates previous models' reliability by leveraging literature from social energy sciences and information systems. We conduct a systematic review, propose a quality assessment framework, and introduce refined models for PoS-based Ethereum and PoSt-based Filecoin, improving data quality, transparency, and documentation. Our goal is to enhance scientific rigor in blockchain energy studies, offering a code of conduct for research methodologies to standardize the assessment of blockchain systems' energy and environmental impacts.
105. 11 June Jesse Heyninck LogicLM: Combining Logic Programs with Language Model abstract

LogicLM: Combining Logic Programs with Language Model
Jesse Heyninck

In this talk I'll present the project proposal LogicLM (joint work with Stefano Bromuri).

(Large) language models ((L)LMs) like chatGPT are becoming frontier technologies, but their reasoning is intransparent and unreliable. The goal of this project is to combine LLMs with symbolic AI, ensuring verifiable and transparent reasoning. This is done by fine-tuning an LM to convert natural language into formal logic and vice versa, and integrating this LM with LM-based approaches for information retrieval and generation. This results in an architecture taking advantage of the strengths of LMs while bypassing their weaknesses by leaning on symbolic AI.

104. 28 May Jos Craaijo libLISA: Instruction Discovery and Analysis on x86-64 abstract slides

libLISA: Instruction Discovery and Analysis on x86-64
Jos Craaijo

Even though heavily researched, a full formal model of the x86-64 instruction set is still not available. We present libLISA, a tool for automated discovery and analysis of the ISA of a CPU. This produces the most extensive formal x86-64 model to date, with over 118000 different instruction groups. The process requires as little human specification as possible: specifically, we do not rely on a human-written (dis)assembler to dictate which instructions are executable on a given CPU, or what their in- and outputs are.

The generated model is CPU-specific: behavior that is "undefined" is synthesized for the current machine. Producing models for five different x86-64 machines, we mutually compare them, discover undocumented instructions, and generate instruction sequences that are CPU-specific. Experimental evaluation shows that we enumerate virtually all instructions within scope, that the instructions' semantics are correct w.r.t. existing work, and that we improve existing work by exposing bugs in their handwritten models.

103. 14 May Daniel Stanley Tan ConCoNet: Class-agnostic counting with positive and negative exemplars abstract slides

ConCoNet: Class-agnostic counting with positive and negative exemplars
Daniel Stanley Tan

Class-agnostic counting is usually phrased as a matching problem between a user-defined exemplar patch and a query image. The count is derived based on the number of objects similar to the exemplar patch. However, defining a target class using only positive exemplar patches inevitably miscounts unintended objects that are visually alike to the exemplar. In this paper, we propose to include negative exemplars that define what not to count. This allows the model to calibrate its notion of what is similar based on both positive and negative exemplars. It effectively disentangles visually similar negatives, leading to a more discriminative definition of the target object. We designed our method such that it can be incorporated with other class-agnostic counting models. Moreover, application-wise, our model can be used into a semi-automatic labeling tool to simplify the job of the annotator.
102. 16 April Stijn de Gouw Analysis of Java's BitSet library

Translation validation for decompilation
Nico Naus

Translation of code is essential to all that we do as computer scientists. When you compile code, source code is translated multiple times, before being turned into a binary file. It is obviously very important that these translations are correct; otherwise meaning of a program changes after translation. In this talk, we will take a look at a way of proving these kinds of translations correct. As a use-case, we will look at the translation from x86-64 binaries to an intermediate representation. This translation is one of the very first steps in decompilation, a process that can be seen as the (partial) inverse of compilation. In doing so, we will discuss how we assign meaning to a program in the first place, what is meant by a ``correct translation'', and finally how we can show this correctness. The context of disassembly adds a few challenges, which will also be addressed.

Since this is OUrsi 101, it was suggested I should keep the talk entry level, so no prior knowledge of any of the above will be required!

100. 5 March Luc Edixhoven Expressive specification and verification of choreographies abstract slides

Expressive specification and verification of choreographies
Luc Edixhoven

Choreographies describe possible sequences of interactions among a set of agents. They can be used to specify and analyse these sequences, and contain certain safety properties by construction. As a drawback, choreographic languages typically have limitations on their expressiveness, which limits their applicability. This talk will give a high-level overview of our results in raising the expressiveness of choreographies. This was initially prepared as a 20 minute talk for PROMIS, in early November 2023; in the time remaining, I can delve further into any of the presented topics that the audience deems (the most) interesting.
99. 20 February Hugo Jonker Fighting scientific fraud by looking at people, not papers abstract slides

Fighting scientific fraud by looking at people, not papers
Hugo Jonker

Incentives in science push scientist to publish more and be cited more often. Some scientists use nefarious means to increase their publication metrics, for example by abusing their power as reviewers or editors to solicit references or even authorships. Most traditional approaches to fighting scientific fraud focus on identifying malfeasance in the contents: plagiarism, pertubed images, etc. Instead, we propose to look for identifying characteristics of the person committing fraud, not at the contents.
This opens up a new way to help fight scientific fraud, taking a more generic view of the process. On the one hand this entails that concrete evidence of specific cases of fraud is out of scope; on the other hand, the new approach offers a way to investigate people, not papers.
98. 6 February Bea Waelbers (OU, Faculty of Management) What the Ear Cannot Hear: Predicting Customer Satisfaction in Voice-To-Voice Service Interactions From Text and Audio Signals abstract slides

What the Ear Cannot Hear: Predicting Customer Satisfaction in Voice-To-Voice Service Interactions From Text and Audio Signals
Bea Waelbers (OU, Faculty of Management)

In today's business environment centered around customer satisfaction, our interdisciplinary research pioneers the exploration of predicting satisfaction by analyzing voice-to-voice service interactions using machine learning. Integrating textual and auditory signals from over 1,100 real service interactions, our study breaks from the tradition of text-focused analyses and recognizes the crucial role of auditory features like tone and intonation. This work not only advances the methodology for predicting customer satisfaction but also serves as a vital bridge between consumer psychology, service management, and machine learning, offering practical strategies for service firms to enhance customer interactions and informed decision-making.
97. 23 January Tom Dooney cDVGAN: Improved Conditional GANs for Generalized Gravitational Wave Transient Generation abstract slides

Tom Dooney
cDVGAN: Improved Conditional GANs for Generalized Gravitational Wave Transient Generation

We present Conditional Derivative GAN (cDVGAN), a novel conditional GAN framework for simulating multiple classes of gravitational wave (GW) transients in the time domain. cDVGAN can also generate generalized hybrid waveforms that span the variation between waveform classes through class-interpolation in the conditioned class vector. cDVGAN transforms the typical 2-player adversarial game of GANs into a 3-player game with an auxiliary critic analyzing the derivatives of time series signals. Our results show that this provides synthetic data that better captures the features of the original waveforms. cDVGAN conditions on three waveform classes, two preprocessed from LIGO blip and tomte glitch classes from its 3rd observing run (O3), and the third representing binary black hole (BBH) mergers. Our proposed cDVGAN outperforms 4 different baseline GAN models in replicating the features of the three waveform classes. Specifically, our experiments show that training convolutional neural networks (CNNs) with our cDVGAN generated data improves the detection of waveforms embedded in detector noise beyond the synthetic data from other state-of-the-art GAN models. Our best synthetic dataset yields as much as a 7% increase in area-under-the-curve (AUC) performance compared to the next best synthetic dataset from baseline GANs. Moreover, training the CNN with hybrid waveforms from our cDVGAN outperforms CNNs trained only on the standard waveform classes when identifying real signals embedded in LIGO detector background between SNRs ranging from 1 to 16. Lastly, we illustrate cDVGAN as a viable data augmentation method that can surpass the performance of using a traditional data augmentation approach.
 

2023

96. 19 December Ali Ozbakir Unlocking Knowledge: Building a Multi-Document Q&A Bot with LLM, Embeddings, and Conversational Memory abstract slides

Unlocking Knowledge: Building a Multi-Document Q&A Bot with LLM, Embeddings, and Conversational Memory
Ali Ozbakir

Conducting a comprehensive literature review in research domains often involves navigating an extensive corpus of PDF documents, leading to challenges in information retrieval and synthesis. Contradictions and nuanced perspectives within various sources can complicate the process, demanding a more efficient and interactive approach.
In this initial attempt, I explore a Large Language Model solution that integrates LangChain, OpenAI, and ChromaDB to streamline literature review workflows. Acknowledging token limitations, I address bottlenecks by embedding multiple PDF documents into queries, employing techniques like splitting and chunking. To facilitate this, a Python script processes the research-oriented documents, for extraction and a splitting to break down content into manageable and partially overlapping chunks. These chunks are then embedded using OpenAI's GPT-based model ("text-embedding-ada-002") via a vector database. The resulting embeddings are stored for persistence, creating a foundation for real-time interaction and dynamic information synthesis within the research corpus.
Despite token limitations, this retrieval augmented approach demonstrates promising strides in enhancing the efficiency and effectiveness of literature reviews in research applications. By integrating multiple PDF documents into queries and employing advanced chunking and embedding techniques, researchers can better manage, understand, and synthesize information from diverse sources, paving the way for further advancements in scientific research communication.
95. 12 December Clara Maathuis, Stefano Bromuri, Greg Alpár Evidence-based Software Engineering with LLMs: a case study about inclusivity abstract slides

Evidence-based Software Engineering with LLMs: a case study about inclusivity
Clara Maathuis, Stefano Bromuri, Greg Alpár

Software affects nearly all aspects of our lives. Software engineering, encompassing the entire software lifecycle from conception through design and development to maintenance and continuous improvement, is significantly influenced by emerging technologies such as large language models (LLMs) and generative artificial intelligence (GenAI), often epitomized by ChatGPT. These advancements have the potential to substantially change various aspects of the software development process. Given their impact, there is a growing consideration within the industry to delegate tasks to LLMs and GenAI rather than relying solely on human experts. However, the reliance on these tools raises concerns about addressing historical biases embedded in their training data. Our research focuses on assessing the impact of LLMs on software engineering, particularly emphasizing inclusivity. More specifically, we explore how dyslexia affects learning to read with software assistance. Leveraging literature reviews and comprehensive LLM prompt experiments, we share our current findings and invite audience participation to advance our research agenda. We extend our invitation to this OUrsi presentation not only to computer scientists, artificial intelligence researchers, and software engineers but also to experts in psychology and other fields interested in dyslexia and various learning difficulties and disabilities.
94. 21 November, 11:30 - 12:30 Natasha Alechina Reinforcement learning to logical specifications abstract slides

Natasha Alechina
Reinforcement learning to logical specifications

I will talk about an approach to safe reinforcement learning, which we call Pure Past Action Masking (PPAM). In PPAM, actions are disallowed (``masked'') according to specifications expressed in Pure-Past Linear Temporal Logic (PPLTL). PPAM can enforce non-Markovian constraints, i.e., constraints based on the history of the system, rather than just the current state of the (possibly hidden) MDP. The features used in the safety constraint need not be the same as those used by the learning agent, allowing a clear separation of concerns between the safety constraints and reward specifications of the (learning) agent. We prove that an agent trained with PPAM can learn any optimal policy that satisfies the safety constraints, and that they are as expressive as shields, another approach to enforce non-Markovian constraints in RL. This talk is based on joint work with Giovanni Varricchione, Giuseppe De Giacomo, Mehdi Dastani, Brian Logan, and Giuseppe Perelli.
93. 7 November Gideon Mailette de Buy Wenniger MultiSChuBERT: Effective Multimodal Fusion for Scholarly Document Quality Prediction abstract slides

MultiSChuBERT: Effective Multimodal Fusion for Scholarly Document Quality Prediction
Gideon Mailette de Buy Wenniger

Automatic assessment of the quality of scholarly documents is a difficult task with high potential impact. Multimodality, in particular the addition of visual information next to text, has been shown to improve the performance on scholarly document quality prediction (SDQP) tasks. We propose the multimodal predictive model MultiSChuBERT. It combines a textual model based on chunking full paper text and aggregating computed BERT chunk-encodings (SChuBERT), with a visual model based on Inception V3. Our work contributes to the current state-of-the-art in SDQP in three ways. First, we show that the method of combining visual and textual embeddings can substantially influence the results. Second, we demonstrate that gradual-unfreezing of the weights of the visual sub-model, reduces its tendency to ovefit the data, improving results. Third, we show the retained benefit of multimodality when replacing standard BERTBASE embeddings with more recent state-of-the-art text embedding models.
Using BERTBASE embeddings, on the (log) number of citations prediction task with the ACL-BiblioMetry dataset, our MultiSChuBERT (text+visual) model obtains an R2 score of 0.454 compared to 0.432 for the SChuBERT (text only) model. Similar improvements are obtained on the PeerRead accept/reject prediction task. In our experiments using SciBERT, scincl, SPECTER and SPECTER2.0 embeddings, we show that each of these tailored embeddings adds further improvements over the standard BERTBASE embeddings, with the SPECTER2.0 embeddings performing best.

92. 23 Oct Anshuman Mohan Formal Abstractions for Packet Scheduling abstract slides

Formal Abstractions for Packet Scheduling
Anshuman Mohan

Early programming models for software-defined networking (SDN) focused on basic features for controlling network-wide forwarding paths, but more recent work has considered richer features, such as packet scheduling and queueing, that affect performance. In particular, PIFO trees, proposed by Sivaraman et al., offer a flexible and efficient primitive for programmable packet scheduling. Prior work has shown that PIFO trees can express a wide range of practical algorithms including strict priority, weighted fair queueing, and hierarchical schemes. However, the semantic properties of PIFO trees are not well understood.
This paper studies PIFO trees from a programming language perspective. We formalize the syntax and semantics of PIFO trees in an operational model that decouples the scheduling policy running on a tree from the topology of the tree. Building on this formalization, we develop compilation algorithms that allow the behavior of a PIFO tree written against one topology to be realized using a tree with a different topology. Such a compiler could be used to optimize an implementation of a PIFO trees, or realize a logical PIFO tree on a target with a fixed topology baked into the hardware. To support experimentation, we develop a software simulator for PIFO trees, and we present case studies illustrating its behavior on standard and custom algorithms.

91. 10 October Mina Alishahi Privacy-preserving data analysis abstract slides

Privacy-preserving data analysis
Mina Alishahi

Privacy and fairness are foundational principles in the realm of trustworthy AI. In the context of machine learning, privacy forms the theoretical and methodological bedrock, facilitating the development of algorithms and methodologies that empower data insights while upholding the rights and expectations of data subjects. Fairness, on the other hand, represents equity and justice, ensuring that AI algorithms treat all individuals impartially and without discrimination. In a world where AI plays a pivotal role in decisions spanning from loans to hiring, fairness ensures that these decisions remain unbiased, transparent, and just. In this presentation, we delve into the integration of privacy in clustering algorithms and feature selection, and we explore emerging avenues in privacy-preserving machine learning focusing on the aspect of fairness.
90. 4 July Parsa Karimi The AUTOLINK project abstract

The AUTOLINK project
Parsa Karimi

In this presentation, I will introduce myself, touching upon my academic and industrial background. Following that, I will discuss the role I will occupy the next few years: as a PhD student in the AUTOLINK project. I will discuss the overall project goals and objectives, and give some initial ideas of how Testar can be of value in this project. Finally, I will focus on the details of my contributions in the project and discuss some initial ideas.
89. 20 June Niels Doorn Enhancing Software Testing Education abstract slides

Understanding the Sensemaking Process in Test Case Design: Enhancing Software Testing Education
Niels Doorn

Software testing is a widely used technique for quality assurance in the software industry. However, in computer science education, software testing is often neglected, and students struggle to effectively test their software. Teaching software testing is challenging, as it requires students to allocate multiple cognitive resources simultaneously. Despite various attempts to address this issue, progress in improving software testing education has been limited. To enhance pedagogical approaches in software testing, it is crucial to gain a deeper understanding of the sensemaking process that occurs when students design test cases. In an initial exploratory study, we identified four distinct sensemaking approaches employed by students during test model creation. Building upon these findings, we conducted a follow-up study with 50 students from a large university in Spain. In this study, we provided the participants with a specialized web-based tool for modeling test cases. They were tasked with creating test models based on given descriptions of test problems. We evaluated the fit between the models and the test problems, examined the sensemaking processes employed by students, and gathered their perspectives on the assignment. To capture comprehensive data, we collected textual, graphical, and video data, which were analyzed using an iterative inductive analysis process. The insights gained from our study shed light on the sensemaking processes involved in test case modeling. We refined our previous findings and identified new sensemaking approaches. These results have significant implications for influencing the sensemaking process in software testing education. By addressing potential misconceptions and fostering desired mental models for test case design, we can improve the effectiveness of software testing education. Our findings provide a foundation for further research and exploration in this domain. By gaining a deeper understanding of the sensemaking process, we can develop interventions and pedagogical strategies to enhance software testing education and equip students with the necessary skills for effective software testing.
88. 13 June Lianne Hufkens Grammar-based action selection rules for scriptless testing abstract slides

Grammar-based action selection rules for scriptless testing
Lianne Hufkens

Scriptless testing at the GUI level involves generating test sequences on the fly. These test sequences consist of user interactions on the GUI. The creation of these sequences is based on action selection. The most commonly used action selection rules are based on stochastic methods. This presentation covers a new approach based on a grammar that allows for the specification of action selection rules, which enables mimicking user behaviour when interacting with web applications. While grammars have been used in software testing to generate input data for test cases, this approach utilizes grammars to specify action selection rules to generate test sequences. The effectiveness of this approach has been evaluated by applying it to filling web forms when randomly testing with the TESTAR tool.
87. 23 May Joshua & Tobias Veni interview Thunderdome abstract

Veni interview Thunderdome
Joshua & Tobias

Joshua and Tobias will both present their 5-minute presentations and welcome excruciating audience scrutiny in a 15 minute interview round. This all in order to help prepare them for their Veni interviews.
86. 9 May Daniel Stanley Tan Mask-controlled generative adversarial network for image retargeting abstract slides

Mask-controlled generative adversarial network for image retargeting
Daniel Stanley Tan

Image retargeting aims to resize a photograph without distorting its content. Recent solutions leverage generative adversarial networks (GANs) to learn an image's internal distribution. Patches are then replicated and generated to fill the target aspect ratio to preserve the internal distribution. Although these approaches are somewhat successful, they do not have a sense of image semantics or the image's contents, causing semantic errors during generation (e.g., a person with two heads). Our model addresses this by allowing user intervention. The users can preserve the objects they desire through user-defined masks. Our model enforces the masked object to appear at the user-defined location. It also utilizes a de-association regularizer to loosen the association of the selected object with its surroundings. Our method prevents object distortion and enables the user to remove or relocate the object from the input image while allowing the user to set its target size.
85. 18 April Jesse Heyninck Algebraic foundations of hybrid artificial intelligence abstract

Algebraic foundations of hybrid artificial intelligence
Jesse Heyninck

For the robustness and trustability of AI, the presence of an internal model of the world as the basis of reasoning and decision making is essential. For example, the absence of such a model is arguably the root of many reported problems of recent show ponies of AI such as chatGPT and BERT. These approaches have been dubbed model-agnostic AI, as they essentially leverage computation and large datasets to memorize training examples and interpolate patterns within these datasets to very similar cases without any notion of model of the studied domain. This approach is often contrasted with model-based or symbolic AI where a formal representation of knowledge is at the core of any formalism. A lot of the research in model-based AI is, not surprisingly, dependent on the modelling language. What this means is that many important advances made in this domain have stayed confined to specific formalisms. Examples include important contributions to the scalability, dynamism and integration of sub-symbolic approaches. I will present a proposal for a top-down methodology for combining symbolic with sub-symbolic AI. This gives a general answer to challenge of how to combine data-driven machine-learning approaches in AI with knowledge-based approaches while retaining the strength of these complementary methodologies. In order to ensure high-quality AI-components, we will also develop techniques to ensure scalability and explainability.
84. 28 March Tobias Kappé Completeness and the Finite Model Property for Kleene Algebra, Reconsidered abstract slides

Completeness and the Finite Model Property for Kleene Algebra, Reconsidered
Tobias Kappé

Kleene Algebra (KA) is the algebra of regular expressions. Central to the study of KA is Kozen's (1994) completeness result, which says that any equivalence valid in the language model of KA follows from the axioms of KA. Also of interest is the finite model property (FMP), which says that false equivalences always have a finite counterexample. Palka (2005) showed that, for KA, the FMP is equivalent to completeness.
We provide a unified and elementary proof of both properties. In contrast with earlier completeness proofs, this proof does not rely on minimality or bisimilarity techniques for deterministic automata. Instead, our approach avoids deterministic automata altogether, and uses Antimirov's derivatives and the well-known transition monoid construction.
Our results are fully verified in the Coq proof assistant.

83. 14 March Simone Tummers (OU, gezondheidspsychologie) investigating behaviour change using a Bayesian network approach abstract slides

investigating behaviour change using a Bayesian network approach
Simone Tummers

BACKGROUND: Physical activity (PA) is known to be beneficial for health, but adherence to international PA guidelines is low across different subpopulations. Interventions have been designed to stimulate PA of different target groups by influencing relevant psycho-social determinants, essentially based on a combination of the Integrated Model for Change, the Theory of Planned Behaviour, its successor the Reasoned Action Approach and the self-determination theory. The current study investigates the pathways through which interventions influence PA. Further, gender differences in pathways of change are studied. METHODS: An integrated dataset of five different randomised controlled trial intervention studies is analysed by estimating a Bayesian network. The data include measurements, at baseline and at 3, 6 (short-term), and 12 (long-term) months after the baseline, of important socio-cognitive determinants of PA, demographic factors, and PA outcomes. A fragment is extracted from the Bayesian network consisting of paths between the intervention variable, determinants, and short- and long-term PA outcomes. For each relationship between variables, a stability indicator and its mutual information are computed. Such a model is estimated for the full dataset, and in addition such a model is estimated based only on male and female participants' data to investigate gender differences. RESULTS: The general model (for the full dataset) shows complex paths, indicating that the intervention affects short-term PA via the direct determinants of intention and habit and that self-efficacy, attitude, intrinsic motivation, social influence concepts, planning and commitment have an indirect influence. The model also shows how effects are maintained in the long-term and that previous PA behaviour, intention and attitude pros are direct determinants of long-term PA. The gender-specific models show similarities as well as important differences between the structures of paths for the male- and female subpopulations. For both subpopulations, intention and habit play an important role for short-term effects and maintenance of effects in the long-term. Differences are found in the role of self-efficacy in paths of behaviour change and in the fact that attitude is relevant for males, whereas planning plays a crucial role for females. The average of these differences in subpopulation mechanisms appears to be presented in the general model. CONCLUSIONS: While previous research provided limited insight into how interventions influence PA through relevant determinants, the Bayesian network analyses show the relevance of determinants mentioned by the theoretical framework. The model clarifies the role that different determinants play, especially in interaction with each other. The Bayesian network provides new knowledge about the complex working mechanism of interventions to change PA by giving an insightful overview of influencing paths. Furthermore, by presenting subpopulation-specific networks, the difference between the influence structure of males and females is illustrated. These new insights can be used to improve interventions in order to enhance their effects. To accomplish this, we have developed a new methodology based on a Bayesian network analysis which may be applicable in various other studies.
82. 21 February Ton Poppe GraphCase: Structural Node Embeddings beyond Node Properties abstract slides

GraphCase: Structural Node Embeddings beyond Node Properties
Ton Poppe

Graph embeddings typically focus on the graph structure and thereby miss out on other aspects of graphs, such as edge attributes. However, many real-world settings can be modeled as graphs with edge attributes, e.g., roles in an organiza- tional hierarchy or bank transfers in the financial industry. In this paper, we present GraphCase, a novel role-oriented node embedding algorithm that is inductive and scalable for attributed di- rected graphs. GraphCase effectively includes edge information in node embeddings and is designed to capture the structural similarity of nodes. We test GraphCase in an artificial setting (barbell and ring graphs) as well as apply GraphCase to real-world data (Enron e-mail dataset and BZR chemical compounds dataset) showing its superior performance. The derived abstract em- beddings can be made transparent with local sen- sitivity analysis allowing for explainable AI in downstream tasks.
81. 12 January Alaaeddin Swidan Hedy in Arabic abstract slides

Hedy in Arabic
Alaaeddin Swidan

Hedy is a gradual programming language that aims to teach kids programming by building up concepts and syntax step by step. Hedy was launched in early 2020 and since then over 2 million Hedy programs have been created worldwide. From its inception Hedy was multi-lingual in its UI, for example in buttons and error messages. But to be truly inclusive, kids must be able to program in their own natural language! Thus, since early 2022, Hedy also supports keywords in multiple languages, including Dutch, French, Spanish, Hindi and Arabic. In this presentation we provide an overview of the challenges we met and the changes that were necessary to make Hedy work multilingually, with a special emphasis on Arabic, which include tricky changes to the website, parser, the syntax highlighter.
80. 12 January Murat Firat Embedding ML models into Linear Optimization abstract slides

Embedding ML models into Linear Optimization
Murat Firat

In this presentation, we describe a case study on embedding machine learning into linear optimization in order to optimize expected project scheduling.
 

2022

79. 13 December Marloes Venema (RU) Attaining Basically Everything in Attribute-Based Encryption -- Simplifying the Design of Practical Schemes via Pair Encodings abstract slides

Attaining Basically Everything in Attribute-Based Encryption -- Simplifying the Design of Practical Schemes via Pair Encodings
Marloes Venema (RU)

Attribute-based encryption (ABE) cryptographically implements fine-grained access control on data. In particular, data can be stored by an entity that is not necessarily trusted to enforce access control. Instead, access control can be externally enforced by a trusted authority. For this reason, pairing-based ABE has been considered for various use cases, e.g., cloud settings, which require the ABE scheme to satisfy several desirable properties. Additionally, the scheme should be efficient enough for the setting that is considered. However, measuring the efficiency of schemes is difficult, and may provide an inaccurate view on which schemes are efficient. Furthermore, schemes that can satisfy all desirable properties may be generally less efficient than schemes that satisfy fewer properties. In this talk, I will give an overview of some use cases and the properties that are desirable. I will also give an overview of the pair encodings framework, which is a common abstraction of many pairing-based ABE schemes. We use it to set up a framework, called ABE Squared, for analyzing the efficiency of schemes more accurately than was previously done, and eventually, to design schemes that can satisfy all properties with the required efficiency.
78. 15 November Tobias Kappé Reasoning about program equivalence using (Prob)GKAT abstract slides

Reasoning about program equivalence using (Prob)GKAT
Tobias Kappé

Which refactoring operations never change the meaning of an imperative program? Is there a set of equivalences that can be used to prove such correctness claims? Can we automatically decide whether a certain equivalence is valid? This talk will give a tour of Guarded Kleene Algebra with Tests (GKAT), a (co)algebraic framework that presents positive answers to these questions, as well as a probabilistic extension called ProbGKAT.
This is joint work with Nate Foster, Dexter Kozen, Justin Hsu, Wojciech Rozowski, Todd Schmid, Steffen Smolka, and Alexandra Silva.
77. 1 November Hugo Jonker Are some prices more equal than others? Spoiler: yes. abstract slides

Are some prices more equal than others? Spoiler: yes.
Hugo Jonker

Is the price you're seeing fair? That's a perennial question. One factor that may undermine price fairness is price differentiation -- asking different prices for the same item. This need not violate fairness, e.g., prices go up when the item becomes more scarce (e.g., flight tickets).
Price differentiation does not require nor enforce fairness. Indeed, it is nowadays common for online shops to offer different views on their items (e.g., desktop site, mobile site, country-specific sites, etc). Whether these preserve price fairness has hitherto not been investigated. Our goal is to evaluate whether using a different platform affects item price.
We devise a method to detect such platform-based price differentiation and perform a case study of platform effects on flight pricing. We acquire pricing data of flights by synchronised scraping of multiple platforms (mobile app, desktop site for different countries, mobile site) over 45 days, gathering extra item attributes to be able to determine item equivalence across different platforms. We analyse the collected data in various ways, comparing between platforms but also providing a holistic view on the acquired data.
Our results show no differences between platforms for most investigated companies, though in some cases, using a mobile phone is frequently more expensive.

76. 18 October Diego Velázquez (UNAM) Pattern models: A Dynamic Epistemic Logic For Distributed System Analysis abstract slides

Pattern models: A Dynamic Epistemic Logic For Distributed System Analysis
Diego Velázquez (UNAM)

We introduce pattern models, a dynamic epistemic logic designed for analyzing distributed systems. First, we present a version of the pattern models where the full-information protocol, deeply studied in distributed computability, is static in the product definition of pattern models. Latter, we parametrized such logic to add the capability to model dynamics of arbitrary protocols. We give a systematic construction of pattern models for a large variety of distributed-computing models called dynamic-network models. For a proper subclass of dynamic-network models called oblivious, the communication pattern model remains the same throughout the computation so that an oblivious distributed-computing model can be described using constant space.
75. 4 October Ella Roubtsova Constraint Formalization for Automated Assessment of Enterprise Models abstract slides

Constraint Formalization for Automated Assessment of Enterprise Models
Ella Roubtsova

Enterprises always do their business within some restrictions. In a team of enterprise architects, the restrictions are transformed into the modelling conventions and the corresponding modelling constraints that should be consistently applied across all enterprise models. This paper presents an approach for refining and formalizing modeling conventions into modelling constraints and using them for assessment of enterprise models by a software component called ArchiChecker. The specifics of the proposed approach is that the modeling conventions are first visualized and formalized using the types of elements and relationships of the Archi- Mate modeling language, that is also used for modelling of enterprise views. The ArchiMate elements and relationships serve as types to formulate constraints. The elements and relationships in an ArchiMate model are instances of the ArchiMate elements and relationships. Using these types and instances the ArchiChecker automatically generates the lists of violations of modeling conventions in the enterprise models. Each violation shows how a specific enterprise view deviates from a given modeling convention. The paper reports a case study of application of the proposed approach to enterprise modelling views and modelling conventions used in a medical center. The case study is used to discuss the added value of formalization and automated assessment of modelling constraints in enterprise modelling.
74. 20 September Stijn de Gouw Verifying JDK's identity hashmap implementation abstract slides

Verifying JDK's identity hashmap implementation
Stijn de Gouw

Hash maps are a common and important data structure in efficient algorithm implementations. Despite their wide-spread use, real-world implementations are not regularly verified. We present the first case study of the IdentityHashMap class in the Java JDK. We specified its behavior using the Java Modeling Language (JML) and proved correctness for the main insertion and lookup methods with KeY, a semi-interactive theorem prover for JML-annotated Java programs. Furthermore, we report how unit testing and bounded model checking can be leveraged to find a suitable specification more quickly. We also investigated where the bottlenecks in the verification of hash maps lie for KeY by comparing required automatic proof effort for different hash map implementations and draw conclusions for the choice of hash map implementations regarding their verifiability.
73. 24 June Ashish Sai Promoting rigour in Blockchain's energy & environmental footprint research abstract slides

Promoting rigour in Blockchain's energy & environmental footprint research
Ashish Sai

Harald made the national news with his research on the energy consumption of bitcoin. Various other researchers also investigated this issue, ending up with wildly different estimates. In this talk, I investigate the underlying assumptions and discuss how some assumptions strongly impact these estimates: small changes in an assumption leading to huge changes in the resulting estimate. Finally, I propose a set of guidelines to promote rigour in scientific estimates of environmental footprint research.
72. 24 June Daniel Stanley Tan Anomaly detection, image forensics, and creative AI abstract slides

Anomaly detection, image forensics and creative AI
Daniel Stanley Tan

In this talk, I present an overview of my research in three AI-related areas: anomaly detection, image forensics, and creative AI.
Anomaly detection
I will discuss defect detection, where small changes from the expected baseline in images need to be automatically recognised. A second topic in this vein is the automated detection of crop pests and diseases from images taken by drones. This would allow large farms to far better ascertain the current health status of their crops and take appropriate targeted measures.
Image forensics
We dive into detection of faked images. The underlying principle is that for each image, its processing pipeline results in a spatial signature. When combining (parts of) multiple images, these spatial signatures are misaligned with respect to each other, indicating a forgery.
Creative AI
We discuss ways to learn the style of images (e.g., Van Gogh's "Starry Night") and then apply that to other images (e.g., photos of your holiday).
71. 14 June Jesse Heyninck Lexicographic Entailment, Syntax Splitting and the Drowning Problem abstract slides

Lexicographic Entailment, Syntax Splitting and the Drowning Problem
Jesse Heyninck

Lexicographic inference [Lehmann, 1995] is a well-known and popular approach to reasoning with non-monotonic conditionals. It is a logic of very high-quality, as it extends rational closure and avoids the so-called drowning problem. It seems, however, this high quality comes at a cost, as rea- soning on the basis of lexicographic inference is of high computational complexity. In this paper, we show that lexicographic inference satisfies syntax splitting [Kern-Isberner et al., 2020], which means that we can restrict our attention to parts of the belief base that share atoms with a given query, thus seriously restricting the computational costs for many concrete queries. Furthermore, we make some observations on the relationship between c- representations and lexicographic inference, and reflect on the relation between syntax splitting and the drowning problem.
70. 31 May Stefano Schivo Convinces biologists that formal verification is actually a thing, hilarity ensues abstract slides

Convinces biologists that formal verification is actually a thing, hilarity ensues
Stefano Schivo

We often hear people talk about "Multidisciplinary Research": regrettably, sometimes this leads us to think "buzz-word". But what happens in practice when people from different disciplines (try to) collaborate? What does this look like? What kind of challenges and rewards can we expect? With this talk I want to provide some reflection points on multidisciplinary collaborations. And maybe even give you some confidence to try working with "outsiders" yourself, who knows? This presentation is centered around my personal experience in collaborating with biologists. I will show off some interesting results, and reflect about what we learned from each other during this time. Potential generic advice and considerations are mainly left to the public.
69. 17 May Freek Verbeek Formally Verified Lifting of C-compiled x86-64 Binaries abstract slides

Formally Verified Lifting of C-compiled x86-64 Binaries
Freek Verbeek

Lifting binaries to a higher-level representation is an essential step for decompilation, binary verification, patching and security analysis. In this paper, we present the first approach to provably overapproximative x86-64 binary lifting. A stripped binary is verified for certain sanity properties such as return address integrity and calling convention adherence. Establishing these properties allows the binary to be lifted to a representation that contains an overapproxima- tion of all possible execution paths of the binary. The lifted representation contains disassembled instructions, reconstructed control flow, invariants and proof obligations that are sufficient to prove the sanity properties as well as correctness of the lifted representation. We apply this approach to Linux Foundation and Intel's Xen Hypervisor covering about 400K instructions. This demonstrates our approach is the first approach to provably overapproximative binary lifting scalable to commercial off-the-shelf systems. The lifted representation is exportable to the Isabelle/HOL theorem prover, allowing formal verification of its correctness. If our technique succeeds and the proofs obligations are proven true, then -- under the generated assumptions -- the lifted representation is correct.
68. 10 May Harald Vranken Detection of Botnets via DGA-Generated Domain Names abstract slides

Detection of Botnets via DGA-Generated Domain Names
Harald Vranken

Botnets often apply domain name generation algorithms (DGAs) to evade detection by generating large numbers of pseudo-random domain names of which only few are registered by cybercriminals. In this paper, we address how DGA-generated domain names can be detected by means of machine learning and deep learning. We first present an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting DGA-generated domain names. We observe that a common methodology is still missing and, due to use of different datasets, experimental results can hardly be compared. We next propose the use of TF-IDF to measure frequencies of the most relevant n-grams in domain names, and use these as features in learning algorithms. We perform experiments with various machine-learning and deep-learning models using TF-IDF features, of which a deep MLP model yields the best results. For comparison, we also apply an LSTM model with embedding layer to convert domain names from a sequence of characters into a vector representation. The performance of our LSTM and MLP models is rather similar, achieving 0.994 and 0.995 AUC, and average F1-scores of 0.907 and 0.891 respectively.
67. 19 April Luc Edixhoven Balanced ω-regular languages abstract slides

Balanced ω-regular languages
Luc Edixhoven

Last time around I gave a talk about balanced regular languages: regular languages over an alphabet of different types of opening and closing parentheses, where the parentheses must be properly paired in some way (cf. the Dyck language). I showed how we can express precisely all these languages by enriching regular expressions with a parameterised shuffle operator. This time I will talk about how we adapted our previous constructions and proofs to languages of infinite words, formally known as omega-languages. While undoubtedly fun, infinity (rather expectedly) introduces additional complications. The visual analogy using jigsaw puzzle pieces makes a triumphant return.
66. 15 March Sung-Shik Jongmans A Predicate Transformer for Choreographies abstract slides

A Predicate Transformer for Choreographies
Sung-Shik Jongmans

Construction and analysis of distributed systems is difficult; choreographic programming is a deadlock-freedom-by-construction approach to simplify it. In this talk, I present a new theory of choreographic programming. It supports for the first time: construction of distributed systems that require decentralised decision making (i.e., if/while-statements with multiparty conditions); analysis of distributed systems to provide not only deadlock freedom but also functional correctness (i.e., pre/postcondition reasoning). Both contributions are enabled by a single new technique, namely a predicate transformer for choreographies. This is joint work with Petra van den Bos (University of Twente); our paper has been accepted for publication in ESOP 2022.
65. 22 February Frouke Hermens Predicting choice from eye movements abstract slides

Predicting choice from eye movements
Frouke Hermens

This talk will discuss ongoing work on whether eye movements reveal a decision maker's final decision. The data-set under consideration examined decisions on whether or not to get screened for Chlamydia (for women, a disease with few symptoms, but with a risk of serious complications when undetected). The set-up is relevant for current acute questions such as whether to get a booster vaccine against Covid-19. Various machine learning (classification) models were fitted on information about what was shown on the screen, regions fixated and what was shown in the regions fixated. Classification performance was not improved by including eye tracking data and was at a relatively low 71% correct, only marginally above the base rate of 69.5% accept decisions. I would like to discuss what these results means for publication of machine learning outcomes (only publish when you achieve good classification?) and how many attempts at fitting models are needed before concluding there may be insufficient information in the data for accurate classification.
64. 11 January Hugo Jonker Ethics in CS research: a tale of two failures abstract slides

Ethics in CS research: a tale of two failures
Hugo Jonker

In 2021, two different CS research efforts stirred up quite some hullabaloo for what was considered to be ethics failures on part of the researchers and their institutions.

In this talk, I discuss both cases (the "Hypocrite Commits" S&P'21 paper and the Princeton & Radboud CCPA study), identifying the aspects under dispute thanks to the power of hindsight. Moreover, I (briefly) discuss the 1978 US Belmont Report on ethical guidelines for Human Subject research, and the Menlo report describing an ethical framework for research involving ICT. Finally, I discuss deception research, a type of research where subjects are deliberately misinformed about the nature of the research. This type of research has already been recognised, in other fields, as requiring extra care for the subjects and preferably avoided.

 

2021

63. 2 November Mo Liu, University of Lorraine Expressivity of some versions of APAL abstract slides

Expressivity of some versions of APAL
Mo Liu, University of Lorraine

Arbitrary public announcement logic (APAL) is a logic of change of knowledge with modalities representing quantification over announcements. It extends public announcement logic (PAL). We present three rather different versions of APAL: FSAPAL only quantifies over announcements containing a finite subset of all propositional variables. SCAPAL only quantifies over announcements containing variables occurring in the formula bound by the quantifier. IPAL quantifies over announcements implying a given formula. We determine the relative expressivity of FSAPAL, SCAPAL and IPAL in relation to APAL and PAL.
62. 2 November Marta Gawek, University of Lorraine Dynamic Epistemic Separation Logic abstract slides

Dynamic Epistemic Separation Logic
Mart Gawek, University of Lorraine

We investigate extensions of separation logic with epistemic and dynamic epistemic modalities. Separation logics are based on the intuitionistic logic of bunched implications (BI) or its classical counterpart Boolean BI. These logics combine additive and multiplicative connectives in the language, expressing the notions of resource composition and resource decomposition. Dynamic Epistemic Separation Logic (DESL) is a generalization of the Public Announcement Separation Logic (PASL) of Courtault et al. We present the syntax and semantics of DESL as well as reduction axioms for the elimination of dynamic modalities.
61. 2 November Krisztina Fruzsa, TU Wien Knowledge-based analysis of the Firing Rebels problem abstract slides

Knowledge-based analysis of the Firing Rebels problem
Krisztina Fruzsa, TU Wien

We provide an epistemic analysis of a simple variant of the fundamental consistent broadcasting primitive for byzantine fault-tolerant asynchronous distributed systems. Our Firing Rebels with Relay (FRR) primitive enables agents in the system to trigger an action (FIRE) at all correct agents, in an all-or-nothing fashion. By using our epistemic reasoning framework developed for byzantine fault-tolerant distributed systems, we discover the necessary and sufficient state of knowledge that needs to be acquired by an individual correct agent in order to FIRE according to the specification of the problem. It involves eventual common hope (a modality related to belief), which we show to be attained already by achieving eventual mutual hope in the case of FRR.
60. 5 October Hugo Jonker HLISA: a Human-Like Simulation API for webpage interaction abstract slides

HLISA: a Human-Like Simulation API for webpage interaction
Hugo Jonker

Automated browsers (web bots) are an invaluable tool for studying the web. However, research has shown that web bots can bedistinguished from regular browsers, and that they may be served different content as a consequence. This undermines their utility as a measurement tool. So far, three methods have been used to detectweb bots: browser fingerprint, order of site traversal, and aspects of page interaction. While site traversal depends on the study being executed, the other two aspects can be controlled in a generic fashion. The identifiability of web bot fingerprints has been extensively studied, but how to alter the fingerprint has received less attention. In this paper, we study which method to alter the fingerprint incurs the least side effects. Secondly, we provide an initial investigation of how the interaction API of Selenium differs from human interaction. We incorporate these results into HLISA, an API that simulates interaction like humans. Finally, we discuss the conceptual armsrace between simulators and detectors and find that conceptually, detecting HLISA requires modelling human interaction.
59. 7 September Fabian van den Broek Getting attribute-based signatures in the legal framework abstract slides

Getting attribute-based signatures in the legal framework
Fabian van den Broek

Digital signatures were invented in the 70s. In The Netherlands, digital signatures have been recognised by law since the 2000s. A special variant of digital signatures, attribute-based signatures, was first proposed in 2010. Since then, various studies have explored the possibilities of attribute signatures and enriched their application area. For example, attribute-based signatures are incorporated into the IRMA framework to enable non-interactive proofs of attributes. This has brought attribute-based signatures (literally) into the hands of many ordinary citizens. However, to date, attribute-based signatures are not specifically mentioned in the law. In this presentation, we discuss the advantages and limits of using attribute-based signatures by means of several case studies. Based on these case studies, we analyze whether and to what extent attribute-based signatures fit within current legal regulations. We consider both regular attribute-based signatures and anonymous attribute-based signatures. The latter are of particular interest, as the law defines signatures as containing an identifying aspect -- which is absent in these anonymous signatures.
58. 7 July Gideon Maillette de Buy Wenniger Efficient deep learning using richer context, with applications to handwriting recognition and scholarly document quality prediction abstract slides

Efficient deep learning using richer context, with applications to handwriting recognition and scholarly document quality prediction
Gideon Maillette de Buy Wenniger

Using richer context is one of the driving forces behind many advances in the performance of deep learning models, and before the advent of deep learning, it has been for other machine learning approaches also. My interest in richer context for machine learning models goes back to my PhD [1,2,3], in which reordering context labels were applied to improve translations produced by a hierarchical statistical machine translation model. More recently I worked on neural handwriting recognition [4], in which multi-dimensional long short-term memories (MDLSTMs) are applied to encode images while considering the full context of all other pixels in the image. In this application computational efficiency is crucial, and I developed various new strategies to increase it. The second application I will cover is scholarly document quality prediction. In this task, some property indicative of or correlated to quality is predicted from the textual and/or visual contents of a scholarly document alone. This property can be 1) the acceptance/rejection of a scientific paper in a journal or major conference, or 2) the log of the number of citations. The latter property, while imperfect as a notion of quality, is attractive because of its availability for huge amounts of scientific publications. I will discuss two new approaches presented in my recent work, that leverage ways to increase context for quality prediction models. The first approach [5] does so by incorporating document structure information to prediction, and the second approach [6] by full text encoding facilitated by the use of a BERT+GRU model in combination with chunking of the full document text. Using these approaches, new state-of-the art results were achieved for accept/reject prediction and log number of citation prediction tasks. I will end my talk with an outlook on my current ongoing work to make scholarly document quality prediction more explainable using saliency maps and other explainable deep learning methods. The work on hierarchical statistical machine translation, only mentioned briefly, is joint work together with my former PhD supervisor Khalil Sima'an. The work on handwriting recognition is joint work with Lambert Schomaker and Andy Way. Finally, the work on scholarly document quality prediction is joint work with Thomas van Dongen, Lambert Schomaker, Elerei Aedmaa, Herbert Kruitbosch, and Edwin Valentijn. See the below list of references for more information about the research.
57. 22 June Hans van Ditmarsch Gossip and Knowledge abstract slides

Gossip and knowledge
Hans van Ditmarsch

A well-studied phenomenon in network theory since the 1970s are optimal schedules to distribute information by one-to-one communication between nodes that are connected in the network. One can take these communicative actions to be telephone calls, and protocols to spread information this way are known as gossip protocols or epidemic protocols. A common abstraction is to call the information of each agent its secret, and that the goal of information dissemination is that all agents know all secrets: that is the termination condition of gossip protocols. Following investigations assuming a global scheduler, it is now typically assumed that gossip protocols are distributed in some way, where the only role of the environment is to ensure randomization. Statistical approaches to gossip have taken a large flight since then, wherein network topology is an important parameter. In epistemic gossip protocols, an agent (node) will call another agent not because it is so instructed by a scheduler, or at random, but based on its knowledge or ignorance of the distribution of secrets over the network and of other agents' knowledge or ignorance of that. One such protocol requires that an agent may only call another agent if it does not know the other agent's secret. Epistemic features of gossip protocols may affect their termination, the (order of complexity) expectation of termination, their reachability (what distributions of secrets may occur before all agents know all secrets), and so on. Variations involve agents exchanging telephone numbers in addition to agents exchanging secrets (which results in network expansion), or agents exchanging knowledge about secrets; we may also assume common knowledge of the protocol; further generalizations would involve multi-casts. We present a survey of distributed epistemic gossip protocols.
56. 15 June Joshua Moerman Weighted Register Automata abstract slides

Weighted Register Automata
Joshua Moerman

Since I am new at the open universiteit, I will first introduce myself. After that I will present recent research, which will be presented at LICS 2021. Register automata are an extension of automata that deal with data, possibly from an infinite domain. Many algorithms from classical automata theory remain to work on deterministic register automata. However, the nondeterministic register automata are strictly more expression and some properties become undecidable. Motivated by a recent result that the subclass of unambiguous register automata has decidable equivalence, we investigate weighted register automata, the common generalisation of register automata and weighted automata. These include unambiguous register automata. We show that equivalence is also decidable for these automata. In our view, the main contribution is the development of the mathematical theory of so-called orbit-finitely spanned vector spaces, on which our decision procedure is based. This is joint work with Mikołaj Bojańczyk and Bartek Klin.
55. 25 May Nico Naus Exploit Logic abstract slides

Exploit Logic
Nico Naus

Automatic exploit generation is a relatively new area of research. Work in this area aims to automate the manual and labour intensive task of finding exploits in software. With Exploit Logic, we are aiming to lay the foundation for a new automatic exploit generation system. Exploit Logic, which formally defines the relation between exploits and the preconditions which allow them to occur. This relation is used to calculate the search space of preconditions for a specific program and exploit. This presentation is an introduction into Exploit Logic, and the current state of my research.
54. 20 April Frouke Hermens Data driven group comparisons of eye fixations to dynamic stimuli abstract slides

Frouke Hermens
Data driven group comparisons of eye fixations to dynamic stimuli

Recent advances in software and hardware have allowed eye tracking to move away from static images to more ecologically relevant video streams. The analysis of eye tracking data for such dynamic stimuli, however, is not without challenges. The frame-by-frame coding of regions of interest (ROIs) is labour intensive, and computer vision techniques to automatically code such ROIs are not yet mainstream, restricting the use of such stimuli. Combined with the more general problem of defining relevant ROIs for video frames, methods are needed that facilitate data analysis. Here we present a first evaluation of an easy-to-implement data-driven method with the potential to address these issues. To test the new method, we examined the differences in eye movements of self-reported politically left- or right-wing leaning participants to video clips of left- and right-wing politicians. The results show that our method can accurately predict group membership on the basis of eye movement patterns, isolate video clips which best distinguish people on the political left-right spectrum and reveal the section of each video clip with the largest group differences. Our methodology thereby aids the understanding of group differences in gaze behaviour, and the identification of critical stimuli for follow-up studies or for use in saccade diagnosis.
53. 23 March Vincent van der Meer Deducing File History from NTFS timestamps abstract slides

Deducing File History from NTFS timestamps
Vincent van der Meer

Proper attribution is a key factor in digital forensics. In this paper, we investigate one such area of attribution: file timestamp history. This is crucial for determining whether a file could have been accessible to a suspect. In particular, we determine file history given a file's current NTFS timestamps. That is, we infer all possible sequences of file system operations which culminate in the file's current NTFS timestamps. To this end, we provide an exhaustive overview of such operations and their effect on timestamps, clarifying inconsistencies in literature. Based upon this overview, we propose a method for reconstructing possible histories of operations, accounting for various forms of timestamp forgery. We provide an implementation of this method, that depicts possible histories graphically.
52. 9 March Stef Joosten Programming with relation algebra's abstract slides

Programming with relation algebra's
Stef Joosten

Why use relation algebra as a programming language? Well, besides being cool for using formal specifications directly for programming, it gives neat language properties such as declarative semantics, strong typing, reactive programming, and calculating with programs. And it solves real engineering problems that will help solve the software crisis (Fred Brooks, The Mythical Man-Month: Essays on Software Engineering, Addison-Wesley, 1975). This presentation gives an overview of the language Ampersand, its formal foundations and its applications in industry. It contains a demonstration. We end by discussing some research challenges that are stil open.
51. 23 February Olivia Rodríguez Valdés Towards a testing tool that learns to test abstract slides

Towards a testing tool that learns to test
Olivia Rodríguez Valdés

GUI testing is a type of testing that verifies software as a whole through its GUI, interacting with widgets and testing the System Under Test (SUT) from the user's perspective. Guiding the testing process of the SUT in a smarter way should lead to an enhancement of error detection/recognition. We will study the application of Reinforcement Learning techniques in automated GUI testing. Using the scriptless GUI testing tool TESTAR as a vehicle, we will focus our research on improving the action selection mechanisms, that guide the learning process. We will tailor rewards towards the goal of enhancing the effectiveness and efficiency of testing.
50. 26 January Luc Edixhoven Expressing safe communication protocols abstract slides

Expressing safe communication protocols
Luc Edixhoven

Parallelisation is becoming ever more important in modern-day computing and, consequently, also proper communication between different threads, processes and machines. However, existing static methods to verify the safety of message passing-based communication protocols are generally limited in their expressiveness. As a result, many protocol implementations which are, in fact, safe, cannot be formally verified as such. Our aim is to find an expressive formal basis which is able to support, if possible, any safe protocol. Our first step in this adventure deals with regular protocols; we base our approach on formal language theory, by enriching regular expressions with a parameterised interleaving operator.
 

2020

49. 8 December Bastiaan Heeren Software technology for automated feedback generation abstract slides

Software technology for automated feedback generation
Bastiaan Heeren

Intelligent Tutoring Systems (ITSs) that offer well-designed feedback have the potential to enhance education in many problem domains. However, the cost of developing such systems is known to be high, with an estimated 200 hours of development for one hour of practice. The Ideas research project aims at simplifying ITS construction by using software technology for automated feedback generation. Examples of such techniques are embedded domain-specific languages, combinators and their algebraic laws, (datatype-)generic programming, and rewriting. Over the past 14 years, our approach has been tested on many problem domains, and has supported the exploration of new feedback opportunities. In this presentation, I will explain two techniques: the embedding of light-weight rewrite rules in problem-solving procedures, and the use of generic traversals for controlling rule application. The techniques will be illustrated by a simple problem-solving procedure for rewriting propositions into negation normal form.
48. 24 November Alaaeddin Swidan Learning programming for school-age children: challenges and opportunities abstract slides

Learning programming for school-age children: challenges and opportunities
Alaaeddin Swidan

There are currently a variety of programming environments and tools meant to teach children programming, as one aspect of a larger goal of developing their computational thinking. Block-based programming is one of the popular choices to introduce programming to children. Due to the low ceiling that such environments provide, there comes a point where students are introduced to other "traditional" programming languages. With this learning path in mind, in addition to the idea that learning programming is difficult, there arise many challenges that children and educators face. In this presentation we will focus on two of them: challenges to achieve higher conceptual development in block-based languages and the efforts needed for a successful transfer of knowledge from block based to textual programming languages.
47. 3 November Sung-Shik Jongmans Discourje: Runtime Verification of Communication Protocols in Clojure abstract slides

Discourje: Runtime Verification of Communication Protocols in Clojure
Sung-Shik Jongmans

This talk presents Discourje: a runtime verification framework for communication protocols in Clojure. Discourje guarantees safety of protocol implementations relative to specifications, based on an expressive new version of multiparty session types. The framework has a formal foundation and is itself implemented in Clojure to offer a seamless specification-implementation experience. Benchmarks show Discourje's overhead can be less than 5% for real/existing concurrent programs.
46. 14 October Stefano Bromuri Deep Learning applied to finance and logistics abstract slides

Deep Learning applied to finance and logistics
Stefano Bromuri

This presentation is meant to be a highlight concerning the activities of the CAROU team in the Heerlen campus for all computer scientists that may be interested in participating. Specifically, this presentation starts by revisiting the concepts of machine learning (ML) and deep learning (DL). After discussing the building blocks of ML and DL, the presentation then focuses on three projects in the AI domain currently carried out by CAROU with and for industrial partners in the Heerlen campus. The three domains considered are emotion analytics (APG), logistics (DHL) and customer segmentation (APG). Ethical considerations are discussed for each of the three projects.
45. 29 September Marloes Venema (Radboud University) Systemizing Attribute-Based Encryption -- In Search of Practical Decentralized ABE abstract slides

Systemizing Attribute-Based Encryption -- In Search of Practical Decentralized ABE
Marloes Venema

Attribute-based encryption (ABE) cryptographically implements fine-grained access control on data. As such, data can be stored by an entity that is not necessarily trusted to enforce access control, or an entity that is not even trusted to have access to the plaintext data at all. Instead, access control can be externally enforced by a trusted entity. For instance, you can store your personal data in the cloud without the storage servers getting actual access to it. You can also manage access to the data, and share it with the people with whom you want to share that data. Additionally, some multi-authority variants of ABE -- which do not have a central authority -- can effectively and securely implement access control in multiple-domain settings. We show that, despite the amount of work that has been done in the area of ABE, a practical and privacy-friendly scheme does not exist yet. To make ABE more practical and privacy friendly, we outline important open problems.
44. 15 September Christof Ferreira Torres (Université du Luxembourg) Ægis: Shielding Vulnerable Smart Contracts Against Attacks abstract slides

Ægis: Shielding Vulnerable Smart Contracts Against Attacks
Christof Ferreira Torres (Université du Luxembourg)

In recent years, smart contracts have suffered major exploits, costing millions of dollars. Unlike traditional programs, smart contracts are deployed on a blockchain. As such, they cannot be modified once deployed. Though various tools have been proposed to detect vulnerable smart contracts, the majority fails to protect vulnerable contracts that have already been deployed on the blockchain. Only very few solutions have been proposed so far to tackle the problem of post-deployment. However, these solutions suffer from low precision and are not generic enough to prevent any type of attack. In this work, we introduce Ægis, a dynamic analysis tool that protects smart contracts from being exploited during runtime. Its capability of detecting new vulnerabilities can easily be extended through so-called attack patterns. These patterns are written in a domain-specific language that is tailored to the execution model of Ethereum smart contracts.The language enables the description of malicious control and data flows. In addition, we propose a novel mechanism to streamline and speed up the process of managing attack patterns. Patterns are voted upon and stored via a smart contract, thus leveraging the benefits of tamper-resistance and transparency provided by the blockchain. We compare Ægis to current state-of-the-art tools and demonstrate that our solution achieves higher precision in detecting attacks. Finally, we perform a large-scale analysis on the first 4.5 million blocks of the Ethereum blockchain, thereby confirming the occurrences of well reported attacks, as well as uncovering unreported attacks in the wild.
43. 12 May Stijn de Gouw Verifying OpenJDK's LinkedList using KeY abstract slides

Verifying OpenJDK's LinkedList using KeY
Stijn de Gouw

As a particular case study of the formal verification of state-of-the-art, real software, we discuss the specification and verification of a corrected version of the implementation of a linked list as provided by the Java Collection framework.
42. 11 May Clara Maathuis Social Manipulation Campaigns abstract slides

Social Manipulation Campaigns
Clara Maathuis

The increased media attention towards the growing number of social manipulation campaigns that aimed at influencing or damaging perceptions, processes, and systems, succeeded in rerouting and rewriting thoughts and agendas of academic researchers, professionals, and decision makers in the last years. This implied that after a long struggle, wearing and seeing through offensive lenses could be perceived as the right approach to advancing effective, efficient, and adaptive intelligent solutions, programs, and strategies for defence, deterrence, and resilience purposes when fighting such threats. However, these measures should embed in their design phase socio-technical constraints, complexities, requirements, and values, and should further consider hybrid implementation and evaluation methods. In these lines, this presentation aims at i) addressing this phenomenon, and ii) discussing some incidents and solutions to countering such threats.
41. 14 April 2020 Freek Verbeek Formal Proofs of Return Address Integrity abstract slides

Formal Proofs of Return Address Integrity
Freek Verbeek

We present a methodology for generating a characterization of the memory used by an assembly program, as well as a formal proof that the assembly is bounded to the generated memory regions. A formal proof of memory usage is required for compositional reasoning over assembly programs. Moreover, it can be used to prove low-level security properties, such as integrity of the return address of a function. Our verification method is based on interactive theorem proving, but provides automation by generating pre- and postconditions, invariants, control-flow, and assumptions on memory layout. As a case study, three binaries of the Xen hypervisor are disassembled. These binaries are the result of a complex build-chain compiling production code, and contain various complex and nested loops, large and compound data structures, and functions with over 100 basic blocks. The methodology has been successfully applied to 251 functions, covering 12,252 assembly instructions.
 

2019

40. 5 November Sung-Shik Jongmans Distributed programming using role-parametric session types in Go abstract slides

Distributed programming using role-parametric session types in Go
Sung-Shik Jongmans

In this talk, I'll present a framework for the static specification and safe programming of message passing protocols where the number and kinds of participants are dynamically instantiated. More precisely, we developed the first theory of distributed multiparty session types (MPST) to support parameterised protocols with indexed roles; our framework statically infers the different kinds of participants induced by a protocol definition as role variants, and produces decoupled endpoint projections of the protocol onto each variant. This enables safe MPST-based programming of the parameterised endpoints in distributed settings: each endpoint can be implemented separately by different programmers, using different techniques (or languages). We proved the decidability of role variant inference and well-formedness checking, and the correctness of projection. We implemented our theory as a toolchain for programming such role-parametric MPST protocols in Go. Our approach is to generate API families of lightweight, protocol- and variant-specific type wrappers for I/O. The APIs ensure a well-typed Go endpoint program (by native Go type checking) will perform only compliant I/O actions w.r.t. the source protocol. We leveraged the abstractions of MPST to support the specification and implementation of Go applications involving multiple channels, possibly over mixed transports (e.g., Go channels, TCP), and channel passing via a unified programming interface. We evaluated the applicability and run-time performance of our generated APIs using microbenchmarks and real-world applications. This is joint work with David Castro, Raymond Hu, Nicholas Ng, and Nobuko Yoshida; the talk is based on a paper that was published in POPL 2019 (https://dx.doi.org/10.1145/3290342).
39. 24 September Hugo Jonker Fingerprint surface-based detection of web bot detectors abstract slides

Fingerprint surface-based detection of web bot detectors
Hugo Jonker

Web bots are used to automate client interactions with websites, which facilitates large-scale web measurements. However, websites may employ web bot detection. When they do, their response to a bot may differ from responses to regular browsers. The discrimination can result in deviating content, restriction of resources or even the exclusion of a bot from a website. This places strict restrictions upon studies: the more bot detection takes place, the more results must be manually verified to confirm the bot's findings. To investigate the extent to which bot detection occurs, we reverse-analysed commercial bot detection. We found that in part, bot detection relies on the values of browser properties and the presence of certain objects in the browser's DOM model. This part strongly resembles browser fingerprinting. We leveraged this for a generic approach to detect web bot detection: we identify what part of the browser fingerprint of a web bot uniquely identifies it as a web bot by contrasting its fingerprint with those of regular browsers. This leads to the fingerprint surface of a web bot. Any website accessing the fingerprint surface is then accessing a part unique to bots, and thus engaging in bot detection. We provide a characterisation of the fingerprint surface of 14 web bots. We show that the vast majority of these frameworks are uniquely identifiable through well-known fingerprinting techniques. We design a scanner to detect web bot detection based on the reverse analysis, augmented with the found fingerprint surfaces. In a scan of the Alexa Top 1 Million, we find that 12.8% of websites show indications of web bot detection.
38. 5 July Christof Ferreira Torres (Université du Luxembourg) The Art of The Scam: Demystifying Honeypots in Ethereum Smart Contracts abstract slides

The Art of The Scam: Demystifying Honeypots in Ethereum Smart Contracts
Christof Ferreira Torres

Modern blockchains, such as Ethereum, enable the execution of so-called smart contracts -- programs that are executed across a decentralised network of nodes. As smart contracts become more popular and carry more value, they become more of an interesting target for attackers. In the past few years, several smart contracts have been found to be vulnerable and thus exploited by attackers. However, a new trend towards a more proactive approach seems to be on the rise, where attackers do not search for vulnerable contracts anymore. Instead, they try to lure their victims into traps by deploying vulnerable-looking contracts that contain hidden traps. This new type of contracts is commonly referred to as honeypots. In this paper, we present the first systematic analysis of honeypot smart contracts, by investigating their prevalence, behaviour and impact on the Ethereum blockchain. We develop a taxonomy of honeypot techniques and use this to build HONEYBADGER: a tool that employs symbolic execution and well defined heuristics to expose honeypots. We perform a large-scale analysis on more than 2 million smart contracts and show that our tool not only achieves high precision, but also high scalability. We identify 690 honeypot smart contracts as well as 240 victims in the wild, with an accumulated profit of more than $90,000 for the honeypot creators. Our manual validation shows that 87% of the reported contracts are indeed honeypots.
37. 18 June Greg Alpár How resistance can turn into curiosity? -- The Open Maths approach abstract slides

How resistance can turn into curiosity? -- The Open Maths approach
Greg Alpár

Open Maths is an initiative, based on state-of-the-art research results, to make mathematics accessible and enjoyable for students in the higher education and in life-long learning. In this talk, I will give an overview about the first two Open Maths courses and their impact on students. Be prepared for some surprises while we delve deeper in one particular aspect of this new approach!
36. 28 May Vincent van der Meer Investigating File Fragmentation abstract slides

Investigating File Fragmentation
Vincent van der Meer

In digital forensics, data recovery tools are used to find lost or deleted evidence. One popular approach is file carving, which recovers files based on their contents instead of metadata (such as a file system). To recover fragmented files, file carvers must make assumptions about fragmentation of files on the examined devices. There has been one large-scale study on file fragmentation, on devices from 1998-2006. Significant advances in hard- and software since then more than merit a new study. In this paper, we examine file fragmentation of 220 systems (334 drives), that serve as the primary computer of their owners. To this end, we have devised a privacy-friendly approach to acquiring the data. Only meta- information of the file system was collected. Moreover, meta-information was pro-actively filtered to avoid any potential privacy-sensitive issues. Our results indicate that the average level of fragmentation is lower compared to previous studies. Interestingly, we find that fragmentation not only concerns gaps between blocks of the same file, but also the order of these blocks. In particular, we find that a type of fragmentation where a later block is stored on disk before an earlier block of the same file, occurs in nearly half of all fragmented files. This type of out-of-order fragmentation is rarely discussed in literature, yet its high rate of occurrence should impact the practice of file carving.
35. 29 January Frank Tempelman The Semi-Direct Control System - a vehicle for AI research abstract slides

The Semi-Direct Control System - a vehicle for AI research
Frank Tempelman

In mijn voordracht van komende dinsdag zal ik, na een initiële, korte introductie van mijzelf, een overzicht geven van mijn mogelijke onderzoeksonderwerpen op met name het gebied van kunstmatige intelligentie en dan vooral, gegeven mijn achtergrond en contacten, in combinatie met luchtvaart. Aan de orde komt het Semi-Direct Control System, een alternatief besturingssysteem voor onbemande vliegtuigen (UAVs), voornamelijk toepasbaar in zogeheten 'dog fights' op grote afstand, waarbij vliegtuigen elkaar op korte onderlinge afstand te lijf gaan. Het probleem is in deze dat een direct controlesysteem, waarbij de vlieger op vaak duizenden kilometers afstand rechtstreeks op basis van sensorinformatie van het UAV met zijn controle-apparatuur het ding bestuurt, te traag is. Dat zou op te lossen te zijn door het UAV een veel grotere mate van autonomie te geven: geef hem meer intelligentie en je hoeft hem niet steeds van commando's te voorzien, want hij kan het zelf. Zover zijn we echter nog niet, en het SDCS vormt een tussenweg, waarbij de frequentie van commando's een trade-off vormt met de benodigde intelligentie in het UAV. Daarmee biedt het SDCS een mooi en praktisch toepasbaar onderzoeksonderwerp voor de (toename van) autonomie in vliegtuigen en ook andere voertuigen, op het gebied van onder andere kennisacquisitie, - formalisering en -representatie, beeldherkenning en beslissingslogica. Het zou heel interessant zijn om die trade-off tussen de mate van controle en de benodigde intelligentie in het vliegtuig in kaart te brengen.
 

2018

34. 11 December Bernard van Gastel On the security of self-encrypting SSDs abstract

On the security of self-encrypting SSDs
Bernard van Gastel

Various SSDs offer an option to enable encryption at a firmware level. The security goals should be similar to those of disk encryption handled by software. And indeed, Bitlocker, the disk encryption built into Windows, fully relies on firmware-level encryption if it is present. However, when Bernard and Carlo Meijer (RU) investigated drives from Samsung and Crucial, they found several issues with the afforded security, such as a master password set to the empty string.
33. 16 October Johan Jeuring Intelligent Programming Tutors abstract, slides

Intelligent Programming Tutors
Johan Jeuring, OU & University of Utrecht

Intelligent programming tutors (IPTs) are used to support students who learn how to program. There exist IPTs for almost any programming language. In this talk I will give a brief introduction to IPTs. In particular, I will discuss what kind of feedback and hints these tutors give, and what technologies they use to calculate feedback.
32. 25 Sept Josje Lodder Developing an e-learning tool for inductive proofs abstract, slides

Developing an e-learning tool for inductive proofs
Josje Lodder

Proving is a central subject of most logic course. Students attending such courses do not only have to learn how to prove consequences in a formal system, but they also have to reason about formal systems. A typical example of an exercise that occurs in many textbooks is the following: prove that the number of left parentheses in a logical formula is equal to the number of right parentheses. These kinds of properties can be proved by structural induction, where the structure of the proof follows the structure of the inductive definition of the logical language. Students have conceptual as well as technical problems with inductive proofs. In this talk I will discuss our ideas and first results concerning the development of an e-learning tool that helps students to practice with inductive proofs.
31. 26 June Hugo Jonker Investigating Adblock wars abstract, slides

Investigating Adblock wars
Hugo Jonker

We investigate the extent to which websites react to filter list updates. We collected data on the Alexa Top-10K for 120 days, and on specific sites newly added to filter lists for 140 days. By evaluating how long a filter rule triggers on a website, we can gauge how long it remains effective. We matched websites with both regular adblocking filter lists (Easylist) and with filter lists that remove anti-adblocking (primarily UblockProtector / NanoProtector). From our data, we observe that the effectiveness of the Easylist adblocking filter decays initially, but after ~80 days seems to stabilize. However, we found no evidence for any significant decay in effectiveness of the more specialized, but less widely used, Ublock Protector / Nano Protector list.
30. 24 April Stijn de Gouw Human-in-the-Loop Simulation of Cloud Services abstract, slides

Human-in-the-Loop Simulation of Cloud Services
Stijn de Gouw

We present an integrated tool-suite for the simulation of software services which are offered on the Cloud. The tool -suite uses the Abstract Behavioral Specification (ABS) language for modeling the software services and their Cloud deployment. For the real-time execution of the ABS models we use a Haskell backend which is based on a source-to-source translation of ABS into Haskell. The tool-suite then allows Cloud engineers to interact in real-time with the execution of the model by deploying and managing service instances. The resulting human-in-the-loop simulation of Cloud services can be used both for training purposes and for the (semi-)automated support for the real-time monitoring and management of the actual service instances.
29. 27 March Benjamin Krumnow The Shepherd Project: Automated security audits of website login processes abstract, slides

The Shepherd Project: Automated security audits of web login processes
Benjamin Krumnow

HTTP does not have memory. HTTP Cookies were created to address this. These cookies thus allow web sites to remember things, such as whether you are logged in or not. After logging in, the cookies authenticate the user, so that the user does not have to type his password with every click. If an attacker steals these cookies, he could thus impersonate a user. In 2010, the Firesheep plugin for Firefox demonstrated how easy it was to steal such cookies for many popular web sites (Gmail, Facebook, etc.). The problem that Firesheep exploited was that cookies were sent over an insecure channel (HTTP instead of HTTPS). Such an attack is called a session hijacking attack.

In the Shepherd project, we investigate the adoption of countermeasures against session hijacking attacks. Over the last year, we have developed an automatic selenium-based scanner. This scanner performs an automated audit of the security of login processes of web sites.
In this OUrsi talk, I will give you an overview of the current scanning framework, explain how it assesses login security and talk about our current study, where we analyse over 65K websites. I hope that we will have lively discussion to identify possible unclear points and future directions.

28. 27 February Vincent van der Meer Future-proofing recovery of deleted files abstract, slides

To maintain the ability to recover deleted files: about the further development and future-proofing of this digital forensic specialization
Vincent van der Meer

Reconstructing deleted files, file carving, is a crucial part of digital forensic work, often concerning child pornography. The explosive growth of data makes file carving increasingly time-consuming and therefore increasingly inadequate, so that evidence remains hidden. File carving is still set up separately per device and per app, resulting in a considerable delay to develop file carving for new apps. This research will investigate how the app-specific part can be made as small as possible by abstraction and generalization. Spin-off will be a widely applicable, modular, maintainable and expandable framework with corresponding benchmark to compare the results of new technologies.
27. 19 February Hassan Alizadeh Intrusion Detection and Traffic Classification using Application-aware Traffic Profiles abstract, slides

Intrusion Detection and Traffic Classification using Application-aware Traffic Profiles
Hassan Alizadeh

Along with the growing number of applications and end-users, online network attacks and advanced generations of malware have continuously proliferated. Many studies have addressed the issue of intrusion detection by inspecting aggregated network traffic with no knowledge of the responsible applications/services. Such systems fail to detect intrusions in applications whenever their abnormal traffic fits into the network normality profiles. This research addresses the problem of detecting intrusions in (known) applications when their traffic exhibits anomalies. To do so, a network monitoring tool needs to: (1) identify the source application responsible for the traffic; (2) have per-application traffic profiles; and (3) detect deviations from profiles given a set of traffic samples. This work is mainly devoted to the last two topics. Upon an initial survey on traffic classification techniques, this research proposes the use of three different kinds of Gaussian Mixture Model (GMM) learning approaches, namely classic, an unsupervised and Universal Background Model (UBM) to build per-application profiles. The effectiveness of the proposed approaches has been demonstrated by conducting different sets of experiments on two public datasets collected from real networks, where the source of each traffic flow is provided. Experimental results indicate that while the proposed approaches are effective for most applications, the UBM approach significantly outperforms other approaches.
26. 19 February Fenia Aivaloglou How Kids Code and How We Know abstract, slides

How Kids Code and How We Know
Fenia Aivaloglou

Programming education is nowadays beginning from as early as the elementary school age. Block-based programming languages like Scratch are increasingly popular, and there is substantial research showing that they are suitable for early programming education. However, how do children program, and are they really learning? In this talk we will focus on Scratch and explore the programs that children write. By scraping the Scratch public repository we have download and analyzed 250K projects in terms of complexity, used abstractions and programming concepts, and code smells. We will further discuss the effect of code smells to novice Scratch programmers and our experiments in Dutch classrooms. Focusing on how to teach good programming practices to children, we designed the edx MOOC "Scratch: Programmeren voor kinderen" and we analyzed the results of its first run, where more that 2K children actively participated. We found factors that can predict successful course completion, as well as age-related differences in the students' learning performance. To further examine the children's motivation and self-efficacy and their relation to learning performance we run a Scratch course in three schools and we will present our initial findings.
25. 30 January Greg Alpár Blockchain for identity management, seriously? abstract, slides

Blockchain for identity management, seriously?
Greg Alpár

Right now (Dec. 11, 2017, 15:45) 1 Bitcoin is €14228.71. This amazing success of a cryptocurrency has made the blockchain technology, underlying that of Bitcoin, so exciting for businesses that it is often called the next big revolution after the internet. Blockchains are proposed to solve all kinds of problems in our digital world. Besides the financial context, identity is considered to be another promising area for blockchains. Is it a good idea? In this OUrsi session, I would like to share with you some thoughts with regard to the application of blockchains in identity management. Since there is no big results in our study yet, this talk will hopefully turn into a fruitful discussion.
 

2017

24. 30 November Pekka Aho Automated Model Extraction and Testing of Graphical User Interface Applications abstract, slides

Automated Model Extraction and Testing of Graphical User Interface Applications
Pekka Aho

The industry practice for test automation through graphical user interface is script-based. In some tools the scripts can be recorded from the user (capture & replay tools) and in others, the scripts are written manually. The main challenge with script-based test automation is the maintenance effort required when the GUI changes. One attempt to reduce the maintenance effort has been model-based testing that aims to generate test cases from manually created models. When the system under testing (SUT) changes, changing the models and re-generating the test cases is supposed to reduce the maintenance effort. However, creating the formal models allowing test case generation requires very specialized expertise. Random testing (monkey testing) is a maintenance-free and scriptless approach to automate GUI testing. In random GUI testing, the tool emulates an end-user by randomly interacting with the GUI under testing by pressing buttons and menus and providing input to other types of GUI widgets. More recent research aims to automatically extract the GUI models by runtime analysis while automatically exploring and interacting with the GUI (or during random testing). Test case generation based on extracted models is problematic but by comparing extracted GUI models of consequent versions it is possible to detect changes on the GUI. This approach aims to address the maintenance problem with scriptless regression testing and change analysis.
23. 30 November Twan van Laarhoven Local network community detection: K-means clustering and conductance abstract, slides

Local network community detection: K-means clustering and conductance
Twan van Laarhoven
Joint work Elena Marchiori

Local network community detection is the task of finding a single community of nodes in a graph, concentrated around few given seed nodes. The aim is to perform this search in a localized way, without having to look at the entire graph. Conductance is a popular objective function used in many algorithms for local community detection. In this talk I will introduce a continuous relaxation of conductance. I will show that continuous optimization of this objective still leads to discrete communities. I investigate the relation of conductance with weighted kernel k-means for a single community, which leads to the introduction of a new objective function, σ-conductance. Conductance is obtained by setting σ to 0. I will present two algorithms for optimizing σ-conductance, which show promising results on benchmark datasets.
22. 27 June Jeroen Keiren An O(m log n) algorithm for stuttering equivalence abstract, slides

An O(m log n) algorithm for stuttering equivalence
Jeroen Keiren

In this presentation, I will discuss a new algorithm to determine stuttering equivalence with time complexity O(m log n), where n is the number of states and m is the number of transitions of a Kripke structure. Theoretically, this algorithm substantially improves upon existing algorithms, which all have time complexity of the order O(mn) at best. Moreover, when compared with other algorithms, the new algorithm has better or equal space complexity. Practical results confirm these findings: they show that the algorithm can outperform existing algorithms by several orders of magnitude, especially when the Kripke structures are large.
21. 13 June Stef Joosten Software development in relation algebra with Ampersand abstract, slides

Software development in relation algebra with Ampersand
Stef Joosten

Relation algebra can be used as a programming language for building information systems. We will demonstrate this principle by discussing a database-application for legal reasoning. In this case study, we will focus on the mechanisms of programming in relation algebra. Besides being declarative, relation algebra comes with attractive promises for developing big software. The case study was compiled with Ampersand, a compiler whose design and development is the focus of a research collaboration between Ordina and the Open University.
20. 9 May,
13.30 - 14.30
Hieke Keuning Code Quality Issues in Student Programs abstract, slides

Code Quality Issues in Student Programs
Hieke Keuning

Abstract: Because low quality code can cause serious problems in software systems, students learning to program should pay attention to code quality early. Although many studies have investigated mistakes that students make during programming, we do not know much about the quality of their code. We have examined the presence of quality issues related to program-flow, choice of programming constructs and functions, clarity of expressions, decomposition and modularization in a large set of student Java programs. We investigated which issues occur most frequently, if students are able to solve these issues over time and if the use of code analysis tools has an effect on issue occurrence. We found that students hardly fix issues, in particular issues related to modularization, and that the use of tooling does not have much effect on the occurrence of issues.
19. 4 April Greg Alpár The Alphabet of ABCs abstract, slides

The Alphabet of ABCs
Greg Alpár

The digital world becomes more and more a part of the physical world. Data, bound to our identity, relationships and communications, is processed all around the world. This may result in privacy harms, including unwanted advertising, data creep, exclusion, insecurity, etc. Although much of this data could possibly be used without linking personally to you, currently everybody seems to accept that we have to give up on the control over our privacy. A fundamental technological paradigm today is that you almost have to identify yourself in order to use a given digital service. What if we could avoid the need for this? For instance, you could just prove that you have a train ticket without a central party storing an account about you and about all your trips. Or you could merely show that you are over 18 of age without having to show (and let save) your name and credential number, so you are eligible to buy a bottle of 'jenever'. Or you could do online shopping with giving away as little personal information as you used to in a department store in the 70s. Attribute-based credentials (ABCs) provide a wide variety of possible scenarios, in which you can prove that you are entitled to do things that otherwise you can only do by having to use identifiers. In the last OUrsi meeting Fabian presented the IRMA technology, one of the most important practical ABC projects, and its applications. In this OUrsi meeting, I will talk about the neat cryptographic and mathematic techniques that allow for the above-mentioned high level of flexibility and practical privacy. (Don't get scared, you won't need lots of background in mathematics to understand my presentation!)
18. 7 March Rolando Trujillo
(University of Luxembourg)
Counteracting active attacks in social network graphs abstract, slides

Counteracting active attacks in social network graphs
Rolando Trujillo

Active privacy attacks in social networks attempt to de-anonymize a social graph by using social links previously established between the attackers and their victims. Malicious users are hard to identify a priori, so active attacks are typically prevented by anonymizing (perturbing) the social graph prior publication. This talk is an introduction to this type of privacy attack and the anonymization techniques that have been proposed to counteract it. Towards the end of the talk we will discuss drawbacks of current approaches and possible improvements.
17. 7 February Fabian van den Broek IRMA, as simple as ABC abstract, slides

IRMA, as simple as ABC
Fabian van den Broek

I will present the IRMA technology. IRMA (I Reveal My Attributes) is an attribute-based credential system in which users can obtain credentials stating attributes about themselves (e.g.\ I'm an employee of the Open University, with employee-number 1234). These attributes can be disclosed selectively to verifiers (e.g. I'm an employee of the Open University, xxxxxxxxxxxxxxxxxxxxx), offering a very flexible and potentially privacy-friendly way of authentication. This presentation will mostly focus on the progress we made in the last two years and the progress we are hoping to achieve in the coming years.
 

2016

16. 25 October Jeroen Keiren Game-based approach to branching bisimulation abstract, slides

Game-based approach to branching bisimulation
Jeroen Keiren

Branching bisimilarity and branching bisimilarity with explicit divergences are typically used in process algebras with silent steps when relating implementations to specifications. When an implementation fails to conform to its specification, i.e., when both are not related by branching bisimilarity (with explicit divergence), pinpointing the root causes can be challenging. In this paper, we provide characterisations of branching bisimilarity (with explicit divergence) as games between Spoiler and Duplicator, offering an operational understanding of both relations. Moreover, we show how such games can be used to assist in diagnosing non-conformance between implementation and specification.
15. 4 October Arjen Hommersom Flexible Probabilistic Logic for Solving the Problem of Multimorbidity abstract, slides

Flexible Probabilistic Logic for Solving the Problem of Multimorbidity
Arjen Hommersom

One of the central questions in artificial intelligence is how to combine knowledge representation principles with machine learning methods. Effectively dealing with structure and relations in data opens up a wealth of applications in areas such as economics, biology, and healthcare. In this talk I will discuss a recent project proposal, which aims to develop a new methodology that provides powerful methods for deriving knowledge from structured data using state-of-the-art probabilistic logics. These methods will be applied to deal with a highly significant and challenging problem in modern medicine: managing patients with multimorbidity, i.e., patients that have multiple chronic conditions. The main purpose of this talk is to obtain feedback as this project will likely soon be presented to an evaluation committee in the exact sciences.

14. 28 June Hugo Jonker Gaming the H-index abstract, slides

Gaming the H-index
Hugo Jonker

In this talk, we discuss some early results looking into formally modelling the scientific process, and elucidating what attacks can be performed on bibliographic metrics. Specifically, we distinguish between hacking, fraud, and gaming, provide initial notions of these three terms and initial results towards identifying which gaming attacks are possible on the H-index.
13. 31 May Josje Lodder Strategies for axiomatic Hilbert-style proofs abstract, slides

Strategies for axiomatic Hilbert-style proofs
Josje Lodder

Een van de onderwerpen die studenten tegenkomen bij een logicacursus is een afleidingssysteem zoals natuurlijke deductie of axiomatisch afleiden. Het vinden van afleidingen is niet altijd eenvoudig. Studenten moeten hier veel mee oefenen, en een e-learning systeem kan hier bij helpen. Voor natuurlijke deductie bestaan er verschillende ondersteunende tools, maar voor axiomatisch afleiden zijn er geen tools die studenten helpen bij het maken van deze afleidingen. Om gerichte feedback en feedforward te kunnen geven hebben we strategieën ontwikkeld waarmee we automatisch axiomatische bewijzen kunnen genereren, en we gaan deze strategieën ook gebruiken om uitwerkingen van studenten te volgen, zodat we op elk moment gericht feedback kunnen geven.
12. 26 April Bastiaan Heeren Ask-Elle: an Adaptable Programming Tutor for Haskell Giving Automated Feedback abstract, slides

Ask-Elle: an Adaptable Programming Tutor for Haskell Giving Automated Feedback
Bastiaan Heeren

Ask-Elle is a tutor for learning the higher-order, strongly-typed functional programming language Haskell. It supports the stepwise development of Haskell programs by verifying the correctness of incomplete programs, and by providing hints. Programming exercises are added to Ask-Elle by providing a task description for the exercise, one or more model solutions, and properties that a solution should satisfy. The properties and model solutions can be annotated with feedback messages, and the amount of flexibility that is allowed in student solutions can be adjusted. The main contribution of our work is the design of a tutor that combines (1) the incremental development of different solutions in various forms to a programming exercise with (2) automated feedback and (3) teacher-specified programming exercises, solutions, and properties. The main functionality is obtained by means of strategy-based model tracing and property-based testing. We have tested the feasibility of our approach in several experiments, in which we analyse both intermediate and final student solutions to programming exercises, amongst others.
11. 29 March Hugo Jonker MitM attacks evolved abstract, slides

MitM attacks evolved
Hugo Jonker

The security community seems to be thoroughly familiar with man-in-the-middle attacks. However, the common perception of this type of attack is outdated. It originates from when network connections were fixed, not mobile, before 24/7 connectivity became ubiquitous. The common perception of this attack stems from an era before the vulnerability of the protocol's context was realised. Thanks to revelations by Snowden and by currently available man-in-the-middle tools focused on protocol meta-data (such as so-called `Stingrays' for cellphones), this view is no longer tenable. Security protocols that only protect the contents of their messages are insufficient. Contemporary security protocols must also take steps to protect their context: who is talking to whom, where is the sender located, etc. In short: the attacker has evolved. It's high time for our security models and requirements to catch up.
10. 26 February
13.45-14.30, Utrecht
Tanja Vos TESTAR: Automated Testing at the User Interface Level abstract, slides

TESTAR: Automated Testing at the User Interface Level
Tanja Vos

Testing applications at the User Interface level is an important yet expensive and labour-intensive activity. Several tools exist to automate UI level testing. These tools are based on capture replay or visual imagine recognition. We present TESTAR, a tool for automated GUI testing that takes a totally different approach and has demonstrated to be highly useful in practice. Using TESTAR, you can start testing immediately and you do not need to specify test cases in advance. TESTAR automatically generates and executes random test sequences based on a structure that is automatically derived from the UI through the accessibility API. Defects that can be found this way are crashes or suspicious outputs. For defects related to functionality, expected outputs need to be added to the tool. This can be done incrementally. The structure on which testing is based is build automatically during testing, the UI is not assumed to be fixed and tests will run even though the UI evolves, which will reduce the maintenance problem that threatens the approaches mentioned earlier.
9. 23 February Freek Verbeek Cross-layer invariants for Network-on-Chips abstract, slides

Cross-layer invariants for Network-on-Chips
Freek Verbeek

One of the major challenges in the design and verification of manycore systems is cache coherency. In bus-based architectures, this is a well-studied problem. When replacing the bus by a communication network, however, new problems arise. Cross-layer deadlocks can occur even when the protocol and the network are both deadlock-free when considered in isolation. To counter this problem, we propose a methodology for deriving cross-layer invariants for Network-on-Chips. These invariants relate the state of the protocols run by the cores to the state of the communication network. We show how they can be used to prove the absence of cross-layer deadlocks. Our approach is generally applicable and shows promising scalability.
8. 26 January Harald Vranken About a recently submitted research proposal
 

2015

7. 27 October Bernard van Gastel Inschatten van energieverbruik van software door middel van een typesysteem abstract, slides

Inschatten van energieverbruik van software door middel van een typesysteem
Bernard van Gastel

Energieverbruik speelt een steeds bepalendere rol binnen ICT. In veel toepassingsgebieden is het belangrijk om het energieverbruik te voorspellen, en ervoor te kunnen optimaliseren. Deze methoden moeten praktisch zijn: zowel snel feedback geven als voor grote programma/bibliotheken toepasbaar zijn. Omdat het energieverbruik sterkt afhangt van de hardware die gebruikt wordt, moet de gebruikte hardware snel in te stellen zijn en te wisselen zijn. In dit kader stellen wij een type systeem voor dat statisch gebruikt kan worden om energie verbruik compositioneel af te leiden. Dit type systeem werkt op een kleine theoretische programmeertaal waarbij hardware interacties expliciet gemaakt zijn.
6. 29 Sept
10.00-11.00, Eindhoven
Sung-Shik Jongmans Protocol Programming with Automata: (Why and) How? abstract, slides

Protocol Programming with Automata: (Why and) How?
Sung-Shik Jongmans

With the advent of multicore processors, exploiting concurrency has become essential in writing scalable programs on commodity hardware. One important aspect of exploiting concurrency is programming protocols. In this talk, I explain my recent work on programming protocols based on a theory of automata.
5. 25 August Sylvia Stuurman Setup for a experiment on teaching refactoring slides
4. 30 June Jeroen Keiren State Space Explosion: Facing the Challenge abstract, slides

State Space Explosion: Facing the Challenge
Jeroen Keiren

When processes run in parallel, the total number of states in a system grows exponentially; this is typically referred to as the state space explosion problem. The holy grail in model checking is finding a way in which we can effectively cope with this state space explosion. In this talk I discuss a line of research in which we use an equational fixed point logic and parity games as a basis for verification. I will focus on the latter, and discuss ideas that can be used to reduce these parity games. For those wondering whether this general theory has any applications at all I discuss industrial applications of this technology. If you want to know what these applications are, come listen to my talk!
3. 26 May Arjen Hommersom Probabilistic Models for Understanding Health Care Data abstract, slides

Probabilistic Models for Understanding Health Care Data
Arjen Hommersom

In the first part of this talk, I will give an overview of my recent research in using probabilistic models for health care applications. In the second part, I will zoom into recent results from the NWO CAREFUL project, where we attempt to reconcile probabilistic automata learning with Bayesian network modelling. In particular, the aim is to identify the probabilistic structure of states within a probabilistic automaton. We propose a new expectation-maximisation algorithm that exploits the state-based structure of these models. This model is currently being applied to learn models from psychiatry data to provide novel insights into the treatment of psychotic depressions. Preliminary results will be discussed.
2. 28 April Hugo Jonker FP-Block: usable web privacy by controlling browser fingerprinting abstract, slides

FP-Block: usable web privacy by controlling browser fingerprinting
Hugo Jonker

Online tracking of users is used for benign goals, such as detecting fraudulous logins, but also to invade user privacy. We posit that for non-oppressed users, tracking within one website does not have a substantial negative impact on privacy, while it enables legitimate benefits. In contrast, cross-domain

tracking negatively impacts user privacy, while being of little benefit to the user. Existing methods to counter tracking treat any and all tracking similar: client-side storage is blocked, or all sites are fed random characteristics to hamper re-identification. We develop a tool, FP-Block, that counters cross-domain tracking, while allowing intra-domain tracking. For each visited site, FP-Block generates a unique, consistent fingerprint: a "web identity". This web identity is then used for any interaction with that site, including for third-party embedded content. This ensures that ubiquitously embedded parties will see different, unrelatable fingerprints from different sites, and can thus no longer track the user across the web.
1. March 31 Christoph Bockisch A design method for modular energy-aware software abstract, slides

A design method for modular energy-aware software
Christoph Bockish

Nowadays achieving green software by reducing the overall energy consumption of the system is becoming more and more important. A well-known solution is to make the software energy-aware by extending its functionality with energy optimizers, which monitor the energy consumption of software and adapt it accordingly. Modular design of energy-aware software is necessary to make the extensions manageable and to cope with the complexity of the software. To this aim, we propose a model of the energy utilization -- or more general resource utilization -- and an accompanying design method for components in energy-intensive systems. Optimizer components can analyze such Resource Utilization Models (RUM) of other components in terms of such a model and adapt their behavior accordingly. We show how compact RUMs can be extracted from existing implementations. Using the Counterexample-Guided Abstraction Refinement (CEGAR) approach, along with model-checking tools, abstract models can be generated that help establish key properties relating to energy consumption. This approach is illustrated by the concrete example of a network manager sub-system.

 

empty