Graduated students

This page describes final projects by students who finished their studies under my direct supervision. Students I currently supervise are listed on the supervision page.

Graduated PhD students:

Vincent van der Meer: Improving Foundations of File Recovery.
Publications: WIFS'19, DFRWS-APAC'20, WSDF@ARES'21, WSDF@ARES'23, DFRWS-EU'24.
Benjamin Krumnow: Web Scrapology — overcoming limits of automating web measurements.
Publications: ESORICS'19, MADWeb'20, IMC'21, J. Computers & Security, CoNEXT'22, MADWeb'23.
Naipeng Dong: Enforced Privacy: from practice to theory.
Publications: FAST'10, FHIES'11, ESORICS'12, ESORICS'13.

Graduated MSc students:

Siebren Lepstra: Preparing Passports for the Post Quantum Era.
Nick Borchers: Bits don't lie — detecting NTFS driver fingerprints.
Jan Ouwehand: Enabling users to enforce privacy — towards a privacy-preserving document life cycle when digitizing and sharing documents.
Bas van de Louw: Browser-based port scanning.
Wibren Wiersma: Using graph-based anomaly detection to uncover scientific fraud.
Jeroen Hoebert: Using GUI testing to automate website security analysis.
Ewoud Westerbaan: Acquisition and integration of public data to improve detection of scientific fraud.
Jorgos Korres: Towards better web measurements by mitigating impact factors.
Godfried Meesters: Synchronising Distributed Scraping.
Thesis contributed to a publication at MADWeb'23.
Jeroen Pinoy: Nothing to see here! On the awareness of and preparedness and defenses against cloaking malicious web content delivery.
Koen Aerts: Cookie dialogs and their compliance.
Mart Peters: Advanced file format validation for file carving.
Nils de Jong: Refining user context detection on smartphones.
Aksel Harrewijn: Turning It Off: context-driven prevention of passive WiFi-tracking.
Gabry Vlot: Automated data extraction: what you see might not be what you get.
Thesis resulted in a publication at ESORICS'19.
Niels Tielenburg: Automating outlier detection in academic publishing.

Research internships (MSc students):

David Roefs: Camouflaging OpenWPM.
Internship report contributed to a publication at IMC21.
Rick Dolfing: Towards a Fingerprint Surface

Graduated BSc students:

Nick Huijsmans, Bart Kuijsten: Reconstructing Files with Out-of-Order Fragmentation.
Violeta Sizonenko: Measuring Evolution of Cookie Dialogues.
Evert van Kammen, Serge Gordts, Chris Adoorn: Mining Software Repositories To Discover The Usage Of Concurrency.
Camile Lendering: Inside the Cookie Jar: Tracking Consent in Digital Advertising.
Koen Berkhout and Maarten Meyns: Cookie Dialog Compliance.
Zowie van Dillen: A Measured Evaluation of Artificial Filesystem Aging Tools.
Wouter Hueting: Feasibility of Simulating the Java Programming Process.
Mitchel Jansen: Recognising client-side behavioral detection of web bots.
Bart van Vulpen: Towards finding browser fingerprinters through automated static analysis of JavaScript code.
Daniel Goßen: Design and implementation of a stealthy OpenWPM web scraper.
Thesis contributed to a publication at IMC21.
Jelle Bouma: Interpreting NTFS time-stamps.
Thesis resulted in a publication at ARES-WSDF'23, which was awarded the best paper award.
Robbert Noordzij: Synthetic Fragmentation Experiments using WildFragSim.
Yvonne Vollebregt: File dating based on the physical location of the File.
Nataliya Yasko and Siebren Cosijn: FP-Block 2.0: preventing browser fingerprinting.
Alan Verresen and Jelle Kalkman: Shepherd: enabling large-scale scanning of websites after social single sign-on.
Nick Nibbeling: Comparing privacy plugins.
Annet Vink and Katleen de Nil: 50 ways to lose your cover.
Bas Doorn and Lucas Vos: Private Tweeting.
Marc Sleegers: Counting sheep: analysing online authentication security.
Thesis resulted in a publication at MADWEB'20
Gert-Jan den Besten: S3 - Securely Sharing Selfies.
Else van Schaijk and Ton Poppe: A GIMP-plugin for deblurring uniform motion blurred text images.
Christof Ferreira Torres: Fingerprint privacy: a fresh perspective on web privacy.
Thesis resulted in a publication at ESORICS'15.
François Lange: Deblurring text: from theory towards an implementation.
Xin Zhu: Autogenerating context to manage sensitive data in Android-phones.
Filipe Ferreira: SpyDroid.

Graduated BPMIT students: As a courtesy to the starting Information Sciences group, from 2015-2018, I helped out by supervising a few of their master students for their final project for the BPMIT programme. The BPMIT MSc. programme focuses on the management aspect of IT projects. As such, it is more in line with the field of Management than Computer Science, which is reflected in the methodology of the below theses.

Alex Trappenberg: Social engineering binnen Nederlandse zelfstandige bestuursorganen (PDF).
Jeroen Bakker: Informatiebeveiliging in huisartsenpraktijken (PDF).
Dennis Spijker: Social engineering binnen de Nederlandse Rijksoverheid (PDF).
Mustafa Nizami: 'Veilig' gedrag tegen Social Engineering in Industriële Automatisering (IA) omgeving van Rijkswaterstaat (PDF).
Krijn van der Laan: Risicomanagement van social engineering (PDF).

Improving Foundations of File Recovery

Vincent van der Meer, PhD thesis, Open Universiteit.
Defended: September 2024.
Co-supervised with Prof. dr. ir. H. Vranken and Dr. J. van den Bos.
Link: thesis.
Publications: WIFS'19, DFRWS-APAC'20, WSDF@ARES'21, WSDF@ARES'23, DFRWS-EU'24.
Best paper awards: WSDF@ARES'23, DFRWS-EU'24.
Related MSc projects: 1.
Related BSc projects: 1, 2, 3, 4, 5.

Challenges in file recovery are manifold. Of particular importance is the challenge posed by file fragmentation, where files are spread across multiple fragments on different locations on the storage medium. Prior to this work, the most recent survey measuring file fragmentation was severely outdated, leaving a gap in the current data for the field of digital forensics that this dissertation addresses by means of a data collection of fragmentation in the wild. This data collection resulted in the largest dataset on file fragmentation since 2007. This data collection revealed that, (1) the average rate of fragmentation has decreased, largely due to the automated scheduled defragmentation processes, and (2) the sheer volume of fragmented data has risen, a trend tied to the increasing capacity of hard drives. Interestingly, it also uncovered that nearly half of all fragmented files exhibit out-of-order fragmentation, a form of file fragmentation not accounted for in current file carving tools.

The collected dataset enabled a detailed study of file timestamps, including their manipulation and the impact of file operations on these timestamps. By examining the state of a file prior to specific operations (effectively, analyzing the reverse effect of file operations on timestamps) we were able to reconstruct potential file histories. This methodology, along with its visualization, was showcased and automated through the development of two artifacts.

Lastly, the dissertation focuses on the JPEG file format. Due to the lack of effective algorithms for identifying fragmentation points in JPEG files, recovering these files presents a considerable challenge. Following an in-depth analysis of the JPEG decoding process, we developed a validation algorithm for JPEG files. Unlike many existing approaches, this algorithm operates on deterministic principles, guaranteeing consistent results with identical inputs. The algorithm underwent rigorous and extensive testing, applying it to a wide array of JPEG files that included both baseline and progressive formats. The results are exceptionally convincing: in average-case scenarios, the fragmentation point was detected in 99.997% of cases (for baseline encoded JPEGs) within 4 kilobytes (the most common block size of NTFS). Even under the most challenging conditions, a detection rate of 99.4% was achieved. These outcomes underscore the effectiveness of the algorithm, leading us to conclude that the longstanding problem of JPEG fragmentation point detection has been effectively solved.

Web Scrapology — overcoming limits of automating web measurements

Benjamin Krumnow, PhD thesis, Open Universiteit.
Defended: December 2023.
Co-supervised with Prof. dr. ir. H. Vranken and Prof. dr. S. Karsch (TH Köln).
Link: thesis.
Publications: ESORICS'19, MADWeb'20, IMC'21, J. Computers & Security, CoNEXT'22, MADWeb'23.
Related MSc projects: 1, 2, 3, 4.
Related internship projects: 1.
Related BSc projects: 1, 2.

In general, automated measurement tools should be able to collect data from the Web as users experience it. That is, being able to go beyond the Web shown only to automated visitors. This becomes a necessity when web measurement tools are used to gain insight into online privacy and online security on the Web. To this end, measurement tooling must overcome two obstacles. First, websites limit reachability of content. They do so not to stop automated measurements, but for functionality reasons. Secondly, websites intentionally oppose automated visitors. They may use blocking mechanisms such as captchas, or, more insidiously, tailor responses to automated visitors, e.g., leaving out advertisements or videos. From a measurement perspective, obstacles may result in blind spots which undermine a study's validity and limit its significance.

This thesis shows that both types of obstacles can be overcome, albeit at the cost of considerable engineering efforts. We use the resulting proof-of-concept tooling to analyse various aspects of web security that were previously unreachable at scale. We find that significant differences arise from overcoming the aforementioned obstacles. As such, we thus conclude that, for some types of web measurements, such efforts are necessary.

Enforced Privacy: from practice to theory

Naipeng Dong, PhD thesis, University of Luxembourg.
Defended: November 2013.
Co-supervised with Prof. Dr. S. Mauw and Dr. J. Pang.
Link: thesis.
Publications: FAST'10, FHIES'11, ESORICS'12, ESORICS'13.

The project focused on formalising enforced privacy. Starting point was to investigate practical requirements for enforced privacy in voting, healthcare, and auctions. Using the resulting domain-specific formalisations, a generalisation step was made, which captures not only privacy-reducing conspiracies, but also privacy-preserving coalitions.

Reconstructing Files with Out-of-Order Fragmentation -- a new approach to file carving

Nick Huijsmans and Bart Kuijsten, bachelor project (OU).
Finished: January 2025.
Links: thesis, source code.

File carving, the recovery of files without file system metadata, faces significant chal- lenges when dealing with out-of-order (OoO) fragmented files. While approximately only 4.4% of files in NTFS systems are fragmented, about 46% of these fragmented files show OoO fragmentation, highlighting the importance of addressing this challenge.
To enable systematic evaluation of carving algorithms, we developed a test set gener- ator capable of producing disk images with controlled, adjustable, levels of fragmen- tation. Our analysis of existing tools, including Foremost, PhotoRec, Scalpel, and JPG-Carve, revealed significant performance degradation when handling (OoO) fragmented files. That analysis provides input for a comprehensive framework to improve file carving strategies for OoO fragmented files. We introduce a novel two-stage carving framework that significantly enhances recov- ery performance. The first stage focuses on efficient recovery of non-fragmented files while minimizing memory usage, thereby reducing the search space for the more com- plex fragmented cases. The second stage implements best first search strategy, with a heuristic based on statistical analysis of the WildFrag dataset, which reduces the search space by 93%.

Measuring Evolution of Cookie Dialogues

Violeta Sizonenko, bachelor project (RU).
Finished: October 2024.
Links: thesis.

In their thesis on detecting cookie dialog dark patterns, Koen and Maarten found that within the EU, French websites (that is, .fr websites) offer significantly more often a `reject all' option than other countries. They hypothesized it could be due to the French DPA imposing harsh fines for lack of this button in early 2022. This project developed a method for obtaining historical data from the Web Archive to track evolution of cookie dialogs over time.

Preparing Passports for the Post Quantum Era

Siebren Lepstra, master thesis (OU).
Finished: October 2024.
Links: thesis.

Quantum computers pose a significant threat to electronic machine readable travel documents, such as passports, which are used for long periods, from 10 to 15 years. This long lifespan of these documents makes the potential impact of quantum computers even more critical, as the current security protocols rely on classical asymmetric cryptography which will be vulnerable in a post-quantum era. In par- ticular, passive authentication, the protocol responsible for safeguarding the in- tegrity of the travel document, is at risk. This research evaluates and benchmarks three quantum-resistant signature algorithms currently being standardized by NIST: Dilithium, Falcon, and SPHINCS+. The findings indicate that Dilithium and Fal- con are well-suited to replace the Public Key Infrastructure (PKI) used in passive authentication due to their small digital footprint. SPHINCS+ is, by contrast, less suitable for this application due to its large signature size. Consequently, this study establishes a new quantum-resistant PKI and eMRTD, marking a significant step forward in securing travel documents against future quantum threats.

Bits don't lie — detecting NTFS driver fingerprints

Nick Borchers, master thesis (OU).
Finished: December 2023.
Links: github repo, thesis.

NTFS is the default system in Windows since Windows 2000. In addition to a plethora of NTFS drivers by Microsoft, there are also third party drivers, such as a MacOS driver by Paragon and NTFS-3g, included by default in Ubuntu. These drivers may all behave subtly different. Disk allocation strategies, fragmentation patterns and other properties may reveal what NTFS driver operated on a disk. The goal of this project is to find specific characteristics which indicate a specific driver. Being able to identify one or more drivers used on a disk from its contents has applications in digital forensics, such as finding hidden OSes, or assisting in determining file origins.

Enabling users to enforce privacy — towards a privacy-preserving document life cycle when digitizing and sharing documents

Jan Ouwehand, master thesis (OU).
Finished: December 2023.
Links: github repo, thesis.

Offices often have networked printer/scanner devices (multi-functional printer, MFP). Scanning a document on an MFP typically requires a user to log in, following which the scan is made and a mail with a PDF of the scan sent to the user. While this could provide some security, plaintext copies of the PDF may appear in many locations: on the MFP's onboard storage, in the MFP's backend, in the organisations mail server, and, lastly, in the user's mailbox. This project investigated the possibility of direct and seamless encryption. Prior to scanning, the user's (NFC-capable) smartphone creates an encryption key together with the MFP. Then, the PDF is encrypted from there on out. The second part consists of an Outlook plugin, which uses Bluetooth to check for presence of the smartphone in order to acquire the decryption key when the user opens the PDF.
Lastly, the project investigated the possibility for extending this strong privacy protection for digital sharing of the scanned document. This resulted in an initial proposal for a standard way to make company document sharing policies public, and a way to integrate that automatically into the user's mailer.

Browser-based port scanning

Bas van de Louw, master thesis (OU).
Finished: December 2023.
Links: Github repo, thesis.

This project investigated browser-based port scanning, a technique that allows for the detection of open ports on a target system through the use of a web browser. The study first investigates the optimal strategy for browser-based port scanning, then feasibility of using port scanning to identify specific programs running on a user's system, and lastly the uniqueness of browser-based port scanning fingerprints. The study demonstrates that browser-based port scanning can serve as an effective alternative to traditional port scanning techniques. The results suggest that browser-based port scanning can accurately identify specific programs running on a user's system. This has concerning implications because browser- based port scanning is a client-side, local operation on the user's system, unlike regular port scanning, which might be leveraged to bypass intrusion detection systems, such as a firewall. Furthermore, the study estimates the uniqueness of browser-based port scanning fingerprints, which has significant implications for user privacy and internet anonymity. The study reveals that browser-based port scanning fingerprints are distinct enough to be employed as a means of tracking users across various websites, highlighting the need for enhanced privacy measures by modern web browsers.

Mining Software Repositories To Discover The Usage Of Concurrency

Evert van Kammen, Serge Gordts, and Chris Adoorn, bachelor thesis (OU).
Finished: October 2023.
Links: thesis", commit extractor.

Multicore CPUs have become standard in computers, and recent trends raise the number of cores further. It seems, however, that programmers are not making optimal use of this available computing power. That is: most programs do not leverage parallel programming to a significant degree. If true, this implies that a lot of speedup can be gained by better facilitating the development of parallel programming. The goal of this project was to perform a proof-of-concept study on Github, measuring the prevalence of parallel programming primitives used in projects in three languages, in order to ascertain whether parallel programming is indeed an infrequent occurrence.

Inside the Cookie Jar: Tracking Consent in Digital Advertising

Camile Lendering, bachelor thesis (RU).
Finished: July 2023.
Links: thesis, Github.

Cookie consent dialogs are on just about any website one goes to (except this one :). Accepting cookies is typically made far easier than rejecting cookies. Moreover, thanks to the use of dark patterns, it may be the case that a user believes to have rejected all cookies, but still receives some non-functional cookies. In this project, the student investigated a way to investigate ad vendor compliance as well as Cookie Management Platform compliance. This was done by setting cookie choices directly via a cookie, thereby bypassing the need to interact with the cookie dialog. This method allows for testing whether the injection was successful; only in cases where it was, did the analysis continue. Various choices were tested, including logically inconsistent ones (e.g., reject all purposes combined with accept all vendors). Violations of the explicitly set (and processed) cookie preferences were common.

Using graph-based anomaly detection to uncover scientific fraud

Wibren Wiersma, master thesis (RU).
Finished: September 2022.
Links: thesis, GitHub.

The volume of academic publications doubles every 10-15 years. Logically, the volume of academic fraud would then similarly double. Despite more attention, efforts to detecting fraud are still severely lagging behind the torrent of publications. Moreover, current methods for detecting fraud focus on evaluating individual papers for tell-tale signs of fraud (plagiarism, faked data, image manipulation, etc.). However: papers do not commit fraud, their authors do.
This thesis explores the idea put forth by Westerbaan that since academic fraud ultimately benefits its purpetrator, such fraud should lead to a cycle in a graph of the publication process. Data acquisition and integration builds upon and improves Westerbaan's efforts, and a new approach to outlier detection for exponential distributions is proposed. The resulting graph database allows for finding cycles not related to known fraud types, which raises the possibility that unknown types of fraud may also be detected via this method.

Using GUI testing to automate website security analysis

Jeroen Hoebert, master thesis (OU).
Finished: September 2022.
Links: thesis.

TESTAR is a Java-based tool for GUI testing. It has recently been expanded to enable testing of websites (using Selenium + Webdriver). TESTAR does this in a random fashion: it does not follow a pre-programmed path over the site, but selects links at random to follow. It continues this process to establish a complete picture of a website. This allows for a more holistic view on the security of the site: does one part of the site adversely affect security of another part? The goal of this project was to incorporate scanning for various security aspects into a holistic security assessment tool using TESTAR, such as cookie security, HTTP headers, use of insecure connections, injection attacks, etc. The results have been incorporated into the TESTAR tool.

Cookie dialog compliance

Koen Berkhout and Maarten Meyns, BSc thesis, Open University.
Finished: August, 2022.
Links: thesis, Cookie Dialog Evaluation Assistant plugin, Automated cookie dialog crawler & evaluator.

Although the presence of cookie dialogs suggests that users can decide which cookies they accept, the possibility exists that websites do not always fully comply with cookie preferences. The goal of this project is to test this assumption in a systematic and automated way. To reach this goal the project is divided into two substudies, each of which is mainly carried out by one student. The first substudy, conducted by Koen Berkhout, investigated the compliance by a limited number of top websites with the cookie preferences stated by their visitors. Reviewers evaluate cookie dialogs, assisted by a Chrome plugin and backend that supplies the website to review, records cookies set initially and after making choices, and the number of clicks needed to "deny all" in detail. The second substudy, conducted by Maarten Meyns, investigated scaling this up to automatically studying thousands of sites. This study automated the ability to detect cookie dialogs and interact with these cookie dialogs. This resulted in an automated crawler that relies on pretrained ML models to detect cookie dialogs and classify options in these dialogs. The results are indicative of widespread use of dark patterns, and of failure to comply with legal obligations on an unprecedented scale.

Acquisition and integration of public data to improve detection of scientific fraud

Ewoud Westerbaan, MSc thesis, Open University.
Finished: April, 2022.
Links: thesis.

The quantity of scientific publication increases at an ever-increasing pace. Current anti-fraud measures focus on specific suspects and specific attack modes (plagiarism detection, etc.), and cannot keep up. In this thesis, we investigated whether it is possible to offset this deluge by focusing only on high-impact cases, in a more generic approach. To that end, we enriched existing public datasets with publicly available data. In the resulting data set, we performed group-based outlier detection to identify individuals of interest, that is, those where manual investigation is warranted.

Towards better web measurements by mitigating impacting factors

Jorgos Korres, MSc thesis, Open University.
Finished: January 2022.
Links: thesis.

Webpages change frequently, for a variety of reasons. Sometimes, this even results in different visitors being shown different content. Such differences can impact web studies, especially studies where multiple scraping runs are compared to each other. This project constructed a taxonomy of different causes for such differences, and investigated mitigations across the entire spectrum.

A Measured Evaluation of Artificial Filesystem Aging Tools

Zowie van Dillen, BSc thesis, Open University.
Finished: September 2021.
Links: thesis, disk measurement tool.

Filesystems accumulate file fragmentation over the course of years of active use. This phenomenon is known as "filesystem aging". For validating the performance of algo- rithms related to filesystem aging, it is a huge benefit to have a tool that can help gener- ate filesystem aging in a comparatively short period of time, such as a single day. These tools exist, and are known as "artificial aging tools". In this thesis we compare the performance of several artificial aging tools: Geriatrix, Impressions and Compilebench. The comparison is based upon how much they fragment the filesystem, as well as upon how realistic the filesystems they generate are. We conclude that, by our metrics, Impressions generates filesystem aging most efficiently. We also conclude that none of the aging tools produce allocation patterns that are similar to the allocation patterns of computers that were aged the normal way.

Synchronising Distributed Scraping

Godfried Meesters, MSc thesis, Open University.
Finished: August 2021.
Links: thesis, DiffScraper synchronisation tool, mobile+desktop scraper.
Publication: MADWeb'23.

Price differentiation refers to a commercial strategy of charging different prices for the same product or service. A given e-commerce company can offer the same items through multiple outlets, such as a website or a mobile application. To assist in comparing outlets, data needs to be collected simultaneously on a large scale. Manual data collection can be used, however the amount of data that can be collected manually is limited. In this study, a distributed and synchronized web scraping system is designed. An unlimited number of web bots taking jobs in a pub/sub system can be accommodated that synchronize to each other. To validate the design, an experiment with price differentiation in the travel industry is conducted with a focus on flight ticket prices. In the experiment, prices are collected from the company's app, from a desktop version of the website, and from the mobile version of their website.

Nothing to see here! On the awareness of and preparedness and defenses against cloaking malicious web content delivery

Jeroen Pinoy, MSc thesis (2 yr CS programme), Open University.
Finished: August 2021.
Links: thesis, cloaking test site source.

Website cloaking is a technique that enables websites to deliver different content to different clients, with the goal of hiding particular content from certain clients. Website cloaking is based on client detection, which is achieved via browser fingerprinting. In an attempt to hide their malicious web pages from detection, cyber criminals use cloaking. They use vulnerability detection to only target clients that seem vulnerable. On top of that, they also provide benign content in case they suspect someone or something is trying to detect them. On the other hand, security analysts use security web crawlers, automated tools that crawl web pages and analyze them, for example to find malicious web pages. One example of such tools are honeyclients, also known as client honeypot web bots. Honey- clients are browser clients that are purposefully left vulnerable or that emulate vulnerable browsers. They are the client equivalent of a so-called server honeypot [QH10; QZ11], a server that is left vulnerable on purpose to lure in attackers, thus distracting and de- tecting them. The goal of a honeyclient is to detect webpages delivering malicious code. They are a potential counter to cloaking. While there is prior research into bot detection and browser fingerprinting [JKV19], it is currently not clear to what extent security web crawlers are distinguishable from regular clients, and thus whether cybercriminals can avoid sending malware to such clients by using generic cloaking techniques. It is also not clear to what extent cyber security professionals and their organisations are aware of and prepared for web based attacks using cloaking, or how their awareness and preparedness could be improved. In this work, we investigate to what extent security web crawlers can be detected by browser fingerprinting techniques, and provide suggestions for how to improve them to better hide from those techniques. We survey security analysts and analyse a set of threat intelligence sharing communities, to gauge awareness of cloaking as an available detection evasion method for cybercriminals. Finally, we investigate one final technique, the use of Cache-Control: no-store, which an attacker might be able to use to thwart forensic analysis.

Cookie dialogs and their compliance

Koen Aerts, MSc thesis, Open University.
Finished: July 2021.
Links: thesis, cookie-dialog crawler, analysis crawler.

All those annoying cookie dialogs you encounter on the web every day? These are governed by specific European laws. Earlier research has shown that many of these cookie dialogs do not conform to the legal requirements. However, since the government entities tasked with upholding these laws are extremely understaffed, these practices can continue unpunished. Therefore, Koen looked into the possibility of automating compliance checking of cookie dialogs. Showing with a Proof-of-Concept that a pro-active wide-scale audit process supported by automation is possible.

Essentially, Koen extended a web crawler with ways to recognize (certain characteristics of) cookie dialogs and tested this crawler against lots of European websites. Comparing this to, for instance, the number of advertising cookies that are being set before users have consented, gives some truly shocking results!

Advanced file format validation for file carving

Mart Peters, MSc thesis, Open University.
Finished: July 2021.
Links: thesis, software.

In this thesis we investigate how file format specifications can guide file format validation. We propose a method to determine whether file format validation is feasible and how this can be achieved using existing validation techniques. To answer this question we approached this problem from a file format perspective, because file format validation relies on properties of a file format. We analyzed popular file formats of commonly used file types to identify and generalize commonly used file format concepts across the different file format specifications. The analysis resulted in the identification of commonly used file format concepts. This resulted in a method to determine the feasibility of file format validation.

To verify the proposed method we apply the method on a complex file format. The PST file format is identified as a suitable candidate, because related work found out that PST files are frequently fragmented on a system. The PST file format is used for storing e-mails and calendar items of Outlook. We implemented a PST file validator using the suggested validation techniques provided by the method. The implemented PST validator was able to recognize file fragments and can be used to reconstruct file fragments into the original file.

Feasibility of simulating the Java programming process

Wouter Hueting, BSc thesis, Open University.
Finished: June 2021.
Links: thesis, software: host for simulator, simulator, experiment to measure impact of using clientside program.

Data on file system aging is relevant for file system design and forensic research. However, acquisition of large quantities of real-life data is hard: many hard disks need to be found, and their owners must be willing to share the state of their file system. Privacy issues increase the effort needed to acquire data. Therefore, simulation is an accepted way to age file systems. However, current approaches do not simulate actual computer use, but focus mainly on disk interaction.

In this project, we investigate the feasibility of simulating actual computer use. To this end, we use as input the treasure trove of actual usage data on the programming process publicly available on GitHub. We construct a proof-of-concept simulator inside a virtual machine which runs an IDE. In this IDE, we replay the full history, as stored in Git, of the programming process. This includes typing commits at human-alike speeds, committing to a local git repository, working on different branches, merging, etc. We encounter and solve several issues, amongst which: incompatibilities between the IDE and the data stored in Git (e.g., filenames containing spaces are disallowed in the IDE), communication between IDE and host, handling branching in the IDE. The resulting proof-of-concept simulator shows feasibility by being able to replay a small project fully. There remain various obstacles to scaling this up to many projects, which are left as future work.

Camouflaging OpenWPM

David Roefs, Research Internship, Radboud University.
Finished: June 2021.
Links: internship report, HLISA software library.

Vlot found out in his thesis that a significant proportion of web sites have detection for scrapers. OpenWPM is a popular scraper used in over 60 published scientific studies. If web sites detect that OpenWPM is visiting, they may present a different site than they would to a human visitor. In short, in order to make OpenWPM more reliable as a data gathering tool, a stealthy version is needed.

In this internship, we investigated how interaction characteristics of scrapers (mouse clicks, mouse movement, typing, scrolling, focus changes) differ from when humans interact with a page. We developed a new interaction library (HLISA) which addresses these shortcomings. HLISA, a Human-Like Interaction Selenium API, provides an interaction API that closely resembles human interaction. Finally, we considered the arms race between simulators and detectors, and where HLISA falls within this arms race.

Recognising client-side behavioral detection of web bots

Mitchel Jansen, BSc thesis, Radboud University.
Finished: January 2021.
Links: thesis.

Scraper detection may influence the result of web studies. Scraper detection may be done by browser fingerprinting techniques, as investigated by Gabry Vlot. An alternative is to look for telltale signs of scraper behaviour. This thesis investigated behavioral detection. It analysed known scraper detection scripts for signs of behavioral detection, incorporated this into a static analysis/scoring mechanism and used this in a scan of statically loaded scripts in the Tranco Top 10K web sites. Several scripts exhibiting behavioral detection were found.

Towards finding browser fingerprinters through automated static analysis of JavaScript code

Bart van Vulpen, BSc thesis, Radboud University.
Finished: April 2020.
Links: thesis.

Browser fingerprinting is a technique to reidentify browsers by recognising their unique set of attributes (screen resolution, version number, fonts, etc.) and behaviour (canvas fingerprinting, audio fingerprinting, etc.). While there are known ways to detect specific instances of browser fingerprinting, there is no generic approach to detect all commercial browser fingerprinting yet. This thesis provides a first effort towards such a generic approach.

Design and implementation of a stealthy OpenWPM web scraper

Daniel Goßen, BSc thesis, Radboud University.
Finished: April 2020.
Links: thesis.

Web sites employ scraper detection for a variety of reasons. Such scraper detection can result in omitting content or even blocking behaviour when a scraper is encountered. This obviously interferes with scrapers designed to study the web. This thesis examines the extent to which OpenWPM, a popular research-oriented scraping tool, is distinguishable using modern approaches (fingerprint surface and javascript templates). Then, it investigates the extent to which existing countermeasures aid in lessening the distinctiveness of OpenWPM. Finally, building upon approaches from these countermeasures, it incorporates a significatn distinction-lessening measure into OpenWPM and manually validates its effectiveness.

Refining user context detection on smartphones

Nils de Jong, MSc thesis, Open University.
Finished: October 2019.
Links: thesis.

Smartphones have a wide array of sensors and radio devices which can be used to determine context. This thesis investigated machine learning approaches to refine context recognition on smartphones. It found that multi-label classification is a promising avenue to help refine contexts. Specifically, distinguishing between multiple levels of location (building, floor, room) can help in refining the classification of current activity.

Interpreting NTFS Time-stamps

Jelle Bouma, BSc thesis, Open University.
Finished: August 2019.
Publication: ARES-WSDF'23, best paper award.
Links: thesis, Github.

The NTFS file system records 8 time stamps per file.This thesis investigates how these time stamps may change due to regular (user-triggered) operations. The resulting list of time stamp effects is used to develop an approach to recovering which operations have been applied to a file, based on the current state of these 8 time stamps. Finally, a proof-of-concept implementation of this approach was created.

Synthetic Fragmentation Experiments using WildFragSim

Robbert Noordzij, BSc thesis, Open University.
Finished: August 2019.
Links: thesis.

Longitudinal studies on fragmentation and block allocation in hard disks are hard to execute: simulations will not accurately model disks found in real life, while convincing a large group of disk owners to participate in a years-long study has its own problems. In this research, a tool is developed to improve simulations to more closely mimic human behaviour - in particular, writing rhythm. The tool is currently geared towards investigating fragmentation effects.

File dating based on the Physical Location of the File

Yvonne Vollebregt, BSc thesis, Open University.
Finished: August 2019.
Links: thesis.

In digital forensic investigations, it is necessary not only to prove that a specific file existed on the examined device, but also to establish some bounds upon the length of duration. Presence of a file that existed only for a split second will likely be regarded very differently than a file that existed on the device for months. To establish such bounds, two file dating mechanisms were tested: dating the entire disk based on the physical location of files, and dating a single file based on the dates of 10 neighboring files. The second method proved more useful, but neither will be sufficient in all cases.

FP-Block 2.0: preventing browser fingerprinting

Nataliya Yasko and Siebren Cosijn, BSc thesis, Open University.
Finished: August 2019.
Links: thesis, software.

FPBlock is a browser plugin (developed by Christof Ferreira Torres) that prevents fingerprint-based cross-site tracking. Since its release in 2015, several critiques were released. Moreover, Firefox moved to a new plugin model, which necessitated an update. FPBlock 2.0 addresses several shortcomings of the original FPBlock, improving tracking resistance. Specifically, resistance to canvas fingerprinting and font fingerprinting is repaired and an initialisation bug is fixed. In addition, several small inconsistencies in the used web identities were uncovered and addressed. Moreover, fingerprint generation is now done a priori, and fingerprint selection is done in near-constant time. This allows FPBlock to be used on millions of sites without the fingerprint selection process grinding the browser to a halt.

Shepherd: Enabling Large-Scale Scanning of Websites after Social Single Sign-on

Alan Verresen and Jelle Kalkman, BSc thesis, Open University.
Finished: August 2019.
Links: thesis. (this software is not made publicly available)

Shepherd is a tool for automating website logins, enabling studies of post-login content. So far, Shepherd worked with domain-specific credentials. This project worked to add support for single-sign on logins, such as logging in with your Facebook or Google account. The resulting extension to Shepherd generalised the concept of single-sign on, and can also support non-western identity providers.

Comparing privacy plugins

Nick Nibbeling, BSc thesis, Radboud University.
Finished: July 2019.
Links: thesis.

This thesis compares several plugins that claim to offer improved privacy to their users.

Research Internship: Towards a Fingerprint Surface

Rick Dolfing, MSc research internship, Radboud University.
Finished: April 2019.
Links: thesis.

When fingerprinting is discussed, a lot of different terminology is being used. Studies use different terms, are not always clear about the studied features and propose methods which lack a fundamental approach. There is an understanding about the concept of fingerprinting, but it lacks a fundamental way to reason about it. We talk about a fingerprint surface and propose a taxonomy to use when discussing this topic. We applied our taxonomy to important studies over multiple years and reason about counter measures. We argue that the fingerprint surface is a fundamentally hard to problem, as it is hard to be complete in every category. We identified the one category that can be complete and we introduce the Prop-test to return the entire fingerprint surface for this category.

Turning It Off: context-driven prevention of passive WiFi-tracking

Aksel Harrewijn, MSc thesis, Open University.
Finished: May 2019.
Links: thesis (in Dutch).

Smart phone tracking has become popular in the public space. This thesis investigates whether countering this via a simple machine-learning based app that turns off wifi when outside trusted areas is a viable solution. In comparison to a simple polling solution, it guards privacy better, thereby showing the potential for using machine learning to counter wifi-tracking.

Automated data extraction: what you see might not be what you get

Gabry Vlot, MSc thesis, Open University.
Finished: July 2018.
Links: thesis, software: webbot detection scanner, fingerprinting webserver tailored for webbots (both on GitHub).
Publication: ESORICS'19.

More and more research relies on automatically visiting web pages and processing the results. These investigations typically do not account for the possibility that a web site returns different content to a scraper or web bot than to a regular user. In this project, we provide a taxonomy of how to detect web bots. We provide a characterisation of the client-side detectable fingerprint surface of 9 web bots. Using these findings, we design a generic detector that detects web bot detection. We implement the detector and found that ~11% of web sites in the Alexa Top 1 million is employing some form of web bot detection.

50 ways to lose your cover

Annet Vink and Katleen de Nil, BSc thesis, Open University.
Finished: July 2018.
Links: thesis, cover song.

In 2010, Eckersley released a web site that measured various attributes, and computed how unique each visitor was compared to the visitors that preceded. This was the first (public) foray into browser fingerprinting: tracking a browser based on its attributes. After attracting half a million visitors, Eckersley was able to determine for many different attributes of a browser (screen resolution, user agent, etc.) how much impact they had on privacy.
Recently, Torres & Jonker investigated fingerprinting on mobile phones and found that the incidence of fingerprinting in apps is significantly higher than the most recent data on fingerprinting in web sites. This project developed a test setup, similar to Eckersley's, to be able to test to what extent any given attribute on a mobile phone impacts privacy.

Automating outlier detection in academic publishing

Niels Tielenburg, MSc thesis, Open University.
Finished: June 2017.
Links: thesis, software.

In recent years, several egregious cases of scientific fraud emerged (e.g. Diederik Stapel, Hyung-In Moon). These people were detected by peers suspicious of uncharacteristic speed in the scientific process - not by any procedures in the scientific publishing process.

This project took a first step towards addressing this. Due to the ever-increasing volume of research output, automation is necessary. Unfortunately, detecting fraud cannot be fully automated - some types of fraud strongly resemble behaviour of excellent researchers. The scope of the project was therefore to design a method to identify those scientists where manual investigation is warranted.

The project designed heuristics on publication data and provided tooling to automatically gather publication data. This helps to identify outliers. Such outliers were then further investigated by a partially automated comparison with their scientific peers (co-authors, co-editors, papers published in the same venue, etc.). Outliers who stand out from their scientific peers are of interest and could be manually investigated.

Private tweeting

Bas Doorn, Lucas Vos BSc thesis, Open Universiteit.
Finished: June 2017.
Links: thesis, software.

Social networks provide wonderful possibilities for interaction. In turn, they learn the social network of each user. The idea behind this project is to create a layer on top of an existing social network (e.g. Twitter), that allows a user to use all the advantages of a social network without revealing the full extent of his/her network.

Counting Sheep: analysing online authentication security

Marc Sleegers, BSc thesis, Open Universiteit.
Finished: March 2017.
Links: thesis, poster CSW-NL.
Publication: MADWEB'20.

Modern websites store an authentication cookie on the client computer when the login process succeeds. This cookie allows the end user to remain authenticated for the remainder of the session, forgoing the need to supply credentials to each following request. An attacker can steal this cookie in a session hijacking attack, effectively authenticating as the victim without needing the username and the password.

As such, it is vital that authentication cookies are used securely over an encrypted connection. Firesheep showed that websites typically protected the login process, but dropped down to an insecure connection immediately after in 2010. While this secured credentials, it left the authentication cookie exposed. Following this, Firesheep allowed any attacker to trivially hijack sessions on sites such as Google and Facebook. The websites in the spotlight quickly implemented a secure connection across the board, but it is unknown if others followed suit.

Analysing how widespread "faux" online authentication security still is first requires a way to identify domains that are vulnerable to session hijacking. We conclude that this type of faux web security can be identified by analysing the authentication cookies of a site. During initial testing we found that the problem still exists today, despite the internet ecosystem appearing to be more secure. Additionally, we found that mobile apps suffer from the same vulnerabilities. Following these results, we developed a tool, Shepherd, which analyses a given number of domains using a pre-emptive and generic approach. Using this tool, we were able to automate the entire login process for 4689 unique domains without the need for prior information such as credentials. We found that four out of five authenticated domains (3764) are indeed still vulnerable to session hijacking.

S3 - Securely Sharing Selfies

Gert-Jan den Besten, BSc thesis, Open Universiteit.
Finished: November 2016.
Links: thesis (NL), developed app.

Photosharing has become very popular. In this project, we combine photos with context information: information about when and where the photo was taken. This is used to ensure access is only granted in similar circumstances.

A GIMP-plugin for deblurring uniform motion blurred text images

Else van Schaijk, Ton Poppe, BSc thesis, Open Universiteit.
Finished: July 2016.
Links: thesis (NL), software.
News: SoylentNews.org.

Blurring is way to add "smearing" to pictures. This can be used to hide text: a strong enough blur smears the text such that it becomes illegible. This may be used to hide information. For example, a Dutch newspaper broke a story on national exams being available on the black market by showing the first page of the exam - with the questions blurred. This evokes the question: can the blur be reversed?

This project build on the work of F. Lange in the Deblurring Text project (see below). Based on this, an investigation of the state of the art in text deblurring was done, and the students selected the then-best algorithm by Pan et al, and implemented this text deblurring algorithm as a GIMP plugin.

Fingerprint Privacy: A Fresh Perspective on Web Privacy

Christof Ferreira-Torres, BSc thesis, University of Luxembourg.
Finished: June 2014.
Links: thesis, software
Publication: ESORICS'15.

Web tracking is pervasive. Common tracking techniques depend on client-side storage and can thus be thwarted by savvy users. However, an emerging class of trackers is using so-called browser fingerprinting to track users across different sites. These trackers take a series of measurements of the client's browser, such as screen resolution, time zone settings, operating system, etc. The combination of all these measurements (the "fingerprint") is often unique. Thus, users can be tracked by their fingerprint.

Existing measures for online privacy fall into two categories: blocking the trackers or faking the measurements they take. Fingerprinting trackers are embedded into commonly-used web widgets, such as social media buttons or video. Most users would not want to block such widgets, thus ensuring that the tracker is not blocked. This leaves faking the fingerprint as the most viable counter. However, research had already shown that it is often easy to detect faking - in fact, it becomes another vector for identification!

In this project, we proposed a new perspective on online privacy: consistent faking. We developed a Firefox plugin, FP-Block, that successfully prevents fingerprinting trackers from cross-site tracking. The plugin generates fingerprints based on real-world usage data, and ensures that any two generated fingerprints are distinct in at least three attributes. Moreover, the fingerprints are consistent - no iPhones running flash. Finally, the plugin ensures that any third parties whose widgets are embedded in a requested page receive the same fingerprint as used for the original request. E.g., if sites A and B both have a Facebook button, Facebook will see two different fingerprints.

Deblurring text: from theory towards an implementation

François Lange, BSc thesis, University of Luxembourg.
Finished: June 2014.
Links: thesis.

This project investigated several approaches to deblurring: mathematically inverting the (unknown) blur operation, using pre-blurred letters as a basis for comparison, etc. The field dedicated to mathematically inverting an unknown blur operation proved very rich. The then-best text deblurring algorithm, by Cho et al., was chosen for implementation as a GIMP plugin. 3 out of 5 steps of Cho et al's algorithm were implemented within the time frame of the project.

Autogenerating context to manage sensitive data in Android-phones

Xin Zhu, BSc thesis, University of Luxembourg.
Finished: June 2013.
Links: thesis.

SnapChat was a popular app for sharing photos. The premise / promise of the app was that a shared photo would be only be accessible for 10 seconds. It turned out that this was not due to any security, but merely due to hiding the shared photo from the normal user interface by renaming it. This project provided an initial investigation of how to do this with actual security in mind, and how a phone's sensors might be leveraged to support security of shared photos.

SpyDroid

Filipe Ferreira, BSc thesis, University of Luxembourg.
Finished: February 2012.
Links: thesis

In 2011, researchers showed that it is possible to determine what is typed on a keyboard by having a nearby phone detect vibrations from the typing and decoding this. The original research was carried out on an iPhone. The goal of this project was to investigate how to achieve the same on an Android phone.

The project resulted in an Android app that registers vibrations using a phone's three accelerometers and its gyroscope. These are sent to a back-end server for further processing. Moreover, the project showed that:

a stand-alone application, running only on the phone, lacks the processing power to duplicate their results;
that the rate at which sensor data is made available via the Android framework is insufficient to duplicate these results;
that this rate cannot be improved by going to the lower JNI layer;
that even accessing the linux kernel directly does not provide a sufficient data rate.


phone:	+31 (0)45 576 2143
email:	hugo.jonker@ou.nl
www:	http://www.open.ou.nl/hjo
twitter:	@hugojonker