Skip to main content

A new kind of fingerprint / Analysis of keyboard dynamics earns student trip to statistics convention

Andrew Simpson talks to students
Andrew Simpson explains his project to other students at the SDSU Data Science Symposium Feb. 6-7. On Feb. 23, he was notified the project won first place in the American Statistical Association’s Section on Statistics in Defense and National Security and will be able to present his work at the Joint Statistical Meeting in Toronto Aug. 5-10.

SDSU doctoral student Andrew Simpson has been chosen as one of 22 students to present his poster at the Joint Statistical Meeting of the American Statistical Association in Toronto Aug. 5-10.

It is the largest gathering of statisticians and data scientists in North America with nearly 7,000 attendees, including 1,000 students. His adviser, Semhar Michael, an associate professor of statistics at SDSU, said, “The student paper competition is a big part of this conference and, to my knowledge, this is the first time an SDSU student has been selected.”

Simpson, a first-year doctoral student in statistics, received a $1,000 travel award to attend the conference for having the best paper in the Statistics in Defense and National Security section. Simpson, of East Bethel, Minnesota, explained that the American Statistical Association sponsors contests in about 22 subject areas. Winners are invited to make presentations at the international gathering.

Kurt Cogswell, head of the SDSU Department of Mathematics and Statistics, called Simpson’s honor “a remarkable achievement, particularly for a person in the first year of a Ph.D. program. This bodes well for an outstanding career employing statistics to have high impact on matters of national security.” 

Simpson anticipates a career analyzing defense and national security problems from a statistical perspective, working in either academia or a national laboratory.

But that is a ways down the road. He projects to complete his doctoral degree in 2026. He already holds master’s degree in statistics (2022) and bachelor’s degree in mathematics and data science (2021), both from SDSU.


Understanding keyboard dynamics

The paper that earned him the trip to Toronto is “Finite Mixture Modeling for Hierarchically Structured Data with Application to Keystroke Dynamics.”

He explained that keystroke dynamics are the time it takes one to press and release keys on a keyboard.

“Since everyone has a slightly different typing rhythm, the user of a computer system can be identified by how they type. We can think of this as being similar to a fingerprint to unlock your phone or computer. Since passwords are often compromised, monitoring keystroke dynamics can add an extra layer of security by detecting unusual keystrokes, at which point it may be assumed that unauthorized personnel are using the system,” Simpson said.

Collecting this data would be the function of a software program that a user would put on the computer. It is still a developing technology, he said.

Simpson’s work focuses on an analyzing the data collected by that software.


Dealing with multiple users

“Many current methods for building a model of a user’s keystrokes focus on the cases when there is a single user who has authorized access to the computer system. In real-life scenarios, this is not always the case. For example, on a family computer or within a business or government agency, there may be many people authorized to access a system.

“The paper focused on developing new ways to be able to adequately group or separate keystroke patterns and build a model of users keystrokes in which many users have access to the same system.

“For example, if two people have authorized access to a system and it is unknown which user was using the system at any given moment, one would want to be able to separate all the keystrokes and say this set of keystrokes came from user one and this other set came from user two.”

The aim of Simpson’s 15 months of research and writing was to figure out which keystrokes belong to which user. “Essentially what you’re doing is you’re reaching in there and trying to pull apart this big mess of keystrokes, separate which user created which keystroke. Once you do that, you can create a model for each of these users.

“Then we can move on to Part II at a later time. That is determining if the keystrokes came from an authorized user or is it from an unknown user,” Simpson explained.


Used Gaussian mixture models

In developing his method, Simpson used a series of multi-layered overlapping bell curves, known in the statistical field as a multi-layer Gaussian mixture models, to represent the complexity present in a keystroke dynamics dataset.

“We were able to show that our proposed method not only outperformed traditional methods in terms of separating keystrokes by user, but it also provided a more accurate model of the users,” Simpson said.

He worked under the guidance of Michael, who has done extensive research on Gaussian models.