Statistical foundations of forensic toolmark analysis

Academic Internship


I have been running internships with UPenn undergraduate students to study the statistical foundations of forensic toolmark analysis. We have been meeting in a socially distanced way, so we work from different cities, but we meet every week to present new research and discuss it as a group.

The image below is an example of a comparison of microscopic screwdriver striation marks. The four marks on the left were made by the same screwdriver at different angles, and the mark on the right is the questioned mark. Toolmark analysis seeks to answer whether the mark on the right was made by the same tool as the marks on the left.

Image taken from the Busey Lab.

Fall 2020 Internship

The team is working on two research tracks:

  1. Black-box studies: Learn about what makes a good black-box study.
    1. What is the purpose of a black-box study?
    2. What is the best experimental design for such a study? (And what makes other designs not useful?)
    3. Which contextual information is relevant to forensic examinations (to be included in a black box study), and which is irrelevant?
  2. Database design: Design a database of toolmarks.
    1. What tools are the best for an initial study on toolmark analysis?
    2. Which factors (e.g. angle, force, surface materials) should be selected and why?
    3. How can we design a database that helps us learn about difficult (“complex”) and easy comparisons?
  3. Algorithmic design: Design an algorithm to evaluate the similarity between toolmarks.
    1. Which machine learning tools are most useful for toolmarks?
    2. What are the benefits and limitations of deep learning?
    3. What measure of uncertainty might be a good fit for toolmark comparisons?


Amy Win

Amy Win is a senior at the University of Pennsylvania majoring in criminology and minoring in South Asia Studies. She is interested in refugee law and Asian criminology through a postcolonial lens. She is currently working with Southeast Asian refugees in South Philadelphia through Southeast by Southeast, a community center that provides ESL classes and citizenship classes among other types of support. As an intern, she is focusing on designing a black box study that tests how forensic examiners make their decisions on toolmark analyses with and without contextual information.

Michaela Rieser

Michaela Rieser is a third-year undergraduate student at the University of Pennsylvania studying chemistry, with a strong interest in criminology. She hopes to be able to pursue a career in forensic science. Michaela also serves as the women’s captain for Penn’s club cross country and track team which has allowed her to further develop leadership skills. For this fall internship, Michaela is researching toolmark analysis and assisting in developing a database of toolmarks.

Thomas Kyong

Thomas Kyong (C’23) is a Sophomore at the University of Pennsylvania who plans to pursue a dual degree of criminology at the College of Arts & Sciences and statistics at the Wharton School. Passionate about applying quantitative data into the field of law, Thomas hopes to bring statistical academic research to influence criminal justice policy reform as a lawyer and politician.

As a Stavros Niarchos Foundation Paideia Fellow, Thomas is an elected member of the Penn Undergraduate Assembly, an executive officer of the Penn Government & Politics Association, and involved in the Undergraduate Statistics Society. During his free time, he enjoys cooking intercultural recipes, laughing with friends, and running/biking up mountains.


Summer 2020 Internship

The team worked on five research tracks:

  1. Previous literature: Learn about the literature in the field of forensic toolmark analysis and find potential gaps. Some questions of interest are:
    1. What is the state of the academic literature about forensic toolmark analysis?
    2. Which researchers are currently working in this field?
    3. What is the audience for this research?
  2. Manufacturing: Understand factory procedures for the manufacturing of screwdrivers. Some questions of interest are:
    1. What are the most commonly used brands? Do different brands us the same factory?
    2. How are common screwdrivers manufactured and how does this affect their toolmarks?
    3. Is it possible to acquire consecutively manufactured tools?
  3. Simulations: Understand some researchers’ procedures for simulating toolmarks from a real tool. Some questions of interest are:
    1. How do they do this?
    2. What software do they use?
    3. Is it actually useful for examiners currently, or is it just a project for the future?
  4. Procedures: Find standard operating procedures (SOPs) and training manuals for forensic toolmark analysis. Some questions of interest are:
    1. How heterogeneous are laboratories in their forensic toolmark analysis?
    2. How is toolmark analysis done in practice?
    3. What are some measures to ensure accuracy and consistency in practice?
  5. Software package: Understand the capabilities of the R software packages “bulletr” that is designed for analyzing bullets. Some questions of interest are:
    1. How are bullets different from toolmarks in terms of algorithmic analysis?
    2. How much of this package can we use in making a new one for toolmark analysis?
    3. What are the best file formats for images?



Adalyn Richards

Adalyn Richards is a rising sophomore at the University of Pennsylvania studying political science and economics. She has a strong interest in law, particularly criminal and immigration law, and is passionate about learning foreign languages. She is a notary public and has worked at the Colorado Bar Association but hopes to expand her experience to include research in forensic science. As an intern, she is focusing on previous literature regarding toolmark analysis and seeks to answer the following questions: What have researchers been working on? Who is doing this research? Who cares about it?

Johanna Doherty

Johanna Doherty is a rising junior at Penn majoring in Criminology. She is interested in criminal justice reform as well as the effect that interaction with the justice system has on mental health. She has previously interned with the Joseph J. Peters institute, where she worked with survivors of trauma. For this summer internship, she is focusing on procedures, including researching standard operating procedures for forensic tool mark analysis labs and determining the heterogeneity between different manuals.

Tori Borlase

Tori Borlase is a rising junior at the University of Pennsylvania studying Philosophy, Politics, and Economics. Tori has worked for judges in the Wake County Justice System, defense lawyers, and other legal experts, and plans to become a lawyer in order to work on civil rights issues. She is interested in equal employment law, criminal justice reform, and environmental law. As an intern, she is working on Simulations: Understanding Chumbley et al.’s procedures for simulating toolmarks from a real tool, and Manufacturing: Understanding factory procedures for the manufacturing of screwdrivers.

Melina Muthuswamy

Melina Muthuswamy is a rising third-year undergraduate student at Penn in the Computer Science department in the School of Engineering. She is interested in spreading and encouraging participation in engineering for high school students and is a member of the board for PennApps, UPenn’s student-run hackathon. For this summer internship she is focusing on Simulations: Understand Chumbley et al.’s procedures for simulating toolmarks from a real tool and the Software Packages used: understanding the current capabilities of R packages to analyze bullets and understanding how they can be applied to toolmarks.

Research Questions

(Fall 2020)



(Summer 2020)

What is the Interpol report about and why is it important?

The “Interpol Report of Shoe and Tool Marks 2016-2019” is a comprehensive literature review of recent developments in forensic science regarding toolmarks. The report summarizes and relates over 150 relevant studies and divides findings into three subsections: shoemarks, striated and impression toolmarks, and invasive toolmarks. For each subsection, the report covers topics such as software for toolmark analysis and the variability of toolmark characteristics. It also identifies gaps in the existing literature and aims to inform future research to advance the accuracy and reliability of forensic toolmark analysis, which is often used as pivotal evidence in criminal proceedings. (Reported by Adalyn Richards.)

How heterogeneous are crime laboratories in their forensic toolmark analysis procedures?

Based on research collected from toolmark analysis procedural manuals from the Virginia Department of Forensic Science, the Arizona Department of Forensic Science, the Texas Department of Safety, and many others, the standard operating procedures are predominantly heterogeneous. Each manual mentions similar or identical methods for both creating test toolmarks and analyzing/comparing test and evidence toolmarks. Additionally, each manual is similarly lacking in providing standardized characteristics for comparison between test and evidence toolmarks. Overall, standard operating procedures are predominantly heterogeneous in their guidelines and language. (Reported by Johanna Doherty.)

According to the Interpol Report, what are some recent developments in automated toolmark comparison analysis?

Until recently, most forensic comparison software used only one similarity metric when comparing toolmarks to identify a potential match. Recent research indicates that using a multi-feature similarity score could yield more accurate results when comparing marks. For example, Hare et al. developed a multi-feature algorithm to compare bullet land impressions by standardizing mark characteristics with three height values variables that represent the groves and contours of a mark. Similarly, Keglevic et al. presented a multi-feature convolutional neural network to efficiently match toolmarks from large databases. In both cases, the multi-feature algorithm outperformed the baseline. (Reported by Adalyn Richards.)

What is axial rotation, and why does it matter in toolmark analysis?

In the context of toolmarks, axial rotation is the rotation of a tool around its own axis. For example, think about cutting a cake. Normally, you would position the sharp edge of the knife so that it cuts into the cake with ease. Now think about rotating the knife 90º so that the side edge of the knife comes in contact with the cake—it would make an entirely different mark in the icing. In the same way, the axial rotation of a tool can drastically change the mark that it makes. Previous studies have primarily focused on the angle of attack between a tool and a substrate. Two recent publications, however, have focused on the effect of axial rotation in toolmark analysis. Both studies found that comparison algorithms cannot successfully identify matches when the marks are made at significantly different axial angles. These findings highlight a need for technology that can distort  topographical data by scaling it to allow for comparison of toolmarks made at varying axial degrees. (Reported by Adalyn Richards.)

How do Chumbley et al. simulate toolmarks?

In “Virtual Tool Mark Generation for Efficient Striation Analysis” Chumbley et al. describe the simulation process in a few distinct steps.  First, the tool tip geometry is scanned and digitized using an IFM detector, as well as an algorithmic cleaning process to make the geometry appear more like the screwdriver tip.  This study used an Alicona microscope to obtain the surface geometry of the tool tip and the marked plates and the retrieved geometry was cleaned and refined through spike removal. Then, using a program called OpenGL, the 3D surface was projected in the direction of the tool travel. This virtual simulation of the tool tip (or a specific edge) is “squished” between two planes, and the resulting indentations are measured with a stylus profilometer. (Reported by Tori Borlase.)

How are common screwdrivers manufactured?

Flat head screwdrivers are manufactured in a few simple steps.  First, metal is shaped into a long, slim cylinder that is cut to the size of the screwdriver shaft.  Then, the end of the metal shafts are shaved down and then squished into a flattened shape.  These metal ends are then trimmed to create the typical shape of a flat head screwdriver, with the point being grinded against a wheel to fine-tune the tip.  Depending on the size of the screwdriver head that is being manufactured, these steps will change slightly.  In order to get a grip on screws, the screwdriver head is put through a blast of an abrasive solution.

Philips head screwdrivers, on the other hand, require a different process.  While they still start out as a long, slim cylinder of metal, once they are cut to size, they are shaved down into a cross-like shape, forming bevels.  These screwdrivers are also texturized, but using a different machine, which creates impressions into the screwdriver tip.  Finally, both types of screwdrivers are fitted with handles for easy gripping. (Reported by Tori Borlase.)

How many forensic labs in the United States perform toolmark analysis?

Based on information from the National Accreditation Board’s database, 208 forensic labs in the US are accredited to perform firearm and toolmark analysis. However, since firearm analysis and toolmark analysis are combined under a single accreditation, there is no guarantee that an accredited lab actually performs both of these services. After reaching out to the labs listed on the NAB’s Accreditation database, I have determined that only about 50% of accredited labs actually perform toolmark analysis. The directors of many of these labs explained that they no longer perform toolmark analysis, despite their active accreditation, due to lack of demand for toolmark testing. (Reported by Johanna Doherty.)

What is virtual microscopy, and does it work for toolmark analysis?

Virtual microscopy (VM) is another name for 3D topographical surface data. In the context of toolmarks, VM is a virtual 3D rendering of a mark made by a tool. 3D data is increasingly popular in forensics due to its potential advantages over 2D data, including enhanced viewing, data sharing, and annotating capabilities, as well as improved detail and the potential to create virtual archives that can validate future technology and help solve crimes. Emerging VM systems typically come with two components: a data acquisition instrument to scan the toolmark and a software component to analyze the 3D marks. As with any new technology, researchers must validate VM systems before implementing them in real casework. A recent article published in the Journal of Forensic Sciences validated a VM system by administering proficiency tests for cartridge case identification to fifty-six examiners in fifteen laboratories. The examiners, who were trained to use the VM software, correctly identified 100% of matches with zero false positives. This study provides strong evidence for the integration of VM in toolmark analysis, but critics express doubt due to the small sample size of marks used in the experiment. (Reported by Adalyn Richards.)

What are degraded toolmarks, and how do forensic examiners analyze them?

Degraded toolmarks are fragmented striations or impressions made by a tool that cannot be recovered in full due to the passage of time, mishandling of evidence, or simply an incomplete mark found at a crime scene. In the context of forensic analysis, fragmented marks can make it difficult to determine whether two marks were created by the same tool, but recent research proposes a method to better integrate degraded marks in toolmark analysis. In a study by Hare et al., researchers artificially degraded bullet land impressions by deleting portions of the lands to emulate fragmentation. They then trained a matching algorithm to analyze these fragmented marks by performing a smoothing step and counting the number of matching striae between the two marks and then dividing that number by the length of the overlapping region, so as to not punish a mark for having a smaller surface area. Notably, they found that algorithm performance declines as a function of degradation up to a threshold of about fifty percent. (Reported by Adalyn Richards.)

Do all forensic crime labs accredited in toolmark analysis have standard operating procedures for this service?  

No, while the vast majority of accredited labs have specific standard operating manuals for toolmark analysis, a few labs do not keep updated procedures for this service. For example, the Albuquerque Police Department Crime Lab says that rather than rely on a manual, they “focus on the use of the scientific method and training in order to establish a methodology of how to approach each case depending on what evidence is available.” The Oakland Police Department Crime Lab explained that due to the lack of requests for toolmark analysis, they no longer keep a firmly written, updated copy of their toolmark procedural manual. Lastly, the San Diego Police Department Crime Lab wrote that they “do not have any formal process for toolmark analysis.” After gathering information on procedures from the majority of the crime labs in the United States, it is clear that while most of them do have procedural manuals for toolmark analysis, a few labs do not have formally written procedures for a variety of reasons. (Reported by Johanna Doherty.)

How unique are individual toolmarks? Does each tool produce a mark so unique that it can be traced back to the tool of origin?

In a study about the individuality of toolmarks, Hadler et al. sought to quantitatively answer these questions. First, they created striated toolmarks using a variety of screwdrivers at different angles of attack and in different mediums. Then, they created an automated toolmark comparison system with two parts: (A) NanoFocus MicroSurf white light confocal sensor, a non-contact 3D data acquisition software, and (B) data analysis software with a signature generation component to determine individual characteristics. The software was designed to compare individual characteristics of two or more marks and produce a similarity metric between one and zero. Finally, the authors compared the statistical distributions of the similarity values from the matching and nonmatching pairs of marks. This revealed that “while it is not possible to prove uniqueness statistically, the results of this study provide support for the concept that tool marks contain measurable features that exhibit a high degree of individuality.” The error rate was 0.00% with the exception of one false negative. (Reported by Adalyn Richards.)