Data science of forensic toolmark analysis


I am developing an algorithm to aid forensic examiners with toolmark comparisons. Given a toolmark from a crime scene and a suspect’s tool, can we determine whether the mark was made by that tool, and if so with what confidence? This is joint work with Heike Hofmann of Iowa State University, and with funding from the Center for Statistics and Applications in Forensic Evidence.

The images below are examples of microscopic 3D images of screwdriver striation marks. Toolmark analysis seeks to answer whether the mark on the right (questioned mark) was made by the same tool as the marks on the left (known mark).

From the 3D marks we can extract a signal that allows us to analyze the 3D profile of the toolmark. These signals look like the following:

Experiment manual

This Toolmark manual describes the experimental setup in detail and makes the research replicable.

Current Team

Sheng Gao

Sheng Gao is a Ph.D. candidate in statistics at the Wharton School, University of Pennsylvania. His research interest includes theory and methodology of high dimensional statistics. He is interested in differential privacy and deep learning problems. Currently he is also working on the intersection of neuroscience and high dimension statistics. Sheng is working on applying clustering algorithms to 3D toolmark data to help determine which marks were made by the same tool.

Previous members

I have been running internships with UPenn undergraduate students to study the statistical foundations of forensic toolmark analysis.

Sheqi Zhang

Sheqi Zhang is a master’s student in criminology at Penn. Her research interest is using computational methods including game theory, multiagent planning, stochastic modeling, and network analysis to model societal problems and provide solutions. Previously, she graduated from UC Berkeley with a B.A. degree in computer science and history.

Saara Ghani

Saara is a Pakistani international student majoring in Criminology and minoring in Survey Research and Data Analytics. She is an intern with Judge Stephanie Sawyer at The Sentencing Foundation and is a co-chair for Take Back the Night at Abuse and Sexual Assault Prevention (ASAP) at Penn.


Nina Barretts (REU summer 2022)

Figure: Nina Barretts presenting a poster about this project at the International Association for Identification Conference, August 2022.

I am an undergraduate Forensic Science major, with a concentration in Biology, at the University of New Haven. I am an Honors Program student and the Sergeant at Arms of the Forensic Science Student Association. I am expected to graduate with my B.S. in the Fall 2023 semester, and I will be continuing my education with a M.S. in Cellular and Molecular Biology as a part of UNH’s pathway program. I currently hold the position of Sergeant at Arms for the Forensic Science Student Association. I have been working on the tool mark project this summer, as a CSAFE REU intern, under Dr. Cuellar. I am thrilled at the progress we made on the project and the opportunity to present the research at the IAI conference in Omaha, NE.

Jonathan Palomo

Jonathan Palomo is a Junior at the University of Pennsylvania studying physics and statistics. His academic interests that span physics and statistics are in solid state physics, general relativity, high dimensional inference, and Bayesian inference. In Summer 2022, he will be interning at Discover Financial Services in their data and analytics team. After undergrad, Jonathan wants to either be a full fledged data scientist in an industry, or go to grad school to study physics or statistics. Jonathan is working on generating 3D data of toolmarks for analysis with a machine learning algorithm.

Naomi Korn

Naomi Korn is a sophomore at the University of Pennsylvania studying Philosophy, Politics, and Economics (PPE) and minoring in statistics in the College. She is interested in data analytics, especially related to current events and sports. She is part of the Penn Sports Analytics Group where she helps analyze data for Penn athletics. Naomi is working on generating 3D data of toolmarks for analysis with a machine learning algorithm.

Jordan Wong

Jordan Wong is a Junior at the University of Pennsylvania studying Systems Engineering and concentrating in Data Science. He is primarily interested in applying statistical and analytical tools to solve different societal issues. In the past, Jordan worked on a start-up called MyMakery which aimed to make internships better by providing a comprehensive internship management platform to companies. For this internship, Jordan is focusing on how to refine the algorithm that aims to identify tools from their toolmark. Jordan is working on generating 3D data of toolmarks for analysis with a machine learning algorithm as well as on the stitching of 3D data.

Ryan Gross

Ryan Gross is a second year Ph.D. student in the Statistics Department at the Wharton School. He previously graduated from Rutgers University with degrees in Math, Statistics, and Economics, and conducted research analyzing the effects of disability on employment outcomes. Along with studying the statistical analysis of toolmarks, he is interested in Bayesian methodology and its applications to urban analytics, sports analytics, and various other fields. (Spring 2021)

Brianna Fisher

Brianna Fisher is a rising sophomore at the University of Pennsylvania majoring in Criminology and minoring in Survey Research and Data Analytics. After interning at a Criminal Defense Firm in her hometown of South Florida, she refined her interest in law to criminal law, and specifically criminal prosecution. Brianna also has a strong interest in using statistical analysis to identify and solve social justice issues, and plans to work on criminal justice reform in the future. For this summer internship, Brianna is researching toolmark analysis and assisting in developing a database of toolmarks.

Amy Win

Amy Win is a senior at the University of Pennsylvania majoring in criminology and minoring in South Asia Studies. She is interested in refugee law and Asian criminology through a postcolonial lens. She is currently working with Southeast Asian refugees in South Philadelphia through Southeast by Southeast, a community center that provides ESL classes and citizenship classes among other types of support. As an intern, she is focusing on designing a black box study that tests how forensic examiners make their decisions on toolmark analyses with and without contextual information. (Fall 2020)

Thomas Kyong

Thomas Kyong (C’23) is a Sophomore at the University of Pennsylvania who plans to pursue a dual degree of criminology at the College of Arts & Sciences and statistics at the Wharton School. Passionate about applying quantitative data into the field of law, Thomas hopes to bring statistical academic research to influence criminal justice policy reform as a lawyer and politician. As a Stavros Niarchos Foundation Paideia Fellow, Thomas is an elected member of the Penn Undergraduate Assembly, an executive officer of the Penn Government & Politics Association, and involved in the Undergraduate Statistics Society. During his free time, he enjoys cooking intercultural recipes, laughing with friends, and running/biking up mountains. (Fall 2020)

Adalyn Richards

Adalyn Richards is a rising sophomore at the University of Pennsylvania studying political science and economics. She has a strong interest in law, particularly criminal and immigration law, and is passionate about learning foreign languages. She is a notary public and has worked at the Colorado Bar Association but hopes to expand her experience to include research in forensic science. As an intern, she is focusing on previous literature regarding toolmark analysis and seeks to answer the following questions: What have researchers been working on? Who is doing this research? Who cares about it? (Summer 2020)

Johanna Doherty

Johanna Doherty is a rising junior at Penn majoring in Criminology. She is interested in criminal justice reform as well as the effect that interaction with the justice system has on mental health. She has previously interned with the Joseph J. Peters institute, where she worked with survivors of trauma. For this summer internship, she is focusing on procedures, including researching standard operating procedures for forensic tool mark analysis labs and determining the heterogeneity between different manuals. (Summer 2020)

Tori Borlase

Tori Borlase is a rising junior at the University of Pennsylvania studying Philosophy, Politics, and Economics. Tori has worked for judges in the Wake County Justice System, defense lawyers, and other legal experts, and plans to become a lawyer in order to work on civil rights issues. She is interested in equal employment law, criminal justice reform, and environmental law. As an intern, she is working on Simulations: Understanding Chumbley et al.’s procedures for simulating toolmarks from a real tool, and Manufacturing: Understanding factory procedures for the manufacturing of screwdrivers. (Summer 2020)

Melina Muthuswamy

Melina Muthuswamy is a rising third-year undergraduate student at Penn in the Computer Science department in the School of Engineering. She is interested in spreading and encouraging participation in engineering for high school students and is a member of the board for PennApps, UPenn’s student-run hackathon. For this summer internship she is focusing on Simulations: Understand Chumbley et al.’s procedures for simulating toolmarks from a real tool and the Software Packages used: understanding the current capabilities of R packages to analyze bullets and understanding how they can be applied to toolmarks. (Summer 2020)

Research reports

In spring 2021, the team is working on one research track:

  1. Pre-processing of toolmarks: Determine the best way to process toolmarks before they are compared.
    1. How should one reshape the toolmark signals so they are comparable without losing relevant information?
    2. What type of controls should one use so comparisons are fair?
    3. Can one use simulations to determine what types of controls are most useful?

In fall 2020, the team worked on three research tracks:

  1. Black-box studies: Learn about what makes a good black-box study.
    1. What is the purpose of a black-box study?
    2. What is the best experimental design for such a study? (And what makes other designs not useful?)
    3. Which contextual information is relevant to forensic examinations (to be included in a black box study), and which is irrelevant?
  2. Database design: Design a database of toolmarks.
    1. What tools are the best for an initial study on toolmark analysis?
    2. Which factors (e.g. angle, force, surface materials) should be selected and why?
    3. How can we design a database that helps us learn about difficult (“complex”) and easy comparisons?
  3. Algorithmic design: Design an algorithm to evaluate the similarity between toolmarks.
    1. Which machine learning tools are most useful for toolmarks?
    2. What are the benefits and limitations of deep learning?
    3. What measure of uncertainty might be a good fit for toolmark comparisons?

In summer 2020, the team worked on five research tracks:

  1. Previous literature: Learn about the literature in the field of forensic toolmark analysis and find potential gaps. Some questions of interest are:
    1. What is the state of the academic literature about forensic toolmark analysis?
    2. Which researchers are currently working in this field?
    3. What is the audience for this research?
  2. Manufacturing: Understand factory procedures for the manufacturing of screwdrivers. Some questions of interest are:
    1. What are the most commonly used brands? Do different brands us the same factory?
    2. How are common screwdrivers manufactured and how does this affect their toolmarks?
    3. Is it possible to acquire consecutively manufactured tools?
  3. Simulations: Understand some researchers’ procedures for simulating toolmarks from a real tool. Some questions of interest are:
    1. How do they do this?
    2. What software do they use?
    3. Is it actually useful for examiners currently, or is it just a project for the future?
  4. Procedures: Find standard operating procedures (SOPs) and training manuals for forensic toolmark analysis. Some questions of interest are:
    1. How heterogeneous are laboratories in their forensic toolmark analysis?
    2. How is toolmark analysis done in practice?
    3. What are some measures to ensure accuracy and consistency in practice?
  5. Software package: Understand the capabilities of the R software packages “bulletr” that is designed for analyzing bullets. Some questions of interest are:
    1. How are bullets different from toolmarks in terms of algorithmic analysis?
    2. How much of this package can we use in making a new one for toolmark analysis?
    3. What are the best file formats for images?

Reporting by students

(Spring 2021)

In progress…

(Fall 2020)

How are random forests used in toolmark algorithms?

A random forest produces a similarity score based on three-dimensional topographic scans of engraved areas. It is based on cross correlations, the number of matching striae and the number of consecutive matching striae. The output of the random forest is a score between 0 and 1. This output represents the algorithm’s similarity of the two signatures, where a score close to 1 indicates similar signatures that may have originated from the same source while a score close to 0 indicates different signatures that may have originated from different sources. Random forests use SAM scores to aggregate land-to-land scores into engraved areas and are not capable of automatically detecting parts of lands with well-expressed striae. 
(Reported by Thomas Kyong.)

How are cross correlations used in toolmark algorithms?

A cross correlation describes similarity between two surfaces. Scholars consider the correlation between two marks and compare various fixed size windows and locations along the mark. All computed correlations are based on the same number of pairs of data points. Cross correlation describes similarity between two surfaces; moreover in order to pairwise align signatures, researchers generally use the maximized correlation matrix. Cross correlation ranges from -1 to 1 and these values are critical components in a random forest.
 (Reported by Thomas Kyong.)

What is a black box study?

A black box study aims to find error rates in a forensic science discipline by focusing on the accuracy of conclusions made by examiners without assessing how the conclusions were reached. The examiner is viewed as “black box” which provides output, such as conclusions, that can vary and are measured by researchers who input test samples. Few black box studies have been completed on different forensic science disciplines. One black box study in 2014 was conducted by the Ames Laboratory, which is affiliated with Iowa State University, and the study focused on error rates in firearm analysis. Another black box study by the FBI in 2011 focused on error rates in latent print examination. Both studies had sets where examiners compared samples from a source they did not know to samples from known sources. By knowing the sources of all the test samples and whether or not each set had a match, researchers were able to measure each examiner’s accuracy. A black box study is not to be confused with a white box study, as the latter allows researchers to also assess how each examiner arrived at their conclusion. (Reported by Amy Win.)

What is the President’s Council of Advisors on Science and Technology (PCAST) report on forensic science from 2016?

The President’s Council of Advisors on Science and Technology (PCAST) is a group of scientists and engineers who are appointed by the President to report and give recommendations for improving policies on science and technology. The 2016 PCAST report on forensic science evaluated the use of forensic science in criminal courts based on each forensic science discipline’s scientific validity and reliability. For each discipline, it identified existing knowledge and resources as well as gaps in clarity of scientific standards for validity and reliability. Ultimately, the report recommended that more research, especially black box studies, be conducted so that error rates can be determined and forensic methods are evaluated. For toolmark analysis, the report provided an evaluation of prior studies and recognized that only one black box study conducted on error rates in firearm analysis in 2014 by the Ames Laboratory was accurately and effectively carried out. (Reported by Amy Win.)

What is the Association of Firearm and Toolmark Examiners (AFTE) glossary?

The Association of Firearm and Toolmark Examiners is an international professional organization for practitioners of firearm and/or Toolmark identification where practitioners can exchange information, methods and best practices, standards, and call for further research on the firearm and/or toolmark identification discipline. The AFTE glossary provides common terminology that can be used by firearm and toolmark examiners. Many studies on firearm and toolmark analysis have referred to AFTE’s listed definitions of possible analysis conclusions, which include identification, elimination, and inconclusive (the inconclusive conclusion offers three types of reasoning for why the examiner determined the analysis was inconclusive). For toolmarks specifically, the AFTE glossary includes terminology on the types of characteristics and marks a toolmark sample may have, as well as types of tools. (Reported by Amy Win.)

What types of surfaces are the best for a study on toolmark analysis?

When producing toolmarks, soft metals or wax sheets should be used to prevent damaging the tool. The Brinell Hardness scale, which rates the hardness of materials using an indention penetration scale, characterizes lead and aluminum as the two softest metals. A cheap and non-toxic alternative to lead is jewelry wax. The quality of toolmarks produced on hard wax (for example, Green Matt Jewelry wax) is comparable to toolmarks made on lead sheets. Soft wax can also be used to produce high quality toolmarks if chilled to around -18 °C. (Reported by Michaela Rieser)

What kind of lead sheets should be used for toolmark analysis and how should they be handled?

If toolmark analysts chose to work with lead, 1/16 in, 1⁄8 in, or 1⁄4 in thick sheets should be used. When working with 1⁄4 in lead sheets, metal shears are required to cut the sheet. If the lead sheets are thinner than the 1/16 in, there is a greater risk of tearing through the sheet when making toolmarks. Toolmark analysts should take the proper precautions when working with lead sheets. Gloves are necessary when handling lead sheets, and protective eyewear and masks are recommended. (Reported by Michaela Rieser)

(Summer 2020)

What is the Interpol report about and why is it important?

The “Interpol Report of Shoe and Tool Marks 2016-2019” is a comprehensive literature review of recent developments in forensic science regarding toolmarks. The report summarizes and relates over 150 relevant studies and divides findings into three subsections: shoemarks, striated and impression toolmarks, and invasive toolmarks. For each subsection, the report covers topics such as software for toolmark analysis and the variability of toolmark characteristics. It also identifies gaps in the existing literature and aims to inform future research to advance the accuracy and reliability of forensic toolmark analysis, which is often used as pivotal evidence in criminal proceedings. (Reported by Adalyn Richards.)

How heterogeneous are crime laboratories in their forensic toolmark analysis procedures?

Based on research collected from toolmark analysis procedural manuals from the Virginia Department of Forensic Science, the Arizona Department of Forensic Science, the Texas Department of Safety, and many others, the standard operating procedures are predominantly heterogeneous. Each manual mentions similar or identical methods for both creating test toolmarks and analyzing/comparing test and evidence toolmarks. Additionally, each manual is similarly lacking in providing standardized characteristics for comparison between test and evidence toolmarks. Overall, standard operating procedures are predominantly heterogeneous in their guidelines and language. (Reported by Johanna Doherty.)

According to the Interpol Report, what are some recent developments in automated toolmark comparison analysis?

Until recently, most forensic comparison software used only one similarity metric when comparing toolmarks to identify a potential match. Recent research indicates that using a multi-feature similarity score could yield more accurate results when comparing marks. For example, Hare et al. developed a multi-feature algorithm to compare bullet land impressions by standardizing mark characteristics with three height values variables that represent the groves and contours of a mark. Similarly, Keglevic et al. presented a multi-feature convolutional neural network to efficiently match toolmarks from large databases. In both cases, the multi-feature algorithm outperformed the baseline. (Reported by Adalyn Richards.)

What is axial rotation, and why does it matter in toolmark analysis?

In the context of toolmarks, axial rotation is the rotation of a tool around its own axis. For example, think about cutting a cake. Normally, you would position the sharp edge of the knife so that it cuts into the cake with ease. Now think about rotating the knife 90º so that the side edge of the knife comes in contact with the cake—it would make an entirely different mark in the icing. In the same way, the axial rotation of a tool can drastically change the mark that it makes. Previous studies have primarily focused on the angle of attack between a tool and a substrate. Two recent publications, however, have focused on the effect of axial rotation in toolmark analysis. Both studies found that comparison algorithms cannot successfully identify matches when the marks are made at significantly different axial angles. These findings highlight a need for technology that can distort  topographical data by scaling it to allow for comparison of toolmarks made at varying axial degrees. (Reported by Adalyn Richards.)

How do Chumbley et al. simulate toolmarks?

In “Virtual Tool Mark Generation for Efficient Striation Analysis” Chumbley et al. describe the simulation process in a few distinct steps.  First, the tool tip geometry is scanned and digitized using an IFM detector, as well as an algorithmic cleaning process to make the geometry appear more like the screwdriver tip.  This study used an Alicona microscope to obtain the surface geometry of the tool tip and the marked plates and the retrieved geometry was cleaned and refined through spike removal. Then, using a program called OpenGL, the 3D surface was projected in the direction of the tool travel. This virtual simulation of the tool tip (or a specific edge) is “squished” between two planes, and the resulting indentations are measured with a stylus profilometer. (Reported by Tori Borlase.)

How are common screwdrivers manufactured?

Flat head screwdrivers are manufactured in a few simple steps.  First, metal is shaped into a long, slim cylinder that is cut to the size of the screwdriver shaft.  Then, the end of the metal shafts are shaved down and then squished into a flattened shape.  These metal ends are then trimmed to create the typical shape of a flat head screwdriver, with the point being grinded against a wheel to fine-tune the tip.  Depending on the size of the screwdriver head that is being manufactured, these steps will change slightly.  In order to get a grip on screws, the screwdriver head is put through a blast of an abrasive solution.

Philips head screwdrivers, on the other hand, require a different process.  While they still start out as a long, slim cylinder of metal, once they are cut to size, they are shaved down into a cross-like shape, forming bevels.  These screwdrivers are also texturized, but using a different machine, which creates impressions into the screwdriver tip.  Finally, both types of screwdrivers are fitted with handles for easy gripping. (Reported by Tori Borlase.)

How many forensic labs in the United States perform toolmark analysis?

Based on information from the National Accreditation Board’s database, 208 forensic labs in the US are accredited to perform firearm and toolmark analysis. However, since firearm analysis and toolmark analysis are combined under a single accreditation, there is no guarantee that an accredited lab actually performs both of these services. After reaching out to the labs listed on the NAB’s Accreditation database, I have determined that only about 50% of accredited labs actually perform toolmark analysis. The directors of many of these labs explained that they no longer perform toolmark analysis, despite their active accreditation, due to lack of demand for toolmark testing. (Reported by Johanna Doherty.)

What is virtual microscopy, and does it work for toolmark analysis?

Virtual microscopy (VM) is another name for 3D topographical surface data. In the context of toolmarks, VM is a virtual 3D rendering of a mark made by a tool. 3D data is increasingly popular in forensics due to its potential advantages over 2D data, including enhanced viewing, data sharing, and annotating capabilities, as well as improved detail and the potential to create virtual archives that can validate future technology and help solve crimes. Emerging VM systems typically come with two components: a data acquisition instrument to scan the toolmark and a software component to analyze the 3D marks. As with any new technology, researchers must validate VM systems before implementing them in real casework. A recent article published in the Journal of Forensic Sciences validated a VM system by administering proficiency tests for cartridge case identification to fifty-six examiners in fifteen laboratories. The examiners, who were trained to use the VM software, correctly identified 100% of matches with zero false positives. This study provides strong evidence for the integration of VM in toolmark analysis, but critics express doubt due to the small sample size of marks used in the experiment. (Reported by Adalyn Richards.)

What are degraded toolmarks, and how do forensic examiners analyze them?

Degraded toolmarks are fragmented striations or impressions made by a tool that cannot be recovered in full due to the passage of time, mishandling of evidence, or simply an incomplete mark found at a crime scene. In the context of forensic analysis, fragmented marks can make it difficult to determine whether two marks were created by the same tool, but recent research proposes a method to better integrate degraded marks in toolmark analysis. In a study by Hare et al., researchers artificially degraded bullet land impressions by deleting portions of the lands to emulate fragmentation. They then trained a matching algorithm to analyze these fragmented marks by performing a smoothing step and counting the number of matching striae between the two marks and then dividing that number by the length of the overlapping region, so as to not punish a mark for having a smaller surface area. Notably, they found that algorithm performance declines as a function of degradation up to a threshold of about fifty percent. (Reported by Adalyn Richards.)

Do all forensic crime labs accredited in toolmark analysis have standard operating procedures for this service?  

No, while the vast majority of accredited labs have specific standard operating manuals for toolmark analysis, a few labs do not keep updated procedures for this service. For example, the Albuquerque Police Department Crime Lab says that rather than rely on a manual, they “focus on the use of the scientific method and training in order to establish a methodology of how to approach each case depending on what evidence is available.” The Oakland Police Department Crime Lab explained that due to the lack of requests for toolmark analysis, they no longer keep a firmly written, updated copy of their toolmark procedural manual. Lastly, the San Diego Police Department Crime Lab wrote that they “do not have any formal process for toolmark analysis.” After gathering information on procedures from the majority of the crime labs in the United States, it is clear that while most of them do have procedural manuals for toolmark analysis, a few labs do not have formally written procedures for a variety of reasons. (Reported by Johanna Doherty.)

How unique are individual toolmarks? Does each tool produce a mark so unique that it can be traced back to the tool of origin?

In a study about the individuality of toolmarks, Hadler et al. sought to quantitatively answer these questions. First, they created striated toolmarks using a variety of screwdrivers at different angles of attack and in different mediums. Then, they created an automated toolmark comparison system with two parts: (A) NanoFocus MicroSurf white light confocal sensor, a non-contact 3D data acquisition software, and (B) data analysis software with a signature generation component to determine individual characteristics. The software was designed to compare individual characteristics of two or more marks and produce a similarity metric between one and zero. Finally, the authors compared the statistical distributions of the similarity values from the matching and nonmatching pairs of marks. This revealed that “while it is not possible to prove uniqueness statistically, the results of this study provide support for the concept that tool marks contain measurable features that exhibit a high degree of individuality.” The error rate was 0.00% with the exception of one false negative. (Reported by Adalyn Richards.)