View All Events

Security SIG: N-Gram Analysis in Suspect Author Identification of Anonymous Email

| 2:00pm to 3:30pm Bits & Pieces, Room 306 Sansom Place West

In response to an investigation where a company received a series of credible threats of workplace violence via anonymous email, Paul Herrmann developed a system and protocol utilizing current linguistic techniques to successfully identify the perpetrator. Empirical authorship analysis has a long history, primarily as it relates to literary works of unknown or disputed authors. One such technique known as N-Gram Analysis has shown promise in identifying the most likely author of known texts when presented with candidate authors having predefined text samples of undisputed authorship.
This presentation will introduce N-Gram analysis, the technique utilized in the developed solution. Current academic research on N-Gram analysis will be reviewed. The mathematics for N-Gram analysis will be presented including key parameters. Paul will review the applied mathematics specific to email analysis as implemented in the solution. Design models for system testing and error rate determination will be explored as well as the rationale for the specific technique for validating the system and error rate. The forensic protocol for usage of the technique will be discussed including protections against the introduction of bias. Opportunities for further research and the evidentiary positioning of the technique as it relates to the Daubert test will be discussed.

The session with conclude with an open discussion on the ethical implications of the technique.

Presenter: Paul Herrmann, ISC Information Security

Location: Bits & Pieces, Room 306 Sansom Place West

iCalendar: http://www.upenn.edu/computing/group/sigcal/Mar1624N-GramAnalys.ics