Characterizing a Spam Traffic

December 21, 2006

SEMINARIO NUCLEO MILENIO CIW

Miercoles 21 de Diciembre, 10:15AM
Auditorio DCC
Blanco Encalada 2120, tercer piso
Santiago, Chile

Characterizing a Spam Traffic

Dr. Virgilio Almeida
Depto. de Ciencia da Computacao
Univ. Fed. de Minas Gerais
Belo Horizonte
Brazil

Presentación disponible: PDF 600 KB

The rapid increase in the volume of unsolicited commercial e-mails, also known as spam, is beginning to take its toll in system administrators, business corporations and end-users. Widely varying estimates of the cost associated with spam are available in the literature. However, a quantitative analysis of the determinant characteristics of spam traffic is still an open problem. This talk presents an extensive characterization of a spam traffic.

As basis for our characterization, standard spam detection techniques are used to classify over 360 thousand incoming e-mails to a large university into two categories, namely spam and non-spam. For each of the two resulting workloads, as well as for the aggregate workload, we analyze a set of parameters, aiming at identifying the characteristics that significantly distinguish spam from non-spam traffic, assessing the qualitative impact of spam on the aggregate traffic and, possibly, drawing insights into the design of more effective spam detection techniques.

Our characterization reveals significant differences in the spam and non-spam traffic patterns. E-mail arrival process, size distribution as well as the distributions of popularity and temporal locality of e-mail recipients are key workload aspects which distinguish spam from traditional e-mail traffic. We conjecture that these differences are consequence of the inherently different mode of operation of spam and non-spam senders. Whereas non-spam e-mail transmissions are typically driven by social bilateral relationships, spam transmission is usually a unilateral action, based solely on the senders’s will to reach as many users as possible.

Short Bio

Virgilio Almeida is professor at the Computer Science Department at the Federal University of Minas Gerais, Brazil. His research interests include modeling large scale distributed systems, such as the Web and performance analysis. He held visiting professor positions at Boston University and Polytechnic University of Catalunya in Barcelona and held visiting appointments at Xerox PARC and Hewlett-Packard Research Laboratory.