Wednesday, October 3, 2012

SMTP Dialects, or how to detect bots by looking at SMTP conversations

It is somewhat surprising that, in 2012, we are still struggling fighting spam. In fact, any victory we score against botnets is just temporary, and the spam levels raise again after some time. As an example, the amount of spam received worldwide dropped dramatically when Microsoft shut down the Rustock botnet, but has been rising again since then.

For these reasons, we need new techniques to detect and block spam. Current techniques mostly fall in two categories: content analysis and origin analysis. Content analysis techniques look at what is being sent, and typically analyze the content of an email to see if it is indicative of spam (for example, if it contains words that are frequently linked to spam content). Origin analysis techniques, on the other hand, look at who is sending an email, and flag the email as spam if the sender (for example the IP address the email is coming from) is known to be malicious. Both content analysis and origin analysis techniques fall short and have problems in practice. For instance, content analysis is usually very resource intensive, and cannot be run on every email sent to large, busy mail servers. Also, it can be evaded by carefully crafting the spam email. On the other hand, origin analysis techniques often have coverage problems, and fail to detect as malicious many sources that are actually sending out spam.

In our paper B@BEL: Leveraging Email Delivery for Spam Mitigation, that got presented at the USENIX Security Symposium last August, we propose to look at how emails are sent instead. The idea behind our approach is simple: the SMTP protocol, which is used to send emails on the Internet, follows Postel's Law, which states: "Be liberal in what you accept, but conservative in what you send". As a consequence of this, email software developers can come up with their own interpretation of the SMTP protocol, and still be able to successfully send emails. We call these variations of the protocol SMTP dialects. In the paper we show how it is possible to figure out which software (legitimate of malicious) sent a certain email just by looking at the SMTP messages exchanged between the client and the server. We also show how it is possible to enumerate the dialects spoken by spamming bots, and leverage them for spam mitigation.

Although not perfect, this technique allows, if used in conjunction with existing ones, to catch more spam, and it is a useful advancements in the war against spamming botnets.

Wednesday, July 25, 2012

Fake followers on Twitter: my two cents

During the last few days, a huge fuss has been made about this report. This article, written by Italian professor Marco Camisani Calzolari, describes a system to detect fake followers on Twitter. The article shows how many of the Twitter followers of corporations and celebrities (up to 45%) are actually fake. Among such celebrities are Italian public persons and politicians such as Beppe Grillo and Nichi Vendola. The news got a lot of attention in Italy, and got reported by foreign press as well (most notably by the the Guardian and the Huffington Post). Of course, a lot of outrage was generated by the supporters of this or that politician, and many people argued that the study wasn't correct. Today, Italian economics professor Francesco Sacco declared that the study actually has an error margin of 1%, and should be considered correct.

Now, I am a researcher, and I am not very interested in flame wars between opposite political factions. However, I am quite disappointed that the Italian press, as well as some foreign newspapers, considered this study as reputable without at least checking with an expert. As of today, a few days after the news was first published, the only person from academia who reviewed the article is an economics professor. With all due respect, I think that somebody with a degree in computer science and some experience in machine learning and artificial intelligence would be a better person to review this article, and judge how reasonable the proposed approach actually is.

I decided to write this blog post because I have been reading a lot of comments on this article, but most of them were just flames, and very few of them analyzed the proposed approach in detail. I decided to analyze it myself. After all, I have been doing research in the field for quite a while now. In the academic world, we have this procedure called peer review. When somebody submits a paper to a journal or to a conference, the paper gets read by two to three other researchers, who value the validity of the proposed approach, and how reasonable the results sound. If the reviewers think the paper is good enough, it will be published. Otherwise, the author will have to make some changes to the paper, and submit elsewhere.

Camisani didn't go through this process, but just uploaded the paper to his website.  For this reason, neither the approach nor the results have been vetted. Let's play what-if, and pretend that this paper actually got submitted to a conference, and that I had been assigned to review it. Here is what I would have written:

The paper proposes a method to detect fake Twitter accounts (i.e., bots) that follow popular accounts, such as the ones belonging to celebrities and corporations. To this end, the author identified a number of features that are typical of "human" activity, as well as ones that are indicative of automatic, "bot-like" activity. For each account taken into consideration, if the account shows features that are typical of a human, it will get "human points". Conversely, if it shows features that are typical of a bot, it will get "bot points". The total of human and bot points gets then passed to a decision software, that decides whether the account is real or not. Here comes the first problem with the article: the decision procedure is not described at all. How many "bot" point does an account need to score to be considered as a bot? Is this compared to "human" points? And how are the accounts that lie in the grey area in the middle calculated? Also, the classification features are not discussed. Why are those typical of human or bot activity? Why is posting from multiple applications a sign of being a human? On the contrary, this could be a sign of being a bot, since Twitter periodically blocks offending applications and miscreants have to create new ones. Moreover, the classification procedure seems to be ad hoc and unverified. Using a classification algorithm and a training phase on labeled data would have helped - a lot.

The second problem with the paper is that it is not clear how the followers for the analysis have been chosen. Only "up to" 10,000 followers per each account were checked, allegedly by using a random algorithm. This has been done, I believe, because Twitter limits the number of queries that can be asked each hour. However, technical details on how the whole process has been performed are missing. Without such details, it is impossible to evaluate how accurate the results are. Stating that half of the followers of somebody are fake just means that, according to the algorithm, 5,000 followers are maybe fake.

A third problem is that it is impossible to check whether the detected accounts are fake or not. The problem is known to be very hard, because it is pretty much impossible to detect a bot from a fairly inactive account. Twitter itself relies on manual analysis to sort this kind of issues. 

The last problem is that this paper doesn't cite any previous research in the field, and there is been a wealth of it. This way, it is impossible to compare how sound the results are, compared to the state of the art. However, this was not the goal of the paper. The goal was to get publicity, and this worked perfectly.

My verdict? REJECT.

Friday, May 4, 2012

Poultry Markets: On the Underground Economy of Twitter Followers


Twitter has become such an important medium that companies and celebrities use it extensively to reach their customers and their fans. Nowadays, creating a large and engaged network of followers can determine the difference between succeeding and failing in marketing. However, creating such a network requires time, especially when the party building it does not have an established reputation among the public.

For this reason, a number of websites to help Twitter users create a large network of followers have emerged. These websites promise their subscribers to provide followers in exchange for a fee. In addition, some of these services offer to spread promotional messages in the network. We call this phenomenon Twitter Account Markets. We study this phenomenon in our paper "Poultry Markets: On the Underground Economy of Twitter Followers", that will appear at the SIGCOMM Workshop on Online Social Networks (WOSN) later this year.

Typically, the services offered by a Twitter Account Market are accessible through a webpage, similar to the one below. Customers can buy followers at a rate that is between $20 and $100 for 1,000 followers. In addition, markets typically offer the possibility of having content sent by a certain number of accounts, again in exchange for a fee. 



All Twitter Account Markets we analyzed offer both "free" and "premium" versions of their services. While premium accounts pay for their services, the free ones gain followers by giving away their Twitter credentials (a clever way of phishing). Once the market administrator gets the credentials for an account, he can follow other Twitter accounts (that are free or premium customers of the market), or send out "promoted" content (typically spam). For convenience, the market administrator typically authorizes an OAUTH application by using his victim's stolen credentials. By doing this, he can easily administer a large number of accounts, by leveraging the Twitter API.

Twitter Account Markets are a big problem on Twitter: first, an account with an inflated number of followers tends to look more trustworthy to the other social network users. Second, these services introduce spam in the network.

Of course, Twitter does not like this behavior. In fact, they introduced a clause in their Terms of Service that specifically forbids to participate in Twitter Account Markets operations. Twitter periodically suspends the OAuth applications that are used by Twitter Account Markets. However, since the market administrator has the credentials to his victims' accounts, he can go and authorize a new application, and continue his operation. 

In our paper, we propose techniques to both detect Twitter Account Market victims and customers. We believe that an effective way of mitigating this problem would be to focus on the customers, rather than on the victims. Since participating in a Twitter Account Market violates the terms of service, Twitter could suspend such accounts, and impact the market from the economic side.

Friday, January 27, 2012

Knowing a Bot’s true name, or how to find interesting malware samples

Folklore tells us that by knowing a creature’s true name one obtains great power over her. This is the reason why daemons and such usually don’t tell you their true name, and popstars often times go under pseudonyms.
In a lot more prosaic fashion, security researchers often times struggle in finding malware samples to run for their experiments. The reason is that, most of the time, antivirus companies don’t agree on a name for a malware family.

The way antivirus companies come up with names for malware is funny by itself, and it often generates laughter in the cybercrime community. An example is the Cutwail botnet, whose real name is “Psyche Evolution”. How people came up with Cutwail is a mistery.
Fact is that I was looking for samples for the “Donbot” bot to validate some novel research. According to m86, this botnet is responsible for about 20% of worldwide spam. I looked on anubis, our honeypot system that collects thousands of malware samples, with no results. I even started wondering if that bot really existed, or it was just a legend.

After losing hope, I was told that Donbot is also known as Buzus. No idea what the true name of the bot is, but by using the second name I was suddenly able to find working samples. Which, for a poor grad student struggling with experiments, is good enough.