Learning to Detect Phishing Emails

By | December 28, 2006

Phishers launched a record number of attacks in January 2006, as reported by the Anti-Phishing Working Group. These attacks often take the form of an email that purports to be from a trusted entity, such as eBay or PayPal. The email states that the user needs to provide information, such as credit card numbers, identity information, or login credentials, often to correct some alleged problem supposedly found with an account. Some number of users fall for these attacks by providing the requested information, which can lead to fraudulent charges against credit cards, withdrawals from bank accounts, or other undesirable effects.

The first attempts at applying learning to these problems took the form of browser toolbars, such as the Spoofguard and Netcraft toolbars. Our research group is currently conducting a study to determine the accuracy of these and other toolbars more precisely. Preliminary results indicate that a large percentage of phishing emails make it past these toolbars. One of the biggest drawbacks from a learning perspective is that toolbars in web browsers have access to less information. In some sense, users have already partially fallen for the attack by clicking on a link in an email, and this could potentially expose the user to spyware and malware loaded by attacks on insecure web browsers.

Furthermore, now that detection is being done in the browser, the contextual information and features from the email are no longer available. In theory, this reduced amount of information should cause a similar reduction in the accuracy that such systems are able to achieve. Aside from accuracy, toolbars suffer from a number of other problems.

Ideally, phishing detection algorithms require minimal user interaction, either as a server-side filter to compliment existing spam filters, or as a filter running locally in the client. This approach has several benefits over other methods. First, by removing user interactions, there is no chance for the user to dismiss warning dialogs and proceed to provide information to an attacker. Second, contextual information is available in the email. Finally, by operating on the email rather than in the browser, server-side detection is possible. Server-side detection avoids paying a transmission cost for sending the email (and subsequent requests for images linked to in an HTML image) to the user, as well as the cost of evaluating certain features (such as doing WHOIS lookups, whose results can be cached for re-use).

We present in this an algorithm, which we call “PILFER” – phishing identification by learning on features of email received. Our implementation is not optimal. It does not make use of all the information potentially available to a server-side filter. However, we obtain high accuracy rates, and posit that further work in this area is warranted.

Click here to download the full whitepaper

Leave a Reply