2009
12.11

Grouping Malware

Grouping malware with similar binary structure saves time and effort. As a standalone part-time researcher, such productivity again is invaluable. When you collect malware, in time you will accumulate malware samples – many of them. Perhaps 2000 samples of malware. Processing all of them could be a costly operation. To save time and effort, we want to remove similar or duplicates of the same family. What can one do?

For this problem, we assume all the files are malicious as honeypots do not collect innocent software.

One way is to use virus scanners to scan and classify the files. After a scan, group together all the files that are detected as “Conficker.B” for example. As Conficker family is quite prevalent, such duplication identification can save a lot of time and effort. This way, just analyzing one or two of them is sufficient. However, the drawback is that all the undetected samples will be left as a big group which you must analyze one-by-one.

Extract of a clamscan result…

/tmp/4c71b97435a24ffb8fd7fedd1b1790e1: OK
/tmp/82dd3a3d386d4ea09870dcee4a75a531: OK
/tmp/72bdd3bd37a0b5d1dd5f1be80cb29639.bin: OK
/tmp/24bd1722b994f7daa193458348108bfc.bin: OK
/tmp/39960c5ff1922466ded71a4a2799c295: Trojan.VanBot-366 FOUND
/tmp/33f5f14c33bf2f71556204705407a885: W32.Virut-54 FOUND
/tmp/880ce6df69aaeb1d3c57e756f53dd158.bin: Trojan.Delf-911 FOUND
/tmp/7e0ce66bb299370010016f4522152969: Trojan.VanBot-366 FOUND
/tmp/4f2d9f8129e7d7fd9b37f700aacdc9aa.bin: Trojan.Hupigon-25647 FOUND
/tmp/5b69ff6f331ece36558516f66306f969: Trojan.Small-4287 FOUND
/tmp/078aedb8630339487cf39d028b0156bd.bin: OK
/tmp/417bdef0688996a845701da9dcf1b145: Trojan.VanBot-366 FOUND
/tmp/eda3b7766c23dfffc0b85d0ba546b0c1: W32.Virut-54 FOUND
/tmp/86f22ff53382dbb54e2c22560a3db373: Trojan.VanBot-366 FOUND
/tmp/a4a41d2122c4d3552e3d59315f42d4e3: W32.Virut-54 FOUND

In the above, without signatures, how can you tell if 4c71b97435a24ffb8fd7fedd1b1790e1 and 82dd3a3d386d4ea09870dcee4a75a531 is not the same family? How can you tell which malware is unique? You have to analyze them. Now scale the problem to perhaps 600, for yourself only.

The other way is to use ssdeep, a fuzzy hashing tool. It is used to match inputs that are similar, perhaps only some bytes and length. It will produce a hash signature like md5 but unlike md5, a single change of byte will not create a wildly different signature. The concept of ssdeep is to chop the files into many sections, and calculate the hash for each section.

Below I take a sample of an exe file (“file1.exe”). I copied the file and concatenates a byte after it (“file2.exe”), and computes the md5 sum of the two files.

$ cp file1.exe file2.exe
$ echo 1 >> file2.exe

$ md5sum file1.exe file2.exe
72bdd3bd37a0b5d1dd5f1be80cb29639  file1.exe
a626b78fa6ba13fdd9cfddb9f55ee7c6  file2.exe

Just a difference in one byte, and the md5 hash is completely different. Let us do the ssdeep sum of the two files.

(broken into lines for clarity)

$ ssdeep -b file1.exe file2.exe
ssdeep,1.0–blocksize:hash:hash,filename
768:my+qxlsz7yiV0+7YUaFhLFAtVI0xbM
LvzEg1B1Ki8nJ78
:R+qxlsHvGhLFyI0l8tC5J78,”file1.exe”
768:my+qxlsz7yiV0+7YUaFhLFAtVI0xbM
LvzEg1B1Ki8nJ7V
:R+qxlsHvGhLFyI0l8tC5J7V,”file2.exe”

Separated by colon, the first (768) is the blocksize, then two ssdeep hashes (my+qxlsz7yiV0+7YUaFhLFAtVI0xbMLvzEg1B1Ki8nJ7V and R+qxlsHvGhLFyI0l8tC5J7V) , then the last is the file path name (“file2.exe”). The main point are the two hashes – the signatures of the file. Both file hashes of the two files are really alike except for the last byte ( “8″ vs “V” ).

If you have a large number of unidentified malware, antivirus scanners will not help to classify, but ssdeep can try. Below is extracted output of file matching with ssdeep. Each file name is the md5 of the file itself.

$ ssdeep -dr .


/tmp/72bdd3bd37a0b5d1dd5f1be80cb29639.bin matches /tmp/fa7c91b738e763eccf69676bd393925e.bin (88)
/tmp/72bdd3bd37a0b5d1dd5f1be80cb29639.bin matches /tmp/ae142ce3b35cc04f5648a0c17c37ea30.bin (82)
/tmp/72bdd3bd37a0b5d1dd5f1be80cb29639.bin matches /tmp/794b74fc4e833d245eb005e078dc21da.bin (82)
/tmp/72bdd3bd37a0b5d1dd5f1be80cb29639.bin matches /tmp/46fb9678675df8dc83d38761a76c7950.bin (99)
/tmp/72bdd3bd37a0b5d1dd5f1be80cb29639.bin matches /tmp/f412d41aacb4b16ded7b158b89fd3552.bin (90)
/tmp/72bdd3bd37a0b5d1dd5f1be80cb29639.bin matches /tmp/4bfba885ed3dc4ba800446df49051af0.bin (82)
/tmp/72bdd3bd37a0b5d1dd5f1be80cb29639.bin matches /tmp/13776c2b604290906305a56c4e7c61e5.bin (99)
/tmp/72bdd3bd37a0b5d1dd5f1be80cb29639.bin matches /tmp/5a8424f4e1504b5823ca8742e2b1ce8d.bin (82)

In the above, all of them are undetected malware and gives wildly different md5 signature. Yet, ssdeep can relate them. For malware that does not match any other files, it can be assumed to be a unique malware in your collection, and you should pay more attention to it. Moreover, even packed executables (tested on UPX) still can be matched since packers are just compressors – the similar code will be compressed into a similar binary pattern.

There are a few culprits. First, remembering that ssdeep just does mini-hashes, if some bytes vary a little throughout the file ( by some obfuscation, etc, every 1 byte change at 100 byte intervals, i.e. no-ops) will cause the ssdeep to fail to identify matches. Then, for botnets credentials identification, similar files could contain very different login credentials and wrongly discarded due to highly similar binary structure. However, you can analyze the access control logic through such duplicated samples, then you can generalize the login credentials.

With ssdeep, you can now group duplicated undetected malware into groups for more efficient analysis.

===

ssdeep – http://ssdeep.sourceforge.net/

UPX – http://upx.sourceforge.net/

(为了清楚一点,分为数行)(为了清楚一点,分为数行)
2009
12.08

This is a report more than discovery in spam collection. I was working on setting up a spampot using spampot.py which was written by Neale Pikett back to 2003. Although the result is not as my expectation, it does gives me more information about setting up a spampot.

Goal

The goal of running a spampot (honeypot which only care about spam) is to collect spam and analysis the trend of them, hopefully we can find some interesting techniques that spammers/ hackers use in junk and phishing emails.

Approach
So far, there are at least two types of spampot hosting method that I know. The names of them are designed by me, if there are formal names for them, please let me know.

Open Relay Spampot: This kind of honeypot is running as an open mail relay server. In case you are not familiar with, open relay means users can send message through the server anonymously.

Close Relay Spampot: The spampot is running as a close mail relay server. To expose the server to spammers, you need to have your own domain binding to this server with email address(es) exposing to spammers/ hackers. For example, we can have onhacks.org binding to a spampot and spam@onhacks.org is one of the email address we want to expose to spammers. However, about the methods to increase the exposure of an email addresses is out of scope, we can discuss more on it later.

In my setup, I decided to run spampot as open mail relay server.

Setup
I have VirtualBox installed on top of Windows 7. I am using Ubuntu as the guest OS, this is because it seems the implementation was done in *nix system. Since port 25 is the default port for SMTP service, we need to forward packets from host (Win7) to guest (Ubuntu) so that the spampot in guest OS can react to incoming connection at host port 25.

(Assuming that you are using NAT for VirtualBox)
To enable port forwarding, you need to set the HostPort 25 forwarding to GuestPort 25. For more detail around port forwarding in VirtualBox, please refer to this article.

However, you will soon discover that it is not possible to perform port forwarding if the port is reserved (< 1024). This can easily be resolved by running VirtualBox with admin credential (ie. Run As Administrator).

The spampot.py requires Sendmail being installed in Linux. Since sendmail actually is a service listening to port 25, I will do the follow to switch to spampot.py:

sudo /etc/init.d/sendmail stop
sudo spampot.py 0.0.0.0

Surely you can set this automatically run when the system is started.

The last thing is to add a DNS record pointing to my machine. I have smtp.onhacks.org. pointing to it. Since it is still under experiment, the machine is running at home and IP is dynamic, I need to change it often.

Result
Currently, I got 0 message after running the spampot for few days. I have google around and looks like open relay spampot is not that popular anymore because many server admins aware that spammers were abusing open mail relay servers, they don’t allow open relay anymore. As a result, submitting spams to open relay servers is not efficient anymore.

I will continue running the spampot these days and see if we can get more spam through open relay honeypot. Afterward, I will work on close relay spampot.

Reference

  1. Open mail relay – Wikipedia
  2. spampot.py – written by Neale Pickett
  3. Configure Port Forwarding to a VirtualBox Guest OS – Tombuntu
  4. SpamPots Project – Cert.org
  5. Brazilian Honeypots Alliance
2009
11.29

Last night, I was waken by a call that a server was not working. This server is hosting an online judging system (similar to uva.onlinejudge.org, which has algorithmic problems that users can solve). I took a quick look at the compilation process and web pages, everything looked good except it always return “Compilation Error” no matter what was the content in source code (even a HelloWorld!). By manually compiled the source code, the compilation error message gave more detail information about the root cause…Not enough space to link the object files! When I did a “df”, it said that the data partition was used 100%!!

After a deeper investigation, I discovered that one of the user was preparing questions on the machine, and generated a 12GB test data unexpectedly. Since this is a very old machine, it only has a 14GB hard disk for data storage and it already had 2GB data on it. This is kind of DoS attack since no one can submit sources to the judging system even though they can navigate to it.

Lesson learned: We should have restriction on storage usage of each user instead of unlimited.

Any other suggestion to prevent this happen again?

2009
11.20

Details at Jose Nazario of Arbor Networks : http://asert.arbornetworks.com/2009/11/malicious-google-appengine-used-as-a-cnc/ .

Log0 is quite busy lately.

2009
11.16

BotHerder 0.1 is now available for download here, or at the source page. Help file included at README in the zip.

This tool was not to be released when I first built it, however it becomes more useful. It has a lot of functions to include in the future such as adopting general botnet communication, and making it easier to use and automate, and even scriptable.

2009
11.14

The deck of “A DIY Botnet Tracking System” is here :

I will post the source code to the tool after updated with HELP document. Feel free to email me =)

BTW, hac.ka is my friend and the otherOnHacks teammate whom I mentioned during my final speech. He works on Email and DNS related items.

http://www.slideshare.net/log0/a-diy-botnet-tracking-system
2009
11.06

Microsoft Security Intelligence Report 7th is out! Interested individuals should check it out. =)

http://www.microsoft.com/security/portal/Threat/SIR.aspxhttp://www.microsoft.com/security/portal/Threat/SIR.aspx
2009
10.31

Yup, suddenly I decided to be a speaker than a seat warmer.

The topic will be “A DIY Botnet Tracking System”. I will share my own self-built tool for botnet tracking tool, and the problems one might meet during doing so.

If anyone planning to show up at ISF 2009, be sure to drop by and grab a drink!

2009
10.31

Botnets in Q3 2009

Sharing several news article on botnets :

ClickForensics : Botnets Accounted for 42.6 Percent of All Click Fraud in Q3 2009.

Symantec : Botnets Generate 87.9% of Total Spam Messages

DarkReading : Botnet Unleashes Variety Of New Phishing Attacks <– this one particularly impersonated as Microsoft support to fool users to download “cleanup tool”.

2009
10.28

OWASP简介:

OWASP是一个开源的、非盈利的全球性安全组织,致力于应用软件的安全研究。我们的使命是使应用软件更加安全,使企业和组织能够对应用安全风 险作出更清晰的决策。目前OWASP全球拥有130个分会近万名会员,共同推动了安全标准、安全测试工具、安全指导手册等应用安全技术的发展。   近几年,OWASP峰会以及各国OWASP年会均取得了巨大的成功,推动了数以百万的IT从业人员对应用安全的关注以及理解,并为各类企业的应用安全 提供了明确的指引。作为OWASP中国的第一届年会,OWASP安全专家将为大家带来精彩的演讲

CISRG简介:

CISRG是一个活跃的技术研究团队,团队成员都拥有自己特定的技术研究方向,目前的研究方向主要有:操作系统内核、逆向工程、漏洞挖掘、WEB漏洞挖掘及漏洞利用、渗透测试、信息搜集与社会工程。

议题征集范围(不限于以下范围)

  • 应用程序威胁建模及其防御技术
  • WEB2.0方向的安全技术
  • WEB应用程序漏洞挖掘及分析
  • 数据及数据库安全
  • 浏览器安全(Firefox、IE、Safari、Chrome等)
  • 操作系统研究(Vista、Windows7)
  • 逆向工程
  • 反恶意代码前瞻性技术
  • 漏洞挖掘技术
  • 智能移动设备安全研究
  • 硬件设备安全性研究
  • 取证分析
  • 入侵检测
  • 点对点网络
  • 渗透测试

参会者票价

10月31日前报名:¥300
10月31日后报名:¥500

付款方式

户名:杭州安恒信息技术有限公司
账号:77818100000385
开户行:杭州银行科技支行
交款事项:写明姓名,注明年会


会议时间安排

2009年11月12日
2009年11月13日
全天两日


会议地点

中国 上海
详细地址:待定


联系我们

联系人:刘彦俊(小姐)
联系电话:+86 137 1380 7300
电子邮箱:rip@owasp.org

===

I should be there. Are you coming? =)