09.12
I disappeared again after my last post talking about spam collections and DNS misconfigurations. Today, I read log0′s post which he is calling for bots/ tools for his security research. Did you see anything familiar to you? How log0 is showing his contact to us, “log0 [ at ] gmail [ dot ] com”. We were using this format for quite some time, after we realized that showing full form of our address (eg. spam@onhacks.org) increases the chance that our email get exposed to spammers.
However, these kinds of representation already appeared on the Internet for last few years. Did you ever think of one fact is that: A clever spammers just need to modify few lines of code in their bots, changing the target strings they are looking for, then everything is just working as the same as in the past.
The most interesting thing is that RSnake has blogged his finding on this form of email representation last Tuesday. In short, he has googled with “at gmail dot com”, and surprisingly there are at least 6 email addresses in the first result page. There are many variations, but they all have the same pattern, here are some examples:
spam at onhacks dot com
spam [at] onhacks [dot] com
spam (at) onhacks (dot) com
spam <at> onhacks <dot> com
spam “at” onhacks “dot” com
(Obviously, I am trying my best to let spammers know my address)
I spent an hour to write a very simple PoC parser to retrieve email addresses from the result page mentioned above. Obviously there are at least 4 valid email addresses, it is not too hard to get those email addresses by bots. The parser is just looking for 1 ‘at’ and 1 ‘dot’ keyword appears sequentially in the pattern: [any word] “at” [any word] “dot” [any word]. The code is poorly written, I will improve it later this week.
It is not so difficult to discover the pattern between these email addresses, just a piece of cake even for primary students. Then, what kind of representation we should use to show our email address on the Internet? Display the jpeg of the email? Without adding noises to the image, it is as easy as just performing text recognition. With noises on the image, it is more like CAPTCHA. Since most of the CAPTCHA solver aims on specific type of CAPTCHA, it may takes more time to decrypt an “encrypted” email using CAPTCHA. However, it is not unsolvable.
What is the takeaway then? Better not showing your address on web! Or encrypt it into CAPTCHA, at least your email address has less chance being captured by spammers.
English
Yes. Even if you use image, it will only deter normal people. Those who spam ( and spamming for the most percentile on the internet ) have custom OCR engines that break CAPTCHA so, OCR/CAPTCHA is no difficult stuff then.
@log0
I think OCR is not working very well to solve those CAPTCHAs. Neural Network should be the most popular method to break CAPTCHAs, that’s why i was saying that it may take more time to break it because usually an NN can only break CAPTCHAs in specific pattern.
This is the fact that we can only yes or no in showing email, but not selective receive email from. Just like the case in Hong Kong, those phone-call marketing will call your mobile at anytime selling everything. There is no way to stop just spam call.
You may want to have a look at http://mailhide.recaptcha.net/
The ReCaptcha project has been around for some time and it exploits the difficulty of using OCRs or standard image processing techniques to digitize old books.
If the spammers manage to develop an algorithm to solve ReCaptcha effectively, they are doing a service to mankind by enhancing old books digitization…
LP, this is a great tools for hiding. It is a good way to prevent mail address collect and increase the effort of email collect.
I found some spammer don’t collect email address, instead, they random generate address from name+something+domain.
If we can’t stop the spam mail sending out, how about stop receiving it?