Use RegEx to filter spam from your mail server - part 2
July 9, 2017
On the 4th of July I wrote an article explaining how you can use Regular Expressions (RegEx) to create spam filters that can be applied to a mail server for your commercially hosted domains. This article shows how to create RegEx filters to block spam based on the IP addresses of the mail servers found in the headers of incoming emails.
If you haven't read the first article in this series, I recommend you do so now. It has lots of important information that this article builds upon. It will open in a new tab so you can refer to it as necessary.
Email messages contain a section that is normally hidden from view when you read the body text. It is called the email headers and they contain the actual routing details for each incoming and outgoing message. Some of those details can be forged by spammers and frequently are. But, others are not easily forged, including certain numeric entries that relate to the IP addresses of the email servers through which the message has passed.
So, without any further ado, let's look at a spam filter to block unwanted IP addresses.
The following header details came from a spam email foisting counterfeit sunglasses from China. I don't care to receive any email from China at this point in time and these spam messages for counterfeit goods never contain an unsubscribe link. I have replaced personally identifiable details with the word REDACTED.
Return-path: <[email protected]>
Envelope-to: REDACTED
Delivery-date: Sun, 09 Jul 2017 03:59:20 -0600
Received: from [47.94.42.47] (port=2712 helo=qq5.wolegequ.co)
by REDACTED.com with esmtp (Exim 4.87)
(envelope-from <[email protected]>)
id 1dU8zo-002MYl-1T
for REDACTED; Sun, 09 Jul 2017 03:59:20 -0600
Message-ID: <[email protected]>
From: "bxjoag" <[email protected]>
To: REDACTED
Subject: Ray Ban Sunglasses sale with 80% discount REDACTED
Date: Sun, 9 Jul 2017 17:59:08 +0800
MIME-Version: 1.0
Content-Type: text/html;
charset="utf-8"
Content-Transfer-Encoding: base64
X-mailer: Pkx 4
In the above headers, the line beginning with "Received: from [47.94.42.47]" contains the IP address of the mail server that delivered the email to my hosted domain's email system. If I copy the numbers 47.94.42.47 and paste them into the IP input box at tcpiputils.com and submit it, the results show that the IP is registered to a company in Hangzhou, Zhejiang, China. Additionally, they reveal that this IP is part of a very large range of 262,142 IPs, known as a CIDR, ranging from 47.92.0.0 through 47.95.255.255, which is designated in CIDR shorthand notation as 47.92.0.0/14. The entire CIDR is assigned to China.
Say you want to create a spam filter that will block just that one IP address. Here is how you would do that. First, you would create a new email filter by logging into your (Apache server based) hosting account, then follow the click route to "cpanel > Email > Account Filtering > Email Filters > New Filter."
Select "All email addresses" then type in a name for the filter, like "Block Chinese IPs."
First Rule = Any Header > Matches Regex:
Received:\ from\ \[47\.94\.42\.47\]
Actions: Discard
Click on Create Filter and it should be saved to your Filters list.
If you want to block all 262,142 IPs within 47.92.0.0 through 47.95.255.255, you'll need to use slightly advanced RegEx, thusly:
Received:\ from\ \[47\.9[2-5](\.\d{1,3}){2}\]
Here is an explanation of the two RegEx filters.
"Received: from\ " is the beginning of the pertinent line in the header. The \(space key) is how one designates a blank space. Most RegEx interpreters require spaces to be "escaped" by using the backslash character. While some interpreters won't mind you just typing a space with the space key, some may throw an error, ignoring all or part of your filter. Play it safe and escape your blank spaces with a \ .
Next, \[ and \] is how you escape the bracket characters if you need to use them literally. Since the opening and closing brackets are used in the IP address in the header, you must escape them with backslashes before each of them. The reason you need to escape the brackets is because they also have a particular meaning to the RegEx interpreter. Brackets around numbers, letters, or other characters, mean they contain a range of whatever is between the opening and closing brackets.
The numbers making up the IP single address are actual numbers found in the header, so they can be pasted in as found. But, the dots between groups of numbers have a meaning to the RegEx engine (periods indicate zero or one of any character). To avoid mistaken matches, we must escape those dots (periods) with a leading backslash, like this: \.
Next, in the second filter that encompasses the entire /14 CIDR, the numbers making up a range are within brackets. Here again is that expression: 47\.9[2-5](\.\d{1,3}){2}
That range in our example is from 92 through 95, and is coded using: 9[2-5]. This covers the number 9, in combination with any number from 2 through 5. We could also write the actual numbers inside the brackets as [2345], but I find the shorthand with a dash between the lowest and highest numbers much easier to write.
Finally, the expression \.\d{1,3}{2} translates to "a dot, followed by one, two or three numeric digits, twice" - because \d means one numeric digit. The curly brackets encompass a multiplier or multiplier range for whatever immediately precedes the left curly bracket, in this case, 2 times. The reason I used the shorthand method of any digits one through three times is that when dealing with IPv4 IP addresses, all available numbers range from 0 through 255. It is much simpler to write \.\d{1,3}{2} than the long form: ([0-1][0-9]{0,2}|2[0-5]{0,2})\.([0-1][0-9]{0,2}|2[0-5]{0,2})
. One has 13 characters while the other has 61 to accomplish the same results.
You can add more "Received: from" IP lines as add-on rules by using the + symbol on the right of the last rule and choosing the correct new conditions and expressions. Always save your existing filters before making additions, in case you make a mistake, or the changes don't take (it happens on some cpanels).
What we've learned today
Regular expressions can be used to match numbers making up IP addresses of spamming mail servers.
Coming up in the next installment, I will show you how to combine numeric rules into one long line of code and how to safely edit them.
If you like this article please share it.
The content on this blog may be reprinted provided you do not modify the content and that you give credit to Wizcrafts and provide a link back to the blog home page, or individual blog articles you wish to reprint. Commercial use, or derivative work requires written permission from the author.