## Some Spam Karma Automation in Python

Being a WordPress blogger, unfortunately, I have to wade through tons and tons of comment spam. As of current, I have caught 18,119 comment spams since the inception of this blog in September 2005. Thats more than 1,000 spams per month! That's some serious spam.

Problem is, though, that some of it is getting through the Spam Karma 2 filters and tests (the single best spam protection for WordPress that you can get). As such, I have had to strengthen the grip on some of the rules. Unfortunately, though, this causes some legit comments to get flagged as spam. If you have fallen victim to this, please contact me, with the date and the post of the comment, and I'll retrieve it from the database.

Well, I can't have this happening. If it does, I need to catch it. So, in an effort to learn Python better (seeing as though my job requires it), I set out to write a script that emails me the results of each days spam at the end of the day. This proved to be a bit tricky, but thanks to the wonderful Python language, it was a lot of fun, and not too bad. After the script is written, I throw it in a cron job that runs each night before the beginning of the new day.

First, let me mention that I know some of this code isn't as efficient as it could be (if you're an uber-Python hacker, please be gentle). The problem is, I am a complete n00b to Python. As such, I am sure that there are much better ways to achieve the results that I am looking for. However, given the results, it runs quickly, and I'm proud of it. At any event, here is a sample output and here is the code:

 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667 #!/usr/bin/python # # Script details: # Connects to your wordpress database, runs a query, returns the result, prints to a file, and emails you a notification # Spam Karma 2 provides the ability to email you spam comments under a certain karma threshold.  Unfortunately, the # email sent is not in a good format, and the email can be fairly large in size.  This script returns the comment spams # into a comma-separated file left on the server, and just emails you a notification to review the file. # # To take the best advantage of this script, place in a directory on your server away from the web document root, then # place into cron to run at the end of each day.  The cron syntax: # # 59 23 * * * ~/sk_mysql.py # # Author: Aaron Toponce # Version: 0.3 # License: GPL v. 2 import time, MySQLdb, smtplib, re, csv from datetime import datetime # Function declarations def strip_tags(value):     value = re.sub(r'< [^>]*?>', '', value) # Returns the given HTML with all tags stripped     value = re.sub(r']*?', '', value) # Returns the given phpBB with all tags stripped     value = re.sub(r'[\r\n\$]*?\n','',value) # Returns the given text with no newlines     return value # Database connection and query Con = MySQLdb.Connect(host='127.0.0.1', port=3306, user='root', passwd='cs3210!', db='wordpress') Cursor = Con.cursor() sql = 'SELECT c.comment_ID, s.karma, c.comment_author, c.comment_author_email, c.comment_date, c.comment_content FROM wp_comments c, wp_sk2_spams s WHERE c.comment_ID = s.comment_ID AND c.comment_date like \'%%%s%%\' AND c.comment_approved = \'spam\' AND s.karma > -50 ORDER BY c.comment_ID'  % time.strftime('%Y-%m-%d') Cursor.execute(sql) Results = Cursor.fetchall() Con.close() # Creating a modifiable list NewList1 = [] for i in Results:     NewList2 = []     for j in range(len(i)):         if j == 5:             NewList2.append(strip_tags(i[j]))         elif j == 4:             NewList2.append(i[j].strftime('%Y-%m-%d %H:%M:%S'))         else:             NewList2.append(i[j])     NewList1.append(NewList2) # Creating a .csv file for mailing now = datetime.now() outfile = '/var/www/pthree/mysql/%s.csv' % now.strftime('%Y-%m-%d') output = csv.writer(open(outfile, 'w')) for item in csv.reader(['id,karma,author,email,date,content']):     output.writerow(item) for item in NewList1:     output.writerow(item) # Email to the user server = smtplib.SMTP('localhost') sender = 'root@localhost' reply_to = 'root@localhost' recipient = 'aaron@localhost' subject = '[Pthree.org] MySQL wordpress spam comments results' headers = 'From: %s\r\nReply-To: %s\r\nTo: %s\r\nSubject: %s\r\n\r\n' % (sender, reply_to, recipient, subject) text = '"http://www.pthree.org/mysql/%s.csv" is ready for review.' % now.strftime('%Y-%m-%d') message = headers + text server.sendmail(sender, recipient, message)

UPDATE (11-Mar-2007): Realizing that Python has a CSV module, I added that into the code, and removed what I was typing up by hand. First it removed 6 lines of code, plus, it seems to execute a bit quicker. Cheers!

1. using on | March 7, 2007 at 7:50 pm | Permalink

Doesn't Spam Karma 2 have a built-in spam digest plugin?

2. using on | March 7, 2007 at 8:41 pm | Permalink

Is Spam Karma better than Akismet?

Ever since I started using it (in place of challenge questions), my spam problem has been fairly minimal.

Granted, I don't have nearly the spam that you do...

3. using on | March 7, 2007 at 8:46 pm | Permalink

Jeremy- It does have that option, and I should specify it in the post. Unfortunately, it sends the report inline in the email, and the layout is not friendly. This script returns the query to a comma-separated file, from which I can view just as if I were looking at the database result itself. And, the file is not emailed, but stored on a server.

MDL- Yes. Spam Karma 2 is much better than Akismet. Akismet is good, but does not have near the options that Spam Karma 2 does. I think you'll be pleasantly surprised.

4. using on | March 8, 2007 at 3:14 am | Permalink

Why is Spam Karma 2 the single best?

I use Protect web forms captcha, I never get any spam.

K,

5. EvilDead using on | March 8, 2007 at 5:01 am | Permalink

Hi Aaron,

Just to let you know about list comprehension:

NewList2 = [i[j] for j in range(4)]

has the same effect as:

NewList2 = []
for j in range(4):
NewList2.append(i[j])

This is useful in Python, you may need it to write clearer code if you need to use Python in your job

6. EvilDead using on | March 8, 2007 at 5:04 am | Permalink

Oh, I forgot the strings. In Python, you may use either ' or " for string, depending on your needs. So you can write this:

text = "'%s' is ready for review." % outfile

This produces the same result as your code without the escape characters.

7. using on | March 8, 2007 at 6:39 am | Permalink

Karl- Spam Karma 2 is the best because of the sheer amount of control that you have over your system. CAPTCHAs are good, and you may not be getting any spam through, but they aren't perfect, and when you're getting hit as heavy as I am, you need something better.

EvilDead- Cool. Thanks for the tip. And I'm aware of the strings.

8. Yoni using on | March 8, 2007 at 12:08 pm | Permalink

Just minor python style points -- you use capitalized variable names, which is generally frowned on, just as it is in Java. Classes are capped, variables aren't. So NewList1 should be something like "prettyResults" or something else meaningful.

Also, you use full tabs; it's far more common to stick with four spaces (you can set your text editor to use spaces when you hit tab) unless you're maintaining all-tabbed legacy code. It makes for more consistent display across editors and platforms.

For that matter, what are you using NewList1 for, anyways? You never read the value...

not so much criticisms as style pointers; looks pretty good otherwise. Enjoy python!

9. Mike using on | March 9, 2007 at 11:04 am | Permalink

good to see someone actually posting code to a blog.

10. using on | March 9, 2007 at 8:27 pm | Permalink

Use the CSV module, Luke^WAaron!