Being a WordPress blogger, unfortunately, I have to wade through tons and tons of comment spam. As of current, I have caught 18,119 comment spams since the inception of this blog in September 2005. Thats more than 1,000 spams per month! That's some serious spam.
Problem is, though, that some of it is getting through the Spam Karma 2 filters and tests (the single best spam protection for WordPress that you can get). As such, I have had to strengthen the grip on some of the rules. Unfortunately, though, this causes some legit comments to get flagged as spam. If you have fallen victim to this, please contact me, with the date and the post of the comment, and I'll retrieve it from the database.
Well, I can't have this happening. If it does, I need to catch it. So, in an effort to learn Python better (seeing as though my job requires it), I set out to write a script that emails me the results of each days spam at the end of the day. This proved to be a bit tricky, but thanks to the wonderful Python language, it was a lot of fun, and not too bad. After the script is written, I throw it in a cron job that runs each night before the beginning of the new day.
First, let me mention that I know some of this code isn't as efficient as it could be (if you're an uber-Python hacker, please be gentle). The problem is, I am a complete n00b to Python. As such, I am sure that there are much better ways to achieve the results that I am looking for. However, given the results, it runs quickly, and I'm proud of it. At any event, here is a sample output and here is the code:
# Script details:
# Connects to your wordpress database, runs a query, returns the result, prints to a file, and emails you a notification
# Spam Karma 2 provides the ability to email you spam comments under a certain karma threshold. Unfortunately, the
# email sent is not in a good format, and the email can be fairly large in size. This script returns the comment spams
# into a comma-separated file left on the server, and just emails you a notification to review the file.
# To take the best advantage of this script, place in a directory on your server away from the web document root, then
# place into cron to run at the end of each day. The cron syntax:
# 59 23 * * * ~/sk_mysql.py
# Author: Aaron Toponce
# Version: 0.3
# License: GPL v. 2
import time, MySQLdb, smtplib, re, csv
from datetime import datetime
# Function declarations
value = re.sub(r'< [^>]*?>', '', value) # Returns the given HTML with all tags stripped
value = re.sub(r'
" />[^]*?$$', '', value) # Returns the given phpBB with all tags stripped
value = re.sub(r'[\r\n$]*?\n','',value) # Returns the given text with no newlines
# Database connection and query
Con = MySQLdb.Connect(host='127.0.0.1', port=3306, user='root', passwd='cs3210!', db='wordpress')
Cursor = Con.cursor()
sql = 'SELECT c.comment_ID, s.karma, c.comment_author, c.comment_author_email, c.comment_date, c.comment_content FROM wp_comments c, wp_sk2_spams s WHERE c.comment_ID = s.comment_ID AND c.comment_date like \'%%%s%%\' AND c.comment_approved = \'spam\' AND s.karma > -50 ORDER BY c.comment_ID' % time.strftime('%Y-%m-%d')
Results = Cursor.fetchall()
# Creating a modifiable list
NewList1 = 
for i in Results:
NewList2 = 
for j in range(len(i)):
if j == 5:
elif j == 4:
# Creating a .csv file for mailing
now = datetime.now()
outfile = '/var/www/pthree/mysql/%s.csv' % now.strftime('%Y-%m-%d')
output = csv.writer(open(outfile, 'w'))
for item in csv.reader(['id,karma,author,email,date,content']):
for item in NewList1:
# Email to the user
server = smtplib.SMTP('localhost')
sender = 'root@localhost'
reply_to = 'root@localhost'
recipient = 'aaron@localhost'
subject = '[Pthree.org] MySQL wordpress spam comments results'
headers = 'From: %s\r\nReply-To: %s\r\nTo: %s\r\nSubject: %s\r\n\r\n' % (sender, reply_to, recipient, subject)
text = '"http://www.pthree.org/mysql/%s.csv" is ready for review.' % now.strftime('%Y-%m-%d')
message = headers + text
server.sendmail(sender, recipient, message)
UPDATE (11-Mar-2007): Realizing that Python has a CSV module, I added that into the code, and removed what I was typing up by hand. First it removed 6 lines of code, plus, it seems to execute a bit quicker. Cheers!