Image of the glider from the Game of Life by John Conway
Skip to content

Some Spam Karma Automation in Python

Being a WordPress blogger, unfortunately, I have to wade through tons and tons of comment spam. As of current, I have caught 18,119 comment spams since the inception of this blog in September 2005. Thats more than 1,000 spams per month! That's some serious spam.

Problem is, though, that some of it is getting through the Spam Karma 2 filters and tests (the single best spam protection for WordPress that you can get). As such, I have had to strengthen the grip on some of the rules. Unfortunately, though, this causes some legit comments to get flagged as spam. If you have fallen victim to this, please contact me, with the date and the post of the comment, and I'll retrieve it from the database.

Well, I can't have this happening. If it does, I need to catch it. So, in an effort to learn Python better (seeing as though my job requires it), I set out to write a script that emails me the results of each days spam at the end of the day. This proved to be a bit tricky, but thanks to the wonderful Python language, it was a lot of fun, and not too bad. After the script is written, I throw it in a cron job that runs each night before the beginning of the new day.

First, let me mention that I know some of this code isn't as efficient as it could be (if you're an uber-Python hacker, please be gentle). The problem is, I am a complete n00b to Python. As such, I am sure that there are much better ways to achieve the results that I am looking for. However, given the results, it runs quickly, and I'm proud of it. At any event, here is a sample output and here is the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#!/usr/bin/python
#
# Script details:
# Connects to your wordpress database, runs a query, returns the result, prints to a file, and emails you a notification
# Spam Karma 2 provides the ability to email you spam comments under a certain karma threshold.  Unfortunately, the
# email sent is not in a good format, and the email can be fairly large in size.  This script returns the comment spams
# into a comma-separated file left on the server, and just emails you a notification to review the file.
#
# To take the best advantage of this script, place in a directory on your server away from the web document root, then
# place into cron to run at the end of each day.  The cron syntax:
#
# 59 23 * * * ~/sk_mysql.py
#
# Author: Aaron Toponce
# Version: 0.3
# License: GPL v. 2

import time, MySQLdb, smtplib, re, csv
from datetime import datetime

# Function declarations
def strip_tags(value):
    value = re.sub(r'< [^>]*?>', '', value) # Returns the given HTML with all tags stripped
    value = re.sub(r'

</span>[^<span class=" />

]*?$$'
, '', value) # Returns the given phpBB with all tags stripped
    value = re.sub(r'[\r\n$]*?\n','',value) # Returns the given text with no newlines
    return value

# Database connection and query
Con = MySQLdb.Connect(host='127.0.0.1', port=3306, user='root', passwd='cs3210!', db='wordpress')
Cursor = Con.cursor()
sql = 'SELECT c.comment_ID, s.karma, c.comment_author, c.comment_author_email, c.comment_date, c.comment_content FROM wp_comments c, wp_sk2_spams s WHERE c.comment_ID = s.comment_ID AND c.comment_date like \'%%%s%%\' AND c.comment_approved = \'spam\' AND s.karma > -50 ORDER BY c.comment_ID'  % time.strftime('%Y-%m-%d')
Cursor.execute(sql)
Results = Cursor.fetchall()
Con.close()

# Creating a modifiable list
NewList1 = []
for i in Results:
    NewList2 = []
    for j in range(len(i)):
        if j == 5:
            NewList2.append(strip_tags(i[j]))
        elif j == 4:
            NewList2.append(i[j].strftime('%Y-%m-%d %H:%M:%S'))
        else:
            NewList2.append(i[j])
    NewList1.append(NewList2)

# Creating a .csv file for mailing
now = datetime.now()
outfile = '/var/www/pthree/mysql/%s.csv' % now.strftime('%Y-%m-%d')
output = csv.writer(open(outfile, 'w'))
for item in csv.reader(['id,karma,author,email,date,content']):
    output.writerow(item)
for item in NewList1:
    output.writerow(item)

# Email to the user
server = smtplib.SMTP('localhost')
sender = 'root@localhost'
reply_to = 'root@localhost'
recipient = 'aaron@localhost'
subject = '[Pthree.org] MySQL wordpress spam comments results'
headers = 'From: %s\r\nReply-To: %s\r\nTo: %s\r\nSubject: %s\r\n\r\n' % (sender, reply_to, recipient, subject)
text = '"http://www.pthree.org/mysql/%s.csv" is ready for review.' % now.strftime('%Y-%m-%d')
message = headers + text
server.sendmail(sender, recipient, message)

UPDATE (11-Mar-2007): Realizing that Python has a CSV module, I added that into the code, and removed what I was typing up by hand. First it removed 6 lines of code, plus, it seems to execute a bit quicker. Cheers!

{ 10 } Comments

  1. Jeremy using Firefox 2.0.0.1 on Ubuntu | March 7, 2007 at 7:50 pm | Permalink

    Doesn't Spam Karma 2 have a built-in spam digest plugin?

  2. MDL using Firefox 2.0.0.2 on Ubuntu | March 7, 2007 at 8:41 pm | Permalink

    Is Spam Karma better than Akismet?

    Ever since I started using it (in place of challenge questions), my spam problem has been fairly minimal.

    Granted, I don't have nearly the spam that you do...

  3. Aaron using Firefox 2.0.0.2 on Ubuntu | March 7, 2007 at 8:46 pm | Permalink

    Jeremy- It does have that option, and I should specify it in the post. Unfortunately, it sends the report inline in the email, and the layout is not friendly. This script returns the query to a comma-separated file, from which I can view just as if I were looking at the database result itself. And, the file is not emailed, but stored on a server.

    MDL- Yes. Spam Karma 2 is much better than Akismet. Akismet is good, but does not have near the options that Spam Karma 2 does. I think you'll be pleasantly surprised.

  4. Karl Lattimer using Firefox 2.0.0.2 on Ubuntu | March 8, 2007 at 3:14 am | Permalink

    Why is Spam Karma 2 the single best?

    I use Protect web forms captcha, I never get any spam.

    K,

  5. EvilDead using Firefox 2.0.0.2 on Ubuntu | March 8, 2007 at 5:01 am | Permalink

    Hi Aaron,

    Just to let you know about list comprehension:

    NewList2 = [i[j] for j in range(4)]

    has the same effect as:

    NewList2 = []
    for j in range(4):
    NewList2.append(i[j])

    This is useful in Python, you may need it to write clearer code if you need to use Python in your job :-)

  6. EvilDead using Firefox 2.0.0.2 on Ubuntu | March 8, 2007 at 5:04 am | Permalink

    Oh, I forgot the strings. In Python, you may use either ' or " for string, depending on your needs. So you can write this:

    text = "'%s' is ready for review." % outfile

    This produces the same result as your code without the escape characters.

  7. Aaron using Firefox 2.0.0.2 on Ubuntu | March 8, 2007 at 6:39 am | Permalink

    Karl- Spam Karma 2 is the best because of the sheer amount of control that you have over your system. CAPTCHAs are good, and you may not be getting any spam through, but they aren't perfect, and when you're getting hit as heavy as I am, you need something better.

    EvilDead- Cool. Thanks for the tip. And I'm aware of the strings. :)

  8. Yoni using Firefox 2.0.0.2 on Windows XP | March 8, 2007 at 12:08 pm | Permalink

    Just minor python style points -- you use capitalized variable names, which is generally frowned on, just as it is in Java. Classes are capped, variables aren't. So NewList1 should be something like "prettyResults" or something else meaningful.

    Also, you use full tabs; it's far more common to stick with four spaces (you can set your text editor to use spaces when you hit tab) unless you're maintaining all-tabbed legacy code. It makes for more consistent display across editors and platforms.

    For that matter, what are you using NewList1 for, anyways? You never read the value...

    not so much criticisms as style pointers; looks pretty good otherwise. Enjoy python!

  9. Mike using Firefox 2.0.0.2 on Mac OS | March 9, 2007 at 11:04 am | Permalink

    good to see someone actually posting code to a blog.

  10. David Adam using Firefox 1.5.0.10 on Windows XP | March 9, 2007 at 8:27 pm | Permalink

    Use the CSV module, Luke^WAaron!

    (It makes your code more readable and has the advantage of escaping commas for you.)

Post a Comment

Your email is never published nor shared.

Switch to our mobile site