Building the spam-toasting moderator bot

A general chat area, here you can post anything that doesn't belong in another forum.
User avatar
Fleexy
Tool Smith
Posts: 1432
Joined: Fri Dec 12, 2008 1:21
Location: Abiathar C&C

Building the spam-toasting moderator bot

Post by Fleexy »

There's been a little bit of discussion about the feasibility of automatic spam removal in the off-topic thread, but I wanted to make a new thread to organize discussion on it.

Basically, the idea is to have a robot with some moderative privileges that performs a quick sweep of the forum every few minutes or so and performs some heuristics on new topics to determine whether they're spam and, if they are, locks/deletes/moves them. Possible specifics (open to discussion):
  • Every 10 minutes, download each subforum index
  • For each topic by an author with an account that appeared less than a week ago and has not been confirmed good, visit the topic
  • Check for the following suspicious things: links to external (non-PCKF, non-K:M) sites, angle brackets, lack of line breaks, all-lowercase username, several posts (especially in the same subforum) very quickly
  • Approve accounts whose posts or account contain: links to shikadi.net, an avatar, an e-mail address
  • Delete topics deemed spam, or just lock/move if in testing mode
  • Send a message to moderators containing heuristic evidence (why the bot thought it was spam) and the offending username
  • Send a message to the offending user explaining why their posts were removed, apologizing in case it was an error, and providing contact details for human admins
I think I could develop such a program if it's wanted. I have three weeks starting tomorrow of almost complete free time. Some of the processing routines are already written; I used them in the forum downloader. This is what I'm looking for before I can go ahead:
  • Interest in the project, and admin approval
  • A new subforum, preferably hidden to other users, for testing and fake-spamming
  • Eventually: moderator privileges for the bot (I'll have it just log at first)
  • Eventually: a machine to run it on (will probably require .NET, but that can be used on Linux now!)
I'm open to suggestions for its operating procedures and general discussion on forum automation.
namida
Vortininja
Posts: 118
Joined: Mon Jun 22, 2015 1:35

Post by namida »

I'd strongly advise against outright deleting the post; it happens that users post something (which may be falsely flagged as spam) that's important yet they don't have another copy of and can't be bothered typing out again.

Rather, my suggestion would be:
- Topics: Move them to a hidden board. That way, a mod/admin can move them back if it's a false positive.
- Posts: Delete (or maybe just blank? - this way, it can easily be restored to the same position) the post, but save a copy of its contents, perhaps in a dedicated thread for this purpose in the same hidden board.

That is assuming the forum software doesn't have some kind of "hide post" or "undelete" option.
User avatar
Levellass
S-Triazine
Posts: 5265
Joined: Tue Sep 23, 2008 6:40

Post by Levellass »

I agree; if we build this thing we want to limit its scope. What assurance do we have that our universe is safe?
What you really need, not what you think you ought to want.
User avatar
Flaose
Vorticon Elder
Posts: 568
Joined: Sat Oct 27, 2007 20:30
Location: The Frozen Hell
Contact:

Post by Flaose »

Unfortunately we can't make a hidden subforum (as far as I can tell) with this version of phpBB. However, we can make a locked Trash subforum that suspected spam can be moved to. Since the inboxes are so limited on the forum as well I am a little concerned about it getting filled up and losing the ability to receive bot notifications.
Cerebral Cortex 314 - For All of your Commander Keen Needs.
Eat at Joe's
KeenEmpire
Intellectuality
Posts: 855
Joined: Thu Nov 01, 2007 0:38

Post by KeenEmpire »

.NET seems like a huge dependency, and though parts of it have been open-sourced, I haven't heard anything about people using it on Linux. Is it really necessary? You don't want to be stuck with needing a Windows Server (like I've experienced for a previous project).

As long as it doesn't require unique libraries, it should be doable in a cross-platform language like python.
"In order to ensure our security, and continuing stability, the Kingdom has been reorganized into the First Vorticon Intellectuality!" Image
namida
Vortininja
Posts: 118
Joined: Mon Jun 22, 2015 1:35

Post by namida »

Flaose wrote:Unfortunately we can't make a hidden subforum (as far as I can tell) with this version of phpBB. However, we can make a locked Trash subforum that suspected spam can be moved to.
I don't know exactly how phpBB works (I'm an SMF user myself), but shouldn't it be possible to simply configure the forum's permissions so that only Moderators and Administrators (and perhaps specifically-approved regular users) can view its contents? Even if it doesn't completely hide it from view, this should at least mean regular users can't see its contents - and since they'll most likely be able to work out it exists anyway, at most they might get an indication of how many topics are on it. Anyway, isn't it possible to upgrade to a newer version? (I would assume that - provided there's an admin around with direct access to the database - you could always create a second copy, perhaps on a seperate server to avoid damage to the existing board, to test the upgrdae process, before actually applying it to the site.)



In regards for language used to code it - keep in mind that just because a language exists doesn't mean someone will know how to use it. I'm sure I could make something similar in Delphi if I had a need to; but I don't know the first thing about Python (beyond that it exists), for example.
KeenEmpire
Intellectuality
Posts: 855
Joined: Thu Nov 01, 2007 0:38

Post by KeenEmpire »

Python is just an example - though, as languages go, a relatively straightforward one. My concern is more with the first paragraph.

A few years ago, we wanted to use and customize a C# project. It was pretty much a nonstarter. The features used did not work in Mono, and the only alternative was to pay for a Windows server just to run the project. A program is not useful if it comes with too many shackles.
Last edited by KeenEmpire on Tue Aug 04, 2015 11:38, edited 1 time in total.
"In order to ensure our security, and continuing stability, the Kingdom has been reorganized into the First Vorticon Intellectuality!" Image
namida
Vortininja
Posts: 118
Joined: Mon Jun 22, 2015 1:35

Post by namida »

Yes - but the point here is, Fleexy may only be familiar with .NET coding. In which case, unless someone else develops it, it may have to run on .NET. That being said, isn't there a compatible framework for Linux? (The name "Mono" rings a bell here...)
KeenEmpire
Intellectuality
Posts: 855
Joined: Thu Nov 01, 2007 0:38

Post by KeenEmpire »

As always, it's up to whoever steps up to the plate. However, Fleexy has indicated some free time, so it might be a good opportunity to pick up a more cross-platform language as well. Additional languages are generally easier to learn, and on top of that python is often seen as one of the better teaching languages. Plus, it'd be another thing to add to the résumé ;)
"In order to ensure our security, and continuing stability, the Kingdom has been reorganized into the First Vorticon Intellectuality!" Image
Keening_Product
Kuliwho?
Posts: 2167
Joined: Fri Jan 20, 2012 7:02
Location: Tied up in the Oracle Chamber's basement
Contact:

Post by Keening_Product »

Radical idea: Fleexy sets up a server in his free weeks, hosts the forum, upgrades phpBB and then doesn't need to make the bot 8)
Keening_Product was defeated before the game.

"Wise words. One day I may even understand what they mean." - Levellass
User avatar
lemm
Blorb
Posts: 696
Joined: Fri Jul 03, 2009 10:18
Location: canada lol

Post by lemm »

Keening_Product wrote:Radical idea: Fleexy sets up a server in his free weeks, hosts the forum, upgrades phpBB and then doesn't need to make the bot 8)
Yeah, why are we still using decade-old PHPBB, when we could be using Simple Machines?
namida
Vortininja
Posts: 118
Joined: Mon Jun 22, 2015 1:35

Post by namida »

lemm wrote:
Keening_Product wrote:Radical idea: Fleexy sets up a server in his free weeks, hosts the forum, upgrades phpBB and then doesn't need to make the bot 8)
Yeah, why are we still using decade-old PHPBB, when we could be using Simple Machines?
I'd think actually changing software (as opposed to just upgrading to a new version) would be quite a hassle, unless the idea was to simply archive the old boards and start fresh. That being said, importing the old content as actual content can be done if you're determined enough - current iteration of Lemmings Forums imported the posts from three past iterations, which involved two different forum softwares, and only one of these had a proper SQL backup available.

I'd be happy to help with such an effort if need be; either in a hands-on capacity, or just giving advice (although given my history here, the staff might prefer to limit it to advice). I'd offer the source code of the various code snippets I wrote for the Lemmings Forums job, but I'm not sure if I still have them anymore...
User avatar
Levellass
S-Triazine
Posts: 5265
Joined: Tue Sep 23, 2008 6:40

Post by Levellass »

*Sits back and watches the strange words float by*
What you really need, not what you think you ought to want.
User avatar
Fleexy
Tool Smith
Posts: 1432
Joined: Fri Dec 12, 2008 1:21
Location: Abiathar C&C

Post by Fleexy »

Yeah, upgrading the forum would definitely be a nicer option; we just need access to the server/database. I suppose we could just register a new domain and move over there, leaving this place behind, but then we risk splitting into two communities, or having to constantly cross-post content. That could be solved by locking every subforum here and leaving stickies with signposts to the new place, but I digress.

On programming language: .NET is my main thing, but Java would also be doable if there's a decent HTML parsing library for it. (For the scraper, I used .NET and the HTML Agility Pack, which is awesome.) .NET is actually pretty cross-platform now, what with its core and ASP.NET being open-source. Since it would just be a console program, there wouldn't be any Win32 emulation issues (which we still have with Windows Forms applications). Of course, I would also be open to learning and doing it in Python; it's something I have been sort of poking at for a bit.

As people have mentioned, it's probably a good idea to not insta-delete suspected spam in case a legitimate user posts something that looks like spam (e.g. they have terrible syntax and capitalization habits). I'm also not sure if it's a good idea to lock threads automatically, since this version of phpBB has a security vulnerability that effectively gives administrative power to the owner of a locked post that includes an image. Yay PHP! (OK, so maybe it's not quite so dramatic or deterministic, but it is a serious issue.) It's probably better to move them to a different forum...

I see the Trash Can subforum has been created, so I guess the administration is on board with this? I've created an account called FleexBot for the bot to eventually use; I'm open to other names if people don't like that one. Though I did notice when registering that new accounts have to be activated by administrators; are spammers getting accidentally approved? If we had a bit of a process (maybe require a quick personal e-mail to a board admin) before allowing accounts to post, an automated system wouldn't really be necessary. Though there are probably other interesting services a mod bot could provide, e.g. welcome messages to new users, fixups to BBCode links, moderator elections, whatever else you can think of.
namida
Vortininja
Posts: 118
Joined: Mon Jun 22, 2015 1:35

Post by namida »

Yeah, upgrading the forum would definitely be a nicer option; we just need access to the server/database.
The admins don't have this? That's a bit messy.

This is actually exactly like the case with Lemmings Forums in that regard (except that the existing setup was starting to fall apart - little capacity to deal with spambots aside from deleting their posts; if the site went down our only option was to wait and hope it came back up; etc). Of the three admins, two had gone completely AWOL, and the third didn't have server / database access.

In the end, we did go with the move to a new server - in this case it was a fairly small community, making it easier to get everyone on board (and we were at least able to disable posting and put notification topics up) - and I rigged up some very kludgy but ultimately functional code to process HTML backups of all the posts, and convert them into SQL files to import to the new site's database. Needless to say, I've ensured with the new site that all admins have server and database access (even if they don't fully know how to use it - you can ask for help if needed on how to use server access, but you're up garg creek without a paddle if no one who can give you that access is around), and even the moderators at least have access to a full backup of the site's files and database (minus the database password). To minimize the need to remember a URL, the new site was "lemmingsforums.net" (the old one being "lemmingsforums.com"). There are limitations without database access - for example, there's no way to know who's voted in a poll, so polls either have to be imported read-only, not imported at all, or the votes reset to zero - but ultimately, given the circumstances I think it went well. We also had access to a very old database backup which allowed us to at least import some of the user accounts, although many people did need to re-register on the new site; but that's arguably a minor issue.

A similar process would likely be plausibe here (though I'd have to look at the HTML source of a page to work out how exactly a similar importer could be made). There's a much larger community, but at a quick glance it wouldn't seem we're dealing with too much more in terms of number of posts (and any difference would be negated when we consider that in addition to the at-the-time current iteration of Lemmings Forums, we also imported posts from two older iterations for which backups existed).

Of course, given the need to (as much as possible) shut down the existing site, the staff of the current site would need to be on-board with any such idea; otherwise indeed, a split community is likely to result (either that or the new site fails altogether); and at least one admin would need to be available to assist with it (or give admin powers to a user who's conducting the move). Thus, I would not recommend trying such an option without having at least the staff - and ideally, a decent portion of the community - on board with it.
Post Reply