Background
I get a lot of email. I am on many mailing lists and I do a lot of volunteer work for the SCA which means a lot of email. I also get a lot of bacn, catalogs and newsletters that I signed up for and don't mind getting. I get notifications from FriendFeed, Facebook, Flickr, Twitter, and other online applications, which is also bacn I suppose. Of course, in addition to all that, I get spam. If you want an indication of how much spam I get, check this out. Even though GMail claims to clear my spam folder for things older than thirty days, I have spam since Feb 25th, 2009 in there right now a total of 19,060 items. Since today is 4/25/2009 this will give me a good estimate of my spam per month of 9,500 (across two months). All in all its fairly easy to estimate that 10,000 items come to me every month, yet I haven't gone insane. Why? Well Google does a decent job of catching much of the spam but I have something else in my pocket called Popfile.
POPFile has been around for a while (their download archive page puts the first release in October of 2002) and I first heard about it on the ScreenSavers, about 6 years ago. So you might understand why I was yelling at Leo Laporte in my car as I was listening to a recent podcast when he said that he had too many email's in his inbox. He was the one who introduced me to POPFile! POPFile is not just a spam killer, but it classifies your email into any number of buckets that you want. It took care of my bacn, before there was a term for bacn. It was able to distinguish my SCA email from my personal email, I could categorize responses from forums I participated in as different from newsletters and other notifications.
I had changed my email a couple of times because it was just awash in spam. Now I had Popfile and was able to use my jeffmartin.com as a mail address without worrying about getting overloaded by the 95% noise I would get over my 5% signal. This worked great for about 6 years because I was using a POP email account and outlook email client. But I had just got my G1 Android Phone and it had Gmail built in. I wanted to join the crowd and get my email on my phone, but I was stuck on the pop account and had used Popfile for so long, I didn't want to give it up. Luckily, Google and Popfile conspired to help me out. GMail recently started supporting IMAP and Popfile added an IMAP module to its core installation. Now I have the power of email anywhere I can log into the Internet and its all sorted for me when I get there so I can focus on what I want to work on.
Why is Popfile so Awesome?
You may be saying, "But Jeff, my email is already pretty well sorted. I use GMail filters and stuff goes where it needs to." You might be right, and filters may be enough for the light email user, but you have to spend time setting up each filter and its possible that stuff may slip through that you don't expect. For instance, if you support a product or service, or have a website that you get email from, you might want all that email sorted into different categories. You don't know the email address of the people who will be sending to and you can't guarantee the subject will have particular words in it even if you ask people to do it that way. Maybe you participate in a lot of forums and you have email notifications turned on so that you can see when someone responded to your post. With Popfile, you can start posting at a new forum and have pretty high confidence that response notices will be correctly sorted without having to set up a new filter for that particular forum. Here is the really killer feature: Popfile learns from its mistakes (and you can change your mind and reteach it new tricks). When a piece of email goes to the wrong place (or if Popfile is unsure, it will put it in an Unclassified bucket), you can direct the email to the right place and Popfile will learn from that. (This is a feature of the IMAP module, the normal method of training Popfile is to use the interface to "teach it"). It learns through an algorithm developed by Thomas Bayes around 1720, which appeals to the bit of history buff I have. This Bayesian algorithm is now the basis for many "learning" computer software programs.
Setup
So here is how you set it up.
What you need to start
- First you obviously need a GMail account. I use a Google Apps account so I can have GMail delivered to my own domain.
(As an aside, I also have a GMail email that I use for Google services that don't support a google apps login, this GMail just forwards to my main account).
- A computer that can run Popfile. Popfile runs as a service in the background and has a web interface. Popfile runs on windows but has cross platform versions. I am not a Mac guy or a Linux guy but I don't see any reason why it wouldn't run on those OS's.
I use my desktop for this task. Its pretty much always on and connected to the Internet, so it can have my email sorted and ready to go when I want to access GMail. I don't see why this wouldn't work with a laptop that ran Popfile too but it may take a little time to sort your email if you hadn't been connected to the Internet for a while, so you could potentially log in to GMail and still be waiting for POPFile to sort your inbox. Also, if you hit the Internet from another computer and your laptop isn't connected from somewhere else, your email won't be sorted.
Directions
- Set up IMAP in your Gmail account using these instructions.
- Download and Install Popfile.
- Setting up Popfile:
- Enable the IMAP module by following the directions here.
- Once the IMAP module is turned on, you need to configure it with your gmail information.
- You also need to have labels already created in Gmail, you can then match buckets in Popfile to Labels in Gmail (you need to create buckets). Make sure you match your spam bucket to the standard spam label in GMail.
- You will need to now train POPFile to know what types of email you want in your labels. Initially all of your incoming email will be delivered to the Unclassified bucket. After you start training POPFile, you should be pretty happy with the results. According to POPFile's stats, the average accuracy of the sorting is over 98% after 500 emails.
Conclusion
Hopefully you can stop being afraid to leave your email address out in the open now. Hopefully people can stop going from one free email to another when the spam gets too bad. Hopefully you found this article helpful. POPFile is a great way to not only filter out your spam but help you get things done by pre-sorting your mail into the buckets you want.