Pavo's Postfix Page

Why does Courier-IMAP's pop3d get slow?

So this doesn't really relate directly to postfix, but what the hell, it's my webpage and I think you should know this bit of information. Unless you're fortunate enough to only have users who want to access their mailboxes from a UNIX shell, you probably run either a POP3 or IMAP server. This document describes a condition that causes one such POP3 server to be very slow in specific conditions. It also suggests a possible solution.

Courier-IMAP is a very nice suite of email software developed by Sam Varshavchik. It is a subset of the complete Courier MTA. Since I'm bitten by the Postfix bug, I choose to run Postfix for my SMTP server instead of running the entire Courier suite. This is not to say that Postfix is better than Courier. I have not evaluated the SMTP functions of Courier in enough detail to make such a comparison. Courier-IMAP includes a simple, but effective, POP3 server. The server builds on the foundation of Courier to provide simple access to the same mailboxes Courier's IMAP server does. The IMAP server is very fast and scalable in production. The POP3 server is adiquate. The project isn't called Courier-POP3, now is it?

When it is used for the tasks the POP3 protocol was designed for, Courier's POP3 server works like a charm. But in the real world users don't read RFCs and can't be bothered to learn to use IMAP. The will happily check the "Keep mail on server" button every time. Here we find the problem. POP3 is not a mailbox management protocol. That is what IMAP is for. You shouldn't try to use one to do the other's job. But users will do, keeping hunderds of message on the server at a time. "What's the big deal?" First, this consumes storage space, which is always costly. That isn't Courier's problem however. It also means that the entire mailbox must be scanned each time the user logs in, perhaps every minute!

Because courier's POP3 server is basic, it makes assumptions about how you will use it. And it adheres strictly to the RFC (RFC 1939). Of specific concern to us is Section 11 "Message Format". Here is that section:

11. Message Format

   All messages transmitted during a POP3 session are assumed to conform
   to the standard for the format of Internet text messages [RFC822].

   It is important to note that the octet count for a message on the
   server host may differ from the octet count assigned to that message
   due to local conventions for designating end-of-line.  Usually,
   during the AUTHORIZATION state of the POP3 session, the POP3 server
   can calculate the size of each message in octets when it opens the
   maildrop.  For example, if the POP3 server host internally represents
   end-of-line as a single character, then the POP3 server simply counts
   each occurrence of this character in a message as two octets.  Note
   that lines in the message which start with the termination octet need
   not (and must not) be counted twice, since the POP3 client will
   remove all byte-stuffed termination characters when it receives a
   multi-line response.

What this means is that pop3dserver must read the entire mailbox, character by character, making sure to count all the newline characters twice. Under normal operation this isn't a problem. If the user is logging in to check their mail, they probably need the server to read the entire mailbox anyhow (because they want to download all those messages). As was mentioned, a problem developes when the user does not want to download those messages. If the user is leaving the messages on the server, even for a day or two, the mailbox can grow to megabytes in size (almost 200 megabytes for one of my more troublesome users).

So pop3dserver will read every character of every message in the mailbox. Compound this amount of I/O with the overhead of NFS and you're in for a world of hurt, especially if you've got a few dozen of these users on your server. The politically correct solution is to either correct the user's behavior or switch them to using IMAP. Unfortunately for most SysAdmins this isn't an option. They can't get all the users using IMAP even if they wanted to. That is the problem I faced.

My solution was to follow some advice I heard on the courier-users list. You are on the Courier-Users list aren't you? If you examine the source code for pop3dserver you will notice this function:

Quoting from imap/pop3dserver.c of Courier-IMAP 1.5.3
     76 /*
     77 ** The RFC is pretty strict in stating that octet size must count the CR
     78 ** in the CRLF endofline.
     79 */
     80
     81 static void calcsize(struct msglist *m)
     82 {
     83 FILE    *f=fopen(m->filename, "r");
     84 int     c, lastc;
     85
     86         m->size=0;
     87         if (f == 0)
     88         {
     89                 perror("pop3d");
     90                 return;
     91         }
     92         lastc='\n';
     93         while ((c=getc(f)) >= 0)
     94         {
     95                 if (c == '\n')  ++m->size;
     96                 ++m->size;
     97                 lastc=c;
     98         }
     99         if (lastc != '\n')      m->size += 2;
    100
    101         if (ferror(f))
    102         {
    103                 perror("pop3d");
    104                 return;
    105         }
    106         fclose(f);
    107 }

Notice the section in bold. This is where your pop3dserver will spend most of its time. If you're ok with throwing the RFC out the window then read on. Otherwise thanks for playing, come again soon.

Who cares about counting newline characters as two bites? Certainly not popular POP3 client software. So how can we get an approximate message size faster? Well there are two options. The first will work on almost any server. The second will only work if your SMTP server supports Maildir++. Courier's MTA supports it and Postfix can be patched to support it.

Right now I'm not going to post any patches to add either of these options. You'll need to do some C programming yourself. If my employer's legal department gets on the ball and 'OK's releasing code, I will publish patches.

Right, so back to the action. The first option is to replace all that reading with a simple call to the stat() function. This will give you the file size and cost less than reading the entire file (since the size is stored in the inode). Every server I've ever used supports stat(), and it's safe enough over NFS (for this purpose).

The second option is even faster. Maildir++ encodes the message size into the filename. So since you've already been given the filename, just parse out the size and return it. This method doesn't even access the filesystem. Very Fast indeed.

The end result of either of these methods is a much faster pop3dserver when your users are keeping large amounts of email on the server. Follow these instructions and you will forever bare the scarlet letter of being non-rfc compliant. But at least your servers will stop paging you all the time.

Back to Pavo's Postfix Page

Joshua E. Warchol