December 3, 2010

 Disabling Linux updatedb for Atmail users directory

If you are running a large-scale Atmail implementation under Linux, by default in most distributions the "updatedb" command is run daily via Cron to index the filesystem HDD.

For large systems with 100k+ email messages, running updatedb against the system maildir directory is time consuming and uses a lot of resources.

A simple solution is to disable indexing of the Atmail users maildir by editing:

/etc/updatedb.conf

Search and append the following in bold:

PRUNEFS = "auto afs iso9660 sfs udf"
PRUNEPATHS = "/afs /media /net /sfs /tmp /udev /var/spool/cups /var/spool/squid /var/tmp /usr/local/atmail/users"

If you are running an NFS or storage array server, check your machine does not index the maildir via the system updatedb command.

This is just one simple configuration option that should not be overlooked.


Filed under: Data Mining/SQL Queries,Database,Optimization — info @ 4:14 am

 

 Log purging in Atmail made easy

When using Atmail in production, its wise to setup a script to automate purging of logs for performance. Atmail records all user logins ( Webmail, POP3/IMAP ) , SMTP transactions ( Send / Received ) , Spam ( RBL, DKIM, Spamassassin ) and Virus logs.

If you have a production system in use for sometime, your logs may have millions of rows and using valuable memory/CPU resources via MySQL. While you can use the Atmail Webadmin and see graphs for logs over 1 year+, this can effect the performance of Atmail if your hardware and DB are not optimized.

We recommend the following setup via Cron, create the file below:
/etc/cron.weekly/purge-atmail-logs.sh

#!/bin/sh

mysql -u[username] -p[password] atmaildbname < /usr/local/atmail/purge-atmail-logs.sql

Save the file, then chmod 755 /etc/cron.weekly/purge-atmail-logs.sh

Next, create the SQL query, change the date range as per your needs
/usr/local/atmail/purge-atmail-logs.sql

delete from Log_Error where LogDate < DATE_SUB(NOW(), INTERVAL 4 MONTH);
delete from Log_Login where LogDate < DATE_SUB(NOW(), INTERVAL 4 MONTH);
delete from Log_RecvMail where LogDate < DATE_SUB(NOW(), INTERVAL 4 MONTH);
delete from Log_SendMail where LogDate < DATE_SUB(NOW(), INTERVAL 4 MONTH);
delete from Log_Spam where LogDate < DATE_SUB(NOW(), INTERVAL 4 MONTH);
delete from Log_Virus where LogDate < DATE_SUB(NOW(), INTERVAL 4 MONTH);

optimize table Log_Error;
optimize table Log_Login;
optimize table Log_RecvMail;
optimize table Log_SendMail;
optimize table Log_Spam;
optimize table Log_Virus;

Save, then Cron will automatically purge logs older then 4 months, each week.

Try it out on your server

/etc/cron.weekly/purge-atmail-logs.sh
Tweak as per your requirements, and remember to optimize your MySQL setup for the Atmail database needs! Full tutorial at: http://atmail.com/kb/2010/importance-of-tuning-mysql-with-innodb_buffer_pool_size-and-key_buffer_size/


Filed under: Anti-Spam,Data Mining/SQL Queries — info @ 2:27 am

 

July 29, 2010

 Importance of tuning MySQL with innodb_buffer_pool_size and key_buffer_size

Out of the box, MySQL has very limited memory allocation for cache use. If you are using Atmail with MySQL in production, it is imperative you review and optimize MySQL for your hardware and memory. You can increase performance dramatically and reduce I/O usage by tweaking MySQL to use more of your system memory for the cache and buffers.

The two easy options for increasing performance for mysql are the innodb_buffer_pool_size and key_buffer_size options. Atmail uses InnoDB tables for the user authentication & log-files, and if you have a large userbase performance can be dramatically improved by increasing the innodb_buffer_pool_size. Other tables such as the UserSettings, Abook, use the MyISAM table format which uses the key_buffer_size option.

For a production machine running all the Atmail services with 2GB of RAM we'd recommend the following option:

/etc/my.cnf:

innodb_buffer_pool_size=256M
key_buffer_size=256M

If you have 4GB RAM, double the above. Once enabled reboot MySQL, and you will see via the process table the daemon will be allocated more memory.

Tune and check for your environment, and remember not to use the mysqld defaults for a production environment!


Filed under: Data Mining/SQL Queries,Database — info @ 3:44 pm

 

November 29, 2009

 Installing Apache Solr with Dovecot for fulltext search results

Tutorial on installing Fulltext search with Dovecot + Apache Sol + Lucene + Atmail 6 ( CentOS/Fedora/Linux )

So, your a modern day mail user - You have several folders, each with over 5,000+ messages. Searching your mail for previous clients, customers, friends and family is a pain, slow, and search results are not what you want. The default IMAP search feature just can't scale to modern day usage.
Using the fulltext-search feature of Dovecot ( FTS ) you can index your mailbox with fulltext search capabilities, with 10x speed improvement, and search results that are relevant! This will enable you to login via the Atmail 6 Webmail interface and data-mine your mail better.
The FTS/Solr feature of Dovecot enables the Text/Body keywords on the IMAP search to call the index server.

Download

First, download java for your environment. Java 1.5+ is required

Download from Sun and install under /opt/

http://java.sun.com/javase/downloads/5u21/jdk

cd /opt/ ; wget [large-url-from-sun]
chmod 755 filename.bin
./filename.bin

This will install and extract Java out into the /opt/jdk* directory.

Next, configure the alternatives command to find the new java location.

# alternatives --install /usr/bin/java java /opt/jdk1.5.0_21/bin/java 2
# alternatives --config java
There are 2 programs which provide 'java'.

Selection    Command
-----------------------------------------------
*+ 1           /usr/lib/jvm/jre-1.4.2-gcj/bin/java
2           /opt/jdk1.5.0_21/bin/java

Enter to keep the current selection[+], or type selection number: 2

Next, run

# java -version
java version "1.5.0_21"

And you should see the version string configured via the new Java

Install Ant

The Apache Ant project is required to build the Solr server, download the latest Ant version and install under /opt/

http://ant.apache.org/bindownload.cgi

cd /opt/ ; tar xfvz  apache-ant-1.7.1-bin.tar.gz

Next, update your PATH to reflect the ant/bin directory

Example using bash

vi ~/.bash_profile

set the path to Ant

Replace with your real Ant directory/version

PATH=$PATH:$HOME/bin:/opt/apache-ant-1.7.1/bin/

Then, for the current session run

# export PATH=$PATH:$HOME/bin:/opt/apache-ant-1.7.1/bin/

Then, check ant is available to execute in the path

# ant -v
Apache Ant version 1.7.0 compiled on December 13 2006

Download Solr

http://lucene.apache.org/solr/

Download Apache solar and install under /opt/

# cd /opt/ ; wget http://apacheurl/

# tar xfvz apache-solr*

# cd /opt/apache-solr-1.4.0/example/

Copy the Dovecot XML file as the default schema

cp /usr/local/atmail/server_source/dovecot/doc/solr-schema.xml  solr/conf/schema.xml

Next, run the daemon

# cd /opt/apache-solr-1.4.0/

# ant example

Then you should receive various output, and

BUILD SUCCESSFUL
Total time: 34 seconds

Next, you need to start the Solr daemon

cd example

nohup java -jar start.jar &

You should see a successful bind

2009-11-29 19:38:53.961::INFO:  Started SocketConnector @ 0.0.0.0:8983

Setup firewall access

By default, Solr should only be accessible via the localhost interface, to the local server. Do not open the service to the world for security issues

Verify service working

# telnet localhost 8983
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
GET /solr/

...Welcome to Solr..

Next, using an outside host, make sure the connection to port 8983 cannot be made - You want only the localhost to access Solr for security.

Congratulations - Solr is installed!

Startup daemon
You could write your own start shell script, or a simple approach is to add to /etc/rc.local to startup on boot.

/etc/rc.local

# Start Solr
cd /opt/apache-solr-1.4.0/example/
nohup java -jar start.jar &

Dovecot recompile
The next step required dovecot to be recompiled with the solr plugin install

# service atmailserver stop

# cd /usr/local/atmail/server_source/dovecot/

# ./configure --prefix=/usr/local/atmail/mailserver/ --with-mysql --with-docs=no --with-solr

# make install

Dovecot binary will be replaced, verify the FTS plugin is available

# ls -l /usr/local/atmail/mailserver/lib/dovecot/ | grep solr
-rw-r--r-- 1 root   root   167408 Nov 29 19:48 lib21_fts_solr_plugin.a
-rwxr-xr-x 1 root   root     1195 Nov 29 19:48 lib21_fts_solr_plugin.la
-rwxr-xr-x 1 root   root   108649 Nov 29 19:48 lib21_fts_solr_plugin.so

Edit the Dovecot config for FTS
Edit

/usr/local/atmail/mailserver/etc/dovecot.conf

Find

protocol imap {
mail_plugins = quota imap_quota

Replace with

protocol imap {
mail_plugins = quota imap_quota fts fts_solr

Next find

# Maildir configuration. Quota checks are done via the maildirsize - faster
plugin {
quota = maildir:user
quota_rule = *:storage=5GB
}

Add

# Maildir configuration. Quota checks are done via the maildirsize - faster
plugin {
fts = solr
fts_solr = break-imap-search url=http://localhost:8983/solr/
quota = maildir:user
quota_rule = *:storage=5GB
}

Note the line

fts_solr = break-imap-search debug url=http://localhost:8983/solr/

Since the Solr server supports context sensitive and lucense search terms, this is not "technically" RFC compliant - However end users will only benefit from this new feature, not be hindered.

Next, restart services

# service atmailserver restart

Test index

# telnet localhost imap
* OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE STARTTLS AUTH=PLAIN AUTH=LOGIN] Atmail IMAP/POP3 server ready
1 login user@domain password
2 select Inbox
3 SEARCH text "keyword"

The first search will take longer, this is the first index of the mailbox, dovecot will send the results via HTTP/XML to Solr to store and retrieve.

* OK Indexed 11% of the mailbox, ETA 1:12
* OK Indexed 30% of the mailbox, ETA 1:07
* OK Indexed 49% of the mailbox, ETA 0:52
* OK Indexed 61% of the mailbox, ETA 0:43
* OK Indexed 80% of the mailbox, ETA 0:22
* OK Mailbox indexing finished
* SEARCH 857 1486 1526 1544 1589 1590 1591 1608 1737 1753 2660 2864 3967 4124 4127 4129 4159 4235 4254 4273 4682 4683 4684 4685 4770 5132 5467 5470 5735 5788 5838 5903 5904 5905 5906 5907 5996 6024 6113 6144
1 OK Search completed (114.650 secs)

Run the same query again:

1 search text "sync"
* SEARCH 857 1486 1526 1544 1589 1590 1591 1608 1737 1753 2660 2864 3967 4124 4127 4129 4159 4235 4254 4273 4682 4683 4684 4685 4770 5132 5467 5470 5735 5788 5838 5903 5904 5905 5906 5907 5996 6024 6113 6144
1 OK Search completed (0.000 secs).

Reduced from 114secs to 0 secs for a 10,000 message folder running on a P4 with 1GB RAM - Thank you Solr !

Each time you receive new mail and issue a search, dovecot will index the "new" messages to Solr to update the index accordingly.

Now, kick back and data-mine your IMAP folders for your valuable information. Launch Atmail 6 and use the new Webmail interface to search, and enjoy the new results!


Filed under: Atmail 6,Data Mining/SQL Queries,Uncategorized — info @ 8:40 pm

 

August 24, 2009

 Data Mining the Atmail Log Tables

As handy and powerful as Atmail's webadmin is there is of course some information it does not glean from the DB and display in the pretty interface. Here are a handful of SQL queries that you may find useful for squeezing some extra info out of the Atmail DB.

Total number of accounts that have never been logged into

mysql> select count(Users.Account) from Users, UserSession where Users.Account = UserSession.Account and from_unixtime(UserSession.LastLogin) = Users.DateCreate;

List of accounts that have never been used

mysql> select Users.Account from Users, UserSession where Users.Account = UserSession.Account and from_unixtime(UserSession.LastLogin) = Users.DateCreate;

List of accounts that have not been used for X days/weeks/months/years

mysql> select Account, LastLogin from UserSession where from_unixtime(LastLogin) > date_sub(NOW(), interval 1 year);

Alter the "interval 1 year" part to suit the desired time frame. It should be in the format "interval [num] [unit]" where [num] is the number of [unit]s and [unit] is one of "day, week, month, year" (note that you do not use plurals for the units even if it seems grammatically correct, ie use "interval 2 month" NOT "interval 2 months").

Find the Account that receives the most email

mysql> select Account, count(Account) as emailsReceived from Log_RecvMail group by Account order by emailsReceived desc limit 1;

Or if you want a full list of emails received per account leave off the "limit 1":

mysql> select Account, count(Account) as emailsReceived from Log_RecvMail group by Account order by emailsReceived desc;

Find the account which sends the most mail

mysql> select Account, count(Account) as emailsSent from Log_SendMail where Account != "System" group by Account order by emailsSent desc limit 1;

Again you can leave off the "limit 1" and get a full listing of amount of emails sent per account.

Average number of messages sent per day (overall and per user)

For overall avg:
mysql> create view v as select count(*) as emailsSent from Log_SendMail group by date(LogDate);
mysql> select avg(emailsSent) from v;

For per-user avg:
mysql> create view v2 as select Account, date(LogDate) as day, count(*) as emailsSent from Log_SendMail group by day, Account;
mysql> select avg(emailsSent), Account from v2 group by Account;

Average number of messages received per day (overall and per user)

For overall avg:
mysql> create view v3 as select count(*) as emailsRec from Log_RecvMail group by date(LogDate);
mysql> select avg(emailsRec) from v3;

For per-user avg:
mysql> create view v4 as select Account, date(LogDate) as day, count(*) as emailsRec from Log_RecvMail group by day, Account;
mysql> select avg(emailsRec), Account from v4 group by Account;

OK that's it for now...


Filed under: Atmail 5,Atmail 6,Data Mining/SQL Queries,Database,Uncategorized — Brad Kowalczyk @ 11:09 pm