Configure Apache access logs on Ubuntu server

Tags: http logs, apache

All my previous articles were about how to configure HTTP logs in IIS. It’s now time to see how to monitor the web site activity on an Apache web server running on Linux. The article is based on an Ubuntu installation on which a web site is running. We’ll see in this article how to configure the Apache access logs to facilitate archiving and synchronization of the log files. We’ll then see how to add useful additional fields and how to share the log files to be easily downloadable and synchronized from a remote workstation. Finally we’ll see how we can analyze these logs on a Windows workstation.

Presentation of the environment

You’ll find hereafter how the Ubuntu server was prepared for this article. If you have a configuration a little different it should not be a problem.

  • Ubuntu 16.04 LTE server

  • LAMP (Linux Apache / MySQL / PHP) has been installed

  • SSH has been enabled

  • A WordPress site has been installed

  • The WordPress site is accessible from a URL like www.wptest.com and wptest.com with redirection

  • HTTPS has been activated for the WordPress site

  • If the web site is accessed in HTTP a redirection occurs to HTTPS

This is a very basic WordPress site.

WordPress test web site

Log rotation

In the default configuration the web activity on the Apache server is stored in the file /var/log/apache2/access.log

So if you don’t do anything this file will keep growing and will be every time more difficult to consult. So the first thing to do is to configure the log rotation to store the log lines in a new file every day. This will allow archiving old log files and facilitate consulting the recent web activity.

The access log configuration is stored either in the Apache configuration file /etc/apache2/apache2.conf or in the virtual site configuration. By default you’ll have the following two virtual sites configuration files:

  • /etc/apache2/sites-available/000-default.conf (default HTTP site)

  • /etc/apache2/sites-available/default-ssl.conf (default HTTPS site)

In both files you have by default the following log directive

CustomLog /var/log/apache2/access.log combined

You’ll find more information on how to configure Apache log files and the CustomLog directive in the Apache log files documentation.

In order to put a log rotation in place we will pipe the log rows to the rotatelogs program (See the rotatelogs documentation) and as argument we will tell to rotatelogs in which folder to store the log files, how to name them and after how much time a new file needs to be created.

The following new CustomLog directive will rotate the log file everyday:

CustomLog "||/usr/bin/rotatelogs /var/log/apache2/site000/access/access-%Y-%m-%d.log 86400" combined

Before applying the new configuration we need to create the folder to store the log files of our web site

sudo mkdir /var/log/apache2/site000

And a folder to specifically store access logs for the web activity in case later we want to store error logs in a separate folder.

sudo mkdir /var/log/apache2/site000/access

Then we can change the virtual site configuration files:

sudo nano /etc/apache2/sites-available/000-default.conf

Replace the CustomLog directive with the new one mentioned before. Then save and exit (CTRL+O and CTRL+X)

Do the same with the https virtual site

sudo nano /etc/apache2/sites-available/default-ssl.conf

Now to apply the new configuration we restart Apache gracefully

sudo apache2ctl graceful

Then after some activity happened on the web server we can take a look at the just created log folder

ls /var/log/apache2/site000/access

We see that a file with today’s date has been created.

Log folder with log files generated by Apache rotatelogs

We can display the content of the file with the following command (replace the date in the command)

cat /var/log/apache2/site000/access/access-2018-04-26.log|tail

And we see this

Content of an Apache access log file

Add additional fields

The logged fields are defined with the LogFormat directive in the /etc/apache2/apache2.conf configuration file. You’ll find hereafter the default LogFormat directives.

LogFormat "%v:%p %h %l %u %t \"%r\" %\>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined  
LogFormat "%h %l %u %t \"%r\" %\>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined  
LogFormat "%h %l %u %t \"%r\" %\>s %O" common  
LogFormat "%{Referer}i -\> %U" referrer  
LogFormat "%{User-agent}i" agent

In our CustomLog directive we use the combined format. This is a standard format that contains the client IP address, the used account if any, the date and time, the request (Method + path + query), the HTTP status, the number of bytes sent, the referrer and the user agent.

The variables used in the LogFormat directive are explained in the Apache logging module documentation.

A few important fields are missing from the combined log format.

  • You don’t have the time taken by the request to be served. This is important if you want monitor the performance of your web site.

  • You don’t have the port used by the request. This is important if you want to differentiate HTTP (port 80) from HTTPS (port 443) requests. Of course you could put both in different log files instead as HTTPS is configured on a different virtual site. However if you want to follow SEO best practice to avoid duplicate content you will make any HTTP request on your web site automatically redirect to HTTPS. If both original HTTP requests and redirected requests are in the same file it will be easier to follow.

  • You don’t have the hostname entered in the browser. This is also important if you want to avoid duplicate content. For example you may choose to have your main URL with the www but also want that a request without the www is automatically redirected to the main URL. So if you have the hostname in the log file you will be able to monitor if these redirections are working as expected.

So now that we have selected 3 missing fields important for us we will create a new LogFormat directive. We start from the combined log format and add the new fields at the end. It’s important to add them at the end so any log tool that needs to parse the log file will still be able to extract the standard fields from the log line by just ignoring the end of the log lines.

We will add the following variables:

  • %{ms}T Time taken by the request in ms

  • %p Port used by the request on the server

  • %{Host}i Host name provided by the browser

So this leads us to the following new directive that creates a new log format named extended.

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" %{ms}T %p %{Host}i" extended

We edit the Apache configuration file

sudo nano /etc/apache2/apache2.conf

We add the new LogFormat directive and then we save and exit.

Then we need to change the CustomLog directives in the virtual site configuration files to use this new extended format.

CustomLog "||/usr/bin/rotatelogs /var/log/apache2/site000/access/access-%Y-%m-%d.log 86400" extended

First we edit the HTTP virtual site configuration file

sudo nano /etc/apache2/sites-available/000-default.conf

In the CustomLog directive replace combined with extended. Then save and exit.

Then we edit the HTTPS virtual site configuration file

sudo nano /etc/apache2/sites-available/default-ssl.conf

In the CustomLog directive replace combined with extended. Then save and exit.

Now to apply the new configuration we restart Apache gracefully

sudo apache2ctl graceful

Then after some activity took place we can display the last lines of the log file (replace the date in the following command)

cat /var/log/apache2/site000/access/access-2018-04-26.log|tail

And we can see the new fields

Apache log file with fields time taken, port and host added

Access logs remotely

Analyzing the logs directly on the server is a little difficult so we will see now how we can retrieve them remotely.

Access logs remotely with the admin account

As SSH is enabled it’s very easy to access the log folder through the SFTP protocol by using FileZilla for example.

In FileZilla we create a new site in the Site manager.

  • Specify the web site hostname as host

  • Select SFTP as protocol

  • Select Normal as Logon type

  • And specify the credentials of your admin account

FileZilla site configuration to access Apache log files

Once connected you can browse the whole server folder tree and you can retrieve more specifically the content of the log folder.

Apache log files in FileZilla

However only an admin account (member of the adm group) can retrieve the logs this way. If the user is not member of the adm group he gets an access denied when trying to access to /var/log/apache2.

Share the log folder for a less privileged account

You may want to allow an auditor from outside your organization to access the logs. So you cannot give him the credentials of an account with access to the whole folder tree and with shell access.

So to work around that we will create a specific account only allowed to read the content of the log folder.

It’s possible with SSH to share a specific folder without access to anything else. However there is a condition for this folder. Root must be the owner of the folder and of all parent folders. Unfortunately that’s not the case for folders inside /var/log/.

We don’t want to alter permissions in /var/log/ to avoid any problems with the system logging. So we will create instead a symbolic link to access the log folder from the folder /var/www/ were the web site root folder is already located. This folder has the required permissions by default.

To do that we execute the following command:

sudo ln -s /var/log/apache2/site000 /var/www/logs

We can check that the link is working

ls /var/www/logs/access

And we can see the content of the web site log folder from the new location

Test access to Apache log folder through a symbolic link

Now that the log folder is ready to be shared we create a user named log-guy

sudo adduser log-guy

Once the user is created we edit the SSH configuration

sudo nano /etc/ssh/sshd_config

We add in the configuration file the following lines that will only allow the log-guy account to access the log folder.

Match User log-guy  
ForceCommand internal-sftp  
PasswordAuthentication yes
ChrootDirectory /var/www/logs  
PermitTunnel no
AllowAgentForwarding no  
AllowTcpForwarding no  
X11Forwarding no

We can then save and quit and restart the SSH service to apply the new configuration

sudo systemctl restart sshd

We are now ready to do a test with FileZilla

We create a new site with the same settings but with the log-guy account credentials

Site in FileZilla to access shared Apache log files

And when we connect we see that we land directly in the log folder without any way to go to a parent folder. And we can see our log files in the access subfolder.

Apache shared log files in FileZilla

Synchronize logs

Now that you can access Apache logs remotely you may not want to regularly retrieve new log files manually. To do that you can use some free synchronization tools that support the SFTP protocol.

For example FreeFileSync will do the job.

Synchronize Apache logs with FreeFileSync

If your workstation is on Windows you can also use the LogSync command line tool presented in a previous article.

The main advantage of this tool is that only the increased part of the log file will be retrieved. So if you need to get the latest web requests several times in a day it will avoid downloading every time the whole file saving bandwidth and speeding up the synchronization.

For the occasion the support of the SFTP protocol has been added to the tool. Please refer to the article for a full explanation on this tool.

Download LogSync

First run the following command to store the password

LogSync encrypt -n LogGuyPassword

You are then prompted to enter the password that will be encrypted and saved under the name LogGuyPassword.

Store and encrypt password for LogSync

Then launch the following command to start the synchronization. The destination folder (e.g. d:\Logs\wptest) needs to be created first.

LogSync sftpsync -h www.wptest.com -l d:\Logs\wptest -r /access -u log-guy -p LogGuyPassword

Apache log files synchronized by LogSync

We now have our log files on Windows and it is very easy to keep them synchronized by launching a batch.

Apache log files synchronized on Windows

Analyze the logs

Now that we have all logs on our workstation we can start analyzing them. If you are on a Windows workstation you can use the free edition of the HttpLogBrowser.

Download it from this link. Make sure you have the just released version 2.0.0.11 (or more) that fixes some issues with Apache logs and adds the support of the three fields we added previously.

Once installed and launched go in the Files menu, select Folder and browse for the local log folder.

Load log folder in the HttpLogBrowser

Once the log files have been processed you can see the log rows in the application. In particular you clearly see that the port and host fields have been correctly parsed and you see that requests to the port 80 or host wptest.com are redirected to the canonical URL https://www.wptest.com.

Apache logs loaded in the HttpLogBrowser

To continue further the analysis you could for example click on the value 80 in the Port statistics to check if all http requests are correctly redirected.

You can also see that the TimeTaken field was also retrieved. In the field statistics panel you can see a histogram like hereafter that will allow you to verify the response speed of your web site:

Histogram of the time taken by web requests in ms on a Apache server

You can easily filter on the slowest requests by clicking at a specific position in the histogram to move the cursor and then click on the link TimeTaken > X below the chart.

You see that the HttpLogBrowser makes it very easy to filter on specific field values to only display the web requests you are interested in. I’ll let you discover by yourself all the features of this tool.

Conclusion

You learned in this article how to configure Apache access logs on an Ubuntu server and more specifically how to configure the log rotation, how to add custom fields and how to share the log folder to retrieve the log files from a remote workstation. Then you also learned how to easily synchronize these log files and how to analyze them from a Windows machine with the free edition of the HttpLogBrowser.

References:

No Comments

You must log on to comment.