Simple Web Log Analyzer - Google Analytics Replacement

In this article, we will cover using an open-source program called GoAccess. This program allows for generating statistics reports based on server access logs.

3 months ago

Latest Post Cloud Development Kit for Kubernetes (cdk8s) by Tyler Moon

When running any website it is useful to analyze the server request logs. This analysis can be used to generate statistics on how users are using the site, what pages are most commonly requested, and how much spam traffic is coming in.

The most popular web analytics service is the Google Analytics platform from Google. This service can track website activity such as session duration, pages per session, and bounce rate. While this service is a great option and works well with minimal setup, it does give access to your server logs to Google. Now, this is not a huge deal since there is no personal identifiable information. There is, however, a host of open-source options to generate similar statistics.

In this article, we will cover using an open-source program called GoAccess which can serve a similar function. While there is more setup and maintenance involved, using an open-source system means that all of the access log data will remain on your server.

Prerequisites

Setup GoAccess

In this article the log analyzer we are going to use is GoAccess. To install on your server first SSH (for a useful ssh utility check out this article) to the machine and then run the following install command based on the corresponding distro.

Debina / Ubuntu

apt-get install goaccess

Fedora

yum install goaccess

Arch Linux

pacman -S goaccess

Gentoo

emerge net-analyzer/goaccess

OS X / Homebrew

brew install goaccess

FreeBSD

pkg install sysutils/goaccess

OpenSUSE

zypper ar -f obs://server:http
zypper ref && zypper in goaccess

After getting the program installed, run a simple version command to verify its working:

goaccess -V
goaccess version command output

Next, we need to locate the access logs for the HTTP server. In this article, I'm only going to cover Nginx but GoAccess works for all major servers. The default log location for Nginx servers is /var/log/nginx/access.log.

To view the CLI version of GoAccess run the following command. If your log location is different than the default then substitute it in.

goaccess /var/log/nginx/access.log

That command should show a screen like the following:

Log Format Configuration screen

Here we need to tell GoAccess what format our logs are in. For Nginx select the "Common Log Format (CLF)" option. Other servers log format information can be found on the GoAccess documentation site here. Press space on the option then enter to continue.

Now if all of that worked you should see a screen like the screenshot below which is GoAccess's main CLI interface

GoAccess CLI

From here you can see what URLs and static requests are being made and the number of requests and visitors. This view is packed with information and very useful but it is rather annoying to SSH into the server to view this information. Luckily GoAccess has many different display options, one of which is generating static HTML reports.

Generate HTML report

To generate HTML reports from the GoAccess use the following command. Substitute /var/www/static/report.html with where you want to store the report. I chose this directory as it is in the www root which is the default static file location on Nginx servers.

goaccess /var/log/nginx/access.log -o /var/www/static/report.html --log-format=COMMON

This will generate the HTML report which if you view it looks like a mess of JavaScript. This is because this report is a static HTML page all encompassed in one file without any external dependencies.

Configure Nginx to show report

Now to view this report we need to configure Nginx to serve the HTML page we just created. On my server (which is running ByteUnits) it was just a matter of updating the Nginx config files to alias the /var/www/static directory to /static on the domain. This separates our reports from the normal blog functionality.

The location of the Nginx config files will vary based on your setup. In the case of my Ghost server,  it is under /etc/nginx/sites-available, and then there are two configs, one for normal HTTP and one for SSL HTTPS. Using Vim or Nano edit both config files to add the following location block:

location /static/ {
    alias /var/www/static/;
}
Nginx Config Location Block

With the full file looking something like this (yours may vary depending on your existing setup):

  GNU nano 2.9.3                                   byteunits.com-ssl.conf                                             

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;

    server_name byteunits.com;
    root /var/www/socialtea_ghost/system/nginx-root;

    ssl_certificate /etc/letsencrypt/byteunits.com/fullchain.cer;
    ssl_certificate_key /etc/letsencrypt/byteunits.com/byteunits.com.key;
    include /etc/nginx/snippets/ssl-params.conf;

    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host $http_host;
        proxy_pass http://127.0.0.1:2369;

    }


    location /static/ {
        alias /var/www/static/;
    }

    client_max_body_size 50m;
}
Example full Nginx Config

Finally, restart the Nginx server and your report should be available at <hostname>/static/report.html

Note: If you are also running a Ghost blog on this server then restart Ghost as well with the ghost restart command from the Ghost install root directory
HTML Report Version

Schedule report with Cron

Now the above report looks great and is relevant for the moment. However since this is a static report it will not update until the GoAccess command is run again, which does not solve the issue of not having to SSH into the server every time.

To automatically update the report we can set up a simple crontab to run the generate command periodically for us.

To start run the crontab command to open the editor:

crontab -e

At the bottom of the file add the following line:

*/1 * * * * sudo goaccess /var/log/nginx/access.log -o /var/www/static/statistics/report.html --log-format=COMBINED

Then exit with :wq and then crontab should be in place. Note that this tab is set up to run the generate command every minute which is probably excessive.

And with any luck, your report should re-generate automatically and you can just refresh the page to see updated numbers!

Summary

In this article, we set up an open-source log analyzer to generate simple user statistics on our site. This setup leaves all of the access data on our server and gives us complete control over the statistics.

Tyler Moon

Published 3 months ago