Some time ago I put together a web site visitor logger. At the first I did it as an experiment to see how many of my blog visitors are bots. I also wanted to see if I could use data from it to implement a better anti-bot solution for Google Analytics (GA).
I'm storing session data (a session in my app has the lifespan of a session cookie) in a MySQL database. The data includes easy-to-obtain numerical and boolean values. Besides running analytics I can now also browse individual sessions. This was not possible with GA. At the moment I'm seeing the most value of the app in that feature. Referrer spam has also disappeared but this is because the solution uses a custom protocol.
For bot detection I implemented the following techniques:
- Number of mouse/scroll events.
- Recording page view/session duration.
I have put together some statistics based on the October data from this blog.
|Total number of sessions||1396|
|Total number of pageviews||2148|
|Total number of distinct IP addresses||1202|
By the number of sessions.
Mouse and scroll events
|Sessions with mouse events||935||66.98%|
|Sessions with scroll events||899||64.40%|
|Sessions with mouse or scroll events||1074||76.93%|
|Sessions longer than 5 seconds||1267||90.67%|
|Sessions longer than 15 seconds||1096||78.51%|
|Sessions longer than 30 seconds||885||63.40%|
|Sessions longer than 60 seconds||849||60.81%|
|Sessions longer than 120 seconds||619||44.34%|
|Sessions longer than 240 seconds||592||42.41%|
At the end I decided to filter sessions by 30 seconds. This seems to keep statistics clean enough without losing too much information. Most of the interesting data for me is in longer multi-page sessions anyway.
I added a frontend with a couple of tables and charts once I had enough useful data in the database. It contains very few analytical features: only the last month top entry pages and top pages are shown. The goal of this was not to replace GA which is much more powerful with all the predefined and possible custom views but to complement it with the ability to browse individual sessions. This feature was very simple to add to the app. The session page just lists all visited pages by the view start time.
Screenshot: list of sessions
Screenshot: single session
Source code and documentation can be found in the GitHub repository.