Raivo Laanemets. Full-stack freelancer.

Announcement: Session Viewer


Some time ago I put together a web site visitor logger. At the first I did it as an experiment to see how many of my blog visitors are bots. I also wanted to see if I could use data from it to implement a better anti-bot solution for Google Analytics (GA).

I'm storing session data (a session in my app has the lifespan of a session cookie) in a MySQL database. The data includes easy-to-obtain numerical and boolean values. Besides running analytics I can now also browse individual sessions. This was not possible with GA. At the moment I'm seeing the most value of the app in that feature. Referrer spam has also disappeared but this is because the solution uses a custom protocol.

Bot detection

For bot detection I implemented the following techniques:

  1. Check for the specific JavaScript global properties described in this StackOverflow thread.
  2. Number of mouse/scroll events.
  3. Recording page view/session duration.

Results

I have put together some statistics based on the October data from this blog.

Total number of sessions1396
Total number of pageviews2148
Total number of distinct IP addresses1202

Global variables

By the number of sessions.

window.callPhantom30.21%
window.Buffer00.00%
window.emit00.00%
window.spawn00.00%
window.webdriver00.00%
window.domAutomation00.00%

Mouse and scroll events

Sessions with mouse events93566.98%
Sessions with scroll events89964.40%
Sessions with mouse or scroll events107476.93%

Session length

Sessions longer than 5 seconds126790.67%
Sessions longer than 15 seconds109678.51%
Sessions longer than 30 seconds88563.40%
Sessions longer than 60 seconds84960.81%
Sessions longer than 120 seconds61944.34%
Sessions longer than 240 seconds59242.41%

Results summary

Detection of bots through JavaScript global variables does not work well for generic bots. A number of bots, including Google's and Baidu's indexers, will pass the check. They are executing JavaScript and stay on a page for 5-10 seconds. I checked this by randomly picking short sessions and studying their IP locations (the app has built-in support for this through the IP-Lookup service from WhatIsMyIPAddress.com).

I did find some referral spam through manual inspection. There were 1-2 sessions per week with referral to free-share-buttons spam. The bot executed JavaScript but none of the globals above were set. The session length was less than 10 seconds and there were no mouse or keyboard events.

At the end I decided to filter sessions by 30 seconds. This seems to keep statistics clean enough without losing too much information. Most of the interesting data for me is in longer multi-page sessions anyway.

Session Viewer

I added a frontend with a couple of tables and charts once I had enough useful data in the database. It contains very few analytical features: only the last month top entry pages and top pages are shown. The goal of this was not to replace GA which is much more powerful with all the predefined and possible custom views but to complement it with the ability to browse individual sessions. This feature was very simple to add to the app. The session page just lists all visited pages by the view start time.

Screenshot: list of sessions

Session Viewer - List of sessions

Screenshot: single session

Session Viewer - Single session

Source code and documentation can be found in the GitHub repository.


Comments

No comments have been added so far.

Email is not displayed anywhere.
URLs (max 3) starting with http:// or https:// can be used. Use @Name to mention someone.