Blog of Raivo Laanemets

Stories about web development, freelancing and personal computers.

Blog: 200


The blog has now over 200 articles. 3 years ago I wrote an overview of the blog statistics when I had 100 articles. Recently I looked through my analytics and put together a new summary.

Writing statistics

Based on the tags, I write about similar topics as before. My most frequently used tags are:

TagArticle count
tools40
prolog34
javascript33
bug27
node.js19

However, I have been writing a lot less in the last 2 years:

YearArticle count
201813
201715
201645
201570
201438
201319
20121

In 2017, I volunteered as board member at our apartment association where I live. We went through a major reconstruction project and this took the majority of my time outside my main work, leaving very little to writing. I resigned in 2018 to reduce my workload, but I felt very tired for the rest of the year and did not find much motivation to write anything.

Reading statistics

I switched from Google Analytics to my own solution at the end of the year 2016 . The main reason was the constant analytics spam. While the situation has improved a bit, it still seems to be a problem up to this date.

I also wanted to explore individual user sessions. This feature became available later on Google Analytics as User Explorer but was not available at the time when I built my own solution.

General

The time period of this statistics is from 2016-10-01 to 2018-11-30.

Session count32591
Session count (bots*)1575
Pageviews48952
Pageviews (bots)1693

A session in this analytics is defined by the lifetime of a session cookie. When a reader closes and opens the browser again, then it counts as a new session.

*There is more statistics about bots below.

Session length (time) histogram:

Range (s) Session count Session count (%)
0-10 9222 28.30
11-30 4579 14.05
31-180 5905 18.18
181-1800 7501 23.02
1800+ 5384 16.52

It is suprising that so many people spend so much time on the blog.

Session page count histogram:

Page count Session count Session count (%)
1 27605 84.70
2 2727 8.37
3 1016 3.11
4 391 1.20
5 248 0.76
6+ 604 1.85

Most of the sessions only include a single pageview. Considering that the main source of the traffic is from Google, most people have come to read a specific article to obtain a specific piece of information.

Top stories

Title Page views Page views (%)
Beware of European Business Number spam/scam 13110 26.78
Front page 4277 8.73
Ending (online) fraudulent agreements 1884 3.85
Chrome 59/60+ font rendering in Linux 1827 3.73
KnockoutJS: show spinner while loading page 1544 3.15
Fixing Bootstrap Woff2 CORS issues 952 1.94
Computer upgrade (Slackware on Skylake) 941 1.92
All projects page 931 1.90
Git and SSH: key_read: uudecode XXX failed 793 1.62
Python RPi.GPIO threading broken 764 1.56

My most read article is about the European Business Number letter scam. When I wrote it in 2015, there was no clear information available online whether the letter is a scam or not.

The most popular technical article, Chrome 59/60+ font rendering in Linux, fixes subpixel font rendering on the Chrome or Chromium web browser in Linux. There was a similar issue with the Chrome version 69/70 upgrade as well.

Tag pages

The total number of tag page visits was 763 which is 1.56% of all pageviews. Top tags:

TagViews
prolog25
ebn20
freelancing19
slackware19
tools18
node.js17
now15
javascript14
hardware13
cycling13

The traffic through the tag pages is negligible and there are no interesting outliers to draw any conclusions.

Top referrals

The top referrals table does not include internal links.

Source Session count Session count (%)
No referrer 9276 28.46
Google* 20124 61.74
raspberrypi.org 281 0.86
duckduckgo.com 270 0.82
blog.hqcodeshop.fi 195 0.60
bing.com 194 0.60
linuxquestions.org 194 0.60
disq.us 186 0.57
skrblik.cz 147 0.45
facebook.com 146 0.44

*Google has a massive number of domains. All these are summed here. Compared to the previous stats period, t.co (Twitter) is missing.

Majority of traffic comes the Google search engine. Other traffic sources are for specific articles shared on online discussion sites.

Client technology

PlatformSession countSession count (%)
Win321792054.98
Linux x86_64600518.43
MacIntel374811.50
Win6411143.42
Linux armv8l11123.41
iPhone10493.22
Linux armv7l9292.85
iPad2660.81
Linux i6861750.54
Linux aarch64830.25

I also have visitor browser's raw user agent strings but extracting the browser name and version from them would be too much work.

I find it surprising that there is a sizable number of iPhone and Android (Linux armv8l and Linux armv7l) visitors.

Bot detection

As my analytics solution is unique, it is not targeted by attacks like the Measurement Protocol spam. The only source of false results are from occasional visits by scriptable browsers such as the Headless Chrome and PhantomJS. There are also a number of web crawlers that execute JavaScript. From the analytics, I can find these (name containing Bot, Crawler, or Spider):

Bot Session count Average session
length (s)
Average page
count per session
Baiduspider/2.0 677 1.53 1.00
YandexMobileBot/3.0 198 3.26 1.01
Googlebot/2.1 159 231.41 45.88
IndeedBot 1.1 115 596.00 115.00
YandexBot/3.0 109 4.85 1.00
Applebot 0.1 102 2.76 1.00
AhrefsBot/5.2 92 3.00 1.00
Sogou web spider/4.0 79 2.53 1.00
Baiduspider-render/2.0 72 2.90 1.00
DnyzBot/1.0 27 7.44 1.00

We can see an anomaly in the table: both GoogleBot and IndeedBot have massive page count per session. This is due to their JavaScript Math.random() being deterministic. This function is used to generate unique session identifiers.

Headless browsers

The headless browser detection algorithm checks the existence of the specific global variables in JavaScript. The variables:

Variable Session count Session count (%)
window.callPhantom 731 2.24
window.Buffer 0 0
window.emit 0 0
window.spawn 0 0
window.webdriver 0 0
window.domAutomation 0 0

The window.callPhantom global variable indicates the PhantomJS headless browser.

Feed access

The only statistics for the Atom feed is the web server access log. I can see that the feed is being periodically fetched by Gwene and Feedly services. I do not know how many readers are behind those and I have no way to figure it out.

Yearly growth

Year Sessions
2018 12387
2017 15283
2015 8000
2014 4729
2013 1067

I have lost data about year 2016 as I made the switch from one analytics solution to another on that year. The year 2018 data is incomplete and does not include December.

Summary and the future

This is my second large overview on the blog statistics. I have concluded from the data above:

  • The blog is mostly reached through Google search;
  • Visitors are looking for specific information;
  • Most-read articles:
    • provide crucial information;
    • fix technical problems.

It is very likely that the blog keeps going in the same direction. One of its old goals was to also be a marketing channel for my consulting activities. It proved to be inefficient, especially compared to networking. I have abandoned this idea. It does serve as a nice online overview of my past projects when someone is interested in that.

Ads

I have been asked to include sponsored links on the blog but I think the blog is too small for that. I also don't see the blog having random ads from ad networks. If anything, then something in the form of paid reviews might appear. In any case, I want to retain the total control over the content.

Future analytics

I'm not satisfied with Google Analytics and neither with my current solution. The current solution was built as an experiment to detect bots anyway. The only nice thing about was the use of SQL database which made analytics really easy. I would like to see something tightly integrated with the blog engine.


Comments

No comments have been added so far.

Email is not displayed anywhere.
URLs (max 3) starting with http:// or https:// can be used. Use @Name to mention someone.