For those not in the know, Distrowatch is a very excellent site that tracks the latest releases of dozens of different Linux and BSD distributions. For each distro, there is an information page. There is also a table on the main page (right) showing how many page views each information page gets per day.
While I have no problem with the presence of the table, I’ve serious reservations over the way some people interpret the data. I’ve begun to write this post so many times and never gotten around to finishing it. The reason it keeps floating to the top of my to-write pile is the volume of people flagrantly bandying HPD statistics around as if they mean something in the real world.
I’m going to say a few things that sound an awful lot like "Distribution X isn’t really the most popular" but I’m going to try and give some plausible, even logical evidence to back those statements up so please don’t skip straight to the comment section in an attempt to call me a Distribution Y fanboy. Let’s see where this journey takes us…
What do the H.P.D. figures really mean?
H.P.D. stands for hits per day. This count represents how many unique computers have visited a distro’s information page. This means that if one person sits on the Ubuntu information page and reloads over and over again, only one vote is tallied.
So in short, H.P.D. stats show how many people are visiting a certain Distrowatch page.
Why doesn’t that equate popularity?
For starters, as I’ve just said, all conclusively tells us is how many people are visiting the page. It doesn’t break that figure into people that are just coming from inside distrowatch.com, search engine hits or even the community forums for a distribution (in an effort to game the statistics).
I mention search engine hits because I believe them to be highly enlightening to the whole argument. Yes, I know this is fuzzy science but I believe there is logic behind my thinking. Let’s compare the first two distributions in the table on Google. For quick comparison, I’ve highlighted the position of Distrowatch in green.
As you can see, the PCLinuxOS information page features a lot higher up. The Ubuntu info page falls way below "the fold" (most people would have to scroll down to see it) and is therefore a lot less likely to get visited from that search.
To back this up with even fuzzier statistics, we can look at Google Trends (as Seopher did when trying to argue the other way). Ubuntu gets so many more searches than PCLinuxOS that the latter doesn’t even peak off the chart’s baseline. Assuming Distrowatch appears in equal measure in these searches, the position has a massive effect on the likelihood of clickage.
You also have to take into account what purpose people have in mind when they click through to the information page. Quite simply: they want to find out more about the distribution. So what are you getting at, Captain Obvious?
Lots more people know what Ubuntu is all about. For Linux newcomers I would go as far as saying that Ubuntu is probably a synonym for Linux, just as many people assume a PC runs Windows. It’s many people’s first experience into the wide world of Linux distro. From that assumption you could conclude that fewer people click through to Ubuntu (or other, older distributions) because they’re well known.
The same applies for both people on search engines and people already on the Distrowatch site.
Put simply, some distributions have users that like it too much. They’re willing to be militant in the ways they promote their favourite distribution in the hope that they can get it more attention — including organising community members to visit polls, the Distrowatch page and other sites to inflate it’s statistics.
The proof required to make such a claim usually evaporates shortly after being discovered but plenty of people in this thread have anecdotal evidence.
It wouldn’t be so bad if it was just a positive campaign — it would still wreck the DW stats — but for the most part where there are people promoting, there are people smearing any criticism. I was lynched when I dared mention Ubuntu while reviewing another distro because they thought it was an irrelevant comparison. Sorry but if I think something is done better elsewhere, I’m going to tell you all about it.
Giving example of something done better does help people differentiate between distributions and it also helps developers see what they should borrow for their own distribution.
Anyway, you might argue that this should apply to all distributions and that all the fanboys cancel each other out but I believe that to be untrue. People just seem to get more fanatical about some distros than others. Who in their right mind would get their knickers in a twist over Red Hat Enterprise Linux?
What can you use these statistics to show? Not too much — not accurately anyway. The fact is there are too many variables in the mix to say more than "Distribution X’s DIstrowatch information page gets more hits than Distribution Y’s". People that do try and use them as the prime basis for a distribution popularity value are misrepresenting the figures.
But why does it matter? Why do people insist on using irrelevant things like estimated popularity to compare distributions? If you’ve got something good to say about something a distro does, say it! Conversely, if there’s something you think spoils the finished product, tell people.