Do I get to be grumpy on my own blog?
I studied statistics more than a half century ago, I taught
it and I worked with numbers for dozens of years in my working life. That is
the basis for my grump.
One particular statistical process was awesomely powerful and would make each of our lives better if it were used more often – but I rarely see it used. It is statistical branching for reduction of variance. It is easy to explain.
Start with a survey questionnaire that has been applied to a group of people designed to match couples. A few years after the matching process, you take the group of questionnaires for couples who didn’t last more than a year and take a separate batch of questionnaires for couples who did last more than a year. What you are trying to find out is the set of questionnaire responses that separates the two groups.
The outcome will not be the crap you get in supermarket magazines, the ten attributes of happy couples. What you get would be clusters of attributes that occur in happy couples.
For example: couples that love dogs, focus on outdoor activities, like parties and love home cooked meals. Couples that love cats, like long walks, romantic movies, sports cars and late night TV. Those are clusters not top ten attributes. Those distinct clusters are much more like real humans.
If statistical branching for reduction of variance were used more often websites would be good at guessing what movies, books, and music we like --- not just websites that make the stupid recommendations we get now.
The math is easy. Variance is the arithmetic difference between two numbers, squared and
subtracted. Take two numbers: 4 and 9. The difference is 5. Five squared is 25.
25-5 is 20 – a measure of variance. Two
numbers 4 and 6. Two squared is 4 minus 2 is a measure of variance of 2. The
process for reduction of variance is to take a whole pile of number combinations and get a
sum of all the variance. Then split the
pile of numbers into two groups and see if the sum of each group added together
is less than the total variance number for the pile. This is done for every possibility, on a computer, and you get a
result that looks like branches on a tree.
You follow your answers to the questionnaire out the branches of the tree and when you get to the end of the branch, you will find all the people on the same branch who are like you.
Three cheers for statistical branching for reduction of variance.