Tuesday, March 27, 2018

Why Statistics, Surveys, and Polls are Often Wrong.

Are surveys, polls, and statistics actual science or just junk science?

People like to criticize the United States and say that our electoral process is flawed.  But you have to admit our electoral process is certainly not predictable.  There were just two elections, one in Russia and one in Egypt, and no one doubted the outcome of those elections for even a nanosecond.  Whereas in America, it's still possible for a surprise winner whether you like it or not.

We have a real democracy (sort of) and the winners are not predetermined by the powers-that-be.  If that were the case, then Hillary would be President today.  So maybe president Trump is a good thing - it's proof that we do have a democracy or at least an electoral system that is not outright rigged.  And no, Russian influence on people isn't the same thing as rigging an election.  People listening to Russian trolls are just being foolish.  But the Russian trolls aren't pushing the voting levers in Pennsylvania, Indiana Ohio, and Wisconsin - people are.   Foolish people, but people nevertheless.

On the eve of that election, the polls showed that Hillary would win in a cake walk.  Granted, she did win the popular vote, but pollsters also know about the Electoral College - and they still predicted she would win and the states that she lost - the critical "blue wall" turned out to be anything but blue.  How could they get this so wrong?

Statistics, surveys, and polls are often wrong - dead wrong - for a variety of reasons.  Surveys are particularly problematic as they often rely on self-reported data. But when you conduct a survey, it's almost impossible to get a cross-section sample of a population group because the very nature of the survey selects certain people.

For example, if you called me on the telephone saying you're conducting a survey, I probably would hang up on you. Even if it was legitimate survey, I would be skeptical that it wasn't some sort of scam designed to take my money. And moreover I'm a busy person. I have better things to do than answer a bunch of stupid questions. And often, surveys go on and on with far too many questions. If in written form, they often go on for page after page. I noted before when we bought our pickup truck we received this enormous survey from some Automotive Quality Group that asking increasingly personal questions. After I reached page five, I realize I was being an idiot, and threw the entire thing in the trash. And that's the other problem with surveys - many of them are just bogus and an attempt to harvest your demographic data.

Surveys tend to be answered by people who are either lonely or are older or have a lot of time on their hands or really think that "their opinion matters". Younger people are more jaded today and don't believe somebody calling them on a landline with a survey.  Of course, most young people today don't even have a landline. So the very nature of a survey ends up being a filtering factor. The people answering the survey are not a representative sample of the population as a whole.

But worse than that, the people who answer the survey often lie. I noted before that when surveyed, 70% of Americans claim they paid off their credit cards every month.  The credit card companies, which have computers which track actual data, show that 70% of Americans carry a balance. Clearly a group of about 40% are lying to themselves or to the pollsters.

And this is the problem with things like focus groups as well. You get together a bunch of people in a conference room and ask them what they think of the new Pontiac Aztek.  They  make a bunch of half-ass suggestions, and if you followed them, you'd end up with a car that is so ugly and unsalable no one would end up buying it - not even the people in the focus group. The focus group people have no skin in the game, and they're not obligated to buy the car, so they're going to say ridiculous things - often egged on by other people in the group. Sure, let's put a tent in the hatchback.  What could possibly go wrong? 

So that's the big problem right off the bat - bad data. It's hard to harvest data in a manner which is entirely "clean" that obtains accurate data for an entire population or a truly representative sample of that population. Granted, today, with computers and social media and whatnot online, it's getting easier to track people's actual purchases and actions which is a far greater indication of their real feelings than what they say in a survey or poll.

But even then, the data can be flawed. As I noted in my knitting experiment posting, I went on to YouTube and liked nothing but knitting videos.  Even today, YouTube is convinced that I'm a knitting fanatic.  And even though it's gone back to World War II bombers and hot rods of the 1960s (for the most part) it still offer suggestions for knitting videos - nearly a year later.

The crude algorithms used by many commercial programs assume that since you bought a washing machine that all you're interested in is buying washing machines. However, these are usually purchases you make once, and you're not interested in buying again and again - or at least you hope not to, in less than 15 year intervals.

So you have to find a way to collect accurate data - either exact data for every person in the population you were sampling, or a truly representative sample of that population and that the answers are not self-serving or false or otherwise skewed or distorted.

Then, your next task is to make sense of that data - to show that there is some correlation between the data and some other phenomena, and then show that there's a causation between these two events. And here's where it gets tricky.  Because oftentimes the surveys are conducted with the answer already in mind. The researcher is doing research to prove certain particular theory.  And thus, they fit the data to the theory rather than the other way around.  And oftentimes this research is funded by a company which has a vested interest in having the results come out a certain way.

It is human nature - present in all of us. Clients would call me up and asked me if they should get a patent. They're already convinced they should get a patent, they just wanted me to validate their decision.  And they would selectively give me information steering me to agree with their decision that they should get a patent.  But when I press them for more details, I would find out that somebody else already patented the invention, or they didn't really invent it, or they had abandoned the invention or whatnot.

A reader who is an accountant complains of the same thing with regard to his clients. They call him and ask him if they should take Social Security at 62, 65, or 70.  They've already made up their minds, they just want the accountant to tell them what they already believe.  So they give him selective facts to steer him to the same decision - so they can blame him later on if it goes horribly wrong.

It's human nature for us as people to seek out validation of our preconceived notions.  And it's also human nature for us as professionals giving advice to want to please people and give them the advice they want to hear. This is the sort of thing that usually ends up causing trouble for people and corporations.  "Sure you can steer the Titanic at high-speed through an iceberg field - at night! - what could go wrong? Full speed ahead, Captain!"  Nobody wants to be the naysayer and say, "Gee maybe something could go horribly wrong!"

In Academia there's another effect, publish or perish.  Professors at universities don't get tenure by not publishing papers in scholarly journals.  They need to get noticed.  Thus, they have to keep coming up with "startling new research" to justify their hefty salaries and pensions.  So when you do "research" you when have something that will make a big splash in the media or at least generate a couple of articles - preferably with quotes from you and your associates.

Again, this is human nature.  I'm not necessarily criticizing professors and their ilk for doing this - they have to in order to keep their jobs.  So they come up with startling revelations based on statistical data, - revelations which may or may not be true.

And one way to fudge the data is to change your definition of termsAs I noted before in an earlier posting about homeless children, the federal government claims that homelessness for children encompasses living with somebody else other than your parents.  Thus, for example if a child is living with his grandmother because his mother is a crack addict, then he seemed to be homeless.  A child living in an RV park even if they're in a multimillion-dollar bus motorhome, is deemed to be homeless.  A child living in a motel that's provided by the city as part of housing for the poor is deemed to be homeless.  Based on these definitions it's damn hard not to be homeless!

When you throw in the specious definitions of homelessness, you can come up with some alarming statistics that something like one out of five children is homeless or some sort of nonsense like that. And this helps your department because then you can get more funding from Congress when you go before them and the bright lights and cameras, and tell them with a tear in your eye about all the homeless children America were forced to live with their grandparents.  And Senator Klaghorn, not wanting to appear to be completely an asshole, will vote for another billion dollars for your department to combat homelessness.

And that's just one example.  Every other organization, government agency, corporation, and whatnot does the same thing.  Defense contractors regularly crank up paranoia about Russian and Chinese aircraft carriers, even though we have more aircraft carriers than the rest of the world combined, times two or three.  But reality doesn't get funding for yet another aircraft carrier, so you get people all riled up about the Chinese taking over.  And some fake CGI pictures of the "new Chinese Carrier!" certainly help crank up the angst.

So where does this leave us?  Is all of this data false or "fake news" as our President likes to say?  Not necessarily.  You just have to filter all this information, and not just take it on faith that what is being reported in the press is actually true.  The press likes click-bait. They want to sell newspapers, back in the day - your eyeballs, today.  Do research of your own, online and use your common sense. Discount anything from wildly radical politicized websites and for God's sake, stop listening to memes and all that crap!   Stop believing what is convenient for you to believe and confront harsh realities, instead.

In a previous posting I took a dump on two professors, one from Ithaca, New York and the other from Australia, who claimed that taking early retirement causes you to die.  I pissed on this because it's typical of the sort of bullshit statistics and pseudoscience is being promulgated today.  Sociology is not a science, just advanced navel-gazing.  And the ultimate conclusion from that "study" - in addition to being questionable - is really worthless and not really research.  Because the information is not really helpful to anyone, from people contemplating retirement, to policy makers, to insurance companies or whatever.  As I noted in that earlier posting, people retire early usually because they have to - they have health issues that force them to retire, or are laid-off in their 50's and never work again.   Telling them they are going to die a premature death, based on some sketchy observations, isn't helping much.

I suppose, however, that policy makers might take this data and encourage people to retire early.  After all, the sooner people die, the sooner they stop collecting social security, right?

But of course, the authors of the study had to publish something in order to justify their existence. And I'm not sure why somebody from Australia is an expert on American Social Security. Apparently you have to go to Australia to get an objective view of what's going on. Or perhaps maybe they are offering junk degrees on American Social Security studies, at Australian universities

Why does it matter?  It matters because people read about these surveys and studies in the press and take the information to heart.  They stop eating one thing and start eating another, because some "study" said that margarine made with trans-fats was better for your heart than real butter.  Whoops.   Pseudo-science is misleading and causes people to lose faith in science.   You've heard it before, I am sure, people saying, "well those 'scientists' said we should all stop eating this, and start eating that, and then they changed their minds - again!"   And the problem isn't "scientists" but pseudo-scientists publishing statistical nonsense.   The real dirt hasn't changed much, all along.

Pseudo-science is not science.  And surveys and statistical studies are often the stuff of pseudo-science.  When you see some article that is based on surveys, statistics, or polls, bear that in mind.  Someone is trying to convince you of something, in a way that shuts down all conversation or discussion.   After all, surveys, polls, and statistics can never be wrong, right?

Wrong!