Annual Survey Griping Post [Nov. 26th, 2012|11:07 pm]
Running commentary of anger as I analyze results from the Less Wrong Survey:

Ten duplicate entries.

Respondent 117 (now deleted) gave FUCK YOU as an answer to about half the questions.

One person not giving a random number for the "Is the tallest redwood tree taller or shorter than your randomly generated number" question, but still declaring that the tree was shorter.

Two people started the survey before it was open, even though it said in huge letters "DO NOT START SURVEY, IT IS NOT OPEN YET" and even though the first question on the survey was "Do you understand the survey is not yet open and that you should not take it?" One of these people had IQ 140. He ticked "Yes" to the "Do you understand..." question.

Two people refused to answer any question denominated in "feet" because they claimed not to know what feet were or how they converted to meters. If only there was some sort of globe-spanning computer network that might be able to offer this information!

One person gave their home country as "Australia Germ"

On the "how long have you been in this community, please give your answer in months" question, four people answered in years (I know because they put the word "years" after their answer). About 50 people put the word "months" after their answer, even though I said SO MANY TIMES not to do that, forcing me to go through and delete before the stats program would think of it as a number. A bunch of people tried to write in "less than...", which got cut off because I only put a few characters in that text box BECAUSE THEY WERE SUPPOSED TO PUT A NUMBER.

Also, one person said they'd been in the community "15 minutes". Even if that's true, can't you phrase it as .0003 months?

Also, someone either spent 2010 months in the community or didn't read the question. "Why does my mean length of time in community keep coming out so h...oh, yes, that makes sense."

It's hilarious to answer question asking for a number with "9000" or "9001". Please, keep doing it. The statistics program knows that it's a joke and doesn't actually try to work that number into the mean at all.

If I ask for the average amount of time per day you spend doing something, please don't put 10-30. That's not an average, that's a range. The average of 10-30 is 20. This number also has the advantage of being machine-readable without getting evaluated to negative 20 or making me go back and change it by hand.


You know what sort of entity I can do statistics on? Numbers! "very little" IS NOT A NUMBER!

Numbers given in the "In what year will the Singularity happen?" question: 10000, 34000000000, "2030-2150", "if possible 2500 to", "don't have sufficien". You may have noticed that the last three were not, technically, numbers. If so, give yourself a pat on the back as you are smarter than my survey respondents.

How many hours a day do you spend writing?: "I don't write". This would be what we mathematically gifted people call "0".

How many hours a day do you spend writing?: "5/10/2012." I admit this contains numbers, but I think you can aspire to do even better.

How many hours a day do you spend writing?: "6/7/2012." Huh. I can't imagine two people would answer with dates. Maybe if you write 6/7, Google autocompletes it to be a date during this year? Well, serves you right for not just putting 6.5.

Number of languages: "Two". Okay, you caught me, I should have put "numeral of languages".

Number of languages: "0". This person doesn't even have a clue what he's reading, and he still managed to follow instructions better than the bilingual guy above (though to be fair, the person above never claimed one of his languages was English).

Number of languages: "Hebrew". Okay, let me put this in terms you can understand:
עברית היא לא מספר מזוין !!!

How much do you give to charity: "dont". THERE IS A NUMBER FOR THAT!

I soooo fail to understand this program's logic in making Political Compass Left/Right a number but Political Compass Authoritarian/Libertarian a string. Unless...wait, someone gave a reasonable answer for the first, but "W" for the second. Okay. "On a scale of one to ten, how do you..." "W! The answer is W!"

Binning websites (for the "what website first referred you to Less Wrong?" question) is no fun. Program counts tvtropes, TVTropes, TvTropes, TvTropes.org, www.tvtropes.org, http://tvtropes.org, and so on as all being different websites until I tell it not to. And that doesn't even get around to the people who think they can write "That one site with the funny stuff about TV shows". Also, "xkcd forums" vs. "xkcd fora".

Apparently Less Wrong referred 3 people to itself? Some sort of Moebius hyperlink?

Just to check if everyone understood how to do the percent questions correctly, I had a question "Have you read the instructions?" The two answers were "Yes, I understand I should answer in percentages between 0 and 100" and "No, I don't read instructions and am going to ruin the survey results for everyone." 1138 people (96.2%) chose the first option, and 12 people (1%) the second. That sort of puts those "X% of people believe Obama is secretly a Muslim" polls in a different light, at least to me.

So even though there was an "Unfriendly AI" checkbox, you had to ignore it so you could write in "Unempathetic AI". Annnnd one guy put "Natural aging process" as the disaster most likely to destroy the human race, which is so close to being a legitimate alternative reading of the question that I'm almost not annoyed. Almost.


...gah, this is a site where the average IQ is like 130 to 140, brilliant people in neuroscience and artificial intelligence, and yet I still find myself wishing some of these people were monolingual silence speakers.

[User Picture]From: anholt
2012-11-27 04:36 am (UTC)
In just one case I see there, it's possible people weren't being jackasses. Given that HPMOR is written by "Less Wrong" on fanfiction.net, and the author's notes point to the website pretty regularly, people that are new might not know who the author is and have put that down. I'd still bet on jackasses, though.

I'm seriously hoping I didn't screw up and put any alpha in with my numerics so as to contribute to your rage.
[User Picture]From: gwern branwen
2012-11-27 04:40 am (UTC)
Data cleaning & formatting is *always* the worst part.

> One person gave their home country as "Australia Germ"

I bet they were trying to write 'Australia Germany'. At least, I hope they were.
[User Picture]From: nancylebov
2012-11-29 12:18 am (UTC)
I hope so, too, but I'm leaving the possibility open that they meant Austria Germany.
[User Picture]From: cakoluchiam
2012-11-27 04:49 am (UTC)
You are my hero.

Are the original individual answers (of those who didn't mark a preference for them to remain confidential) going to be made available for the general public to do their own statistics on?

Also, Regular Expressions can help you get through that website formatting thing.
[User Picture]From: squid314
2012-11-27 05:08 am (UTC)
Yes, but I was planning on giving out only the cleaned-up version so that other people trying to work with the data wouldn't have to repeat my frustration. Can you think of a reason that would be a bad idea?
[User Picture]From: maniakes
2012-11-27 05:17 am (UTC)
No attempted SQL injection attacks? I am disappoint.
[User Picture]From: cakoluchiam
2012-11-27 08:24 pm (UTC)
Alternatively, the attack was successful and the 1150 entries Scott analyzed were only the ones input before that with the attack (or after, depending on the method).
From: Mark Eichenlaub
2012-11-27 05:57 am (UTC)
Why not put some form validation into the survey? If you want a number, only accept a number, etc.
[User Picture]From: ciphergoth
2012-11-27 06:06 am (UTC)
Because it's done using Google Forms. Writing and hosting a web application for this would be *far* too big a job.
[User Picture]From: ciphergoth
2012-11-27 06:07 am (UTC)
Thank you for putting yourself through this *again* having already done it once. Now I really hope I avoided all of these mistakes :)
[User Picture]From: ipslore
2012-11-27 06:58 am (UTC)
A smarter kind of stupid, eh?
From: (Anonymous)
2012-11-27 07:59 am (UTC)
Thanks for taking a bullet for the rest of us. An occasional fun part of cleaning up data files is that you get to include amusing footnotes in your article, e.g. "One respondent who answered several questions with obscenities was excluded from analysis." Or, from an actual publication, "A male subject completely misunderstood instructions and his data were omitted from analysis."

"5/10/2012" and "6/7/2012" were probably entered as "5-10" and "6-7". I had one survey with a confidence interval question come back full of dates; now I always make sure to have two blanks. Although I would've expected "10-30" to become October 30th.

[User Picture]From: Joshua Fox
2012-11-27 08:32 am (UTC)


Almost all these can be prevented with good validation in the form processing, whether client-side or server-side.

It might be possible to do this without developing a full webapp, using some easy-to-use SaaS that is more sophisticated than Google Forms: SurveyMonkey, Wufoo, etc.

And there are plenty of software engineers in the LW crowd -- I'll bet you can find a volunteer next year.

Edited at 2012-11-27 08:42 am (UTC)
[User Picture]From: marycatelli
2012-11-27 01:27 pm (UTC)

Re: Software

How true.
[User Picture]From: cartesiandaemon
2012-11-27 08:58 am (UTC)
ROFL. That is quite hilarious, although I sympathise with the frustration!

Edit: I hope I didn't do any of those. The embarrassing thing is that I can very easily imagine doing so, in that if someone asked you that question colloquially, a non-numeric answer might well be most useful.

In fact, I'm trying to avoid victim-blaming by giving Scott lots of gratuitous speculation on how to avoid this sort of answer, because I think that might be useful, but also, (a) some of it is only useful if you can recommend how to make a form with more validation and (b) it WAS clear, so even if Scott can alleviate the problem, it's not his fault that it occurred.

In fact, an interesting question might be, is there any way of phrasing the form so it's more obviously going to be computer parsed, so even if people have been already told, it's intuitive to them that writing english answers won't help? (Another question would be, would anyone who know regexes help tidy the data up?)

Edited at 2012-11-27 10:46 am (UTC)
[User Picture]From: naath
2012-11-27 09:57 am (UTC)
Ugh, some people...

I have been known to claim my first language is maths; but I'm not sure how "silence" is a language at all...

(I didn't do your survey because I don't really read LW)
[User Picture]From: cakoluchiam
2012-11-27 08:34 pm (UTC)
Dang it, I forgot to put "logic" as a language!
[User Picture]From: kerrypolka
2012-11-27 10:05 am (UTC)
...gah, this is a site where the average IQ is like 130 to 140, brilliant people in neuroscience and artificial intelligence

Well, indeed.

Are the IQs self-reported?
(Reply) (Thread)
From: khoth
2012-11-27 10:08 am (UTC)
I think it's mostly self-reported results of random online IQ tests.
From: deiseach
2012-11-27 12:45 pm (UTC)
There was a survey? And I missed it? Aw, darn: I love filling in surveys! Though I probably would have made a mess of it, so it's as well I didn't know about it until this post.

I will now indulge in some smug triumphalist religious believer gloating over the foibles of self-identified smart people, being fed-up to the back teeth of one too many self-identified smart person condescending to announce that smart people don't fall for that religion stuff, only dumb hicks.

Yes, I am a dumb hick, but that's not the point.
[User Picture]From: tremensdelirium
2012-11-27 02:06 pm (UTC)
This reminds me of my job 2-3 years ago. Although I was dealing with professionals being paid to fill out a form as their primary work product...

Basically the only answer is to make ill-formed data impossible, by forcing numeric/date responses when appropriate, and using drop-downs, checkboxes, and radio buttons for category responses. Free text is basically only worth it when you want to find out what "other" meant.
From: deiseach
2012-11-27 03:57 pm (UTC)
"Two people refused to answer any question denominated in "feet" because they claimed not to know what feet were"

That one are easy! "Feet" is the thingies what you wears your shoes on!

Re: form filling - oh, yes. Working in a local government (education) office, we get the grant forms, the applications for courses forms, the applications for employment forms, all kinds of forms as filled in by the public. Invariably, people get confused and put answers on the wrong lines, fill in the wrong sections, leave sections blank that need to be filled in, or are completely terrified of these being "official forms" to the point they ask us to fill them in for them (true story: an art teacher asked me to fill in an application form for her, seeing as how I was the clerical officer and knew how these things worked.)

Our national government in their wisdom (bless their little hearts) decided this year that instead of live persons handing out forms and taking them back and processing them, all in the name of greater efficiency and cost-cutting, they would put student grant application forms online.

That worked about as well as you'd expect: there's a two-month delay in processing the applications and getting the grants awards paid, and they've actually had to take on extra staff to speed up the process.

Yes, indeedy, as we could have told them if they'd asked. But nobody listens to the lowly paper-pushers on the coalface. (This is another reason why I'm so sceptical of the previous post quoting the libertarian stating her preference for the "smart, agenty physicist versus public health official", because I've seen these snazzy new ideas being put forward by consultants and outside experts, but nobody ever asks for input from those of us who deal with the public and know about the likely pitfalls.)

[User Picture]From: multiheaded
2012-11-27 02:09 pm (UTC)
God, that made my evening. Stay classy, LW!
