Lies, Damn Lies, And Now Statistics
We've been at the game at least since Democritus, and god knows I've made it a pet subject on the blog over the years. At what point, we wonder, does a pile of stuff instead become collection of tiny things? When is it appropriate to break something up and consider it in terms of little constituent parts, and when are the small-parts contributions better thought of as a collective whole? Once you get past the philosophical wankery (and let's face it, I'm unlikely to), then I suppose there are really only two practical answers to that: when you can't avoid the quantized nature of things, and when the math is easier. I think it's a little bit funny that a few centuries of science based on analog mathematics (with continuous, nicely differentiable functions) finally coughed up everything digital (breaking it up into approximate chunks), and I have often found it fascinating how the same systems can be usefully described as discrete, continuous, and then discrete again, depending on what scale you dial in at, or what you're trying to prove. Electronics, for example: you start with quanta (electrons), which average out to make analog structures (semiconductor devices, let's say), and then put those together to make a logic network that'll do it all bitwise, allowing only ones or zeroes (I'm using one to write). And sometimes your semiconductor theory gives you localized states to deal with; sometimes the analog nature of a transistor or diode is important. I think one reason that things like macroeconomics and evolution appeal to me is they're large-scale ensemble effects that are logical extensions of (well, evolution is anyway), but seemingly independent phenomena from, the things that make them up, which in those cases are our very lives.
Maybe you'll forgive me for dipping into this well yet again. I had to sit through a weeklong industrial statistics class earlier this month, and this is the sort of thing that I was daydreaming about (well, once I got tired of thinking up wiseass comments and imagining people naked). It was an effort to fuzz over the whole mind-crushing boredom of it all.
Most people loathe stats for the terminally dull math it throws at you, and that's a reputation that's probably deserved, but at least digging through the justifications and proofs has a way of adding a kind of legitimacy of knowledge. Getting through it makes you feel like a smart person. That's not the class I took the other week. There we were training with a computer program to run through all the equations behind the scenes--elegantly enough if you stick to the problems it was designed for (but what kind of engineer would I be if I did that?)--and the practical application got taught without drumming up even the mathematical gravitas you'd need to count back change. It's a well-oiled teaching method that got across how to use a mathematical tool without an underlying idea of how the math might work, and okay, knowing how to use it is the take-home you'd want even if you did take the time to watch the gears turning, and he did a good job of getting across what he tried to. But it's a special kind of tedium to spend a 40 hour week absorbing the knowledgable huckster routine from someone you're pretty sure isn't as smart as you. Christ, it reminded me of those long ago nights of sitting through driver's ed.
(Full disclosure: I had a stats class back in college that taught nothing of perceivable relevance whatsoever. It taught some math, but I didn't learn any of that either, or at least none of it stuck in my head beyond the final. I didn't feel the least bit smart, but still got an A. Not sure how that happened.)
Anyway, the dorky daydreams. It struck me that when you hit that border between chunky and creamy, where you can't really decide whether to count things up or do clean math on some variable, then that is exactly where you have "statistics." Implying a distribution function is exactly the point when you know damn well the data consists of tallied events but you're going to call it a smooth curve anyway, and statistical analysis is supposed to be what tells you if that's worth doing and how legal it really is, when things go one way or the other. In the manufacturing world, one primary concern is sampling and measurement: it's an important question whether you can compare results from measurements that will vary, that is, whether the data are really telling you anything. We're all used to thinking like this, but most scientists I've known aren't terribly rigorous about considering error in the experimentation and data-gathering, although then again, we are usually more about understanding relationships that come from somewhere. More curve fitting, fewer t-tests.
I'll leave the economist-bashing aside today and note that as researchers, if we're chasing something like the scientific method, then we have some working assumptions and models going in. We have some prior experience, sometimes whole fields of it, of how things tend to relate, might relate, or fucking well better relate. One of the most annoying things that got pushed in the class, and I know is used in industrial research, is the development of "models" though statistical design of experiments. The idea of that is to throws a bunch of ingredients together in a way to best infer dependencies, which is a neat scientific tool, and sometimes exactly the right one, but the problem is that it also offers no real understanding. It is meant to address the what, but utterly leaves off the why. [I feel better about things like evolutionary algorithms, where a solution is chased down through randomly mutated generations, and maybe you don't know the intimate details inside there either, but it's a really clever approach at that higher level of granularity] If it's formulation work you're doing, then you end up doing chemistry with a completely optional understanding of, well, chemistry, and this just annoys me on some level. You really should have some fundamental understanding of how materials are known to interact. The instructor called these sorts of insights, a little dismissively, as "local knowledge," but if it's science, the local knowledge is what you are getting at.