Talk:Statistics theory

From Citizendium
Revision as of 20:22, 1 March 2009 by imported>Gene Shackman
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Advanced [?]
 
To learn how to update the categories for this article, see here. To update categories, edit the metadata template.
 Definition A branch of mathematics that specializes in enumeration, or counted, data and their relation to measured data. [d] [e]
Checklist and Archives
 Workgroup category mathematics [Please add or review categories]
 Talk Archive none  English language variant American English

Definition of a statistic

The modified sentence:

"More generally, a statistic can be any measure within a data sample. This would be some quantification of a random variable, or variables, of interest, such as a height, weight, polling results, test performance, and so on"

does not have the same meaning as the original

"More generally, a statistic can be any measurable function of the data samples, the latter being realizations of the random variables which are of interest such as the height of people, polling results, students' performance on a test, and so on."

In particular, a measure and a measurable function are not the same thing and the new sentence obfuscates the definition of a statistic. The point is that there is a precise definition of a statistic in mathematical statistics which is based on measure theoretic probability theory. For this purpose I provide a reference for this definition. An intuitive definition as given in the second paragraph of the article is fine as a gentle introduction, but it should also be complemented by a more rigorous mathematical definition.

I agree that my original sentence may not have been very readable, so to strike a compromise I combined the good parts of both sentences and produced what now appears in the article. Cheers, --Hendra I. Nurdin 17:25, 10 November 2007 (CST)

Outstanding edit! --Michael J. Formica 19:17, 10 November 2007 (CST)


"A data sample is regarded as instances of a random variable of interest..."
I think referring to "random variable" here narrows the focus a little too much.
Statistics is largely about extracting concise info from large piles of data. Sometimes, the data set is best described without reference to a numerical random variable, f.i. the fact that the most common 1st name in this or that town is "Billy" is a perfectly good statistic, ditto that "I" is the most commonly used word in English.
Ragnar Schroder 18:09, 8 December 2007 (CST)
It should be noted that a random variable need not be numerical, but of course numerics is important for quantitative analysis . For example, one can have a random variable X take values on the discrete set {'Billy', 'James', 'Agnes', 'Jill'} endowed with the discrete topology and then take the Borel set to be that generated by the open sets of that discrete topology. But ultimately this set can be mapped to a numerical value, e.g., by the 1-to-1 assignment 'Billy'->0, 'James'->1, 'Agnes'->2, 'Jill'->3.
I really have no idea how you would manage to extricate statistics from random variables and, more generally, probability theory, for what would then be the theoretical basis (if any) for explaining your data and justifying your methods? Are there examples of notions in statistics that cannot be given a firm footing with mathematical statistics? Hendra I. Nurdin 00:56, 9 December 2007
Not 'extricate', but rather 'deemphasize'. Rvs are just an ad hoc artifact of the mathematical model of the situation at hand - after all, not even coin-flipping has a unique a priori given random variable associated with it.
Like in your example above, there's an infinity of functions to choose from, with no formal reason to prefer one to the other.
Sometimes, like when the statistic in question is the population mode, they're not really called upon.
Of course, your point that one ultimately can't live without them is well taken.
Btw. thanks for informing me that rvs need not be numbers, I didnt realize that. I appreciate the enlightenment.
Ragnar Schroder 19:57, 8 December 2007 (CST)
Though a random variable may not be stated explicitly called upon, it does not mean that it is not implicitly used in a certain problem. It's only that these details are usually just swept under the rug in applications. Hendra I. Nurdin 17:59, 9 December 2007 (CST)

Readability

Ragnar, Hendra: I am reading your discussion about random variables with much interest. I have a concern about the readability of the article, and I am wondering if we could address it. I have a Masters degree in Stats, and, yet, I am struggling with the language that we are using to present the initial concepts here. Both the NY and the London Times are written on a 5th grade (by American standards) reading level. Do you think we could tone the article down to be more readable? Blessings... --Michael J. Formica 06:20, 9 December 2007 (CST)

IMHO it seems to read just fine, I mean there is a gentle general introduction about the subject that is worded to be suitable for lay people, and then there's also a more technical definition as well for those who have a math background. I think inclusion of a few examples or nice applications in the article will help clarify things... but then again you'd need to explain more clearly by what you mean when you say "readable" ...
I have to stress, as I have also mentioned to Ragnar on other occasions, that this is *foremost* a math article sitting in the Mathematics Workgroup. Thus one should not expect that such articles to be written to cater exclusively to lay people/general public -- there should be some balance in the presentation. At least if I need to look up a math related topic, I would expect to get some mathematics though I may then realize that I do not yet have all background necessary to completely understand an article (which simply means I have some catching up to do if I really want/need to understand).
Good math articles on sites like CZ or WP can potentially also be nice quick/initial refs for active math-inclined grad students (not necessarily studying math) or researchers who need to look up a certain definition or get some feeling for a topic they are not yet familiar with (for all its woes, WP does have a lot of good math articles, though I think their statistics article isn't up to scratch the last time I saw it) -- but how can they do that if the article is written to be devoid of "serious" math content?
Sorry, if perhaps I had misunderstood your concerns...Hendra I. Nurdin 18:33, 9 December 2007 (CST)
You did, so we agree, then, to disagree. I was not referring to the content, but the manner in which it is (now, was) presented. I have revised the article for readability and interior definition, without compromising the content. --Michael J. Formica 09:25, 10 December 2007 (CST)
Thanks for the edit. We should try to insert an illustrative example of how statistics work -- I'll see if I can do that in the near future. BTW, is there something we are disagreeing on? Cheers, Hendra I. Nurdin 16:35, 10 December 2007 (CST)


Readability is my main concern: In order for intellectual progress to occur, it's imperative that the present state of knowledge is absorbed as fast as possible by as many fertile minds as possible.
However, different people have different models and ways of gaining "understanding". Some people learn from being presented a step-by-step reasoning chain, others learn best from being presented with an overview they can intuit on.
Writing an article for one group seems to make it rather unreadable for the other. I really have no good solution for that conundrum, other than to try to keep things as down to earth simple as possible.
I think there's even developed a formal theory about this: Antoni KĘPIŃSKI's theory of informational metabolism, explained here, here and here.
socioniko.net expounds a union of Jung's and KĘPIŃSKI's model.
I'm not a psychologist, so I may be wrong in much of the above. Your insights are welcome.
Ragnar Schroder 22:39, 10 December 2007 (CST)
Sure, I know where you're coming from, as I'm sure you also know where I'm coming from. I also can't see how to get out of the conundrum. So, let's just keep trying to add some improvements and see how it goes... Hendra I. Nurdin 00:43, 11 December 2007 (CST)

Hi all. I took a look at this page and also wanted to help with the readability. I also hope that CZ can be very readable, while still covering complex topics. I made a few changes, like a new introduction, that talks about statistics in very general terms. I also think that formulas are okay, as long as there is good explanation text close to it. Hope I can help with this. Gene Shackman 04:35, 28 February 2009 (UTC)

Statistics or statistical mathematics?

I am attempting to draft an article on economic statistics and it would be useful to add, and draw upon, a link to this article.

Unfortunately, I find that the article is confined to the mathematics of statistics with no reference to the fact that their usefulness depends upon the methods by which they are categorised, collected and aggregated by professional statisticians. If it is not considered appropriate to include such material in this article, may I suggest that its title should be changed to "statistical theory" or "statistical mathematics" in order to make room for a new article on statistics that would relate to their relevance to disciplines other than mathematics, and to their other users. Nick Gardner 12:04, 28 January 2009 (UTC)

I have altered the opening sentence of this article as a reminder of the existence of professional statisticians - without whose achievements in the production of statistics, there would would be nothing for academics to manipulate. Their work is interesting, demanding and important, and I am prepared to contest any attempts to define them out of existence.Nick Gardner 22:42, 10 February 2009 (UTC)

The revised opening statement avoids the absurdity of saying that statistics is a branch of mathematics, but it is still an invitation to confused thinking. It should be stated clearly that a statistic is an item of information that can exist independently of mathematics - and that is also true of statistics (singular). Mathematics often assists the interpretation of statistics, but it is not necessary for that purpose. Everyone can interpret the statement that more people live in America than in France without the benefit of help from mathematicians - even if that statement is expressed numerically. That is perhaps a rather silly reductio ad absurdum, but the serious point is that there are many problems of statistical inference that can be solved without the use of mathematics - and that failure to reach the correct solution to such problems is a common source of error (ask your banking friends). Nick Gardner 10:59, 28 February 2009 (UTC)

I also revised the opening of this article to put in more everyday examples or understanding. I kept 'mathematics' in the start because statistics does rely on mathematics, and because as a relative newbie, I didn't want to completely remove it. I also moved up the statement about the importance of "usefulness depends upon the methods by which they are categorised, collected and aggregated " That would be good to mention in the introduction.
I'm also going through and thinking of revising this article quite a bit, to make it much more everyday, showing how it relates to everyday life, and make it more accessible to lay people, but still keep in some of the formulas and theory stuff too. Is that okay with folks? Gene Shackman 15:55, 28 February 2009 (UTC)

external links

I had a couple of external links to on line statistical books. These would be very useful if people wanted more detail. Any particular reason why these were removed? Gene Shackman 15:48, 28 February 2009 (UTC)

Gene, they were not removed. Someone correctly relocated them to the external links subpage which is one of the ways that Citizendium differs from Wikipedia. Milton Beychok 17:41, 28 February 2009 (UTC)
Yes, that was me. See the tab at the "external links" tab at the top of the page. Click the "?" mark tab next to it to learn more about our use of subpages. Hope this helps. Chris Day 20:47, 28 February 2009 (UTC)

The workgroup categories in the Metadata template

I am curious as to why the Metadata template lists only the Mathematics and Pyschology Workgroups as categories. Statistics are utilized in many other disciplines as well ... for example, Healing Arts (Medicine), Economics, Engineering and Politics. I agree that Mathematics is certainly appropriate but should not the two others allowed by the Metadata template perhaps be reconsidered? Milton Beychok 17:41, 28 February 2009 (UTC)

Hi Milton. I added this to the introduction "Statistics is used in a very wide variety of fields. For example, statistics is used to develop and analyze psychological tests and public opinion surveys, in program evaluation to determine whether a program works or how it can be improved, in medicine with clinical trials to test the safety and effectiveness of new drugs, and in many other areas." Could you add something about how statistics is used in engineering, and any other discipline? I'm not sure how to add the workgroups, but adding others sounds good to me. Gene Shackman 20:16, 28 February 2009 (UTC)
Th assignment of workgroups is done in the metadata. The metadata can be seen by clicking the orange "M" at the top right of the talk page header. More information on that can be found by clicking the link to a description of the "article checklist". Chris Day 20:50, 28 February 2009 (UTC)

Okay to make major revisions?

I'm going through this article and thinking of revising this article quite a bit, to make it much more everyday, showing how it relates to everyday life, and make it more accessible to lay people, but still keep in some of the formulas and theory stuff too. I'd probably delete much of what is already here and rewrite. Is that okay with folks? Would folks prefer if I put a rewrite in some kind of sandbox first? Can someone let me know how to do that? Gene Shackman 03:51, 1 March 2009 (UTC)

Check out this subpage, Statistics/Advanced. I forget where the discussion is on the forum but this was also done for Quantum mechanics. Chris Day 04:18, 1 March 2009 (UTC)
The advanced statistics page looks like what the statistics page used to look like. Which one is the "statistics" page? I guess people would first go to the statistics page and then there would be a link to the advanced statistics page, right? So is it okay if I change this statistics page? Gene Shackman 05:33, 1 March 2009 (UTC)
I'd say go head. The advanced page is a subpage, you can see it as a tab at the top of the statistics page. It is the exact same content as the article before you started editing, I just copy and pasted it from the history. It does not exist as an article in it own right since it is part of the Statistics cluster of pages. They all share the same metadata (see Template:Statistics/Metadata). Chris Day 13:57, 1 March 2009 (UTC)
Somewhere, there should be a place for:
  • Do not use statistics as a drunkard uses a lamp-post: for support rather than illumination
  • Statistics are like a bikini. What they reveal is suggestive; what they conceal is vital.
There is the lovely operations research parable of where to armor a B-17. Trying to remember if I did write that up somewhere...
Apropose quantum mechanics, I have had it suggested that my attempts to set the timing and points on an engine followed the Uncertainty Principle. Electronic ignition is better.
Further, Schrodinger's Cat is totally unnecessary and inhumane. Only brief feline exposure is quite adequate that the velocity and position of a moving cat cannot be predicted. Confirming the legend, we have a small cutout next to one of the house doors, so, indeed, it is routine to see Cats Walking Through Walls. Howard C. Berkowitz 19:11, 1 March 2009 (UTC)

Good start

On rereading the introduction, I wonder if it's wise to be introducing "model" this early in the discussion. It's one thing to talk about descriptive statistics, statistical experiments in the sense of hypothesis testing/Type I/II error, etc., but the economy? I think of statistics as inputs into econometric, meteorological, or other models of complex systems, but let's not confuse those with an introduction to statistics.

Indeed, computer simulation mixes simulation and analytic modeling more than I find ideal. While I have some moderate experience with simulation software, I'd much rather stay discipline specific with modeling. In military work, the Lanchester equations are only a starting point, essentially a historical one. Of course, Murphy's Laws of Combat can never be ignored. Howard C. Berkowitz 20:30, 1 March 2009 (UTC)

No consultation?

I think there should have been some effort to contact the original contributors of this article of what they would think of these major changes. CZ provides a method to contact authors via email as necessary. Everything I have contributed into the article with some others have been happily deleted. I think the best approach would have been to reach a compromise rather than deleting contents outright.

This article is sitting in a mathematics section of CZ, yet all of the maths and discussions of some of the abstract concepts that lie at the heart of statistics (and frankly provide justification for it). I think foundations are important and deserve some discussion in this article. Perhaps this article should be moved to another workgroup or since "Statistics is not mathematics" as some would claim (perhaps it isn't just mathematics, but see also my remark below), maybe a new CZ Workgroup called Statistics would be in order then?

I don't think statistics has quite the same relationship as physics to mathematics. Physics does not rely on mathematics to justify it's laws (e.g., Newton's law or the laws of quantum mechanics), at least not at the outset, but my feeling is statistics does. It relies on probability theory to give foundations to its methods. Without these justifications, then methods of statistics would all just be ad hoc and can be dangerous when applied in practice (due to unjustified conclusions). Take bootstrapping for example and check some of Brad Efron's early papers on this topic. Using any statistical method without some mathematical theory supporting it would be akin to selling snake oil.

The issue with statistics is that it is used in a variety of fields, including engineering (my field), so people of different backgrounds have different ideas about what "statistics" is. I had always appreciated the foundations of statistics because it gives me a sense of confidence in the statistical tools that I use and about conclusions that I can draw with them. Is there any other way one can achieve this? Hendra I. Nurdin 21:33, 1 March 2009 (UTC)

Good point that you should have been e-mailed. But do note that your version is still in the advanced subpage; tab at the top. At this point I'm not sure what to do here, that was my quick reaction to what i saw. The other option is to have the original article as the main one with a Student Level subpage. In many ways this will probably be a test case with regard to the level of the CZ audience. Possibly it is time to push this article towards approval and see where that takes us? Chris Day 21:59, 1 March 2009 (UTC)
Thanks, Chris. I did not know that. Yeah, I guess this is an important article and maybe time to get it polished. It would be great to have some statistician on board though, but don't see there are any around.Hendra I. Nurdin 00:17, 2 March 2009 (UTC)
Okay, I suppose I was rather quick to make changes. Next time I'll search for the original authors and send emails. Just to note, however, I asked a couple of times about making changes. See my note above:
"I'm also going through and thinking of revising this article quite a bit, to make it much more everyday, showing how it relates to everyday life, and make it more accessible to lay people, but still keep in some of the formulas and theory stuff too. Is that okay with folks? Gene Shackman 15:55, 28 February 2009 (UTC) "
and another one earlier today or yesterday.
My point of view was trying to make this article more concept driven, less formula driven, at least in the beginning. If you all want to revert it to the original, thats fine by me. Gene Shackman 22:59, 1 March 2009 (UTC)
Gene, I don't subscribe to deletions (unless it's something cranky) and obviously you've already spent some time on this. I think to just leave it as it is for now, and maybe I can add in some stuff to the article in the next few days -- it's just hard to find time for to do stuff on CZ these days. Of course, no one is stopping you from doing further work on the article. I did note your remark, but because I do not always go to CZ these days I did not see them. But if there was an email about it, I would have been able to get involved in the discussions.
One point is that the current article there's no discussion of why statistics is called statistics, and would is meant by a "statistic" as in the older versions, although examples are given such as mean, median, standard deviations. Thanks. Hendra I. Nurdin 00:17, 2 March 2009 (UTC)

Suggestion? There is a feature in "my preferences" that lets CZ email a user if someone makes a change to an article on your watchlist. I'm concerned that requiring users to contact past editors before making changes to an article would not encourage wiki collaboration. Users should feel free to edit any article that they feel they can improve. Of course large deletions such as this need to be explained on the talk page, as Gene did. D. Matt Innis 00:55, 2 March 2009 (UTC)

I put back in the stuff I took out (with a little changed headers). Gene Shackman 02:22, 2 March 2009 (UTC)