Philosophy of Statistics (Handbook of the Philosophy of Science)
Mulaik - - Philosophy of Science 52 3: Towards a Clarification of the Theory of "Permissible Statistics". Mohan Matthen and Christopher Stephens: Handbook of the Philosophy of Science: Morgan - - Philosophy of Science 75 2: Uses and Abuses of Patent Statistics. Keith Pavitt - - In A. Nonparametric Methods in Statistics D. Gluck - - Philosophy of Science 26 1: Evidential Probability and Objective Bayesian Epistemology.
Philosophy of Statistics (Stanford Encyclopedia of Philosophy)
Recent Advances in Model Selection. Attempts to Understand Different Aspects of Randomness. Probabilistic and Statistical Paradoxes. Various Issues about Causal Inference. Special Problems in StatisticsComputer Science. An Application of Statistics to Climate Change. Historical Approaches to ProbabilityStatistics. However, the output will also depend on the prior probability over the hypotheses, and generally speaking it will only tend to the maximum likelihood estimator when the sample size tends to infinity.
Most of the controversy over the Bayesian method concerns the probability assignment over hypotheses. One important set of problems surrounds the interpretation of those probabilities as beliefs, as to do with a willingness to act, or the like. Another set of problems pertains to the determination of the prior probability assignment, and the criteria that might govern it.
The overall question here is how we should understand the probability assigned to a statistical hypothesis. Naturally the interpretation will be epistemic: It makes little sense to attempt a physical interpretation since the hypothesis cannot be seen as a repeatable event, or as an event that might have some tendency of occurring. This leaves open several interpretations of the probability assignment as a strength of belief.
One very influential interpretation of probability as degree of belief relates probability to a willingness to bet against certain odds cf. The claim that degrees of belief are correctly expressed in a probability assignment is then supported by a so-called Dutch book argument: This interpretation associates beliefs directly with their behavioral consequences: There are several problems with this interpretation of the probability assignment over hypotheses. For one, it seems to make little sense to bet on the truth of a statistical hypothesis, because such hypotheses cannot be falsified or verified.
Consequently, a betting contract on them will never be cashed. More generally, it is not clear that beliefs about statistical hypotheses are properly framed by connecting them to behavior in this way. It has been argued e. A somewhat different problem is that the Bayesian formalism, in particular its use of probability assignments over statistical hypotheses, suggests a remarkable closed-mindedness on the part of the Bayesian statistician. It is quite a strong assumption, even of an ideally rational agent, that she is indeed equipped with a real-valued function that expresses her opinion over the hypotheses.
Moreover, the probability assignment over hypotheses seems to entail that the Bayesian statistician is certain that the true hypothesis is included in the model.
- Dubbelgangster (Dutch Edition)!
- Der traurige Zwilling: Erinnerung an Geteiltes (German Edition)!
- The Sovereigns Reign (The Venturian Chronicles Book 2).
- Philosophy of Statistics.
This is an unduly strong claim to which a Bayesian statistician will have to commit at the start of her analysis. It sits badly with broadly shared methodological insights e. In this regard Bayesian statistics does not do justice to the nature of scientific inquiry, or so it seems.
The problem just outlined obtains a mathematically more sophisticated form in the problem that Bayesians expect to be well-calibrated. This problem, as formulated in Dawid , concerns a Bayesian forecaster, e. It is then shown that such a weatherman believes of himself that in the long run he will converge onto the correct probability with probability 1.
Yet it seems reasonable to suppose that the weatherman realizes something could potentially be wrong with his meteorological model, and so sets his probability for correct prediction below 1. The weatherman is thus led to incoherent beliefs. It seems that Bayesian statistical analysis places unrealistic demands, even on an ideal agent. For the moment, assume that we can interpret the probability over hypotheses as an expression of epistemic uncertainty. Then how do we determine a prior probability? Perhaps we already have an intuitive judgment on the hypotheses in the model, so that we can pin down the prior probability on that basis.
Or else we might have additional criteria for choosing our prior. However, several serious problems attach to procedures for determining the prior. First consider the idea that the scientist who runs the Bayesian analysis provides the prior probability herself. One obvious problem with this idea is that the opinion of the scientist might not be precise enough for a determination of a full prior distribution. It does not seem realistic to suppose that the scientist can transform her opinion into a single real-valued function over the model, especially not if the model itself consists of a continuum of hypotheses.
But the more pressing problem is that different scientists will provide different prior distributions, and that these different priors will lead to different statistical results. In other words, Bayesian statistical inference introduces an inevitable subjective component into scientific method. It is one thing that the statistical results depend on the initial opinion of the scientist. But it may so happen that the scientist has no opinion whatsoever about the hypotheses. How is she supposed to assign a prior probability to the hypotheses then?
The prior will have to express her ignorance concerning the hypotheses. The leading idea in expressing such ignorance is usually the principle of indifference: For a finite number of hypotheses, indifference means that every hypothesis gets equal probability. For a continuum of hypotheses, indifference means that the probability density function must be uniform.
Nevertheless, there are different ways of applying the principle of indifference and so there are different probability distributions over the hypotheses that can count as expression of ignorance. This insight is nicely illustrated in Bertrand's paradox. Jaynes and provides a very insightful discussion of this riddle and also argues that it may be resolved by relying on invariances of the problem under certain transformations.
But the general message for now is that the principle of indifference does not lead to a unique choice of priors. The point is not that ignorance concerning a parameter is hard to express in a probability distribution over those values. It is rather that in some cases, we do not even know what parameters to use to express our ignorance over.
In part the problem of the subjectivity of Bayesian analysis may be resolved by taking a different attitude to scientific theory, and by giving up the ideal of absolute objectivity. Indeed, some will argue that it is just right that the statistical methods accommodate differences of opinion among scientists. However, this response misses the mark if the prior distribution expresses ignorance rather than opinion: Now there is also a more positive answer to worries over objectivity, based on so-called convergence results e.
It turns out that the impact of prior choice diminishes with the accumulation of data, and that in the limit the posterior distribution will converge to a set, possibly a singleton, of best hypotheses, determined by the sampled data and hence completely independent of the prior distribution. However, in the short and medium run the influence of subjective prior choice remains. Summing up, it remains problematic that Bayesian statistics is sensitive to subjective input.
The undeniable advantage of the classical statistical procedures is that they do not need any such input, although arguably the classical procedures are in turn sensitive to choices concerning the sample space Lindley Against this, Bayesian statisticians point to the advantage of being able to incorporate initial opinions into the statistical analysis.
The philosophy of Bayesian statistics offers a wide range of responses to the problems outlined above. Some Bayesians bite the bullet and defend the essentially subjective character of Bayesian methods. Others attempt to remedy or compensate for the subjectivity, by providing objectively motivated means of determining the prior probability or by emphasizing the objective character of the Bayesian formalism itself. One very influential view on Bayesian statistics buys into the subjectivity of the analysis e.
So-called personalist s or strict subjectivists argue that it is just right that the statistical methods do not provide any objective guidelines, pointing to radically subjective sources of any form of knowledge. The problems on the interpretation and choice of the prior distribution are thus dissolved, at least in part: However, it deserves emphasis that a subjectivist view on Bayesian statistics does not mean that all constraints deriving from empirical fact can be disregarded.
Nobody denies that if you have further knowledge that imposes constraints on the model or the prior, then those constraints must be accommodated. For example, today's posterior probability may be used as tomorrow's prior, in the next statistical inference. The point is that such constraints concern the rationality of belief and not the consistency of the statistical inference per se.
Subjectivist views are most prominent among those who interpret probability assignments in a pragmatic fashion, and motivate the representation of belief with probability assignments by the afore-mentioned Dutch book arguments.
- Philosophy of Statistics - Google Книги?
- Philosophy of Statistics: Volume 7 : Dov M. Gabbay : ?
- DICHOTOMY OF SELF;
- Philosophy of Statistics: Volume 7!
- ;
- Papa trinkt Bier (German Edition).
- Lose Fat Not Weight and Feel Great!?
- Shayne On You!
- ?
- .
- The Best Dang Houston Play Ever, Yall (My Christian Lifescript Book 5).
- The Naked Blogger - 5 Steps to successful online journal writing: How to start a blog - Become a successful blogger And earn money on the side!;
- Get Fast!: A Complete Guide to Gaining Speed Wherever You Ride (Bicycling).
Central to this approach is the work of Savage and De Finetti. Savage proposed to axiomatize statistics in tandem with decision theory , a mathematical theory about practical rationality. He argued that by themselves the probability assignments do not mean anything at all, and that they can only be interpreted in the context where an agent faces a choice between actions, i. In similar vein, De Finetti e. Remarkably, it thus appears that the subjectivist view on Bayesian statistics is based on the same behaviorism and empiricism that motivated Neyman and Pearson to develop classical statistics.
Account Options
Notice that all this makes one aspect of the interpretation problem of Section 4. One response to this question is to turn to different motivations for representing degrees of beliefs by means of probability assignments. Following work by De Finetti, several authors have proposed vindications of probabilistic expressions of belief that are not based on behavioral goals, but rather on the epistemic goal of holding beliefs that accurately represent the world, e. A strong generalization of this idea is achieved in Schervish, Seidenfeld and Kadane , which builds on a longer tradition of using scoring rules for achieving statistical aims.
An alternative approach is that any formal representation of belief must respect certain logical constraints, e. However, the original subjectivist response to the issue that a prior over hypotheses is hard to interpret came from De Finetti's so-called representation theorem , which shows that every prior distribution can be associated with its own set of predictions, and hence with its own behavioral consequences.
In other words, De Finetti showed how priors are indeed associated with beliefs that can carry a betting interpretation. De Finetti's representation theorem relates rules for prediction, as functions of the given sample data, to Bayesian statistical analyses of those data, against the background of a statistical model. See Festa and Suppes for useful introductions. De Finetti considers a process that generates a series of time-indexed observations, and he then studies prediction rules that take these finite segments as input and return a probability over future events, using a statistical model that can analyze such samples and provide the predictions.
The key result of De Finetti is that a particular statistical model, namely the set of all distributions in which the observations are independently and identically distributed, can be equated with the class of exchangeable prediction rules , namely the rules whose predictions do not depend on the order in which the observations come in. Let us consider the representation theorem in some more formal detail.
For simplicity, say that the process generates time-indexed binary observations, i. De Finetti relates this particular set of exchangeable prediction rules to a Bayesian inference over a specific type of statistical model. The representation theorem states that there is a one-to-one mapping of priors over Bernoulli hypotheses and exchangeable prediction rules. Next to the original representation theorem derived by De Finetti, several other and more general representation theorems were proved, e.
Representation theorems equate a prior distribution over statistical hypotheses to a prediction rule, and thus to a probability assignment that can be given a subjective and behavioral interpretation. This removes the worry expressed above, that the prior distribution over hypotheses cannot be interpreted subjectively because it cannot be related to belief as a willingness to act: However, for De Finetti the representation theorem provided a reason for doing away with statistical hypotheses altogether, and hence for the removal of a notion of probability as anything other than subjective opinion cf.
Not all subjectivists are equally dismissive of the use of statistical hypotheses. Jeffrey has proposed so-called mixed Bayesianism in which subjectively interpreted distributions over the hypotheses are combined with a physical interpretation of the distributions that hypotheses define over sample space. Romeijn , , argues that priors over hypotheses are an efficient and more intuitive way of determining inductive predictions than specifying properties of predictive systems directly.
This advantage of using hypotheses seems in agreement with the practice of science, in which hypotheses are routinely used, and often motivated by mechanistic knowledge on the data generating process. The fact that statistical hypotheses can strictly speaking be eliminated does not take away from their utility in making predictions. Despite its—seemingly inevitable—subjective character, there is a sense in which Bayesian statistics might lay claim to objectivity. It can be shown that the Bayesian formalism meets certain objective criteria of rationality, coherence, and calibration.
Bayesian statistics thus answers to the requirement of objectivity at a meta-level: Arguments supporting the Bayesian way of accommodating data, namely by conditionalization , have been provided in a pragmatic context by dynamic Dutch book arguments , whereby probability is interpreted as a willingness to bet cf. Maher , van Fraassen Similar arguments have been advanced on the grounds that our beliefs must accurately represent the world along the lines of De Finetti , e.
An important distinction must be made in arguments that support the Bayesian way of accommodating evidence: Arguments that support the representation of the epistemic state of an agent by means of probability assignments also provide support for Bayes' theorem as a constraint on degrees of belief. Bayes' rule, by contrast, presents a constraint on probability assignments that represent epistemic states of an agent at different points in time. In the philosophy of statistics many Bayesians adopt Bayes' rule implicitly, but in what follows I will only assume that Bayesian statistical inferences rely on Bayes' theorem.
Whether the focus lies on Bayes' rule or on Bayes' theorem, the common theme in the above-mentioned arguments is that they approach Bayesian statistical inference from a logical angle, and focus on its internal coherence or consistency cf. While its use in statistics is undeniably inductive, Bayesian inference thereby obtains a deductive, or at least non-ampliative character: The conclusions, in turn, are straightforward consequences of this probability assignment. They can be derived by applying theorems of probability theory, most notably Bayes' theorem.
Bayesian statistical inference thus becomes an instance of probabilistic logic cf. Hailperin , Halpern , Haenni et al Summing up, there are several arguments showing that statistical inference by Bayes' theorem, or by Bayes' rule, is objectively correct. These arguments invite us to consider Bayesian statistics as an instance of probabilistic logic.
Such appeals to the logicality of Bayesian statistical inference may provide a partial remedy for its subjective character. Moreover, a logical approach to the statistical inferences avoids the problem that the formalism places unrealistic demands on the agents, and that it presumes the agent to have certain knowledge.
Much like in deductive logic, we need not assume that the inferences are psychologically realistic, nor that the agents actually believe the premises of the arguments. Rather the arguments present the agents with a normative ideal and take the conditional form of consistency constraints: An important instance of probabilistic logic is presented in inductive logic , as devised by Carnap, Hintikka and others Carnap and , Hintikka and Suppes , Carnap and Jeffrey , Hintikka and Niiniluoto , Kuipers , and Paris , Nix and Paris , Paris and Waterhouse Historically, Carnapian inductive logic developed prior to the probabilistic logics referenced above, and more or less separately from the debates in the philosophy of statistics.
But the logical systems of Carnap can quite easily be placed in the context of a logical approach to Bayesian inference, and doing this is in fact quite insightful. For simplicity, we choose a setting that is similar to the one used in the exposition of the representation theorem, namely a binary data generating process, i. Carnap derived such rules from constraints on the probability assignments over the samples.
Some of these constraints boil down to the axioms of probability. Other constraints, exchangeability among them, are independently motivated, by an appeal to so-called logical interpretation of probability. Under this logical interpretation, the probability assignment must respect certain invariances under transformations of the sample space, in analogy to logical principles that constrain truth valuations over a language in a particular way.
Carnapian inductive logic is an instance of probabilistic logic, because its sequential predictions are all based on a single probability assignment at the outset, and because it relies on Bayes' theorem to adapt the predictions to sample data cf. One important difference with Bayesian statistical inference is that, for Carnap, the probability assignment specified at the outset only ranges over samples and not over hypotheses.
However, by De Finetti's representation theorem Carnap's exchangeable rules can be equated to particular Bayesian statistical inferences. A further difference is that Carnapian inductive logic gives preferred status to particular exchangeable rules. In view of De Finetti's representation theorem, this comes down to the choice for a particular set of preferred priors.
As further developed below, Carnapian inductive logic is thus related to objective Bayesian statistics. It is a moot point whether further constraints on the probability assignments can be considered as logical, as Carnap and followers have it, or whether the title of logic is best reserved for the probability formalism in isolation, as De Finetti and followers argue.
A further set of responses to the subjectivity of Bayesian statistical inference targets the prior distribution directly: The literature proposes several objective criteria for filling in the prior over the model. Each of these lays claim to being the correct expression of complete ignorance concerning the value of the model parameters, or of minimal information regarding the parameters. Three such criteria are discussed here. In the context of Bertrand's paradox we already discussed the principle of indifference, according to which probability should be distributed evenly over the available possibilities.
A further development of this idea is presented by the requirement that a distribution should have maximum entropy. Notably, the use of entropy maximization for determining degrees of beliefs finds much broader application than only in statistics: In objective Bayesian statistics , the idea is applied to the prior distribution over the model cf.
However, for continuous models the maximum entropy distribution depends crucially on the metric over the parameters in the model. The burden of subjectivity is thereby moved to the parameterization, but of course it may well be that we have strong reasons for preferring a particular parameterization over others cf.
There are other approaches to the objective determination of priors. In view of the above problems, a particularly attractive method for choosing a prior over a continuous model is proposed by Jeffreys The general idea of so-called Jeffreys priors is that the prior probability assigned to a small patch in the parameter space is proportional to, what may be called, the density of the distributions within that patch. Intuitively, if a lot of distributions, i.
More technically, such a density is expressed by a prior distribution that is proportional to the Fisher information. A key advantage of these priors is that they are invariant under reparameterizations of the parameter space: A final method of defining priors goes under the name of reference priors Berger et al The proposal starts from the observation that we should minimize the subjectivity of the results of our statistical analysis, and hence that we should minimize the impact of the prior probability on the posterior. The idea of reference priors is exactly that it will allow the sample data a maximal say in the posterior distribution.
But since at the outset we do not know what sample we will obtain, the prior is chosen so as to maximize the expected impact of the data. The expectation must itself be taken with respect to some distribution over sample space, but again, it may well be that we have strong reasons for this latter distribution. A different response to the subjectivity of priors is to extend the Bayesian formalism, in order to leave the choice of prior to some extent open.
The subjective choice of a prior is in that case circumvented. Two such responses will be considered in some detail. Recall that a prior probability distribution over statistical hypotheses expresses our uncertain opinion on which of the hypotheses is right. The central idea behind hierarchical Bayesian models Gelman et al is that the same pattern of putting a prior over statistical hypotheses can be repeated on the level of priors itself.
More precisely, we may be uncertain over which prior probability distribution over the hypotheses is right. If we characterize possible priors by means of a set of parameters, we can express this uncertainty about prior choice in a probability distribution over the parameters that characterize the shape of the prior.
In other words, we move our uncertainty one level up in a hierarchy: The idea of hierarchical Bayesian modeling Gelman et al relates naturally to the Bayesian comparison of Carnapian prediction rules e. Hierarchical Bayesian modeling can also be related to another tool for choosing a particular prior distribution over hypotheses, namely the method of empirical Bayes , which estimates the prior that leads to the maximal marginal likelihood of the model.
In the philosophy of science, hierarchical Bayesian modeling has made a first appearance due to Henderson et al There is also a response that avoids the choice of a prior altogether. This response starts with the same idea as hierarchical models: But instead of defining a distribution over this set, proponents of interval-valued or imprecise probability claim that our epistemic state regarding the priors is better expressed by this set of distributions, and that sharp probability assignments must therefore be replaced by lower and upper bounds to the assignments.
Now the idea that uncertain opinion is best captured by a set of probability assignments, or a credal set for short, has a long history and is backed by an extensive literature e. In light of the main debate in the philosophy of statistics, the use of interval-valued priors indeed forms an attractive extension of Bayesian statistics: These theoretical developments may look attractive, but the fact is that they mostly enjoy a cult status among philosophers of statistics and that they have not moved the statistician in the street.
On the other hand, standard Bayesian statistics has seen a steep rise in popularity over the past decade or so, owing to the availability of good software and numerical approximation methods. And most of the practical use of Bayesian statistics is more or less insensitive to the potentially subjective aspects of the statistical results, employing uniform priors as a neutral starting point for the analysis and relying on the afore-mentioned convergence results to wash out the remaining subjectivity cf.
Gelman and Shalizi However, this practical attitude of scientists towards modelling should not be mistaken for a principled answer to the questions raised in the philosophy of statistics see Morey et al In the foregoing we have seen how classical and Bayesian statistics differ. But the two major approaches to statistics also have a lot in common. Most importantly, all statistical procedures rely on the assumption of a statistical model , here referring to any restricted set of statistical hypotheses.
Moreover, they are both aimed at delivering a verdict over these hypotheses. Whereas in Bayesian statistics the model presents a very strong assumption, classical statistics does not endow the model with a special epistemic status: But across the board, the adoption of a model is absolutely central to any statistical procedure. A natural question is whether anything can be said about the quality of the statistical model, and whether any verdict on this starting point for statistical procedures can be given.
Surely some models will lead to better predictions, or be a better guide to the truth, than others. The evaluation of models touches on deep issues in the philosophy of science, because the statistical model often determines how the data-generating system under investigation is conceptualized and approached Kieseppa Model choice thus resembles the choice of a theory, a conceptual scheme, or even of a whole paradigm, and thereby might seem to transcend the formal frameworks for studying theoretical rationality cf.
Carnap , Jeffrey Despite the fact that some considerations on model choice will seem extra-statistical, in the sense that they fall outside the scope of statistical treatment, statistics offers several methods for approaching the choice of statistical models. There are in fact very many methods for evaluating statistical models Claeskens and Hjort , Wagenmakers and Waldorp In first instance, the methods occasion the comparison of statistical models, but very often they are used for selecting one model over the others.
In what follows we only review prominent techniques that have led to philosophical debate: Akaike's information criterion, the Bayesian information criterion, and furthermore the computation of marginal likelihoods and posterior model probabilities, both associated with Bayesian model selection.
Similar books and articles
We leave aside methods that use cross-validation as they have, unduly, not received as much attention in the philosophical literature. Akaike's information criterion , modestly termed An Information Criterion or AIC for short, is based on the classical statistical procedure of estimation see Burnham and Anderson , Kieseppa This proximity is often equated with the expected predictive accuracy of the estimate, because if the estimate and the true distribution are closer to each other, their predictions will be better aligned to one another as well.
In the derivation of the AIC, the so-called relative entropy or Kullback-Leibler divergence of the two distributions is used as a measure of their proximity, and hence as a measure of the expected predictive accuracy of the estimate. Naturally, the true distribution is not known to the statistician who is evaluating the model. If it were, then the whole statistical analysis would be useless.
The MLE of the model thereby features in an expression of the model quality, i. As can be seen from the expression above, a model with a smaller AIC is preferable: Notice that the number of dimensions, or independent parameters, in the model increases the AIC and thereby lowers the eligibility of the model: For this reason, statistical model selection by the AIC can be seen as an independent motivation for preferring simple models over more complex ones Sober and Forster But this result also invites some critical remarks.
For one, we might impose other criteria than merely the unbiasedness on the estimation of the proximity to the truth, and this will lead to different expressions for the approximation. Moreover, it is not always clearcut what the dimensions of the model under scrutiny really are. For curve fitting this may seem simple, but for more complicated models or different conceptualizations of the space of models, things do not look so easy cf.
Myung et al , Kieseppa A prime example of model selection is presented in curve fitting.
Handbook of the Philosophy of Science, Vol. 7: Philosophy of Statistics
Different models are characterized by polynomials of different degrees that have different numbers of parameters. Estimations fix the parameters of these polynomials. Various other prominent model selection tools are based on methods from Bayesian statistics. They all start from the idea that the quality of a model is expressed in the performance of the model on the sample data: Because of this, there is a close connection with the hierarchical Bayesian modelling referred to earlier Gelman The central notion in the Bayesian model selection tools is thus the marginal likelihood of the model, i.
There was a problem providing the content you requested
One way of evaluating models, known as Bayesian model selection , is by comparing the models on their marginal likelihood, or else on their posteriors cf. Kass and Raftery Usually the marginal likelihood cannot be computed analytically. Numerical approximations can often be obtained, but for practical purposes it has proved very useful, and quite sufficient, to employ an approximation of the marginal likelihood. This approximation has become known as the Bayesian information criterion , or BIC for short Schwarz , Raftery It turns out that this approximation shows remarkable similarities to the AIC: The latter dependence is the only difference with the AIC, but a major difference in how the model evaluation may turn out.
The concurrence of the AIC and the BIC seems to give a further motivation for our intuitive preference for simple models over more complex ones. Indeed, other model selection tools, like the deviance information criterion Spiegelhalter et al and the approach based on minimum description length Grunwald , also result in expressions that feature a term that penalizes complex models. However, this is not to say that the dimension term that we know from the information criteria exhausts the notion of model complexity.
There is ongoing debate in the philosophy of science concerning the merits of model selection in explications of the notion of simplicity, informativeness, and the like see, for example, Sober , Romeijn and van de Schoot , Romeijn et al , Sprenger There are also statistical methods that refrain from the use of a particular model, by focusing exclusively on the data or by generalizing over all possible models.
Some of these techniques are properly localized in descriptive statistics: Statistical methods that do not rely on an explicit model choice have unfortunately not attracted much attention in the philosophy of statistics, but for completeness sake they will be briefly discussed here.
One set of methods, and a quite important one for many practicing statisticians, is aimed at data reduction.
Often the sample data are very rich, e. The first step in a statistical analysis may then be to pick out the salient variability in the data, in order to scale down the computational burden of the analysis itself. The technique of principal component analysis PCA is designed for this purpose Jolliffe Given a set of points in a space, it seeks out the set of vectors along which the variation in the points is large. The vector on the diagonal is called the principal component of the data. In richer data structures, and using a more general measure of variation among points, we can find the first component in a similar way.
Moreover, we can repeat the procedure after subtracting the variation along the last found component, by projecting the data onto the plane perpendicular to that component. This allows us to build up a set of principal components of diminishing importance.