wmwera.blogg.se

Define data dredging
Define data dredging













If enough different variables are looked at, some will show correlations that occur solely by chance rather than representing a true relationship.

define data dredging

Data dredging is when data mining is abused, so that the same data set is examined too many times. Diabetes Health Care in Young and Old Patientsĭata mining is a brilliant tool for research, but like most things can be exploited.By using pre-collected data about non-communicable diseases the researchers were able to analyse the effect of different diabetes treatments in different age groups and create models that allowed predictions about a treatment efficacy to be made. For example, it could be images, such as CTs or MRIs, with data being examined to find links between different radiological features and diseases.įor an example of how data mining techniques can be used to study the health of populations, check out this paper looking at diabetes in Saudi Arabia. This data may be numerical but does not have to be. Ideally, the data examined in data mining should encompass a whole population, rather than just a representative sub-sample.

define data dredging

How it differs from conventional statistics is in its scale and in the type of data it can handle. Fundamentally it’s just examining data to see if there’s a correlation between two different variables.

define data dredging

Hypotheses about why these associations exist can then be formed.ĭata mining initially appears to be quite similar to normal statistics. Instead of formulating a hypothesis from observations and then collecting data to see if the hypothesis is true or not, as in conventional research, data mining uses data that has already been collected and analyses it to see if links between different variables can be found. It has developed as a research tool in response to the increasing power and capabilities of computers and technology over the last few decades, which allow far larger amounts of data to be simultaneously handled than was previously possible. Data mining is the process by which large-scale data sets are examined in order to find previously unknown links between different variables.















Define data dredging