Topic: Clustering? | Mathematical Musings

Tagged: 6-8 Statistics, Clustering

This topic has 4 replies, 3 voices, and was last updated 12 years, 1 month ago by Cathy Kessel.

Viewing 5 posts - 1 through 5 (of 5 total)

Author

Posts
March 24, 2013 at 3:46 am #1833

jrhiglenn
Participant

I am an 8th grade math teacher. I have a question on one standard that I believe I have finally found the answer to but still question why it has been written as it has.
8th Grade Stats and Probability
8.SP.1. Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities. Describe patterns such as clustering, outliers, positive or negative association, linear association, and nonlinear association.

The question is this. What is clustering supposed to mean?
I have taught stats at the high school level. I have taken two college level stats classes. I have read numerous research papers. I have searched the internet for the meaning of this term. I believe I have finally found the answer. The Math in Focus 8th grade text book uses this term to discuss the position of data points on a graph to the line of fit.
I believe that including this term in the standards as it is makes it sound like it is a term used in the field of statistics. But as I said, in my experience and review of “things statistical” I cannot find anything besides this text which uses it that way.
Please let me know if and where I am in error in my thinking.

March 24, 2013 at 11:37 am #1836

Cathy Kessel
Participant

It sounds as if you’ve been looking for a term from statistics rather than statistics education. An analogue: the CCSS uses terms from mathematics education that are not used in mathematics (e.g., “counting on,” “add within 20”).

“Clustering” is used in the American Statistical Association’s GAISE report (Guidelines for Assessment and Instruction in Statistics Education), which is listed in the CCSS document under “works consulted” and can be downloaded here: http://www.amstat.org/education/publications.cfm. It’s also used in the 6–8 Statistics and Probability Progression for the CCSS which is here: http://ime.math.arizona.edu/progressions/.

March 25, 2013 at 8:27 am #1837

jrhiglenn
Participant

I checked out the GAISE document. After looking through it, I can only find the word “cluster” and I do not believe that the word here is used the same as the word “clustering” in the CCSS which seems to have been lifted almost verbatim from the Math in Focus text that I have.
In fact, the GAISE document uses the word cluster to mean “grouped together” in my reading of the word. And in another location, it is distinctly discussing a cluster analysis type approach where names are grouped with the lengths associated with them. (FIG 37.)
“Looking for clusters and gaps in the distribution helps students identify the shape of the distribution. Students should develop a sense of why a distribution takes on a particular shape for the context of the variable being considered.”
“Does the distribution have one main cluster (or mound) with smaller groups of similar size on each side of the cluster? If so, the distribution might be described as symmetric.”
“Does the distribution have one main cluster with smaller groups on each side that are not the same size? Students may classify this as “lopsided,” or may use the term asymmetrical.”
Why does the distribution take this shape? Using the dotplot from above, students will
recognize both groups have distributions that are “lopsided,” with the main cluster on the lower end of the distributions and a few values
to the right of the main mound.”
“Students will notice that the distribution of the poultry hot dogs has two distinct clusters. What might explain the gap and two clusters?”
“It is interesting to note the two apparent clusters of data for poultry hot dogs.”

Random selection tends to produce some sample means that underestimate the population mean and some that overestimate the population mean, such that the sample means cluster somewhat evenly around the population mean value (i.e., random selection tends to be unbiased).”
“Figure 37: Names clustered by length”

From the Progressions document we read:
“Working with paired measurement variables that might be associated linearly or in a more subtle fashion, students construct a scatter plot, describing the pattern in terms of clusters, gaps, and unusual data points (much as in the univariate situation). Then, they look for an overall positive or negative trend in the cloud of points, a linear or nonlinear (curved) pattern, and strong or weak association between the two variables, using these terms in describing the nature of the observed association between the variables. 8.SP.1”
This use of the term cluster here does not match the Math in Focus textbook discussion which in my mind states that a discussion about “clustering” is a discussion about the closeness of data points to a line of fit drawn on the graph.
However, in one of the figures this document does use the term “clustered about the line” which is the Math in Focus text book rendering of the term.
In the Progressions document we read the following statements which do not use a consistent meaning of the word cluster.

“Students extend their knowledge of symmetric shapes,describe data displayed in dot plots and histograms in terms of symmetry. They identify clusters, peaks, and gaps, recognizing common shapes and patterns in these displays of data distributions(MP7)

“Which measure will tend to be closer to where the data on prices of a new pair of jeans actually cluster?”

“Students realize that the mean may not represent the largest cluster of data points, and that the median is a more useful measure of center. In like fashion, the IQR is a more useful measure of spread, giving the spread of the middle 50% of the data points.”
I think that this indicates that there is a lack of consistency and clarity about what the term means and as mathematics is a language-intensive content area, we should be as precise as possible with terms. There are thousands of math teachers around the country who must read this standard and teach what they believe it means. I believe the standard lacks the clarity to do this. In addition, if we are to make mathematics “real world” then we should be using a consistent language with those who are out in the “real world” using the mathematics we are teaching.

Please point out my errors where needed.

March 25, 2013 at 3:44 pm #1843

Bill McCallum
Keymaster

I agree there are two uses of the word “cluster” here, one referring univariate data and the other referring to bivariate data. But in both cases the word means the same thing, namely an informal notion of data points being close to each other in a group. And that is certainly the sense intended in the standards, not any formal statistical construct.

March 25, 2013 at 4:39 pm #1847

Cathy Kessel
Participant

I think that part of the issue may be use of “cluster” as a verb. jrhiglenn, as you point out, the GAISE document does not contain the word “clustering.” It does, however, use “cluster” as a verb in two of the instances you noted: “Random selection tends to produce some sample means that underestimate the population mean and some that overestimate the population mean, such that the sample means cluster somewhat evenly around the population mean value (i.e., random selection tends to be unbiased)” and “Names clustered by length”.

The 6-8 Progression example also uses “cluster” as a verb: “Which measure will tend to be closer to where the data on prices of a new pair of jeans actually cluster?” Another example illustrates the “clustering” mentioned in 8.SP.1: “the points are closely clustered about the line” (p. 11, shown under a line in a scatterplot).
Author

Posts

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.