Clustering for Data Mining: A Data Recovery Approach (Chapman & Hall/CRC Computer Science & Data Analysis)
Boris Mirkin
This book gives a smooth, motivated and example-rich
introduction to clustering, which is innovative in many aspects.
Answers to important questions that are very rarely addressed if
addressed at all, are provided.
Examples:
(a) what to do if the user has no idea of the number
of clusters and/or their location - use what is called intelligent k-means;
(b) what to do if the data contain both numeric and categorical
features - use what is called three-step standardization procedure;
(c) how to catch anomalous patterns, (d) how to validate clusters, etc.
Some of these may be subject to criticism, however some motivation is always
supplied, and the results are always reproducible thus testable.
The book introduces a number
of non-conventional cluster interpretation aids derived from a data
geometry view accepted by the author and based on what is referred
the contribution weights - basically showing those elements of cluster
structures that distinguish clusters from the rest. These contribution
weights, applied to categorical data, appear to be highly compatible
with what statisticians such as A. Quetelet and K. Pearson were developing
in the past couple of centuries, which is a highly original and welcome
development. The book reviews a rich set of approaches being accumulated
in such hot areas as text mining and bioinformatics, and shows that
clustering is not just a set of naive methods for data processing but
forms an evolving area of data science.
I adopted the book as a text for my courses in data mining for bachelor
and master degrees.
Ссылка удалена правообладателем
----
The book removed at the request of the copyright holder.