myatt, g.j. (2007). making sense of data: a practical guide to exploratory data analysis and data...

2
managing change, and communicating with individuals at all levels within the organization. The most basic requirement, however, is that records managers return to a position of professional leadership. In Choksy’s words: ‘‘We must now lead ourselves into managing records. We must stop following the archivists and librar- ians. Why we have followed the utterances of a few archivists as if they were our gurus and worshipped at the altar of library science is unclear. They do not manage business information; they manage cultural informa- tion’’ (p. 202). It is the records manager who must domesticate the free-range information within the organization. In conclusion, Carol E.B. Choksy has written an insightful and provocative book. Readers (and reviewers) may disagree with some of what she wrote, but we all will leave with a new appreciation for the complex role of the records manager in the decades ahead. Gregory S. Hunter Palmer School of Library and Information Science, Long Island University E-mail address: [email protected] Available online 24 October 2007 doi:10.1016/j.ipm.2007.08.004 Myatt, G.J. (2007). Making sense of data: a practical guide to exploratory data analysis and data mining (pp. 280). Wiley Making Sense of Data by Glenn Myatt is quick over of data analysis technique (i.e., I finished the book on the flight from Dulles to O’Hara). The book takes an interesting approach to understanding data. From the title, I thought the paradigm was going to be an information science viewpoint to data (e.g., the inductive model of data, information, knowledge, and insight). Instead, the book took a statistical approach to making sense of data. However, the author wrapped these standard statistical approaches in a project manage frame- work: thereby achieving the same data, information, knowledge, and insight end. This approach combined with the project management statistical framework was interesting, where one could see the relationship, although the connections between the two paradigms were not thoroughly made throughout the book. Chapter 1 provides an overview of the book, specifically a four-step process of any exploratory analysis/ data mining project, with the four steps being: (1) problem definition, (2) data preparation, (3) implementation of the analysis, and (4) deployment of the results. The author correctly points out that there will be some over- lap among the four steps. The problem definition is given short rift, only a paragraph. This was the same for data preparation. Implementation of the analysis (i.e., decision making) is presented as one of three categories: (1) summarizing the data, (2) finding hidden relationships, and (3) making predictions. Again there is a great deal of interplay among the three. The author does a great job of illustrating this in Fig. 1.2 (page 4), where the overlap of the three categories and several common methods of statistical analysis and data mining are pre- sented in a Venn diagram. A brief mention of the deployment of the results ends the chapter. Chapter 2 focuses on problem definition, with coverage of defining the objectives and deliverables. There is also a brief and out of place mention of project roles and project plans. Either is presented in any practical detail. This chapter, as with most of the chapters, ends with a case study and suggestions for further reading, which are actually quite good. Chapter 3 deals with data preparation, which are an often over looked aspect of data analysis, but which is critical to the confidence with which future decision or implications from findings can be made. This chapter is presented from a database point of view, although there are certainly non-database data that one could ad- dress. For me, the most interesting aspect of this chapter was the data transformation methods. 978 Book reviews / Information Processing and Management 44 (2008) 973–984

Upload: bernard-j-jansen

Post on 04-Sep-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Myatt, G.J. (2007). Making sense of data: a practical guide to exploratory data analysis and data mining (pp. 280). Wiley

managing change, and communicating with individuals at all levels within the organization. The most basicrequirement, however, is that records managers return to a position of professional leadership. In Choksy’swords: ‘‘We must now lead ourselves into managing records. We must stop following the archivists and librar-ians. Why we have followed the utterances of a few archivists as if they were our gurus and worshipped at thealtar of library science is unclear. They do not manage business information; they manage cultural informa-tion’’ (p. 202). It is the records manager who must domesticate the free-range information within theorganization.

In conclusion, Carol E.B. Choksy has written an insightful and provocative book. Readers (and reviewers)may disagree with some of what she wrote, but we all will leave with a new appreciation for the complex roleof the records manager in the decades ahead.

Gregory S. HunterPalmer School of Library and Information Science,

Long Island University

E-mail address: [email protected]

Available online 24 October 2007

doi:10.1016/j.ipm.2007.08.004

978 Book reviews / Information Processing and Management 44 (2008) 973–984

Myatt, G.J. (2007). Making sense of data: a practical guide to exploratory data analysis and data mining

(pp. 280). Wiley

Making Sense of Data by Glenn Myatt is quick over of data analysis technique (i.e., I finished the book onthe flight from Dulles to O’Hara). The book takes an interesting approach to understanding data. From thetitle, I thought the paradigm was going to be an information science viewpoint to data (e.g., the inductivemodel of data, information, knowledge, and insight). Instead, the book took a statistical approach to makingsense of data. However, the author wrapped these standard statistical approaches in a project manage frame-work: thereby achieving the same data, information, knowledge, and insight end. This approach combinedwith the project management statistical framework was interesting, where one could see the relationship,although the connections between the two paradigms were not thoroughly made throughout the book.

Chapter 1 provides an overview of the book, specifically a four-step process of any exploratory analysis/data mining project, with the four steps being: (1) problem definition, (2) data preparation, (3) implementationof the analysis, and (4) deployment of the results. The author correctly points out that there will be some over-lap among the four steps. The problem definition is given short rift, only a paragraph. This was the same fordata preparation. Implementation of the analysis (i.e., decision making) is presented as one of three categories:(1) summarizing the data, (2) finding hidden relationships, and (3) making predictions. Again there is a greatdeal of interplay among the three. The author does a great job of illustrating this in Fig. 1.2 (page 4), where theoverlap of the three categories and several common methods of statistical analysis and data mining are pre-sented in a Venn diagram. A brief mention of the deployment of the results ends the chapter.

Chapter 2 focuses on problem definition, with coverage of defining the objectives and deliverables. There isalso a brief and out of place mention of project roles and project plans. Either is presented in any practicaldetail. This chapter, as with most of the chapters, ends with a case study and suggestions for further reading,which are actually quite good.

Chapter 3 deals with data preparation, which are an often over looked aspect of data analysis, but which iscritical to the confidence with which future decision or implications from findings can be made. This chapter ispresented from a database point of view, although there are certainly non-database data that one could ad-dress. For me, the most interesting aspect of this chapter was the data transformation methods.

Page 2: Myatt, G.J. (2007). Making sense of data: a practical guide to exploratory data analysis and data mining (pp. 280). Wiley

Book reviews / Information Processing and Management 44 (2008) 973–984 979

Chapter 4 was on tables and graphs, addressing primarily the basics of each type of presentation style.Chapter 5 deals with summarizing and the ability to make general statements about statistical data, addressingdescriptive, inferential, and comparative approaches.

Chapter 6 addresses the importance of grouping data in order to find hidden relationships, to become famil-iar with the data, and simplification of data. Several clustering approaches were discussed from a conceptualaspect. There was little presentation on automated tools to accomplish large-scale data analysis, data mining,or text mining.

Chapter 7 dealt with predictive models on data where estimates or forecasting is needed. The authortouches on the use of predictive models for prioritization, decision support, and understanding relationships.

Chapter 8 concerned the deployment aspects of a data analysis project, including the deliverables (i.e., re-ports integration into existing systems, and standalone software). This chapter also dealt with the specificactivities involved in the deployment phase, including the planning, the executing, measuring progress, andmonitoring performance. The chapter ended with brief discussions of possible deployment scenarios.

Chapter 9 presented an overview of the exploratory data analysis process, namely reviewing the four stepsoutlined in the project management process, which are problem definition, data preparation, implementationof the analysis, and deployment.

The book ends with some statistical tables. It has a seven page glossary that is actually quite useful. Thebook has a wholly inadequate bibliography of one and half pages. The index is a little more than five pages.

Overall, the idea of embedding data analysis and mining within a project management approach was inter-esting, and the combination certainly has merit. However, the book really fell short in both areas. I did notfind it useful as a statistical analysis references – and certainly not as a data mining reference –, and it waslacking in the project management area. The book tries to be a reference for both areas, and it ends up doneither well.

Bernard J. JansenThe Pennsylvania State University,

College of Information Sciences and Technology,

329F Information Sciences and Technology Building,University Park,

PA 16802,

USA

E-mail address: jjansen@acm:orgURL: http://ist.psu.edu/faculty_pages/jjansen/

Available online 21 September 2007

doi:10.1016/j.ipm.2007.08.003

Paradigms lost: The life and deaths of the printed word, William Sonn. The Scarecrow Press (2006). 398 pp.,

ISBN-13: 978-0-8108-5262-4; ISBN-10: 0-8108-5262-4, $35

The term disruptive technology or innovation was popularized by Harvard Business School professor Clay-ton Christensen.1 It is often used to denote that which alters the course of a life, an industry, or a whole soci-ety. William Sonn’s book convinces us that all of the ‘‘technology’’ associated over the years with the spread ofthe printed word, or more accurately recorded knowledge, is the perfect example of a disruptive technology.

This extensively researched book is a history primarily of the ‘‘printed word’’. It gives us an insight into theprofound effect of access to information on humankind and the society in which we live. No dry academic

1 Christensen, Clayton. 2000. The Innovator’s Dilemma. New York: Collins.