statbase: library statistics made easy

15
Library Hi Tech StatBase: library statistics made easy Alexandria Payne John Curtis Article information: To cite this document: Alexandria Payne John Curtis , (2014),"StatBase: library statistics made easy", Library Hi Tech, Vol. 32 Iss 3 pp. 546 - 559 Permanent link to this document: http://dx.doi.org/10.1108/LHT-04-2014-0031 Downloaded on: 27 November 2014, At: 01:48 (PT) References: this document contains references to 24 other documents. To copy this document: [email protected] The fulltext of this document has been downloaded 99 times since 2014* Users who downloaded this article also downloaded: Teja Koler-Povh, Matjaž Mikoš, Goran Turk, (2014),"Institutional repository as an important part of scholarly communication", Library Hi Tech, Vol. 32 Iss 3 pp. 423-434 http://dx.doi.org/10.1108/LHT-10-2013-0146 Anilkumar Hanumappa, Mallikarjun Dora, Viral Navik, (2014),"Open Source Software solutions in Indian libraries", Library Hi Tech, Vol. 32 Iss 3 pp. 409-422 http://dx.doi.org/10.1108/LHT-12-2013-0157 Ian Chan, (2014),"Leveraging student course enrollment data to infuse personalization in a library website", Library Hi Tech, Vol. 32 Iss 3 pp. 450-466 http://dx.doi.org/10.1108/LHT-07-2013-0096 Access to this document was granted through an Emerald subscription provided by 404468 [] For Authors If you would like to write for this, or any other Emerald publication, then please use our Emerald for Authors service information about how to choose which publication to write for and submission guidelines are available for all. Please visit www.emeraldinsight.com/authors for more information. About Emerald www.emeraldinsight.com Emerald is a global publisher linking research and practice to the benefit of society. The company manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well as providing an extensive range of online products and additional customer resources and services. Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation. *Related content and download information correct at time of download. Downloaded by University of California San Francisco At 01:48 27 November 2014 (PT)

Upload: john

Post on 31-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: StatBase: library statistics made easy

Library Hi TechStatBase: library statistics made easyAlexandria Payne John Curtis

Article information:To cite this document:Alexandria Payne John Curtis , (2014),"StatBase: library statistics made easy", Library Hi Tech, Vol. 32 Iss 3pp. 546 - 559Permanent link to this document:http://dx.doi.org/10.1108/LHT-04-2014-0031

Downloaded on: 27 November 2014, At: 01:48 (PT)References: this document contains references to 24 other documents.To copy this document: [email protected] fulltext of this document has been downloaded 99 times since 2014*

Users who downloaded this article also downloaded:Teja Koler-Povh, Matjaž Mikoš, Goran Turk, (2014),"Institutional repository as an important part of scholarlycommunication", Library Hi Tech, Vol. 32 Iss 3 pp. 423-434 http://dx.doi.org/10.1108/LHT-10-2013-0146Anilkumar Hanumappa, Mallikarjun Dora, Viral Navik, (2014),"Open Source Software solutions in Indianlibraries", Library Hi Tech, Vol. 32 Iss 3 pp. 409-422 http://dx.doi.org/10.1108/LHT-12-2013-0157Ian Chan, (2014),"Leveraging student course enrollment data to infuse personalization in a library website",Library Hi Tech, Vol. 32 Iss 3 pp. 450-466 http://dx.doi.org/10.1108/LHT-07-2013-0096

Access to this document was granted through an Emerald subscription provided by 404468 []

For AuthorsIf you would like to write for this, or any other Emerald publication, then please use our Emerald forAuthors service information about how to choose which publication to write for and submission guidelinesare available for all. Please visit www.emeraldinsight.com/authors for more information.

About Emerald www.emeraldinsight.comEmerald is a global publisher linking research and practice to the benefit of society. The companymanages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well asproviding an extensive range of online products and additional customer resources and services.

Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committeeon Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archivepreservation.

*Related content and download information correct at time of download.

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 2: StatBase: library statistics made easy

StatBase: library statisticsmade easy

Alexandria PayneDepartment of Digital Services & Information Technology,

Newport News Public Library System, Newport News, Virginia, USA, and

John CurtisDepartment of Technical Services, Newport News Public Library System,

Newport News, Virginia, USA

Abstract

Purpose – The purpose of this paper is to detail a Library open source software (OSS) developmentproject resulting in the launch of StatBase, a statistical gathering and data visualization tool, so thatorganizations can adopt a locally managed alternative to costly data aggregation tools.Design/methodology/approach – This case study is based on a literature review, Agile developmentframework, and user experience modeling. The software solution features a Joomla framework withcontributed modules and open source architecture.Findings – This case study demonstrates the creation and practical implementation of a scalable OSSplatform for data management and analysis.Practical implications – Provides a frame of reference and methodology for libraries, both publicand academic, seeking to implement a web-based resource to gather, organize, and interpret statisticalmetrics via a centralized, lightweight, open source architecture.Originality/value – This case study provides a detailed scope and step-by-step technology processdescription by which an organization can adopt or model the StatBase solution for business metrics.

Keywords Library management, Systems analysis, Public libraries, Data analysis,Design and development, Technology led strategy

Paper type Case study

IntroductionInstitutions are tasked with monitoring their efforts in subjective, outcome-based,as well as objective, numerical-based, assessment which must then translate intoa collective measurement determining an organization’s respective success. In orderto do this, an organization must gather, consolidate, and manipulate large volumesof descriptive statistics. From this mass metric, the information organization must thenattempt to provide usage trends and insight into the viability of business processes.StatBase was designed to help meet these demands.

BackgroundThe Newport News Public Library System collects standardized metrics for each coreservice area, including Reference, Circulation, Technology Services, and PatronRegistration. From 1999 to 2011, roughly 3,698 statistical entries were collected andstored in various spreadsheet formats resulting in over 170 individual documentsdeposited in folders on a shared file server. Essentially, the Library’s data hierarchyconsisted of individual metric categories, per Library Branch location, per fiscal year.So, each metric within a fiscal year required a data set representative of location, type,span (date or range), and sum.

Statistical entry prior to the adoption of StatBase was served via a Visual Basic (VB)enhanced Microsoft Excel worksheet collection. The workflow for data entry within

The current issue and full text archive of this journal is available atwww.emeraldinsight.com/0737-8831.htm

Received 1 April 2014Revised 12 May 201430 May 2014Accepted 25 June 2014

Library Hi TechVol. 32 No. 3, 2014pp. 546-559r Emerald Group Publishing Limited0737-8831DOI 10.1108/LHT-04-2014-0031

546

LHT32,3

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 3: StatBase: library statistics made easy

this system required meticulous scrutiny; consequently, the process was timeconsuming with, depending on the level of data depth, between one and four hoursdedicated to the average user entry workflow. To begin the process, users wererequired to access a series of navigational menus, each providing button-based topicalselections, stored in separate dedicated spreadsheets in order to input data into anarray of subsequent location-specific spreadsheets. Post-entry, the Library’s userswould then glean statistics from the multiple Branch-specific spreadsheets for each ofthe various core service area categories. This process would invariably result inanother set of spreadsheets so that users would have a space to combine data sets inorder to view multiple locations’ information.

A primary issue with the Library’s reliance on spreadsheet based statistical entryderived from the numerous options a user was confronted with upon accessing,and throughout, the data entry workflow. With no clear navigational delineation,ambiguous instructional notations, and obscure outputs, the process appearedas if mired in inconvenience. A secondary concern was the necessity of single userpoint-of-access due to the nature of the document management and hosting via the fileserver. Additionally, and most importantly, the spreadsheet method lacked the abilityto deliver on-demand access to queried data sets and did not offer any form ofscalability without comprehensive architecture or code revision.

Literature review. Decades of previous scholarship debate which library statisticsmerit serious analysis. As far back as the late 1960s, academic librarians questioneddefinitions and standards, aggregation methods, and utility of library statistics.Therefore, the core of historical library scholarship centers on what should be measuredrather than how measurements should be contained or used in illustrated analysis.

Eli M. Oboler (1967) article titled “Academic library statistics revisited” publishedin College & Research Libraries, provides a historically relevant criticism regarding thelack of uniformity in defining statistical categories and warns that inaccurate statisticsare more harmful than non-existent data. Additionally, Oboler’s article suggests thatnon-book formats, such as microforms, incorporated into collections foreshadowdata-collection challenges for now present-day public libraries attempting to trackincreasingly diverse materials.

Similarly, N. Radford’s,“The Problems of Academic Library Statistics,” released in a1968 issue of Library Quarterly, calls for greater uniformity in published statistics; theauthor’s focus centers on major publications about academic libraries by federalentities such as the US Office of Education and large professional organizations like theAmerican Library Association and the Association of Research Libraries. In laterdecades, some of the same inquiries persist, as shown in Yan Quan and Zweizig, 2001Library Quarterly article titled “The Use of National Public Library Statistics by PublicLibrary Directors” and AM Schrader’s“400 Million Circs, 40 Million ReferenceQuestions: What Does This Mean and Does Anybody Care? Getting Beyond LibraryStatistics to Library Value with Help from Canada’s National Core Library StatisticsProgram,” released in a 2006 publication for Argus.

As many present-day libraries confront a growing demand to demonstrate valueand return on investment, acknowledging the need for library statistics is nowubiquitous; however, few case studies have been published that address the need fortools that gather and display data or propose a model for the tool itself. Furthermore,public libraries seem almost removed from the conversation, with the possibleexception of some discourse regarding reference statistics. Consequently, referencelibrarians, perhaps horrified by the continued use of pencil marks on paper as the

547

StatBase: librarystatistics

made easy

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 4: StatBase: library statistics made easy

predominant data gathering tool, have contributed robust conversation about bothdata analysis and gathering, as demonstrated in works by Smith (2006), Meserve et al.(2009), Aguilar et al. (2010), and Garrison (2010).

Most relevant to this case study are Morton-Owens and Hanson, 2012 InformationTechnology & Libraries article, “Trends at a Glance: A Management Dashboard ofLibrary Statistics,” and Wiegand and Humphrey, 2013 Code4Lib Journal contribution,“Visualizing Library Statistics using Open Flash Chart 2 and Drupal.” Both articlesmaintain a focus on data visualization tools built from open-source components.Hanson and Morton-Owens detail a “dashboard” approach to the visualization oflibrary metrics, albeit in a medical school library setting and aimed at librarians andlibrary administrations for strategic planning purposes. Hanson and Morton-Owens’impressive case study also shares code and open-source building blocks, but, despitethe claim that programmers of “moderate experience” should be able to replicate muchof the article’s work, the project for the NYU Health Sciences Library utilizes some“homegrown” software and local resources that are likely beyond the reach of mostsmall and medium-sized public libraries due to staffing limitations. Similarly, atUNC-Wilmington, Humphrey, and Wiegand detail their work to make traditionallibrary data more compelling for non-librarians, university administrators inparticular. While admitting mixed results, Humphrey and Wiegand’s overall effortsyield positive outcomes, and their case study embodies the open source ethos bysharing code examples and specific tools used to create their data visualizations.

AssessmentRecognizing the need for a more efficient data management methodology, and lackinga widespread or formalized method to reference, the Newport News Public Libraryconducted an analysis of the data entry and statistical management workflows in orderto identify potential improvements and assess future needs. Specifically, the Librarysought out user feedback and conducted workflow assessment with the spreadsheetentry method prior to forming any conclusions regarding alternative resources. Theresults of the assessment led to the formation of the desired basic criteria for a systemicdata entry and visualization tool.

To begin, a System Services and Design Team (SSDT) consisting of the Library’smajor stakeholders, including the Digital Services and Support Services managers as wellas an Information Technology Analyst, was formed with the goal of evaluation, planning,and execution of a more efficient data workflow. The SSDT then divided the primaryassessment responsibilities of research, user experience (UX), and documentation so thateach facet of analysis was treated by an appropriate subject matter expert. The productresearch, headed by the Digital Service Manager, consisted of identifying and explaininghow the spreadsheet system was established, maintained, and put to use by the Library.The Support Services Manager, leading the UX effort, sought out opinion, evaluatedresponses, and identified workflow behaviors. Finally, the project documentation wasconsolidated and maintained by the Information Technology Analyst. Via a recurringweekly meeting format, the group compared assessment results, incorporated feedback,and strategized further research opportunities in order to formulate the basic criteria anddesired outputs for management of the Library’s statistics.

Workflow evaluationEssentially, with a VB provided graphic user interface (GUI) for the menu navigation,the spreadsheet system’s entry options were made accessible after enabling VB

548

LHT32,3

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 5: StatBase: library statistics made easy

permissions (macros) each time a file was opened. Consequently, upon menuor sub-menu access, the user would also be prompted to save the document.The recurring save-prompt is due to the action of clicking on a button selectionwhich opened a sub-file within the VB nested spreadsheet environment and constituteda file revision. Within the workflow, the “main menu” or spreadsheet No. 1 wouldtransport users to a specified location’s “sub menu” within spreadsheet No. 2 whichlisted the available data categories. After navigating to the correct location array, theuser would then select the individual metric set for data entry. So, users entering for the“Administration” location could then select the “Circulation” data category. Withinthe circulation category, the user would then encounter a sub-sub-menu allowingoptions for data manipulation. From this tertiary menu, options for recording,saving, or resetting data were presented alongside the option to view the raw dataset. In other words, after the data category selection, the user would access asubsequent sub-menu, and then begin entry on the corresponding spreadsheet viaa VB enhanced form (Figure 1).

Data compilation required a series of menu navigation choices in orderto achieve various data displays. Users seeking to view multiple data sets wererequired to import data from various spreadsheet arrays into a separate “master”spreadsheet, thus creating a high level of data redundancy. For example, to viewsystem-level circulation data, information from each branch-specific spreadsheetarray would be imported into a separate system-wide spreadsheet. Informationwould then be cross-populated and stored simultaneously in several files.

Throughout this entry and compilation process, each document would necessitateseveral revisions, and the content would be saved and re-saved repeatedly foreach user. Consequently, this method of data curation engendered a high level ofvulnerability, since any user could rewrite or save over a data set either intentionally,as with revisions, or in error. The fidelity of each source document was madesusceptible by the numerous overwrites throughout the entry process across multipleversions of the software application. Over time, this loss of fidelity created errorsin formulae, formatting, and display.

Main Menu

Sub Menu

Tertiary Menu

Figure 1.Legacy Workflow

549

StatBase: librarystatistics

made easy

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 6: StatBase: library statistics made easy

Assessment conclusionAlthough the Library’s data backups were conducted for all systems daily, the LibraryDepartment’s file data, incorporated alongside and within the city’s data packaging,was written entirely to offsite tape storage. Complete directories of content werelumped onto the same tape, making retrieval in the event of a network or hardwarefailure a lengthy process involving multiple inter-departmental personnel. If theversion in the backup differed from the version inadvertently lost or damaged, datarecovery would be compromised. If the user simply forgot the title of the wanted file,recovery would also be compromised.

Therefore, the location and comparison of Library data, either within a fiscal yeardata set or between multiple categorical data sets, was difficult since the data werelargely fragmented into various file and folder containers. If a user did not know allthe parameters of the data set, such as filename, fiscal year, or service unit, theywere at a disadvantage in locating information. Additionally, users were forced tocreate separate spreadsheets for analysis of data contained in source files and localaccess to the network directory via select manually configured workstations was theonly means by which users could open these files.

Over time, this antiquated system of localized document manipulation perpetuatedduplication of effort and exhausted storage resources. The spreadsheet storageresource did not have the ability to host a self-sustained, or parallel, data visualizationcomponent. Spreadsheet modifications, across software versions, frequently resultedin corrupted formatting.

Given the legacy (spreadsheet) system’s shortfalls, including issues with multipleuser access, backup and recovery, and custom data display options, the SSDTconcluded that a system design and deployment effort was mandated. Specifically, theteam addressed the need for a lightweight, centrally managed, web-based resourcewhich would be facilitated by an open source architecture further minimizing costand staffing allocations. In order to achieve this goal, the SSDT, based on the dataentry workflow assessment, determined that the new system should ideally meet thefollowing criteria: have a content management system (CMS) structure; consist ofan open source (OSS) and Windows, Apache, MySQL, PHP (WAMP) framework; besupported by an active development community and well documented; have a definedrelease schedule; have native functionality for use of add-ons, plug-ins, or contributedmodules; and be highly scalable (Figure 2).

Candidate evaluationDevelopment methodologyWith the basic system criteria in hand, the SSDT began the newly termed StatBaseProject with specific guidelines for how the effort was to progress. The team first preppedthe historical files by ingesting the spreadsheet data sets into a MySQL format whichcould then be used as a springboard for application development using actual records.This groundwork planning enabled the use of a controlled data environment that servedas a lynchpin for integration into the WAMP framework. With a database ready to launchalongside most “out of the box” open source software (OSS) CMS solutions, the teamquickened the timeline needed for research and headed into the development arena.The SSDT negated a burdensome “what if” question (Can the data be made relational?)by using orphaned spreadsheet data to develop its new host environment.

Using elements of the Agile development process as well as strategies gleaned fromthe review of available Information Science literature, the team sketched a course for

550

LHT32,3

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 7: StatBase: library statistics made easy

project development and delivery, including a phased software review, and deploymentschedule. The team sought to identify promising OSS frameworks, create a betaenvironment for testing, and then evaluate the results via workflow demonstration andUX feedback. Ultimately, the project outline consisted of: a CMS candidate pool; aresearch and development (R&D) evaluation; and a final selection analysis (Figure 3).

Maintaining a UX perspective, the team embraced the feature driven development(FDD) approach lent by the Agile process. As Grant Cause (2004) explains, FDDassures that each primary facet of the software distribution aligns with a businessoriented goal, thus ensuring the highest level of project value and return on investment.Likewise, the SSDT felt that this was the ideal tactic to developing a workflow resourceto accommodate a specific business objective, namely the statistical aggregationcomponent. By anticipating and outlining user needs through the use of the workflowanalysis, the team sought to correlate software features with project goals in order toaddress the specific strengths and weaknesses of each potential candidate frameworkduring the development period.

For the candidate pool, the team selected the three most popular WAMP CMSarchitectures: Drupal, WordPress, and Joomla. According to a W3Techs 2013 survey,these candidates collectively account for more than 74.3 percent of the active web sitemarket share, and the team intuited that the pool provided the broadest spectrumof available, ready-to-deploy, OSS systems. At the outset, the candidate poolfulfilled several core criteria, including the OSS architecture and active development

Drupal

Wordpress

Joomla

Candidates Evaluation Selection

Goals FeaturesAnalyzeResults

Figure 3.Project Methodology

Figure 2.Open Source Architecture

551

StatBase: librarystatistics

made easy

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 8: StatBase: library statistics made easy

community requirements. Moreover, the CMS candidates shared similarities in featuressuch as community contributed code and installation requirements.

DrupalThe SSDT began the candidate evaluation with Drupal version 7.9 during a phasenamed Trial No. 1. Drupal offered a promising framework due to the node format,content types, and module availability. Drupal nodes are a core feature for the CMS, asnodes allow for all content to be categorically represented thus allowing developers theflexibility to define node-based content types that more accurately reflect theinformation structure via nomenclature. For example, using nodes, a developer couldcontent type and store information consisting of numerical values as “price,” “date,”“age,” “height,” or any number of more meaningful categorical representations.Through the node format, the CMS gains the ability to recognize that “price” wouldhave different behaviors than “date,” and each content type would be published andpermissioned according to the administrator’s previously established guidelines.Consequently, Drupal modules build upon content typed data to further render,organize, and display information according to function.

Unfortunately, the very strengths that appeared initially attractive becameweaknesses when applied to the project’s specifications and development environment.Having a balanced perspective and managing expectations of progress is key. At theoutset of the trial, “the ‘Drupal way’ begins to dominate your lexicon as you balancebetween solving a problem with a quick solution or solving a problem with the mostflexible solution” (Mitchell, 2013). Rare is the project where time is not a concern, andDrupal fast monopolized project resources. Data hierarchy creation involved morecustom code and PHP database manipulation than the developers were comfortableallocating. Furthermore, many Drupal modules, such as Webform, are distributed withprerequisite modules and/or code libraries (CiviCRM and jQuery TokenInput) in orderto enable full functionality. This level of customization was deemed problematic sincefuture stability would be impacted by modular revisions. According to a Drupal projectweb post, there exist “several API changes that may make add-on modules incompatiblewith the new version until they are updated” (https://drupal.org/project/webform).

Webform is perhaps the best example of how Drupal failed to align CMS functionwith the project’s stated goal of scalability and native module functionality. Althoughthe module’s “results can be exported into Excel or other spreadsheet applications” andit “provides some basic statistical review” (https://drupal.org/project/webform), Webformessentially digests the data into separate tables and uses features to render the dataaccording to established function (Table I). The table illustrates how the module forcespopulation of data into module-defined tables eliminating the potential to make useof archived data sets from the test database. Webform maintained the ability to createforms, but the module lacked the capability to connect data to tables outside of the onesthat were module-created.

Other modules tested include: Data Table, which provides the ability to adopt atable but prevents association of the table with a data collection method; Form Builder,which lends the ability to create forms but disables the configurations page and “doesnot provide any permanent mechanism for saving data with the created forms”; andTablefield and Tableform, modules that have the ability to create a table throughHTML and CSS in a form but do not give the ability to import tables from a database(https://drupal.org/project/form_builder). These modules showed initial promise, butthe native functionality of each proved insufficient to host and manipulate data to and

552

LHT32,3

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 9: StatBase: library statistics made easy

from the test database. The use of node relationships throughout the Drupal CMS madeassociation of a form to pre-established relatable tables difficult. The team simply couldnot quickly find a built-in, non PHP-saturated, method to integrate an existing SQLdatabase into the Drupal schema. Due to this inability to create a connection from amodule-created form to a pre-established table, Drupal was unable to fulfill theproject’s required goals.

WordPressFor Trial No. 2, the team proceeded to the next evaluation CMS candidate, WordPress.While WordPress is primarily a blogging platform, the team observed that the CMShas powerful entry and data management tools. Additionally, WordPress enjoys abroad-based development community and a dedicated patronage amongst amateurand professional digital media consumers alike, including Forbes and CNN. WordPressallows for custom post types, form generator plug-ins, and has a sophisticated API.

While custom posts allow content to be defined and grouped, the form generatorplug-ins allow real flexibility by providing WYSIWYG-like, almost effortless, formcreation. Using a drag-and-drop interface, the user is able to frame out web forms usingonly a few mouse clicks and definition entries. Content is generated on the end-userside and is then inserted into tables based on the post attributes such as name, date,and category. These posts are intended to be iterative and non-static, so formgeneration outside of contact and feedback capabilities is somewhat limited. Contentis also designed to be natively placed and accessed via a post, so users cannotdefine separate “displayed content types,” only the post category with content typesemployed as filler.

For example, the team evaluated several modules to bring more extensibility to thestandard WordPress distribution using plug-in utility rather than custom coding in thecore: Contact Form 7 and the DB Extension package allowed assigned attributes, but

Component Component Name Description

Table node Establishes the node id (NID) as a primary key; the type andname of data are defined in this table

Table webform Relates each created form with a NID; links data to contenttype

Table webform_component Relates each NID to a component id (CID) or field nameTable webform_submissions Relates each NID to a submission id (SID); established a

date-time and IP address for each form submissionTable webform_submitted_data Relates and places the data input for each field associating

it with the NID, CID, and SID. All form data is contained inthis table and is related to webform, webform_component,and webform_submissions

Feature webform conditional Gives the ability to set a condition on previously definedfields; function is limited to only dropdown selectable fields

Feature webform default fields Gives the ability to set default entry fields for articles, basicpages, and panels. The feature does not allow for thecreation of fields that hold default information that can beedited and used in multiple forms

Feature webform export Gives the ability to export forms to a delimited fileFeature webform import Enables import of data from a delimited file into a

pre-established web form

Table I.Evaluated Drupal

Modules

553

StatBase: librarystatistics

made easy

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 10: StatBase: library statistics made easy

on the front-end only via a pre-populated interface; Contact Form Manager onlyallowed template selections; Library Custom Post Types provided pages that were“content typed” but didn’t allow or masked database modifications for the assignedattributes; Visual Form Builder allowed attribute placement via a drag-n-dropinterface, so no hardcoding was necessary, but the attributes could not be allocatedspecific assignment in the database schema due to predefined field typing. Thesemodules all failed to fully integrate with the test database design.

Consequently, WordPress, while under development, quickly began to illustrateshortcomings when attached to the test database. Most data management inWordPress is intended to be end-user generated and imported via the static orversioned post. WordPress failed versioning deprecation with the available web formplug-ins, and development for new releases appears cautious at best. While one of thestrengths of WordPress is its stability, that stability comes at a cost to the developmentof more robust database plug-ins and expansion code. Also, WordPress has only hadfour major versionings since inception, compared to standard yearly releases for theother CMS options (both proprietary and open source). The plug-in extensionsare fiercely moderated, so the code is highly stable, but custom options do not farewell with upgrades or changes to the core via plug-in or other modifications. In fact,“the official WordPress distribution only supports the MySQL database engine,” so,integrating with future data sources, and accommodating the project goal of scalability,was deemed unlikely (https://codex.wordpress.org/Using_Alternative_Databases).Ultimately, the high numbers of available plug-ins mostly address cosmetic issuesrather than core functionality or enhanced schema extensibility.

WordPress presents a low maintenance, easy to deploy, installation, with a robustplug-in library. Furthermore, the CMS offers a light core file capacity, requires minimalsystem requirements, and maintains a straightforward site architecture. However, some ofthe CMS drawbacks include: heavy system use for routine operations (i.e. page refreshingand caching); intensive database queries; pre-defined locations for attributes andelements; and, little flexibility for content management outside of the post. Contentcreation is central to the WordPress platform, but importing previous generation contentinto a template/form format is not supported without a large amount of custom code.

JoomlaProgressing to Trial No. 3, after encountering two successive development failures, theSSDT began to question the use of a legacy database schema import into an OSS CMS.However, the decision was made to proceed through the third and final candidate trialprior to altering the data set or database schema so that a consistent test environmentcould be employed to equitably assess all candidate systems. So, the team beganJoomla installation using the same MySQL test database as previous trials.

Joomla provides the same lightweight framework as WordPress coupled with theability to add extensive functionality and granular customizations through add onmodules as with Drupal. Joomla has been adopted by leading cultural and educationalinstitutions, such as Harvard University and the Guggenheim Museum, as well asleading corporate establishments, like IHOP and Citibank, so the CMS met severalproject goals at the outset, including the robust extension library and release schedule.However, for the purposes of the StatBase project, what set Joomla apart from itspeerage is not a core feature but its contributed module extensibility. Joomla offereda wide selection of contributed extensions with little to no dependency as well as theability to nest published articles creating a native hierarchy within the navigation.

554

LHT32,3

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 11: StatBase: library statistics made easy

Using the ChronoForms extension, the team rapidly incorporated basic formelements (such as textboxes and hidden fields like “date” and “creator”) to construct aweb form outline which could then be applied to all data entry scenarios within thesystem via Joomla’s published article content type. ChronoForms offers advancedelements, like Panes, Panels, and Page Breaks, further expanding the format.Specifically, the Form Wizard edit options contain all the fields with the form, so to editone of the form elements, one has only to select the configure button on the desiredelement and enter the parameter(s) needed to either pull or push the data to the tables.Basically, creating a data entry or data display form merely required the creation ofsimple HTML to define the form space and PHP to facilitate the assignment of dataattributes to specific form elements, thus customizing the representation of the existingtable structure (Janes, 2010).

Custom code allowing for concise HTML and PHP entry, and tool tips andvalidations specific to each form element, made for the rapid deployment of a completetemplatized user interface for data entry. Specifically, the ChronoForms extensionallows the option for legacy records to populate within the form for edits, which serveda dual purpose in the StatBase environment: to ensure that users maintained a methodfor data revision and to accommodate the deletion of inaccurate data post-submission.The code snippet below illustrates the simple HTML/PHP structure of the form as wellas the reference to the database and “Overdue” table:

o?php$idform¼ $_GET[0idform0];$today¼ date(0y-m-d h:i:s0);$user¼ JFactory::getUser( );$name¼ $user-4name;$db¼& JFactory::getDBO( );$query¼ 00UPDATE overdue SET `modified¼ 0$today0, `modified_by¼ 0$name0

WHERE idform¼ 0$idform0;00;$db-4setQuery($query);$db-4query( );?4

The above code queries from the fiscal year data set in the test database and locates thetable(s) where data entry will be completed and stored. In this case, informationregarding the number of overdue (billing) notices for a particular month is recorded.For the subsequent data display article, where users are directed upon entrycompletion, similar coding is employed in order to query data into a table format forverification and potential editing.

Plotalot (pr. plot-a-lot) is an additional Joomla extension that fulfilled a key projectobjective, allowing the team to incorporate simple yet effective data visualizations intothe CMS. Using SQL queries to call data into standardized table and chart formats,Plotalot renders information in its host articles and provides colorful, easy tounderstand, bar or pie chart representations of data sets based on table content.The extension makes use of the JavaScript-based Google Visualization API in order torender responsive and dynamic data displays that are largely cross browsercompatible. The extension also provides an alternative method to view recordspost-entry through interactive hover and click options for each data display.

By happy coincidence, the team also discovered Joomla’s Akeeba Backup Coreextension. The Akeeba Backup Core utility provides the administrator with the abilityto schedule backups and restore data. Though not a project requirement, Akeeba

555

StatBase: librarystatistics

made easy

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 12: StatBase: library statistics made easy

Backup Core creates, either via scheduled CRON job or ad hoc command, a full orpartial data archive thus ensuring a high level of system stability. Users have theoption to granularly assign backup actions to specific files, folders, or databases. Witheasy-to-deploy backups, a full site restoration in the event of a hardware or networkfailure is much timelier. Through the use of the Akeeba Backup Core extension, theteam was able to package and clone the entire software distribution for installation onlocal staff workstations for UX testing.

Beta testing via a UX group, a select group of departmental super-users, ensuredlimited implementation or post-production discontinuity. The UX effort allowed thecontinuation of the FDD process by providing the team with enhancement requests.The SSDT would evaluate each enhancement request in relation to the project goals forpotential inclusion into the software build. When necessary, the team revisited thedevelopment effort in order to accommodate the requested modification(s), and thenanother round of evaluation and testing via the UX group was completed. This cyclicalapproach mitigated quality assurance and ensured user buy-in during the launchphase. Throughout UX testing, the team maintained the philosophy that any mandatedsoftware resource constitutes a workflow disruption, and, in gaining refinementsbased on actual staff input, the team ensured that this unavoidable change agent wasassociated with positive expectations of performance.

Several key software enhancements were completed as a result of UX feedback:navigation structure, article linking and redirect, permissioned views, and an “exportsearch” function. UX feedback became a critical success indicator pre-launch, as manyfeatures used in real-time data entry were unique to the web environment and had notbeen previously documented as part of the legacy workflow assessment. Specifically,the UX-suggested permissioned views limit user access to the articles/forms thatare associated with users’ workgroups, thus minimizing unnecessary/clutteredinformation displays. And, the “export search” function allows a categorical dataquery of fiscal year, form name, unit location, and/or month attributes coupled with theoption to export the full queried data set.

ConclusionThe StatBase project was completed in approximately 12 months. From data discoveryand workflow assessment to developing a database schema and host CMS, the projectteam capitalized on the Agile framework to rapidly deploy a critical organizationalresource. Assessment of the candidate CMSs, underpinned by Agile’s feature drivendevelopment, focussed the software enhancement and deployment processes on userbehaviors and project outputs. Moreover, the UX perspective and developmentfeedback allowed for rigorous beta testing.

While Joomla yielded the largest area of overlap between the project’s statedgoals and CMS native functionality, the other evaluated CMSs offered insights intodata migration and management. Drupal testing resulted in a better understanding ofhow to leverage data schemas, and the WordPress trial lent the team a greaterappreciation of overall site esthetic and the use of templates to minimalize usernavigation efforts. Leveraging Agile to learn from development disruptions, the teamtransformed testing failures and incompatible user behaviors into actionable systemmodifications involving targeted content rendering and permissioning.

In focussing on CMS extensibility in relation to project goals, the team recognizedthat a paid product could possibly deliver the same features, but at an unacceptablelevel of cost and loss of customization in relation to the importance of the workflow.

556

LHT32,3

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 13: StatBase: library statistics made easy

Modern libraries are constantly in a state of change and becoming increasinglydata driven, and library organizations must continue to modify their data in order togather relevant insights. Therefore, the StatBase project aimed to deliver a feature-richworkflow tool that would enable easy, scalable, additions to the core frameworkensuring future use at minimal cost.

Implementation of StatBase within the Library Department coincided with thebeginning of the municipal fiscal year. Product integration with the data entryworkflow was aided by 30 minute staff training sessions at each Library locationimmediately prior to, and during, the software launch. Documentation and a frequentlyasked question (FAQ) section provided staff with supporting information, and a systemoverview was incorporated into the Department’s staff-orientation procedure, allowingall new staff a general understanding of the Library’s key metrics and driving businessoutputs. These training efforts provided staff with additional support during thetransition from a document-based to web-based data management system.

According to staff supplied feedback, the implementation of StatBase directlyreduced the data entry workflow effort by an estimated 50 percent. Prior to theweb-based utility, the document management and data entry workflows required aper-branch obligation of approximately eight hours per month with an additionalthirty hours for fiscal year close out and data review, totaling over 500 hours annuallyfor the Library System. The StatBase resource shaved this workflow commitment tofive hours per month, per branch, plus eight hours for system-wide fiscal rollover,totaling an estimated 248 hours for Library System data management annually. Keydifferences in system wide versus branch maintained data sets, in addition to the fiscalyear rollover workflow, contributed to these time savings. Effort reduction is alsoattributed to: persistent data integrity, simultaneous data access, and streamlined datadisplays. StatBase eliminated the risk, and therefore the time spent in correction, ofdata corruption from month-to-month entry by offering web form entry accessedaccording to established login permissions and by reducing the quantity of single point(field specific) entry points. Upon user login, branch specific information auto-populates, saving users keystroke entry time, and eliminating redundant data storage.

In addition to the savings in staff time, StatBase offers the unexpected advantage ofmetadata. Time stamped entries allow for the isolation and review of post-date, orrevision, actions with regards to data. So, if data entry occurs past a valid timeframe or isrevised by an alternate user, this information can be isolated in the administration tableview and edited or removed without any required action from the original or subsequentauthors. Additionally, the system has allowed Library administrators and managers amacro level view of departmental productivity. In particular, one Library managerexported system metadata to gauge cataloging productivity using non-standard metricsets, like record creation frequency. Using StatBase data sets, the manager created afuture projection using several fiscal years’ worth of statistics to estimate the number ofitems cataloged for the coming months. This data were then broken down according toitem class, which in turn allowed the manager to identify future staff workload by servicearea, estimate periods of inactivity, and advise on increased or reduced shelving needs.Using a linear regression model, the manager exploited the title additions and discardsdata sets to chart future resource allocations and establish relationships between fiscalyear spending, collection size, and staff effort.

StatBase has allowed insight into seemingly diverse and unrelated data sets. Whiledoor counts and patron events seem directly associated, less obvious is the correlationbetween computer classes and electronic resource or typewriter use. Discovering

557

StatBase: librarystatistics

made easy

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 14: StatBase: library statistics made easy

connections between metrics, and measuring the relevance of these associations, is akey advantage in having a malleable data management resource.

Ultimately, StatBase grows with the data, making the collection effort an easierprocess. The StatBase system allows customization of web entry forms so thatlibraries can track, change, and visualize data. Prebuilt forms include: Reference,Door Count, Patron Registration, Circulation, Outreach, Interlibrary Loan,Acquisitions, Finance, and Instructor-Led Courses. Moreover, StatBase has severaladditional features that allow core functionality customization: editable web forms andfields; user permissioning by unit or service location; immediate data visualization;export option for advanced reporting or ingest; and, synchronous user access. StatBaseis made available under the Creative Commons Attribution-ShareAlike 3.0 UnportedLicense. Organizations are free to use, share, and even modify the software packageprovided they attribute and share the work under a similar license.

References

Aguilar, P., Keating, K. and Swanback, S. (2010), “Click it, no more tick it: online referencestatistics”, Reference Librarian, Vol. 51 No. 4, pp. 290-299.

Janes, B. (2010), ChronoForms 3.1 for Joomla! Site Cookbook: 80 Recipes for Building Attractiveand Interactive Jooomla! Forms, Packt, Birmingham.

Grant Cause (2004), “Delivering real business value using FDD”, Methods & Tools, available at:www.methodsandtools.com/archive/archive.php?id¼19 (accessed February 12, 2014).

Garrison, J.S. (2010), “Making reference service count: collecting and using reference servicestatistics to make a difference”, Reference Librarian, Vol. 51 No. 3, pp. 202-211.

Meserve, H., Belanger, S., Bowlby, J. and Rosenblum, L. (2009), “Developing a model for referenceresearch statistics: applying the “warner model” of reference question classification tostreamline research services”, Reference & User Services Quarterly, Vol. 48 No. 3, pp. 247-258.

Mitchell, F. (2013), “The 7 Stages of Drupal’s Learning Curve”, available at: http://sixrevisions.com/web-development/drupal-learning-curve/ (accessed February 13, 2014).

Morton-Owens, E. and Hanson, K. (2012), “Trends at a glance: a management dashboard oflibrary statistics”, Information Technology & Libraries, Vol. 31 No. 3, pp. 36-51.

Oboler, E. (1967), “Academic library statistics revisited”, College & Research Libraries, Vol. 28No. 6, pp. 407-410.

Radford, N. (1968), “The problems of academic library statistics”, Library Quarterly, Vol. 38 No. 3,pp. 231-248.

Schrader, A. (2006), “400 Million circs, 40 million reference questions: what does this mean anddoes anybody care? Getting beyond library statistics to library value with help fromcanada’s national core library statistics program”, Argus, Vol. 35 No. 1, pp. 15-22.

Smith, M. (2006), “A tool for all places: a web-based reference statistics system”, ReferenceServices Review, Vol. 34 No. 2, pp. 298-315.

Wiegand, L. and Humphrey, B. (2013), “Visualizing library statistics using open flash chart 2 anddrupal”, Code4lib Journal, No. 19, pp. 1-11.

Yan Quan, L. and Zweizig, D. (2001), “The use of national public library statistics by publiclibrary directors”, Library Quarterly, Vol. 71 No. 4, pp. 467-497.

Further reading

Bauer, M. (2005), “Successful web development methodologies article”, available at:www.sitepoint.com/successful-development (accessed February 12, 2014).

Brannon, S. (2011), “National Public Library Statistics: A Literature and Methodology review,1999-2009”, Library Student Journal, Vol. 6, January p. 1.

558

LHT32,3

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)

Page 15: StatBase: library statistics made easy

Dolling, A. and Peppier, C. (2001), “Web-based collection of public libraries statistics”, IFLAJournal, Vol. 27 No. 4, pp. 215-220.

Haug, N. (2004), Webform, available at: https://drupal.org/project/webform (accessed February12, 2014).

Haug, N. (2006), “Form Builder”, available at: https://drupal.org/project/form_builder (accessedFebruary 12, 2014).

Marriott, J. and Waring, E. (2011), The Official Joomla! Book, Addison-Wesley, Upper SaddleRiver, NJ.

Nebulon, Pty. Ltd. (2012), “I.T. Solutions that make a difference”, available at: www.nebulon.com/index.html (accessed February 12, 2014).

Open Source Matters, Inc. (2014), “What is Joomla?”, available at: www.joomla.org/about-joomla.html(accessed February 13, 2014).

Palmer, S. (2009), “An introduction to feature-driven development”, available at: http://agile.dzone.com/articles/introduction-feature-driven (accessed February 12, 2014).

Word Press Org (2013a), “Using alternative databases”, available at: https://codex.wordpress.org/Using_Alternative_Databases (accessed February 13, 2014).

Word Press Org (2013b), Webforms”, available at: https://drupal.org/taxonomy/term/35580(accessed February 12, 2014).

About the authors

Alexandria Payne, a Graduate of the University of Tennessee School of Information Sciences, isthe Digital Services Manager for the Newport News Public Library System. She is primarilyresponsible for the Library’s web-based products, including the web page, digital archive,online public access catalog, vendor supplied e-content, and the web-based statistical contentmanagement system StatBase. Moreover, Mrs Payne coordinates the selection, testing, andimplementation of software new releases and enhancements as well as the evaluation,acquisition, and licensing of all electronic resources. Prior to her role as Digital Services Manager,Mrs Payne worked as a Media Coordinator for Scripps Networks, a freelance KnowledgeManagement Specialist for Gannett Media Technologies International (GMTI) and served asa frequent contributor of biographical and bibliographical abstracts for the Thompson GaleContemporary Authors (CA) project. Alexandria Payne is the corresponding author and can becontacted at: [email protected]

John Curtis, a graduate of the Simmons College Graduate School of Library and InformationScience, is Support Services Manager for the Newport News Public Library System. He is theHead of the Technical Services department, primarily responsible for acquisitions, cataloging,and processing of the system’s print and digital media. Prior to his time with the Newport NewsPublic Library System, Mr Curtis served as Catalog Librarian at the Hampton University.

To purchase reprints of this article please e-mail: [email protected] visit our web site for further details: www.emeraldinsight.com/reprints

559

StatBase: librarystatistics

made easy

Dow

nloa

ded

by U

nive

rsity

of

Cal

ifor

nia

San

Fran

cisc

o A

t 01:

48 2

7 N

ovem

ber

2014

(PT

)