Portrait of Tao Zhang

Tao Zhang

April 2013 · Purdue · research

Discovery tool log analysis

Purdue University Libraries website

I worked with Dr. Xi Niu at University of North Carolina at Charlotte to integrate qualitative user studies with log analysis and to uncover the contextual information of search tasks.

Introduction

Purdue University Libraries implemented a new discovery tool in late 2012. During its beta testing period, both the new discovery tool (Primo) and existing tool (VuFind) were presented on the libraries website. The beta testing period provided a unique opportunity to study and compare user activities of both tools.

Comparison of the Primo and VuFind discovery tool interfaces

Method

We collected logs from servers of the two discovery tools from November to December 2012. Data fields in the logs from both systems included IP address, date, time, URL, referrer URL, and user agent.

Referrer URL is the page on which the user clicked a link that led to the current URL. User agent is a string that identifies the user’s browser and provides certain system details to servers hosting the discovery tools.

The logs were processed in a Perl script to extract the data fields, and the data fields were further analyzed in SAS and R.

Workflow for processing and analyzing discovery tool logs

Results

Most users started with the broadest and default search, that is, keyword search.

Chart showing use of different search fields in the discovery tools

We compared the facet usage of the two tools. We also noticed that nested facet selections were very rare.

VuFind Primo
Facet usage in search sessions 8.4% 9.7%
Top used facets
  • Format
  • Access
  • Topic
  • Library Building
  • Author
  • Online
  • Peer-Review
  • On Shelf
  • Format
  • Subject
  • Publication Date
  • Library

Searches for non-electronic resources (e.g., print books) had longer query length, higher number of query submissions, and higher percentage of reformulated queries than electronic resources (e.g., journal articles).

Non-electronic resources
Mean (standard deviation)
Electronic resources
Mean (standard deviation)
Query length 5.1 (5.4) 4.1 (4.0)
Number of query submissions 3.6 (5.4) 2.6 (2.3)
Percentage of searches reformulated 61.0% 57.8%

From the logs we identified three distinct patterns of query reformulation:

  • Narrowing: Users narrowed down searches by adding one or several terms to append some specific information, such as content, time, or format.
  • Parallel: Parallel movement of searches involves synonym replacement, format change, and spelling correction.
  • Mixed: The mixing of narrowing and parallel movement in one search session.

Visualization of the patterns of query reformulation:

Visualization of query reformulation patterns across search sessions

Publication

Niu, X., Zhang, T., & Chen, H. (2014). Study of user search activities with two discovery tools at an academic library. International Journal of Human-Computer Interaction, 30(5), 422-433.