Discovery tool log analysis
Purdue University Libraries website
I worked with Dr. Xi Niu at University of North Carolina at Charlotte to integrate qualitative user studies with log analysis and to uncover the contextual information of search tasks.
Introduction
Purdue University Libraries implemented a new discovery tool in late 2012. During its beta testing period, both the new discovery tool (Primo) and existing tool (VuFind) were presented on the libraries website. The beta testing period provided a unique opportunity to study and compare user activities of both tools.
Method
We collected logs from servers of the two discovery tools from November to December 2012. Data fields in the logs from both systems included IP address, date, time, URL, referrer URL, and user agent.
Referrer URL is the page on which the user clicked a link that led to the current URL. User agent is a string that identifies the user’s browser and provides certain system details to servers hosting the discovery tools.
The logs were processed in a Perl script to extract the data fields, and the data fields were further analyzed in SAS and R.
Results
Most users started with the broadest and default search, that is, keyword search.
We compared the facet usage of the two tools. We also noticed that nested facet selections were very rare.
| VuFind | Primo | |
|---|---|---|
| Facet usage in search sessions | 8.4% | 9.7% |
| Top used facets |
|
|
Searches for non-electronic resources (e.g., print books) had longer query length, higher number of query submissions, and higher percentage of reformulated queries than electronic resources (e.g., journal articles).
| Non-electronic resources Mean (standard deviation) |
Electronic resources Mean (standard deviation) |
|
|---|---|---|
| Query length | 5.1 (5.4) | 4.1 (4.0) |
| Number of query submissions | 3.6 (5.4) | 2.6 (2.3) |
| Percentage of searches reformulated | 61.0% | 57.8% |
From the logs we identified three distinct patterns of query reformulation:
- Narrowing: Users narrowed down searches by adding one or several terms to append some specific information, such as content, time, or format.
- Parallel: Parallel movement of searches involves synonym replacement, format change, and spelling correction.
- Mixed: The mixing of narrowing and parallel movement in one search session.
Visualization of the patterns of query reformulation:
Publication
Niu, X., Zhang, T., & Chen, H. (2014). Study of user search activities with two discovery tools at an academic library. International Journal of Human-Computer Interaction, 30(5), 422-433.