Tabbed Searching Technology

From Dryad wiki
Jump to: navigation, search

Dryad harvests records from other scientific data repositories (like KNB and TreeBASE) via OAI-PMH. (See description of the Harvesting process.) Harvested data is put into its own DSpace collection (one collection per harvested resource). We want to be able to provide a search across all these collections but treat the display of them differently that we would just just another search facet.

We want to present each data collection in a separate search tab. The Dryad tab should be the default, but we want to give the user the option to view the results from these other collections. We want to display a hit count so the user will know, without clicking on the tab, how many resources are available from the other collection. This requires some customization of the DSpace Discovery module.

Functionality

Since the additional functionality that we want to provide is something that is, at least at this point, unique to Dryad we wanted to make as few changes to the underlying Discovery module as possible, preferring instead to provide a thin layer over the Discovery module that would treat our harvested collections differently from other collections (and from other search facets in general).

This additional layer is implemented in the Dryad theme's XSLTs and in a few additional !JavaScript files, also kept in the Dryad theme. The values that allow for this functionality are hard-coded values in the XSLTs and !JavaScript files, which works fine for Dryad but would require a change to make the functionality work for a generic DSpace that adopted the same use of collections as Dryad. Because collection IDs are hard-coded, they need to have consistent collection numbers across different Dryad instances.

Workflow

When a user searches Dryad, the Discovery module is used to query a Solr index and return results. This takes place through a mix of Solr and Discovery specific search and display syntax. The tabbed searching that sits on top of Discovery works directly with Solr, avoiding the Discovery module, with the exception that it must take the Discovery query that is embedded in the page's metadata and convert any Discovery specific syntax into the corresponding Solr syntax.

This translation is done so that Solr can be queried directly to get the number of hits for the same search performed against a harvested collection. Parsing the Discovery query syntax is also important because the tabbed search layer must create a URL with the Discovery syntax so that when a user clicks on a search tab s/he performs that search in Dryad (using the Discovery module). The process works like this:

  1. User queries Dryad
  2. Discovery module handles query, returns results, and puts query URL into the page's metadata
  3. Dryad XSLT processes page metadata and uses the functions in the !DryadSearch.xsl file to parse out the elements of the query (deduping when necessary, changing syntax to pure Solr syntax, etc.)
  4. Dryad XSLT puts the cleaned query URL into a class attribute for each tab that will be displayed in HTML
  5. Dryad XSLT associates each tab with a JavaScript file for that harvested collection
  6. Dryad XSLT creates a link back into Dryad with the parameters required to query that harvested collection using the Discovery module (each collection has a collection ID within DSpace that can be passed in as a location parameter to narrow a search to a collection)
  7. The user's browser loads the HTML that is created by the Dryad XSLT
  8. On page load, the browser runs the !JavaScript files associated with a query to each of the harvested collections -- they, each, query Solr for the same search against the a different collection than the one used by the Discovery module
  9. Instead of returning lots of XML results, Solr just returns the wrapper information for such a search, which includes a total number of results found (this is controlled by setting the Solr search paramer rows to 0)
  10. The JavaScript that performed the async search then updates the HTML page to include the hit count (in parameters) in the tab for the externally harvested collection
  11. It ends with the tab containing the number of hits a search would retrieve and a URL for that search to be performed in Dryad; the user can now click on that tab and display the results in the main Dryad interface
  12. This toggling of which tab is active takes places in the Dryad XSLT and is based on the collection IDs that are passed as location parameters in the URL query; these are currently hard-coded values in the XSLT so any additional Dryad instances need to insure that they use the same collection IDs for collections (which should happen automatically when a database is copied from one instance to another)

Configuration

Configuration is hard-coded in the XSLT for now. It consists of associating an externally harvested collection with a particular DSpace collection ID, which is then used in the DryadSearch.xsl

The tabbed search process also references the DryadUtils.xsl and is initiated from template calls in the Dryad.xsl

There is a common JavaScript called solr-common.js and the collection specific values are put into small !JavaScript files to be called for each collection:

* solr-treebase.js
* solr-dryad.js
* solr-lter.js

(There is probably an opportunity for refactoring here: instead of using separate !JavaScript files for each, passing in two parameters into the common file from the XSLT)

Relation to DSpace

The Dryad tabbed search functionality is related to, and uses, the DSpace Discovery module. The version of DSpace used by Dryad currently is 1.6.2. The version of Discovery that Dryad uses is tagged version 0.9.4

Discovery is going to be more completely integrated into the DSpace core going forward (though will remain a distinct module as DSpace moves towards structuring the entire codebase using smaller, interrelated modules).