Search System Technology

From Dryad wiki
Jump to: navigation, search

The Dryad search system is based on the DSpace Discovery system.

SOLR indexes

  • authority -- terms for autocompletion using controlled vocabularies, including HIVE
  • dataoneMNlog -- log of accesses through the DataONE API
  • dryad -- local storage of DOIs (this index needs to be renamed)
  • search -- primary search index
  • statistics -- log of accesses to item pages and bitstream downloads

Configuration

  • dspace.cfg -- specifies the URLs for the various solr indexes, specifies fields that are used within the search system
  • dspace-solr-search.cfg -- specifies the parameters that are added to queries, both automatic queries and user-generated queries
  • solr/(index name)/conf -- specifies the fields stored in an index, and how those fields are processed

Maintenance

Primary maintenance of the index is performed with:

/opt/dryad/bin/dspace update-discovery-index

Options available:

  • -i <internal_item_id>
    • re-index a single item
  • -f
    • forced every item to be re-indexed, even if it is up-to-date
  • -b
    • rebuild by dropping the index and reindexing everything
  • -r <handle>
    • remove an item from the index
  • -o
    • optimize the solr indexes on disk

For convenience, the dryad-utils package has a script that will update the index for items that were archived during a particular date range.

dryad-utils/reindex-discovery.py --date_from 2017-01-01 --date_to 2017-01-31