A11: Finding a Needle in a Field of Haystacks:
Lightweight Metadata Search for Large-Scale Distributed
Research Repositories
SessionPoster Reception
Author
Event Type
ACM Student Research Competition
Poster
Reception
TimeTuesday, November 14th5:15pm -
7pm
LocationFour Seasons Ballroom
DescriptionFast, scalable, and distributed search services are
commonly available for single nodes, but lead to high
infrastructure costs when scaled across tens of
thousands of filesystems and repositories, as is the
case with Globus. Endpoint-specific indexes may instead
be stored on their respective nodes, but while this
distributes storage costs between users, it also creates
significant query overhead. Our solution provides a
compromise by introducing two levels of indexes: a
single centralized "second-level index" (SLI) that
aggregates and summarizes terms from each endpoint; and
many endpoint-level indexes that are referenced by the
SLI and used only when needed. We show, via experiments
on Globus-accessible filesystems, that the SLI reduces
the amount of space needed on central servers by over
96% while also reducing the set of endpoints that need
to execute user queries.




