A12: Applying Image Feature Extraction to Cluttered
Scientific Repositories
SessionPoster Reception
Author
Event Type
ACM Student Research Competition
Poster
Reception
TimeTuesday, November 14th5:15pm -
7pm
LocationFour Seasons Ballroom
DescriptionOver time many scientific repositories and file systems
become disorganized, containing poorly described and
error-ridden data. As a result, it is often difficult
for researchers to discover crucial data. In this
poster, we present a collection of image processing
modules that collectively extract metadata from a
variety of image formats. We implement these modules in
Skluma—a system designed to automatically extract
metadata from structured and semi-structured scientific
formats. Our modules apply several image metadata
extraction techniques that include processing file
system metadata, header information, color content
statistics, extracted text, feature-based clusters, and
predicting tags using a supervised learning model. Our
goal is to collect a large number of metadata that may
then be used to organize, understand, and analyze data
stored in a repository.
Author




