Optimizing Word2Vec Performance on Multicore Systems
Author/Presenter
Event Type
Workshop
Applications
Architectures
Graph Algorithms
SIGHPC Workshop
TimeMonday, November 13th11:55am -
12:20pm
Location507
DescriptionThe Skip-gram with negative sampling (SGNS) method of
Word2Vec is an unsupervised approach to map words in a
text corpus to low dimensional real vectors. The learned
vectors capture semantic relationships between
co-occurring words and can be used as inputs to many
natural language processing and machine learning tasks.
There are several high-performance implementations of
the Word2Vec SGNS method. In this paper, we introduce a
new optimization called context combining to further
boost SGNS performance on multicore systems. For
processing the One Billion Word benchmark dataset on a
16-core platform, we show that our approach is 3.53X
faster than the original multithreaded Word2Vec
implementation and 1.28X faster than a recent parallel
Word2Vec implementation. We also show that our accuracy
on benchmark queries is comparable to state-of-the-art
implementations.
Author/Presenter




