Posted On: Jan 18, 2018
You can now use Amazon SageMaker’s BlazingText implementation of the Word2Vec algorithm to generate word embeddings from a large number of documents. Word embeddings represent each unique word in the entire collection of text documents as a vector of numbers. Words that are similar will have similar vectors – that is, they will be close in the low-dimensional space of the embeddings – while words that are less similar will be further apart. This algorithm is used in a variety of Natural Language Understanding (NLU) tasks, such as semantic similarity, sentiment analysis, machine translation, and question-answering. Word2Vec has also recently been used successfully in tasks like recommendation and segmentation, where similar embeddings may denote that, by example, two movies tend to be watched by similar users at similar times. Amazon SageMaker’s BlazingText implementation has been engineered with speed and scale in mind to produce embeddings extremely fast using either GPU or CPU hardware.
The BlazingText implementation of the Word2Vec algorithm is available today in the US East (N. Virginia & Ohio), EU (Ireland) and U.S. West (Oregon) AWS regions. To learn more, visit the Amazon SageMaker documentation for BlazingText Word2Vec.