Skip to content
Snippets Groups Projects
Commit 51ade51a authored by Sean Owen's avatar Sean Owen
Browse files

[SPARK-16440][MLLIB] Undeleted broadcast variables in Word2Vec causing OoM for long runs

## What changes were proposed in this pull request?

Unpersist broadcasted vars in Word2Vec.fit for more timely / reliable resource cleanup

## How was this patch tested?

Jenkins tests

Author: Sean Owen <sowen@cloudera.com>

Closes #14153 from srowen/SPARK-16440.
parent 3d6f679c
No related branches found
No related tags found
No related merge requests found
...@@ -434,6 +434,9 @@ class Word2Vec extends Serializable with Logging { ...@@ -434,6 +434,9 @@ class Word2Vec extends Serializable with Logging {
bcSyn1Global.unpersist(false) bcSyn1Global.unpersist(false)
} }
newSentences.unpersist() newSentences.unpersist()
expTable.unpersist()
bcVocab.unpersist()
bcVocabHash.unpersist()
val wordArray = vocab.map(_.word) val wordArray = vocab.map(_.word)
new Word2VecModel(wordArray.zipWithIndex.toMap, syn0Global) new Word2VecModel(wordArray.zipWithIndex.toMap, syn0Global)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment