removed some specific information from README, as I just changed the scritps

tmikolov@google.com · tmikolov@google.com · commit 2442024f85f0 · 2013-08-01T20:12:56.000Z
git-svn-id: http://word2vec.googlecode.com/svn/trunk@28 c84ef02e-58a5-4c83-e53e-41fc32d635eb
diff --git a/README.txt b/README.txt
@@ -1,29 +1,21 @@
+Tools for computing distributed representtion of words
+------------------------------------------------------
 
-We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG).
+We provide an implementation of the Continuous Bag-of-Words (CBOW) and the Skip-gram model (SG), as well as several demo scripts.
 
-Given a text corpus, the word2vec program learns a vector for every word using the Continuous
-Bag-of-Words or the Skip-Gram model.  The user needs to specify the following:
+Given a text corpus, the word2vec tool learns a vector for every word in the vocabulary using the Continuous
+Bag-of-Words or the Skip-Gram neural network architectures. The user should to specify the following:
  - desired vector dimensionality
  - the size of the context window for either the Skip-Gram or the Continuous Bag-of-Words model
- - Whether hierarchical sampling is used
- - Whether negative sampling is used, and if so, how many negative samples should be used
- - A threshold for downsampling frequent words 
- - Number of threads to use
- - Whether to save the vectors in a text format or a binary format
+ - training algorithm: hierarchical softmax and / or negative sampling
+ - threshold for downsampling the frequent words 
+ - number of threads to use
+ - the format of the output word vector file (text or binary)
 
+Usually, the other hyper-parameters such as the learning rate do not need to be tuned for different training sets. 
 
-Thus the programs require a very modest number of parameter.  In particular,  learning rates
-need not be selected. 
+The script demo-word.sh downloads a small (100MB) text corpus from the web, and trains a small word vector model. After the training
+is finished, the user can interactively explore the similarity of the words.
 
-The file demo-word.sh downloads a small (100MB) text corpus, and trains a 200-dimensional CBOW model
-with a window of size 5, negative sampling with 5 negative samples, a downsampling of 1e-3, 12 threads, and binary files.
-
-./word2vec -train text8 -output vectors.bin -cbow 1 -size 200 -window 5 -negative 5 -hs 0 -sample 1e-3 -threads 12 -binary 1
-
-
-Then, to evaluate the fidelity of our vectors, we can run the command, which will run   
-a battery of tests on the vectors to determine their fidelity.  The tests evaluate
-the vectors' ability to perform linear analogies. 
-
-./distance vectors.bin
+More information about the scripts is provided at https://code.google.com/p/word2vec/