Skip to content

Commit cf6ef09

Browse files
authored
Add simple sentence clustering demo for the Universal Sentence Encoder. (#145)
1 parent aab9e53 commit cf6ef09

File tree

9 files changed

+7077
-3
lines changed

9 files changed

+7077
-3
lines changed

universal-sentence-encoder/.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
node_modules/
22
.cache/
3-
dist/
3+
dist/
4+
.DS_Store

universal-sentence-encoder/README.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,22 @@
11
# Universal Sentence Encoder lite
22

3-
The Universal Sentence Encoder ([Cer et al., 2018](https://arxiv.org/pdf/1803.11175.pdf)) is a model that encodes text into 512-dimensional embeddings. These embeddings can then be used as inputs to natural language processing tasks such as [sentiment classification](https://en.wikipedia.org/wiki/Sentiment_analysis) and [textual similarity](https://en.wikipedia.org/wiki/Semantic_similarity) analysis.
3+
The Universal Sentence Encoder ([Cer et al., 2018](https://arxiv.org/pdf/1803.11175.pdf)) (USE) is a model that encodes text into 512-dimensional embeddings. These embeddings can then be used as inputs to natural language processing tasks such as [sentiment classification](https://en.wikipedia.org/wiki/Sentiment_analysis) and [textual similarity](https://en.wikipedia.org/wiki/Semantic_similarity) analysis.
44

5-
This module is a TensorFlow.js [`FrozenModel`](https://js.tensorflow.org/api/latest/#loadFrozenModel) converted from the Universal Sentence Encoder lite ([module on TFHub](https://tfhub.dev/google/universal-sentence-encoder-lite/2)), a lightweight version of the original. The lite model is based on the Transformer ([Vaswani et al, 2017](https://arxiv.org/pdf/1706.03762.pdf)) architecture, and uses an 8k word piece [vocabulary](https://storage.googleapis.com/tfjs-models/savedmodel/universal_sentence_encoder/vocab.json).
5+
This module is a TensorFlow.js [`FrozenModel`](https://js.tensorflow.org/api/latest/#loadFrozenModel) converted from the USE lite ([module on TFHub](https://tfhub.dev/google/universal-sentence-encoder-lite/2)), a lightweight version of the original. The lite model is based on the Transformer ([Vaswani et al, 2017](https://arxiv.org/pdf/1706.03762.pdf)) architecture, and uses an 8k word piece [vocabulary](https://storage.googleapis.com/tfjs-models/savedmodel/universal_sentence_encoder/vocab.json).
6+
7+
In [this demo](./demo/index.js) we embed six sentences with the USE, and render their self-similarity scores in a matrix (redder means more similar):
8+
9+
![selfsimilarity](./images/self_similarity.jpg)
10+
11+
*The matrix shows that USE embeddings can be used to cluster sentences by similarity.*
12+
13+
The sentences (taken from the [TensorFlow Hub USE lite colab](https://colab.sandbox.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder_lite.ipynb#scrollTo=_GSCW5QIBKVe)):
14+
1. I like my phone.
15+
2. Your cellphone looks great.
16+
3. How old are you?
17+
4. What is your age?
18+
5. An apple a day, keeps the doctors away.
19+
6. Eating strawberries is healthy.
620

721
## Usage
822

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"presets": [
3+
[
4+
"env",
5+
{
6+
"esmodules": false,
7+
"targets": {
8+
"browsers": [
9+
"> 3%"
10+
]
11+
}
12+
}
13+
]
14+
],
15+
"plugins": [
16+
"transform-runtime"
17+
]
18+
}
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Universal Sentence Encoder Demo
2+
3+
## Contents
4+
5+
The demo shows how to use embeddings produced by the Universal Sentence Encoder.
6+
7+
## Setup
8+
9+
cd into the demos folder:
10+
11+
```sh
12+
cd universal-sentence-encoder/demos
13+
```
14+
15+
Install dependencies and prepare the build directory:
16+
17+
```sh
18+
yarn
19+
```
20+
21+
To watch files for changes, and launch a dev server:
22+
23+
```sh
24+
yarn watch
25+
```
26+
27+
## If you are developing universal-sentence-encoder locally, and want to test the changes in the demos
28+
29+
Install yalc:
30+
```sh
31+
npm i -g yalc
32+
```
33+
34+
cd into the universal-sentence-encoder folder:
35+
```sh
36+
cd universal-sentence-encoder
37+
```
38+
39+
Install dependencies:
40+
```sh
41+
yarn
42+
```
43+
44+
Publish universal-sentence-encoder locally:
45+
```sh
46+
yalc push
47+
```
48+
49+
Cd into the demos and install dependencies:
50+
51+
```sh
52+
cd demos
53+
yarn
54+
```
55+
56+
Link the local universal-sentence-encoder to the demos:
57+
```sh
58+
yalc link @tensorflow-models/universal-sentence-encoder
59+
```
60+
61+
Start the dev demo server:
62+
```sh
63+
yarn watch
64+
```
65+
66+
To get future updates from the universal-sentence-encoder source code:
67+
```
68+
# cd up into the universal-sentence-encoder directory
69+
cd ../
70+
yarn build && yalc push
71+
```
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
<!-- Copyright 2019 Google LLC. All Rights Reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License.
14+
==============================================================================-->
15+
<!DOCTYPE html>
16+
<html>
17+
18+
<head>
19+
<title>TensorFlow.js Universal Sentence Encoder lite demo</title>
20+
<style>
21+
h1 {
22+
margin-bottom: 35px;
23+
}
24+
25+
#main {
26+
padding-top: 30px;
27+
font-family: Helvetica, sans-serif;
28+
max-width: 960px;
29+
min-width: 600px;
30+
width: 60vw;
31+
margin-left: auto;
32+
margin-right: auto;
33+
}
34+
35+
#sentences-container {
36+
flex: 1 1 auto;
37+
}
38+
39+
#sentences-container > div {
40+
margin-bottom: 10px;
41+
}
42+
43+
#container {
44+
display: flex;
45+
flex-direction: row;
46+
}
47+
48+
#self-similarity-matrix {
49+
position: relative;
50+
}
51+
52+
.labels {
53+
position: absolute;
54+
}
55+
56+
.x-axis {
57+
bottom: 100%;
58+
width: 100%;
59+
height: 20px;
60+
}
61+
62+
.x-axis > div {
63+
transform: translateX(-50%);
64+
}
65+
66+
.y-axis {
67+
right: 100%;
68+
height: 100%;
69+
width: 20px;
70+
}
71+
72+
.y-axis > div {
73+
transform: translateY(-50%);
74+
}
75+
76+
.labels > div {
77+
position: absolute;
78+
}
79+
80+
#description {
81+
margin-bottom: 50px;
82+
line-height: 1.6;
83+
}
84+
</style>
85+
<meta name="viewport" content="width=device-width, initial-scale=1">
86+
</head>
87+
88+
<body>
89+
<div id='main'>
90+
<h1>Universal Sentence Encoder lite demo</h1>
91+
<div id="description">This demo is taken from the <a target="_blank" href="https://colab.sandbox.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder_lite.ipynb#scrollTo=_GSCW5QIBKVe">TensorFlow Hub Universal Sentence Encoder lite colab</a>. It shows the model's ability to group sentences by semantic similarity usings their embeddings. The matrix on the right shows self-similarity scores (dot products) between the embeddings for the sentences on the left. The redder the cell, the higher the similarity score.</div>
92+
<div id="loading">
93+
Loading the model...
94+
</div>
95+
<div id="container">
96+
<div id="sentences-container"></div>
97+
<div id="self-similarity-matrix">
98+
<div class="labels y-axis"></div>
99+
<div class="labels x-axis"></div>
100+
<canvas></canvas>
101+
</div>
102+
</div>
103+
</div>
104+
<script src="index.js"></script>
105+
</body>
106+
107+
</html>
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
/**
2+
* @license
3+
* Copyright 2019 Google LLC. All Rights Reserved.
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
* =============================================================================
16+
*/
17+
18+
import * as use from '@tensorflow-models/universal-sentence-encoder';
19+
import {interpolateReds} from 'd3-scale-chromatic';
20+
21+
const sentences = [
22+
'I like my phone.', 'Your cellphone looks great.', 'How old are you?',
23+
'What is your age?', 'An apple a day, keeps the doctors away.',
24+
'Eating strawberries is healthy.'
25+
];
26+
27+
const init = async () => {
28+
const model = await use.load();
29+
30+
document.querySelector('#loading').style.display = 'none';
31+
renderSentences();
32+
33+
const embeddings = await model.embed(sentences);
34+
35+
const matrixSize = 250;
36+
const cellSize = matrixSize / sentences.length;
37+
const canvas = document.querySelector('canvas');
38+
canvas.width = matrixSize;
39+
canvas.height = matrixSize;
40+
41+
const ctx = canvas.getContext('2d');
42+
43+
const xLabelsContainer = document.querySelector('.x-axis');
44+
const yLabelsContainer = document.querySelector('.y-axis');
45+
46+
for (let i = 0; i < sentences.length; i++) {
47+
const labelXDom = document.createElement('div');
48+
const labelYDom = document.createElement('div');
49+
50+
labelXDom.textContent = i + 1;
51+
labelYDom.textContent = i + 1;
52+
labelXDom.style.left = (i * cellSize + cellSize / 2) + 'px';
53+
labelYDom.style.top = (i * cellSize + cellSize / 2) + 'px';
54+
55+
xLabelsContainer.appendChild(labelXDom);
56+
yLabelsContainer.appendChild(labelYDom);
57+
58+
for (let j = i; j < sentences.length; j++) {
59+
const sentenceI = embeddings.slice([i, 0], [1]);
60+
const sentenceJ = embeddings.slice([j, 0], [1]);
61+
const sentenceITranspose = false;
62+
const sentenceJTransepose = true;
63+
const score =
64+
sentenceI.matMul(sentenceJ, sentenceITranspose, sentenceJTransepose)
65+
.dataSync();
66+
67+
ctx.fillStyle = interpolateReds(score);
68+
ctx.fillRect(j * cellSize, i * cellSize, cellSize, cellSize);
69+
ctx.fillRect(i * cellSize, j * cellSize, cellSize, cellSize);
70+
}
71+
}
72+
};
73+
74+
init();
75+
76+
const renderSentences = () => {
77+
sentences.forEach((sentence, i) => {
78+
const sentenceDom = document.createElement('div');
79+
sentenceDom.textContent = `${i + 1}) ${sentence}`;
80+
document.querySelector('#sentences-container').appendChild(sentenceDom);
81+
});
82+
};
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
{
2+
"name": "tfjs-models",
3+
"version": "0.0.1",
4+
"description": "",
5+
"main": "index.js",
6+
"license": "Apache-2.0",
7+
"private": true,
8+
"engines": {
9+
"node": ">=8.9.0"
10+
},
11+
"dependencies": {
12+
"@tensorflow-models/universal-sentence-encoder": "0.0.1",
13+
"@tensorflow/tfjs": "^0.14.2",
14+
"d3-scale-chromatic": "^1.3.3"
15+
},
16+
"scripts": {
17+
"watch": "cross-env NODE_ENV=development parcel index.html --no-hmr --open ",
18+
"build": "cross-env NODE_ENV=production parcel build index.html --no-minify --public-url ./",
19+
"lint": "eslint ."
20+
},
21+
"devDependencies": {
22+
"babel-core": "~6.26.3",
23+
"babel-plugin-transform-runtime": "~6.23.0",
24+
"babel-polyfill": "~6.26.0",
25+
"babel-preset-env": "~1.6.1",
26+
"clang-format": "~1.2.2",
27+
"cross-env": "^5.2.0",
28+
"dat.gui": "~0.7.2",
29+
"eslint": "~4.19.1",
30+
"eslint-config-google": "~0.9.1",
31+
"parcel-bundler": "~1.10.3",
32+
"yalc": "~1.0.0-pre.23"
33+
},
34+
"eslintConfig": {
35+
"extends": "google",
36+
"rules": {
37+
"require-jsdoc": 0,
38+
"valid-jsdoc": 0
39+
},
40+
"env": {
41+
"es6": true
42+
},
43+
"parserOptions": {
44+
"ecmaVersion": 8,
45+
"sourceType": "module"
46+
}
47+
},
48+
"eslintIgnore": [
49+
"dist/"
50+
]
51+
}

0 commit comments

Comments
 (0)