-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathindex.html
executable file
·260 lines (230 loc) · 10.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
---
layout: page
title: QMiner
subtitle: a Node.js addon for data processing
use-site-title: true
---
<hr>
<script src="https://embed.runkit.com" data-element-id="word-count"></script>
<script src="https://embed.runkit.com" data-element-id="keyword-search"></script>
<script src="https://embed.runkit.com" data-element-id="text-search"></script>
<script src="https://embed.runkit.com" data-element-id="text-stream"></script>
<div class="row">
<div class="col-xl-3 col-lg-3 col-md-3"> </div>
<div class="col-xl-6 col-lg-6 col-md-6">
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> $ npm install qminer -save </code></pre></div></div>
</div>
</div>
<hr>
<div class="row">
<div class="col-xl-6 col-lg-6 col-md-6">
<h2 class="post-title"> Word count</h2>
<p> It's the 'Hello World!' example used for text mining tools. QMiner can count words,
however more sophisticated text mining applications better suited for the library! </p>
<p>Let's define a schema with text and push some example sentences in the store. The frequency
of the keywords is provided by the "keyword" aggregate which returns a weighted and sorted
vector of keywords: <code>[ (yellow, 0.60), (pen, 0.49), (green, 0.37), (blue, 0.37), (marker, 0.31) ]</code>.
QMiner automatically discards connection words that are not important such as "this" and "is".</p>
<p>Play with it on <a href="https://runkit.com/carolninap/5a21517b57d217001278fd2a" class="post-read-more">RunKit</a>!</p>
</div>
<div class="col-xl-6 col-lg-6 col-md-6">
<div id="word-count">
var qm = require('qminer');
// create the base object with the desired schema
var base = new qm.Base({
mode: 'createClean',
schema: [{ name: 'tweets',
fields: [{ name: 'text', type: 'string' }]
}]
});
// push the data
let tweetStore = base.store('tweets');
tweetStore.push({text: "This pen is green."});
tweetStore.push({text: "This pen is yellow."});
tweetStore.push({text: "This pen is blue."});
tweetStore.push({text: "This marker is yellow."});
// get the distribution of keywords
let distribution = tweetStore.allRecords.aggr(
{ name: "test", type: "keywords", field: "text" });
// output the sorted keyword-weight pairs
distribution.keywords.forEach((obj) => {
console.log(obj.keyword, obj.weight);
});
</div>
</div>
</div>
<hr>
<div class="row">
<div class="col-xl-6 col-lg-6 col-md-6">
<div id="keyword-search">
var qm = require('qminer');
// create the base object with the desired schema
var base = new qm.Base({
mode: 'createClean',
schema: [{ name: 'tweets',
fields: [{ name: 'text', type: 'string' }],
keys: [{ "field": "text", "type": "text_position" }]
}]
});
// push the data
let tweetStore = base.store('tweets');
tweetStore.push({text: "This pen is green."});
tweetStore.push({text: "This pen is yellow."});
tweetStore.push({text: "This pen is blue."});
tweetStore.push({text: "This marker is yellow."});
let query = {
$from: "tweets",
$or: [{text: "pen"}, {text: "yellow"}]
};
var res = base.search(query);
res.toJSON().records.forEach((obj) => {
console.log(obj.text);
});
</div>
</div>
<div class="col-xl-6 col-lg-6 col-md-6">
<h2 class="post-title"> Keyword search</h2>
<p>Let's see which tweet includes the keyword 'pen' OR 'yellow'!</p>
<p>We define the schema and push data in the store. Note that we index keywords by adding the
<code>keys: []</code> configuration vector to the schema description. The query is run
to retrieve the list of records from the store. All four records are included in the result.</p>
<p>Now replace the second line of the query object with <code> "text": "pen", "text": "yellow" </code>
and hit Run. The result will now have two records that include both 'pen' AND 'yellow'. </p>
<p>Play with it on <a href="https://runkit.com/carolninap/5a2164ea416c870012531290" class="post-read-more">RunKit</a>!</p>
</div>
</div>
<hr>
<div class="row">
<div class="col-xl-6 col-lg-6 col-md-6">
<h2 class="post-title"> Nearest neighbor </h2>
<p>Let's see which tweet is the most similar to an input tweet!</p>
<p>We define the schema and push data in the store. Then we create a feature space over the text
of all the training tweets. The query is then used to create an ordered list of similar tweets.
Here's the similarity vector: <code>0.71, 0.05, 0.02, 0.62"</code>. The first and the last tweets
from the store are the most similar to the query tweet while the second and third are not too similar.</p>
<p>In the query tweet try making the words 'pen' and 'marker' plural and hit Run. Suddenly only the first
tweet is now computed as being similar to the query <code>0.97, 0, 0, 0</code>, even though,
content-wise not much changed. Now add the following in the feature space definition after the last
'text': <code>, tokenizer: { type: "unicode", stopwords: "en", stemmer: "porter" }</code> and hit Run!
The similarities now become the same as originally as the <code>stemmer: porter</code> makes sure these
minor differences are ignored.</p>
<p>Play with it on <a href="https://runkit.com/carolninap/5a2167f7416c8700125315cf" class="post-read-more">RunKit</a>!</p>
</div>
<div class="col-xl-6 col-lg-6 col-md-6">
<div id="text-search">
var qm = require('qminer');
// create the base object with the desired schema
var base = new qm.Base({
mode: 'createClean',
schema: [{ name: 'tweets',
fields: [{ name: 'text', type: 'string' }]
}]
});
// create the feature space object
var ftr = new qm.FeatureSpace(base, { type: "text", source: "tweets", field: "text" });
// push the data
let tweetStore = base.store('tweets');
tweetStore.push({text: "This pen is green."});
tweetStore.push({text: "This pen is yellow."});
tweetStore.push({text: "This pen is blue."});
tweetStore.push({text: "This marker is yellow."});
// update the feature space with the data
ftr.updateRecords(tweetStore.allRecords);
let query = tweetStore.newRecord({"text": "The pen and the marker are green."})
let vector = ftr.extractSparseVector(query);
let matrix = ftr.extractSparseMatrix(tweetStore.allRecords);
let sim = matrix.multiplyT(vector);
sim.print();
</div>
</div>
</div>
<hr>
<div class="row">
<div class="col-xl-6 col-lg-6 col-md-6">
<div id="text-stream">
var qm = require('qminer');
// create the base object
let base = new qm.Base({
mode: 'createClean',
schema: [{
name: 'People',
fields: [
{ name: 'Name', type: 'string', primary: true },
{ name: 'Gender', type: 'string' }
]}
]});
let ps = base.store('People');
// create a custom stream object
let s = [];
// each element of the object has to comply to the base schema definition
s.push(ps.newRecord({ Name: 'John', Gender: 'Male' }));
s.push(ps.newRecord({ Name: 'Mary', Gender: 'Female' }));
s.push(ps.newRecord({ Name: 'Jill', Gender: 'Female' }));
s.push(ps.newRecord({ Name: 'Jack', Gender: 'Male' }));
s.push(ps.newRecord({ Name: 'Mary', Gender: 'Female' }));
s.push(ps.newRecord({ Name: 'Andy', Gender: 'Male' }));
s.push(ps.newRecord({ Name: 'Andy', Gender: 'Male' }));
// create your custom stream aggregate
var stream = new qm.StreamAggr(base, new function () {
var data = {};
this.onAdd = function (rec) {
data[rec.Name] = data[rec.Name] == undefined ? 1 : data[rec.Name] + 1;
};
this.saveJson = function (limit) {
return data;
};
this.getFloat = function (name) {
return data[name] == undefined ? null : data[name];
};
this.getInteger = function (name) {
return data[name] == undefined ? null : data[name];
};
});
// start ingesting the stream
s.forEach((obj, idx) => {
stream.onAdd(obj);
console.log("[" + idx + "] John:" + stream.getFloat("John") + " --- Mary:" + stream.getFloat("Mary"));
});
</div>
</div>
<div class="col-xl-6 col-lg-6 col-md-6">
<h2 class="post-title"> Text streams</h2>
<p>QMiner is also able to process streaming data. Here's a custom defined stream aggregate.
For native aggregates check the 'Time series' menu tab.</p>
<p>We define the schema as usually. We simulate our stream by creating the <code>s</code> vector.
Each element of the vector has to comply with the pre-defined schema. Then we define custom
javascript stream aggregate that counts the frequency of the Names.</p>
<p>Note that when ingesting the stream, QMiner only keeps in memory the model and discards
the data itself. The frequency of the two selected names is displayed after each new data
point comes in:
<br><code>"[0] John:1 --- Mary:null"</code>
<br><code>"[1] John:1 --- Mary:1"</code>
<br><code>"[2] John:1 --- Mary:1"</code>
<br><code>"[3] John:1 --- Mary:1"</code>
<br><code>"[4] John:1 --- Mary:2"</code>
<br><code>"[5] John:1 --- Mary:2"</code>
</p>
<p>Now let's change this example to do word count. First change the fields on lines 7-9 to
<code>{ name: 'text', type: 'string' }</code>. Now let's make sure each element of the
stream complies with the schema, so update lines 17-23 to look like
<code>s.push(ps.newRecord({ text: 'John'}));</code>. Replace line 29 with
<code>data[rec.text] = data[rec.text] == undefined ? 1 : data[rec.text] + 1;</code>.
Finally, replace line 45 with <code>console.log(stream.saveJson());</code>. Now try it out!</p>
<p>The resulting output that counts frequent words looks like:
<br><code>{John: 1}</code>
<br><code>{John: 1, Mary: 1}</code>
<br><code>{Jill: 1, John: 1, Mary: 1}</code>
<br><code>{Jack: 1, Jill: 1, John: 1, Mary: 1}</code>
<br><code>{Jack: 1, Jill: 1, John: 1, Mary: 2}</code>
<br><code>{Andy: 1, Jack: 1, Jill: 1, John: 1, Mary: 2}</code>
<br><code>{Andy: 2, Jack: 1, Jill: 1, John: 1, Mary: 2}</code>
</p>
<p>Play with it on <a href="https://runkit.com/carolninap/5a295beaaeefa1001221f121" class="post-read-more">RunKit</a>!</p>
</div>
</div>
<div class="row">
<div class="col-xl-10 col-lg-10 col-md-10" align="center">
<h2 class="post-title"> Check out QMiner at Nodejs interactive!</h2>
<iframe width="640" height="360" src="https://www.youtube.com/embed/liA0ahRj9Nw" frameborder="0" allowfullscreen></iframe>
</div>
</div>