Skip to content
This repository was archived by the owner on Jan 26, 2021. It is now read-only.
This repository was archived by the owner on Jan 26, 2021. It is now read-only.

The topics don't match when every infer.@feiga #83

@RyanPeking

Description

@RyanPeking

The same sentence, the result likes this:

first infer:
0 11:1 18:1 32:1 63:1 69:1 75:1 91:2 110:1 172:1 174:2 218:1 269:1 347:2 359:2

the next infer:
0 13:1 28:2 66:1 110:1 135:2 151:1 181:1 235:1 240:1 261:1 284:1 317:1 353:1 355:1 360:2
there is not same when every infer

but, when the topic is few, there is no problem
0 1:8 8:6 15:1
0 1:9 7:1 8:4 15:1

That makes me confused

@feiga
Thank you very much. you are right! I fixed two things in my latest commit

  1. To make doc_topic_counter intact, infering slice by slice per interation as you metioned above
  2. When sampling at inference phase, the word related term of Pi, i.e., n_sw_beta, n_s_beta_sum, n_tw_beta and n_t_beta_sum, SHOULD BE FIXED, which was ignored by our previous discussion.

After doing so, the result gets much better, here is the first 2 documents
============training phase=============
0 260:1 549:2 778:1 1178:2 1309:1 1789:1 1843:2 2131:2 2390:3 2886:1
1 93:1 140:1 204:1 278:4 320:2 404:1 814:1 856:1 1164:2 1496:1 1627:4 1629:1 2059:1 2122:1 2177:1 2430:1 2686:1 2818:1 2880:1
==============inference phase=========
0 47:1 559:1 778:1 1178:2 1345:2 1843:1 2131:4 2390:3 2886:1
1 93:1 204:1 278:4 320:2 404:2 600:1 711:1 856:1 1164:2 1461:1 1496:1 1627:4 2059:1 2122:1 2144:1 2430:1 2518:1 2818:1

I think it is almost correct

However, I think there are some defects in current logic. First of all, It is unnecessary to re-build alias table per slice/block/iteration. On the other hand, it's unnecessary to build alias table for every words in the big vocab of training phase. Maybe it's better to limit user's input to just one block, and generate just one slice for block without vocab spliting . How do you think it?

Thanks

Originally posted by @hiyijian in #14 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions