[SYSTEMDS-3549] Hidden Markov model builtin #1838

cluelesprogrammer · 2023-06-07T17:58:54Z

No description provided.

Baunsgaard · 2023-06-07T19:38:16Z

Hi @cluelesprogrammer ,

Yes the files are located correctly.

Next steps are to (as we texted about before):

Take a bit of inspiration from the other files, to define an interface of these methods such that you take an input X that you want to train on for the hmm.dml and a input X that you want to predict on for the hmmPredict.dml. As a starting point just make the methods call stop("NotImplemented").
You can integrate these into the system by adding them in /src/main/java/org/apache/sysds/common/Builtins.java, again look at how other builtin functions are done.
Then try to call it from a new file that is not in the repository, perhaps make a /temp directory can a file called test.dml that calls your new builtin.
Add a test that verifies that the builtin is working/throwing the not implemented in src/test/java/org/apache/sysds/test/functions/builtin either part1 or 2, again find some inspiration from other tests.

Once the above is done start making the HMM implementation.

I suggest you just do the work on top of this PR.

Baunsgaard · 2023-06-09T09:25:07Z

scripts/builtin/hmm.dml

+# P     Probability of the set of outputs
+# --------------------------------------------------------------------------------------------
+
+hmm = function(Matrix[Double] X) 


since you are allowed to define more than one function in these files, we require you to mark the main function with: m_ , aka here it become m_hmm

Baunsgaard

It looks good so far you just have to write the test matrix to disk and verify that the test executes correctly with the correct output.

Baunsgaard · 2023-06-16T10:49:16Z

src/test/java/org/apache/sysds/test/functions/builtin/part1/BuiltinHMMPredictTest.java

+			writeInputMatrixWithMTD("X", X, true);
+			"""
+
+			runTest(true, false, null, -1);


this only runs your code. it does not verify something happened.

you can alternatively call it like

String stdout = runTest(null).toString()

then you get the output from the experiment if you also call before setOutputBuffering(true);

Baunsgaard

Hi @cluelesprogrammer

To Finish the submission i expect at least:

A test that verify behavior not just that your method is running.
If it is impossible to implement the HMM itself, then at least test the HMM predict in accordance to what you think it should do, and make clear indications where in the code elements are missing for a full implementation. Otherwise a full simple version of HMM would be required, it does not have to be efficient.

Best regards
Sebastian

scripts/builtin/hmm.dml

Baunsgaard · 2023-07-07T13:13:05Z

scripts/builtin/hmm.dml

+        iteration, break.
+        */
+    }
+}


new lines in end of files.

I do not understand this. Could you please clarify?

simple, GitHub indicate with a big red mark that there is no newline. The code does work, but we try to make it consistent across files.

Should I add or remove a newline?

scripts/builtin/hmm.dml

scripts/builtin/hmmPredict.dml

Baunsgaard

Hi @cluelesprogrammer

I went through the code again.
It looks okay, so i would say it is a passed AMLS project.
But i did leave some coding comments, that we need to look into before we merge it.

Thanks for the PR and help (but do not feel forced to do it.)

Baunsgaard · 2023-07-23T20:09:14Z

scripts/builtin/hmm.dml

+
+    #X should have the size of 1 * ncols
+    #should be transposed for the unique function
+    unique_X = matrix(X, rows=ncol(X), cols=1)


to transpose call t(X)

Baunsgaard · 2023-07-23T20:09:47Z

scripts/builtin/hmm.dml

+
+    search = TRUE
+    nr_states = 2
+    seed = 42


make seed an argument initially

Baunsgaard · 2023-07-23T20:14:41Z

scripts/builtin/hmm.dml

+    #X should have the size of 1 * ncols
+    #should be transposed for the unique function
+    unique_X = matrix(X, rows=ncol(X), cols=1)
+    nr_outputs = unique_vals(unique_X)


we have a builtin function to calculate the number of unique values:
countDistinct()

you can give it an direction if you want to know the number of distinct in columns, rows or entirety.

Baunsgaard · 2023-07-23T20:15:36Z

scripts/builtin/hmm.dml

+    seed = seed+1
+    B = col_normalize(matrix(1/nr_outputs,rows=nr_states, cols=nr_outputs) + rand(rows=nr_states, cols=nr_outputs, seed=seed))
+    seed = seed+1
+    ip = matrix(1/nr_states, rows=nr_states, cols=1) + rand(rows=nr_states, cols=1, seed=seed)


Also, each call to random have to have a unique seed. A suggestion for this is to make each rand call with seed += 1. (+= is not supported so you have to assign back the seed with seed = seed + 1, between the lines.)

Baunsgaard · 2023-07-23T20:18:12Z

scripts/builtin/hmm.dml

+    #initialize state transition and emmission matrices uniformly
+    A = col_normalize(matrix(1/nr_states, rows=nr_states, cols=nr_states) + rand(rows=nr_states, cols=nr_states, seed=seed))
+    seed = seed+1
+    B = col_normalize(matrix(1/nr_outputs,rows=nr_states, cols=nr_outputs) + rand(rows=nr_states, cols=nr_outputs, seed=seed))


Values contained in the matrix is equivalent in each cell for both calls.
perhaps the value to put into the matrix can be calculated once and then assigned into a matrix?
And even better if this is something we can assign into the arguments of the rand call, then we would reduce the current calls from 3 matrices materialized to only 1 per line of (98, 100, and 102)

Baunsgaard · 2023-07-23T20:27:28Z

scripts/builtin/hmm.dml

+        [i, j] = index(trans_id, nr_states)
+        for (t in 1:(T-1)) {
+            #indices for alpha and beta
+            ot1 = output_t(X, t+1)


remove the method output_t and call the extraction here. It should automatically know it is a scalar.
Do you cast to int because you want to round? if so then call round()

Baunsgaard · 2023-07-23T20:29:49Z

scripts/builtin/hmm.dml

+    nr_states = nrow(alpha)
+    T = ncol(alpha)
+
+    gamma = rand(rows=nr_states, cols=T)


This rand call seems to be done just to materialize a matrix?
If so we could consider changing it to a matrix(rows...) call.
But it might not be the best solution.

Baunsgaard · 2023-07-23T20:34:22Z

scripts/builtin/hmm.dml

+            den_it = sum(alpha[,t] * beta[,t])
+            gamma[i, t] = num_it / den_it
+        }
+    }


I think these parfors can be rewritten as:

num_it_M = alpha * beta den_it_M = rowsum(num_it_M) gamma = num_it_M / den_it_M

This is because we support direct matrix and vector operations.
And i think this nicely maps to it. (But i might have an error here.)

Baunsgaard · 2023-07-23T20:36:19Z

scripts/builtin/hmm.dml

+    for (t in 2:T) {
+        ot = output_t(X, t)
+        for (i in 1:nr_states) {    
+            alpha[i, t] = B[i, ot] * sum(alpha[,t-1]* A[ ,i]) 


move alpha[,t-1] to outer loop

Baunsgaard · 2023-07-23T20:36:54Z

scripts/builtin/hmm.dml

+    for (t in (T-1):1) {
+        ot = output_t(X, t)
+        for (i in 1:nr_states) {
+            beta[i, t] = sum(beta[, t+1] * t(A[i, ]) * B[ , ot])


Move beta[,t+1] to outer loop if possible

j143 · 2023-12-17T14:46:10Z

Hi @cluelesprogrammer , thanks a lot for contributing to SystemDS. Do you need some help with addressing the review comments.

Your PR looks good as is. Let me know if you would like to address the comments.

Kindly let us know, if there is any assistance we could provide.

Regards,
Janardhan

added empty files

2166ed1

Baunsgaard changed the title ~~added empty files~~ [SYSTEMDS-3549] Hidden Markov model builtin Jun 7, 2023

Baunsgaard marked this pull request as draft June 7, 2023 19:39

just func defs and comments

5fc6aa6

Baunsgaard reviewed Jun 9, 2023

View reviewed changes

changed builtin order and working functions

e2bc786

Baunsgaard mentioned this pull request Jun 16, 2023

[SYSTEMDS-3545] img_cutout_linearized #1845

Closed

Sandesh added 2 commits June 16, 2023 12:33

hmmPredictTest start

83900b1

added hmmPredict test dml file

31d8886

Baunsgaard reviewed Jun 16, 2023

View reviewed changes

Sandesh added 14 commits June 16, 2023 15:14

fixed function name

f6afde2

added hmm tests

ba030ad

Start of Baum-welch algo

dadb65e

more complete Test

f1fff1b

Removed compilation errors in tests

1b77d3a

spelling change and maybe others

5c01b01

changed file name therefore deleted

e5dc3d2

put in proper return function names

948b92b

working hmmpredict implementation

d3e6ccb

working hmmpredict implementation

9d9c42d

added some comments

44dc75c

some more comments

cb6b9b3

standardized and working

6b3c012

hmm baum welch implementation

69c8e1b

Baunsgaard reviewed Jul 7, 2023

View reviewed changes

Sandesh and others added 4 commits July 8, 2023 17:07

ouput definition added

947ae7d

changed docu and structure

99a805b

working HMM implementation

0e28eb7

fixed hmm documentation and indents in hmmPredict

6faa754

sb0476679 added 5 commits July 19, 2023 11:03

hmm and hmmPredict work now!!

40d1d48

refactored for reproducibility

948af5b

added more iterations for optimization

7602afc

changed hyper-parameters and deterministic behavior

6046879

the tests should be working now

5f2c864

cluelesprogrammer marked this pull request as ready for review July 23, 2023 20:01

Baunsgaard reviewed Jul 23, 2023

View reviewed changes

j143 added this to the next-release milestone Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYSTEMDS-3549] Hidden Markov model builtin #1838

[SYSTEMDS-3549] Hidden Markov model builtin #1838

cluelesprogrammer commented Jun 7, 2023

Baunsgaard commented Jun 7, 2023

Baunsgaard Jun 9, 2023

Baunsgaard left a comment

Baunsgaard Jun 16, 2023

Baunsgaard left a comment

Baunsgaard Jul 7, 2023

cluelesprogrammer Jul 23, 2023

cluelesprogrammer Jul 23, 2023

Baunsgaard Jul 23, 2023 •

edited

Loading

cluelesprogrammer Jul 23, 2023

Baunsgaard Jul 23, 2023

Baunsgaard left a comment

Baunsgaard Jul 23, 2023

Baunsgaard Jul 23, 2023

Baunsgaard Jul 23, 2023

Baunsgaard Jul 23, 2023

Baunsgaard Jul 23, 2023

Baunsgaard Jul 23, 2023

Baunsgaard Jul 23, 2023

Baunsgaard Jul 23, 2023

Baunsgaard Jul 23, 2023

Baunsgaard Jul 23, 2023

j143 commented Dec 17, 2023 •

edited

Loading

[SYSTEMDS-3549] Hidden Markov model builtin #1838

Are you sure you want to change the base?

[SYSTEMDS-3549] Hidden Markov model builtin #1838

Conversation

cluelesprogrammer commented Jun 7, 2023

Baunsgaard commented Jun 7, 2023

Choose a reason for hiding this comment

Baunsgaard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Baunsgaard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Baunsgaard Jul 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Baunsgaard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j143 commented Dec 17, 2023 • edited Loading

Baunsgaard Jul 23, 2023 •

edited

Loading

j143 commented Dec 17, 2023 •

edited

Loading