Strange code generated from streamAsVector/streams. #9

pjonsson · 2012-10-29T19:59:18Z

This is a distilled test case that at least shows some inefficient code being generated. Putting it here so that it doesn't get lost.

cexp :: Vector1 WordN -> Vector1 WordN -> Vector (Vector1 WordN)
cexp n = streamAsVector (scan (\p k -> p) n)

I get the generated code:

    uint32_t v43;
    struct array e0 = {0};
    uint32_t e1;
    struct array e2 = {0};
    int e3;

    v43 = getLength(v1);
    initArray(&e0, (0 - sizeof(struct array)), v43);
    e1 = 0;
    initArray(&e2, sizeof(uint32_t), getLength(v0));
    copyArray(&e2, v0);
    for(uint32_t v34 = 0; v34 < v43; v34 += 1)
    {
        e1 = (e1 + 1);
        initArray(&e2, sizeof(uint32_t), getLength(&e2));
        copyArray(&e2, &e2);
        initArray(&at(struct array,&e0,v34), sizeof(uint32_t), getLength(&e2));
        copyArray(&at(struct array,&e0,v34), &e2);
    }
    initArray(out, (0 - sizeof(struct array)), getLength(&e0));
    copyArray(out, &e0);
    freeArray(&e0);
    freeArray(&e2);

e2 is initialized and populated just before the for-loop. Inside the for-loop e2 is re-initialized and copied onto itself. The generated code might be correct, but should in that case not init and copy arrays to themselves. I have done some preliminary tracing and the generated program structure is immediately visible in the output from fromCore.

The text was updated successfully, but these errors were encountered:

emwap · 2012-10-30T07:55:40Z

Thank you. Using arrays as the state in a loop is a known deficiency. The copying of e2 onto itself comes from scan (\p k -> p). I guess we could optimize that...

I assume that \p k -> p is something more complicated in your code. If not, the code for cexp can be expressed as follows:

cexp2 :: Vector1 WordN -> Vector1 WordN -> Vector (Vector1 WordN)
cexp2 v1 v2 = replicate (length v2) v1

pjonsson · 2012-10-30T10:38:33Z

Known deficiency as "does not work" or "is slow"?

My code is more complicated; here's an excerpt from a larger testcase:

initArray(&e3, sizeof(int32_t), 8); at(int32_t,&e3,0) = -32768; ... at(int32_t,&e3,7) = -32768; for(uint32_t v23 = 0; v23 < v35; v23 += 1) { e2 = (e2 + 1); initArray(&e3, sizeof(int32_t), 8); for(uint32_t v29 = 0; v29 < 8; v29 += 1) { at(int32_t,&e3,v29) = at(int32_t,&e3,at(uint32_t,&v36,v29)); } initArray(&at(struct array,&e1,v23), sizeof(int32_t), getLength(&e3)); copyArray(&at(struct array,&e1,v23), &e3); }

So all elements of e3 are initialized to -32768 before the loop. First thing that happens inside the loop is a call to initArray() which might (?) allocate new memory for e3. After that there's a small loop that both read and write elements in e3.

It might work for this test case as well, but is this safe in general? My concern is that we might read uninitialized memory in the first round of the loop if we are unlucky.

emwap · 2012-10-30T11:01:36Z

Known deficiency as "does not work" or "is slow"?

Is slow. The code should be correct but it does too much copying.
My code is more complicated; here's an excerpt from a larger testcase:

initArray(&e3, sizeof(int32_t), 8);
at(int32_t,&e3,0) = -32768;
...
at(int32_t,&e3,7) = -32768;
for(uint32_t v23 = 0; v23 < v35; v23 += 1)
{
e2 = (e2 + 1);
initArray(&e3, sizeof(int32_t), 8);
for(uint32_t v29 = 0; v29 < 8; v29 += 1)
{
at(int32_t,&e3,v29) = at(int32_t,&e3,at(uint32_t,&v36,v29));
}
initArray(&at(struct array,&e1,v23), sizeof(int32_t), getLength(&e3));
copyArray(&at(struct array,&e1,v23), &e3);
}

So all elements of e3 are initialized to -32768 before the loop. First thing that happens inside the loop is a call to initArray() which might (?) allocate new memory for e3. After that there's a small loop that both read and write elements in e3.

It might work for this test case as well, but is this safe in general? My concern is that we might read uninitialized memory in the first round of the loop if we are unlucky.

It should be safe since it relies on the invariant that initArray will only reallocate the buffer if there is not enough room to hold the data. The generated code will in this case always have the same parameters to initArray and thus there will be no reallocation.

It is of course a bit fragile and I will change the classification of this issue to a bug. We need to remove the extra calls to initArray.

pjonsson · 2012-10-31T10:28:54Z

I figured out why I'm worried. Here's a test case:

cexp3 :: Vector1 WordN -> Vector1 WordN -> Vector (Vector1 WordN)
cexp3 n = streamAsVector (scan (\p k -> p ++ p) n)

Generated code:

    initArray(&e2, sizeof(uint32_t), getLength(v0));
    copyArray(&e2, v0);
    for(uint32_t v36 = 0; v36 < v46; v36 += 1) {
        e1 = (e1 + 1);
        initArray(&e2, sizeof(uint32_t), (getLength(&e2) + getLength(&e2)));
        copyArray(&e2, &e2);
        copyArrayPos(&e2, getLength(&e2), &e2);
        initArray(&at(struct array,&e0,v36), sizeof(uint32_t), getLength(&e2));
        copyArray(&at(struct array,&e0,v36), &e2);
    }

The problem is definitely there in the output from fromCore.

emwap · 2012-10-31T11:10:19Z

That is most certainly a bug. The length of e2 will double for each lap.

emwap · 2012-10-31T21:42:12Z

The fix for aliasing in 3046e0b (issue #12), helps for this issue in that the generated code is correct. However, the increased copying really hurts in this example since we are copying arrays.

I will leave this issue open until the extra copying has been removed.

    uint32_t v46;
    struct array v32 = {0};
    uint32_t v33;
    struct array v35 = {0};
    uint32_t v37;
    struct array v39 = {0};

    v46 = getLength(v1);
    initArray(&v32, (0 - sizeof(struct array)), v46);
    v33 = 0;
    initArray(&v35, sizeof(uint32_t), getLength(v0));
    copyArray(&v35, v0);
    for(uint32_t v36 = 0; v36 < v46; v36 += 1)
    {
        v37 = v33;
        v33 = (v37 + 1);
        initArray(&v39, sizeof(uint32_t), getLength(&v35));
        copyArray(&v39, &v35);
        initArray(&v35, sizeof(uint32_t), (getLength(&v39) + getLength(&v39)));
        copyArray(&v35, &v39);
        copyArrayPos(&v35, getLength(&v39), &v39);
        initArray(&at(struct array,&v32,v36), sizeof(uint32_t), getLength(&v39));
        copyArray(&at(struct array,&v32,v36), &v39);
    }

pjonsson · 2012-11-01T13:01:20Z

The code given to feldspar-compiler for cexp7 looks like this:

(bind (getRef var10) (\var12 -> (then (setRef var10 (var12 + 1)) (setMArr var9 var11 (var0 ! var12)))))

So there's a sequence of evaluation order imposed upon us. Looking at the generated code in the previous comment: why do we have to update v33 at the beginning of the loop? If that update was performed at the end of the loop things would look much better. I tried wiggling with the Stream library implementation but I seem unable to change the order of the array update and the setRef.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange code generated from streamAsVector/streams. #9

Strange code generated from streamAsVector/streams. #9

pjonsson commented Oct 29, 2012

emwap commented Oct 30, 2012

pjonsson commented Oct 30, 2012

emwap commented Oct 30, 2012

pjonsson commented Oct 31, 2012

emwap commented Oct 31, 2012

emwap commented Oct 31, 2012

pjonsson commented Nov 1, 2012

Strange code generated from streamAsVector/streams. #9

Strange code generated from streamAsVector/streams. #9

Comments

pjonsson commented Oct 29, 2012

emwap commented Oct 30, 2012

pjonsson commented Oct 30, 2012

emwap commented Oct 30, 2012

pjonsson commented Oct 31, 2012

emwap commented Oct 31, 2012

emwap commented Oct 31, 2012

pjonsson commented Nov 1, 2012