Data corruption when using db_field

**Summary**
The mongoengine code makes the implicit assumption that db field names and model field names only overlap if they refer to the same field. If this condition is not satisfied, either by explicit model design (test case 1 and 2) or by garbage/old data in the database (test case 3 and 4), all kinds of data corruption happen.

**How to reproduce**
Run the attached file [db_field_test.txt](https://github.com/MongoEngine/mongoengine/files/1663777/db_field_test.txt) with ``python2 -R db_field_test.txt``. The expected (bug free) output would be that in all four test cases f, g, h show the same values. But the actual output is like this:
```
name      x1         x2         y1         y2         z1         z2        
f1        True       None       True       None       True       False     
g1        True       None       None       True       False      None      
h1        True       None       None       True       False      True      

name       x1         x2         y1         y2         z1         z2        
f2         True       None       True       None       True       False     
g2         None       True       None       True       None       False     
h2         True       None       True       None       True       False     

name       w1        
f3         True      
g3         False     

name       w1        
f4         True      
g4         False     
```
Some bugs are dependent on the order iterators return dictionary items, so several runs might be necessary to see the bugs (and that's the reason for the -R flag).

The test program defines a strict / dynamic document with the following fields
```
    w1 = fields.BooleanField(db_field='w2')

    x1 = fields.BooleanField(db_field='x2')
    x2 = fields.BooleanField(db_field='x3')

    y1 = fields.BooleanField(db_field='y0')
    y2 = fields.BooleanField(db_field='y1')

    z1 = fields.BooleanField(db_field='z2')
    z2 = fields.BooleanField(db_field='z1')
```
In each of the four test cases it creates an object f and sets some of the fields to True or False, as shown in the output. The object is saved and loaded again (g). The object h has set the same fields with the same values as f, but using the constructor instead of attribute access (`h = Doc(x1=False, ...)`.
The first and third test cases use strict documents, the second and fourth test cases use a dynamic document.
In the last two test cases, the field 'w1' is set directly in the database to False after f is saved and before g is loaded.

**Analysis**
In the mongoengine code, two patterns are used which work if the above assumption holds, but break down if not:

1. Field names can be converted multiple times to db names (or model names).
2. Model field names and db field names can be mixed in the same data structure.

In `base.document.BaseDocument.__init__`, field names are converted from db names to model names, but the field names should already be model names. This explains h1. This conversion *only happens if the document is strict*, therefore h2 does not show the bug. 

In `base.document.BaseDocument._from_son`, field names are converted to db names. Again, this does not make sense, as the SON object uses db names when loading objects from the database. (As this bug is "repaired" by `__init__` for strict documents, only g2.x1 shows the wrong value and not g1.x1.)
The names/values are copied to `data` dictionary, there they are converted to model names by overwriting `data` in a loop and deleting the db name from it. At this point, the order of the items returned by `field.iteritems()` matters, as this might result in multiple conversions in the `data` dictionary. For example y1 is saved in the database as y0. The conversion does not change it, so it is y0 in `data`. In the loop it is copied to y1. but if `field.iteritems()` returns y2 after y1, the loop treats y1 as the db name of the model field y2, and therefore copies it to `data['y2']`.

The 3rd and 4th test case, the (double) conversion to db names in `base.document.BaseDocument._from_son`is the culprit. In the database, w1 and w2 exists. The latter by saving f, the former by setting it explicitly. Both are converted to db name w2 in a loop over `son.iteritems()`. If w1 was last, the wrong data is loaded.
It is remarkable that this is even the case with strict documents (g3), as fields not defined in a strict model are filtered at the last moment.

If both x1 and x2 are set (test case not included) and saved, this conversion of db names (x2 and x3) would both copy to x3, resulting in only x2 is present after loading, with the value of either the original x1 or x2 (depending of the order of `son.iteritems()`).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data corruption when using db_field #1730

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data corruption when using db_field #1730

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions