Skip to content

Data corruption when using db_field #1730

Open
@th-erker

Description

@th-erker

Summary
The mongoengine code makes the implicit assumption that db field names and model field names only overlap if they refer to the same field. If this condition is not satisfied, either by explicit model design (test case 1 and 2) or by garbage/old data in the database (test case 3 and 4), all kinds of data corruption happen.

How to reproduce
Run the attached file db_field_test.txt with python2 -R db_field_test.txt. The expected (bug free) output would be that in all four test cases f, g, h show the same values. But the actual output is like this:

name      x1         x2         y1         y2         z1         z2        
f1        True       None       True       None       True       False     
g1        True       None       None       True       False      None      
h1        True       None       None       True       False      True      

name       x1         x2         y1         y2         z1         z2        
f2         True       None       True       None       True       False     
g2         None       True       None       True       None       False     
h2         True       None       True       None       True       False     

name       w1        
f3         True      
g3         False     

name       w1        
f4         True      
g4         False     

Some bugs are dependent on the order iterators return dictionary items, so several runs might be necessary to see the bugs (and that's the reason for the -R flag).

The test program defines a strict / dynamic document with the following fields

    w1 = fields.BooleanField(db_field='w2')

    x1 = fields.BooleanField(db_field='x2')
    x2 = fields.BooleanField(db_field='x3')

    y1 = fields.BooleanField(db_field='y0')
    y2 = fields.BooleanField(db_field='y1')

    z1 = fields.BooleanField(db_field='z2')
    z2 = fields.BooleanField(db_field='z1')

In each of the four test cases it creates an object f and sets some of the fields to True or False, as shown in the output. The object is saved and loaded again (g). The object h has set the same fields with the same values as f, but using the constructor instead of attribute access (h = Doc(x1=False, ...).
The first and third test cases use strict documents, the second and fourth test cases use a dynamic document.
In the last two test cases, the field 'w1' is set directly in the database to False after f is saved and before g is loaded.

Analysis
In the mongoengine code, two patterns are used which work if the above assumption holds, but break down if not:

  1. Field names can be converted multiple times to db names (or model names).
  2. Model field names and db field names can be mixed in the same data structure.

In base.document.BaseDocument.__init__, field names are converted from db names to model names, but the field names should already be model names. This explains h1. This conversion only happens if the document is strict, therefore h2 does not show the bug.

In base.document.BaseDocument._from_son, field names are converted to db names. Again, this does not make sense, as the SON object uses db names when loading objects from the database. (As this bug is "repaired" by __init__ for strict documents, only g2.x1 shows the wrong value and not g1.x1.)
The names/values are copied to data dictionary, there they are converted to model names by overwriting data in a loop and deleting the db name from it. At this point, the order of the items returned by field.iteritems() matters, as this might result in multiple conversions in the data dictionary. For example y1 is saved in the database as y0. The conversion does not change it, so it is y0 in data. In the loop it is copied to y1. but if field.iteritems() returns y2 after y1, the loop treats y1 as the db name of the model field y2, and therefore copies it to data['y2'].

The 3rd and 4th test case, the (double) conversion to db names in base.document.BaseDocument._from_sonis the culprit. In the database, w1 and w2 exists. The latter by saving f, the former by setting it explicitly. Both are converted to db name w2 in a loop over son.iteritems(). If w1 was last, the wrong data is loaded.
It is remarkable that this is even the case with strict documents (g3), as fields not defined in a strict model are filtered at the last moment.

If both x1 and x2 are set (test case not included) and saved, this conversion of db names (x2 and x3) would both copy to x3, resulting in only x2 is present after loading, with the value of either the original x1 or x2 (depending of the order of son.iteritems()).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions