Description
Summary
The mongoengine code makes the implicit assumption that db field names and model field names only overlap if they refer to the same field. If this condition is not satisfied, either by explicit model design (test case 1 and 2) or by garbage/old data in the database (test case 3 and 4), all kinds of data corruption happen.
How to reproduce
Run the attached file db_field_test.txt with python2 -R db_field_test.txt
. The expected (bug free) output would be that in all four test cases f, g, h show the same values. But the actual output is like this:
name x1 x2 y1 y2 z1 z2
f1 True None True None True False
g1 True None None True False None
h1 True None None True False True
name x1 x2 y1 y2 z1 z2
f2 True None True None True False
g2 None True None True None False
h2 True None True None True False
name w1
f3 True
g3 False
name w1
f4 True
g4 False
Some bugs are dependent on the order iterators return dictionary items, so several runs might be necessary to see the bugs (and that's the reason for the -R flag).
The test program defines a strict / dynamic document with the following fields
w1 = fields.BooleanField(db_field='w2')
x1 = fields.BooleanField(db_field='x2')
x2 = fields.BooleanField(db_field='x3')
y1 = fields.BooleanField(db_field='y0')
y2 = fields.BooleanField(db_field='y1')
z1 = fields.BooleanField(db_field='z2')
z2 = fields.BooleanField(db_field='z1')
In each of the four test cases it creates an object f and sets some of the fields to True or False, as shown in the output. The object is saved and loaded again (g). The object h has set the same fields with the same values as f, but using the constructor instead of attribute access (h = Doc(x1=False, ...)
.
The first and third test cases use strict documents, the second and fourth test cases use a dynamic document.
In the last two test cases, the field 'w1' is set directly in the database to False after f is saved and before g is loaded.
Analysis
In the mongoengine code, two patterns are used which work if the above assumption holds, but break down if not:
- Field names can be converted multiple times to db names (or model names).
- Model field names and db field names can be mixed in the same data structure.
In base.document.BaseDocument.__init__
, field names are converted from db names to model names, but the field names should already be model names. This explains h1. This conversion only happens if the document is strict, therefore h2 does not show the bug.
In base.document.BaseDocument._from_son
, field names are converted to db names. Again, this does not make sense, as the SON object uses db names when loading objects from the database. (As this bug is "repaired" by __init__
for strict documents, only g2.x1 shows the wrong value and not g1.x1.)
The names/values are copied to data
dictionary, there they are converted to model names by overwriting data
in a loop and deleting the db name from it. At this point, the order of the items returned by field.iteritems()
matters, as this might result in multiple conversions in the data
dictionary. For example y1 is saved in the database as y0. The conversion does not change it, so it is y0 in data
. In the loop it is copied to y1. but if field.iteritems()
returns y2 after y1, the loop treats y1 as the db name of the model field y2, and therefore copies it to data['y2']
.
The 3rd and 4th test case, the (double) conversion to db names in base.document.BaseDocument._from_son
is the culprit. In the database, w1 and w2 exists. The latter by saving f, the former by setting it explicitly. Both are converted to db name w2 in a loop over son.iteritems()
. If w1 was last, the wrong data is loaded.
It is remarkable that this is even the case with strict documents (g3), as fields not defined in a strict model are filtered at the last moment.
If both x1 and x2 are set (test case not included) and saved, this conversion of db names (x2 and x3) would both copy to x3, resulting in only x2 is present after loading, with the value of either the original x1 or x2 (depending of the order of son.iteritems()
).