Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model gives inaccurate results post conversion to tflite #685

Closed
AD-lite24 opened this issue Sep 1, 2024 · 9 comments
Closed

Model gives inaccurate results post conversion to tflite #685

AD-lite24 opened this issue Sep 1, 2024 · 9 comments
Labels
OP:Add OP:Add OP:Sub OP:Sub

Comments

@AD-lite24
Copy link

AD-lite24 commented Sep 1, 2024

Issue Type

Others

OS

Linux

onnx2tf version number

1.25.7

onnx version number

1.16.2

onnxruntime version number

1.18.1

onnxsim (onnx_simplifier) version number

0.4.36

tensorflow version number

2.17.0

Download URL for ONNX

https://huggingface.co/onnx-community/metric3d-vit-small/blob/main/onnx/model.onnx

Parameter Replacement JSON

N/A

Description

  1. To deploy a monodepth model on edge devices. R&D work and problem exploration. Massive impact since nothing except tflite seems to work with snapdragon SOCs

  2. The model outputs are not correct at all. I did a lot of inspection and here are my findings

Here are the input details for the tflite conversion

input_details = [{
    'name': 'pixel_values',
    'index': 0,
    'shape': np.array([1, 480, 640, 3], dtype=np.int32),
    'shape_signature': np.array([1, 480, 640, 3], dtype=np.int32),
    'dtype': <class 'numpy.float32'>,
    'quantization': (0.0, 0),
    'quantization_parameters': {
        'scales': np.array([], dtype=np.float32),
        'zero_points': np.array([], dtype=np.int32),
        'quantized_dimension': 0
    },
    'sparsity_parameters': {}
}]

and here are the output details


output_details = [
    {
        'name': 'Identity',
        'index': 1765,
        'shape': np.array([1, 476, 628], dtype=np.int32),
        'shape_signature': np.array([1, 476, 628], dtype=np.int32),
        'dtype': <class 'numpy.float32'>
        'quantization': (0.0, 0),
        'quantization_parameters': {
            'scales': np.array([], dtype=np.float32),
            'zero_points': np.array([], dtype=np.int32),
            'quantized_dimension': 0
        },
        'sparsity_parameters': {}
    },
    {
        'name': 'Identity_1',
        'index': 1785,
        'shape': np.array([1, 3, 476, 628], dtype=np.int32),
        'shape_signature': np.array([1, 3, 476, 628], dtype=np.int32),
        'dtype': <class 'numpy.float32'>,
        'quantization': (0.0, 0),
        'quantization_parameters': {
            'scales': np.array([], dtype=np.float32),
            'zero_points': np.array([], dtype=np.int32),
            'quantized_dimension': 0
        },
        'sparsity_parameters': {}
    },
    {
        'name': 'Identity_2',
        'index': 1784,
        'shape': np.array([1, 476, 628], dtype=np.int32),
        'shape_signature': np.array([1, 476, 628], dtype=np.int32),
        'dtype': <class 'numpy.float32'>,
        'quantization': (0.0, 0),
        'quantization_parameters': {
            'scales': np.array([], dtype=np.float32),
            'zero_points': np.array([], dtype=np.int32),
            'quantized_dimension': 0
        },
        'sparsity_parameters': {}
    }
]

Upon inspection of the onnx file, the onnx version has 3 outputs

  • Predicted Depth: tensor: float32[batch_size,4floor(3.5floor(height/14)),4floor(3.5floor(width/14))]
  • predicted_normal: tensor: float32[batch_size,3,4floor(3.5floor(height/14)),4floor(3.5floor(width/14))]
  • normal_confidence: tensor: float32[batch_size,4floor(3.5floor(height/14)),4floor(3.5floor(width/14))]

So in the tflite file, Identity, Identity_1, and identity_2 corresponds to either one of these. For predicted depth, it could be either identity or identity_2 and I tried both of them but neither give accurate results at all.

Identity gives values in the range of [-1000, -5000] which does not seem accurate for either confidence or depth values while
Identity_2 gives values in the range of [10, 50] which seems more reasonable but still not accurate.

I am not sure if I was supposed to follow any pre or post processing steps different from the onnx format. Tflite often has different steps but I don't exactly know what they are.

this is an example for drawing inference from the onnx file which works absolutely fine. Is it the conversion process that broke it or is there something additional I need to do to fix the results?

Please also find the reference to an old issue which helped me with the conversion process.

I also created a colab notebook to make it easier to see the inferences from the onnx file. For the same image bus.jpg the range of values with onnx are [4.7, 24.7]

@PINTO0309
Copy link
Owner

PINTO0309 commented Sep 1, 2024

As I commented in the previous issue, if you find it troublesome to correct the Transpose, change the input resolution of the model to a fixed resolution. You should start by checking for precision errors yourself first by making sure to include the -cotof option when converting. onnx2tf is imperfect when it comes to converting dynamic tensors.

There are hundreds of issues in this repository with the same question, so it's a good idea to search for the issue first.

  1. https://github.com/PINTO0309/onnx2tf/issues?q=label%3A%22Dynamic+batch+%2F+Dynamic+shape%22+
  2. https://github.com/PINTO0309/onnx2tf/issues?q=label%3A%22Parameter+replacement%22+
  3. https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#parameter-replacement

It's mentally painful to be asked to answer the same thing over and over again. I might delete the Issues tab soon.

@PINTO0309 PINTO0309 added Dynamic batch / Dynamic shape Dynamic batch / Dynamic shape Parameter replacement Use Parameter replacement labels Sep 1, 2024
@AD-lite24
Copy link
Author

AD-lite24 commented Sep 1, 2024

Really sorry for this. Referring back to the previous issue, the model input resolution was fixed to [1, 3, 480, 640] so I believed the dynamic input size was no longer a problem. with -cotof I can see a pretty bad divergence with an abs error of 4340.93 in the final output. But from this step I thought the input tensors are no longer dynamic and I am reshaping my images according to the fixed tensor input.

Regardless I do not wish to take up anymore of your time, I will try to figure out why the the values are diverging

@PINTO0309
Copy link
Owner

PINTO0309 commented Sep 1, 2024

Using the -cotof option will probably tell you which ops are missing the conversion.

  • e.g.
    image

Your model is a ViT model and has huge parameters, so the auto-correction by onnx2tf may be skipped.

total_output_size: int = 0
for gs_graph_output in gs_graph.outputs:
op_output_size: int = 1
if gs_graph_output.shape is not None:
for s in gs_graph_output.shape:
if isinstance(s, int):
op_output_size *= s
# Total bytes
total_output_size += op_output_size * dtype_sizes.get(gs_graph_output.dtype, 4)
# When exact inference mode is enabled and the total size of the tensor of inference results exceeds approximately 80% of available RAM
mem_available = psutil.virtual_memory().available * 0.80 // 1024 // 1024 //1024
total_output_size_gb = (total_output_size // 1024 // 1024 //1024)
if (not disable_strict_mode and total_output_size_gb > mem_available):
if tmp_onnx_path:
os.remove(tmp_onnx_path)
os.remove(tmp_onnx_external_weights_path)
raise Exception(
f'The tool skipped dummy inference to avoid SWAP processing because the total size of the tensor of inference results exceeded about {mem_available} GB. (results: {total_output_size_gb} GB)'
)

The dummy inference function is necessary to automatically correct model conversion errors, but it consumes a large amount of RAM for models with large structures.

https://github.com/PINTO0309/onnx2tf/tree/main?tab=readme-ov-file#3-accuracy-check

image

onnx2tf -i metric3d-vit-small.onnx -cotof

image

Your model appears to consume 130GB of RAM for auto-correction. MatMul has too many elements.

image

onnx2tf seems to make a mysterious and fatal error in the constant calculations in this part. There may be a problem with the calculation of optimizing two consecutive Sub, i.e. y = (200 - x) - 200.

image
image

It's probably a bug in the optimization process of arithmetic operations. Sorry for blaming you so much.

@PINTO0309 PINTO0309 added Bug bug and removed Parameter replacement Use Parameter replacement Dynamic batch / Dynamic shape Dynamic batch / Dynamic shape labels Sep 1, 2024
@PINTO0309
Copy link
Owner

PINTO0309 commented Sep 1, 2024

  • Tests - OK - The error of the final output was less than 1e-4.
    image

@AD-lite24
Copy link
Author

AD-lite24 commented Sep 1, 2024

Ah that is amazing. Brilliant as always. Results have improved significantly compared to the gibberish output as before. The values are still not accurate though from real world tests and the high frequency features were missed by the converted file.

Do you know how we can find out how the pre and post processing steps change upon conversion? Maybe the image needs normalisation as well or the output needs some sort of scaling? The onnx range of values were [4.7, 24.7] and the tflite range is [0.68631876, 11.6238165] and with metric depth such disparity matters. This does seem at odds with the calculated error of 1e-4 but I am not sure. I will run some more experiments to figure out the exact disparity.

Anyway this is certainly nothing you should worry about, you have helped me a lot. Maybe the model simply cannot be converted with very high precision. Again thanks a lot for taking out your time and fixing these issues!

@PINTO0309
Copy link
Owner

PINTO0309 commented Sep 1, 2024

I can't say anything for sure, just guessing how you plan to use the model, but here are some common patterns of loss of accuracy that can occur after conversion to TensorFlow:

  1. If you eventually quantize to INT8/Float16 and use the model, accuracy is likely to degrade if the normalization process is performed at the beginning of the model. This is because the results vary greatly depending on the type of calibration performed during quantization, so including normalization in preprocessing does not necessarily result in a deterioration in accuracy.
    image
  2. AveragePool has to be converted quite roughly because there is practically no OP in TensorFlow that performs the equivalent operation. Therefore, large errors may occur due to differences in padding processing and rounding at the edges of the image. There is a devastating specification difference between PyTorch/ONNX and TensorFlow in padding processing.
  3. Certain operations (OPs) have a significant divergence between TensorFlow's internal implementation and ONNX and PyTorch's internal implementation. This may involve minor effects such as the internal rounding of numbers being different between ONNX and TensorFlow, or it may be fatal, with a bug on the TensorFlow side having been left unfixed for a long time.
  4. Even if you convert the formula into a completely compatible pattern, there may be a large error that cannot be tolerated. This is also an issue with the internal implementation of ONNX.
  5. I always check that the error of the final output converges to around 1e-4, but you may want to check the errors for each OP that are checked by the -cotof option yourself. The reason why it is better to check the accuracy check results of -cotof again is that there may be problems with large errors being rounded or flattened in the OP processing of Softmax or Convoloution, Pooling. However, since the structure of the ViT model is very large, the amount of accuracy check results output by the -cotof option becomes enormous, and it is very difficult to visually check the error check results of all OPs.
  6. TensorFlow Lite's converters sometimes optimize models arbitrarily, most notably by arbitrarily replacing Div with Mul operations. This may seem like a simple operation, replacing division with multiplication of the reciprocal, but in reality, due to an issue with TensorFlow's internal rounding, errors often occur when the result of an OP that was originally expressed as a division is rewritten to be processed with Mul. You can see this by looking closely at the results of converting your ViT model with -cotof. We can see that errors occur in all Div OPs. Sqrt, which has a similar internal behavior, is prone to errors.
  7. TFLiteConverter does not guarantee the order of input and output OPs of models generated by Keras. Therefore, when a model with multiple input OPs or multiple output OPs is converted to tflite, the meaning of the input and output order in ONNX and the meaning of the input and output order in tflite may be randomly swapped. To deal with such a strange specification of TFLiteConverter, which may seem like a bug, onnx2tf implements an option called -coion, which writes an inferable signature into the model using input and output names. By using interpreter.get_signature_runner(), you can match input tensors and output tensors using the model's input and output names, so processing can be performed normally even if the input and output order is broken.
    https://github.com/PINTO0309/onnx2tf?tab=readme-ov-file#14-inference-with-dynamic-tensors-in-tflite
    • e.g.
      import numpy as np
      import tensorflow as tf
      from pprint import pprint
      
      interpreter = tf.lite.Interpreter(model_path="saved_model/osnet_x0_25_msmt17_float32.tflite")
      tf_lite_model = interpreter.get_signature_runner()
      inputs = {
          'images': np.ones([5,256,128,3], dtype=np.float32),
      }
      tf_lite_output = tf_lite_model(**inputs)
      print(f"[TFLite] Model Predictions shape: {tf_lite_output['output'].shape}")
      print(f"[TFLite] Model Predictions:")
      pprint(tf_lite_output)
    • Your ONNX and TFLite with -coion
      https://github.com/PINTO0309/onnx2tf/releases/download/1.25.9/metric3d-vit-small-with-coion.zip
      ONNX TFLite
      image image
      import numpy as np
      import tensorflow as tf
      from pprint import pprint
      
      interpreter = tf.lite.Interpreter(model_path="metric3d-vit-small_float32.tflite")
      tf_lite_model = interpreter.get_signature_runner()
      inputs = {
          'pixel_values': np.ones([1,480,640,3], dtype=np.float32),
      }
      tf_lite_output = tf_lite_model(**inputs)
      print(f"[TFLite] Model Predictions shape.1: {tf_lite_output['predicted_depth'].shape}")
      print(f"[TFLite] Model Predictions shape.2: {tf_lite_output['predicted_normal'].shape}")
      print(f"[TFLite] Model Predictions shape.3: {tf_lite_output['normal_confidence'].shape}")
      
      ###### Input/output order is irrelevant
      # print(f"[TFLite] Model Predictions shape.1: {tf_lite_output['predicted_normal'].shape}")
      # print(f"[TFLite] Model Predictions shape.2: {tf_lite_output['normal_confidence'].shape}")
      # print(f"[TFLite] Model Predictions shape.3: {tf_lite_output['predicted_depth'].shape}")
      
      print(f"[TFLite] Model Predictions:")
      pprint(tf_lite_output)

Note that onnx2tf fixes all elements to 1 when performing accuracy checks. Do not use real images or test data. This is because the type of input data is not known.

If the error check using the -cotof option with all test data set at 1 converges to an error of around 1e-4, the inference results of your test code should also be within an error of around 1e-4 for all elements.

The onnx range of values were [4.7, 24.7] and the tflite range is [0.68631876, 11.6238165] and with metric depth such disparity matters.

Therefore, as you point out, the situation where the final output values ​​of each model differ by more than a factor of 10 is clearly not a problem with the models themselves.

@PINTO0309 PINTO0309 pinned this issue Sep 1, 2024
@PINTO0309 PINTO0309 removed the Bug bug label Sep 1, 2024
@AD-lite24
Copy link
Author

If you eventually quantize to INT8/Float16 and use the model, accuracy is likely to degrade if the normalization process is performed at the beginning of the model. This is because the results vary greatly depending on the type of calibration performed during quantization, so including normalization in preprocessing does not necessarily result in a deterioration in accuracy.

Makes sense. Using float32 here but yes still holds true.

Thanks for the detail, I get why conversion is so hard especially for these larger models. So from what I can tell it is just not that easy to convert accurately for this particular model. And any arbitrary changes made by tflite cant be predicted (though normalization is not a big factor).

Therefore, as you point out, the situation where the final output values ​​of each model differ by more than a factor of 10 is clearly not a problem with the models themselves.

So the issue is tflite? The values are close but not accurate enough, but that cant be explained since -cotof test does give an error of less than 1e-4. Maybe some depth scaling? I will figure out the scale factor if there is one. Thanks a bunch!

@PINTO0309
Copy link
Owner

I have to attend a conference for the next three days, so my investigation and definitive answer will be a little delayed.

@PINTO0309 PINTO0309 unpinned this issue Sep 6, 2024
@PINTO0309 PINTO0309 pinned this issue Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OP:Add OP:Add OP:Sub OP:Sub
Projects
None yet
Development

No branches or pull requests

2 participants