Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong parsing of numeric values as part of units #383

Open
bergsmat opened this issue Feb 4, 2025 · 15 comments
Open

Wrong parsing of numeric values as part of units #383

bergsmat opened this issue Feb 4, 2025 · 15 comments
Labels

Comments

@bergsmat
Copy link

bergsmat commented Feb 4, 2025

It seems like strings representing integers cannot be coerced to units (except for 1) and "dot" means exponentiation (except after zero). Can you point me to the documentation? Thanks in advance.

as_units('1')
1 [1]
> as_units('2')
Error in as_units.call(expr, check_is_valid = check_is_valid) : 
  is.language(x) is not TRUE
> as_units('2.1')
1 [2.]
> as_units('2.2')
1 [2.^2]
> as_units('2.3')
1 [2.^3]
> as_units('0.3')
Error: In ‘`0.`^(3)’, ‘0.’ is not recognized by udunits.
@Enchufa2
Copy link
Member

Enchufa2 commented Feb 4, 2025

Sure, in ?as_units:

‘as_units’, a generic with methods for a character string and
for quoted language. Note, direct usage of this function by
users is typically not necessary, as coercion via ‘as_units’
is automatically done with ‘units<-’ and ‘set_units’.

:) Basically, if you use it, use it in the same way as set_units, i.e. with unit strings, not numbers. The only exception is1, which has a special meaning in udunits2: it means unitless, basically.

@bergsmat
Copy link
Author

bergsmat commented Feb 4, 2025

Thanks. I read the help for ?set_units. I didn't see text or examples indicating that dot implies exponentiation (nor in the vignettes). I'm not familiar with that convention so I'm just trying to reassure myself that this is so. It becomes relevant, say, in physiology where estimated glomerular filtration rate is commonly expressed in ml/min/1.73m^2, which could be problematic if not handled delicately.

@Enchufa2
Copy link
Member

Enchufa2 commented Feb 5, 2025

Wow, that's some unit! :) Question: do you mean ml/min/1.73/m^2, in other words, ml/min/(1.73m^2)? Or you mean exactly ml/min/1.73m^2?

I'm not sure we meant the dot to imply anything. Probably it's just that parsing units is a complex problem, and we may just not consider a case like this. We'll need to check the parser and adapt it to support this.

@Enchufa2
Copy link
Member

Enchufa2 commented Feb 5, 2025

In fact, when there is an actual unit after the number, the problem is another:

# this is wrong: two times 1.73 there
unclass(as_units("ml/min/1.73m^2"))
#> [1] 1
#> attr(,"units")
#> $numerator
#> [1] "ml"
#> 
#> $denominator
#> [1] "1.73m" "1.73m" "min"  
#> 
#> attr(,"class")
#> [1] "symbolic_units"

@Enchufa2 Enchufa2 added the bug label Feb 5, 2025
@Enchufa2 Enchufa2 changed the title How does as_units() treat the dot in a character string? Wrong parsing of numeric values as part of units Feb 5, 2025
@Enchufa2
Copy link
Member

Enchufa2 commented Feb 5, 2025

Some more digging:

library(units)
#> udunits database from /usr/share/udunits/udunits2.xml

# supported by udunits2, but I think this is NOT what we want
units:::R_ut_format(units:::R_ut_parse("ml/min/1.73m^2"))
#> [1] "9.63391136801542e-09 m⁵·s⁻¹"

# supported too
units:::R_ut_format(units:::R_ut_parse("ml/min/1.73/m^2"))
#> [1] "9.63391136801542e-09 m·s⁻¹"

# avoid our parsing
x <- structure(
  1, units = structure(
    list(numerator = "ml/min/1.73/m^2", denominator = NULL),
    class="symbolic_units"), 
  class="units")

# conversion works as expected
set_units(x, "m/s")
#> 9.633911e-09 [m/s]

# fine... in a way
unclass(as_units("ml/min/1.73/m^2"))
#> [1] 0.5780347
#> attr(,"units")
#> $numerator
#> [1] "ml"
#> 
#> $denominator
#> [1] "m"   "m"   "min"
#> 
#> attr(,"class")
#> [1] "symbolic_units"

# but we ignore the number
set_units(1, "ml/min/1.73/m^2")
#> Warning in `units<-.numeric`(`*tmp*`, value = as_units(value, ...)): numeric
#> value 0.578034682080925 is ignored in unit assignment
#> 1 [ml/m^2/min]

@bergsmat
Copy link
Author

bergsmat commented Feb 5, 2025

@Enchufa2 To your question above, I think I mean ml/min/(1.73m^2) . One might express filtration rate as volume per time. Since values vary by body size, a common practice is to express the result relative to the body surface area of a reference adult (63 kg body weight, 1.7m height). Literally: "milliliters per minute per 1.73 square meters of body surface area". https://en.wikipedia.org/wiki/Glomerular_filtration_rate

@Enchufa2
Copy link
Member

Enchufa2 commented Feb 5, 2025

I suspected that much, thanks for confirming. We'll look into this to support this use case.

@pepijn-devries
Copy link

I think this is also related to the issue raised here: a negative exponent in a scientific notation is not parsed and throws an error:

library(units)
#> udunits database from C:/ProgramData/R/win-library/4.3/units/share/udunits/udunits2.xml
set_units(1, "1e1g")
#> 1 [1e1g]
set_units(1, "1e+1g")
#> 1 [1e+1g]
set_units(1, "0.1g")
#> 1 [0.1g]
set_units(1, "1e-1g")
#> Error: cannot convert g into 1e
#> Did you try to supply a value in a context where a bare expression was expected?

Created on 2025-03-05 with reprex v2.0.2

@Enchufa2
Copy link
Member

Enchufa2 commented Mar 6, 2025

The fact that it gets through with a positive exponent is just an artifact. Scientific notation is not supported nor in the roadmap.

@pepijn-devries
Copy link

The fact that it gets through with a positive exponent is just an artifact. Scientific notation is not supported nor in the roadmap.

OK, good to know, I will create a work-around for this myself for my purpose

@Enchufa2
Copy link
Member

Enchufa2 commented Mar 6, 2025

What's your use case, BTW?

@pepijn-devries
Copy link

What's your use case, BTW?

It's a package that creates a database from publicly available data (see link below). It includes tables with units for numerical data. However, these text fields are not standardised and contain a lot of annotations and inconsistensies. So I'm writing a function that sanitises these text fields, before parsing them with the units package. This is the code so far, but is work in progress:

https://github.com/pepijn-devries/ECOTOXr/blob/units/R/process_unit.r

The intention is to create a column with mixed_units, than allow the user to convert the mixed units to a specific unit (if possible) or return NA if not convertible. Some unit conversion will require additional information (like molar mass, or solvent density), but that will be hard to automate I guess.

@Enchufa2
Copy link
Member

Enchufa2 commented Mar 6, 2025

But how does the data look like?

I'm asking because, if you are trying to parse data with units, quantities provides parsers.

@pepijn-devries
Copy link

But how does the data look like?

I'm asking because, if you are trying to parse data with units, quantities provides parsers.

Thanks for pointing this out. I think these parsers assume consistent formatting of the input. Whereas my input is a lot messier and not always consistent. That's why I do need to do some tidying before parsing.

Below a random sample of 50 units in the database

 [1] "CHLA:CHLB"       "g/1.8 kg soil"   "tillers/m2"      "cfu/ml"          "AI ng/mg bdwt"   "umol/L"          "mg %"           
 [8] "1e+6/ul"         "nmol/org/0.5 h"  "g/m2 soil"       "mg/kg bdwt/wk"   "umhos/cm2"       "ug/egg"          "mBq/ml"         
[15] "1e-5 g/l"        "nCi/L"           "litter %"        "wk"              "mg/g clay"       "uM/h/mg pro"     "ml/64 m2"       
[22] "ml/wk"           "AI mmol/L"       "ug/100 g bdwt"   "umol/L"          "mmol/100 g bdwt" "BH"              "ae mg/org"      
[29] "%ML"             "ml/10 kg diet"   "NT"              "umol/kg bdwt"    "AI ng/eu"        "pmol/mg/d"       "pt/40 gal"      
[36] "sst"             "AI g/dn(Std)"    "mmol/L soil"     "dS/m"            "g/linear ft"     "ad/jv"           "mg/g pro"       
[43] "FTS:PLC"         "g/jv"            "ug/lf"           "cmol+/kg"        "OD/mi/mg pro"    "g/1.2 kg soil"   "pair"           
[50] "bt"             

@Enchufa2
Copy link
Member

Enchufa2 commented Mar 6, 2025

I see... Yes, quantities' parsers have some flexibility, but otherwise assume consistency, so they cannot help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants