-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pH harmonization issues #454
Comments
Potential similar issues observed with total nitrogen in same example data set. |
I think other characteristics in the "Physical" characteristic type group should be reviewed as well. |
@cefergus and @wokenny13 - should suspended fraction pH results be harmonized with all other pH results? This is the current harmonization table, where the included instances for pH all assume no fraction. |
I am not too familiar with the fraction components for pH in terms of what is commonly used by other organization/states for analysis. Was "suspended" recently added as an allowable fraction text for "pH"? Does the current harmonization table get updated regularly? How does that update process work? Do we know what percentage of pH results have the fraction "suspended" in it? What is the logic on why dissolved and total pH assumes fraction is NA for pH? If this is being harmonized to NA for dissolved and total, perhaps it makes sense to harmonize suspended to all other NA as well. But I am not sure what is best |
I found the "suspended" pH results in a test data set I downloaded from the WQP (see first post in this issue). There are quite a few combinations of characteristic/fraction/speciation that are not included in the current harmonization reference table. There ref table was created by pulling most common combinations of characteristic/fraction/speciation from WQX (see this related issue for more details: #319). I am also working on addressing issue 319 to update/add more combinations to the reference table. I am not sure what percentage of pH results have the fraction suspended, and I can't come up with an easy/efficient way to determine that. I'm currently running a modified version of the new combo script from issue 319 and looping it over 100 random datasets to generate a list of combinations that are not currently in the harmonization reference table. I can let you know how many data sets the pH/suspended combination pops up in. I am looking through previous issues and documentation to see if I can find any discussion re: NA fraction for pH. I haven't found any yet, but I will link it here if I do! |
I'm not that familiar with different fractions of pH - but doing some googling - it seems plausible that there could be suspended pH measurements and including it in the TADA harmonization table makes sense. Does suspended mean that the sample is a mixture of liquid and solids (e.g., sediment)? The alternative would be pH from a filtered sample? It seems like the pH measures could be different depending on whether there are solids (e.g., certain rock types that may have lower/higher pH) in the sample vs if it's been filtered. Thinking like mine waste samples or things along those lines. |
After TADA_UnitConversion and TADA_HarmonizeSynonyms, pH results from some data sets are being grouped into multiple TADA.ComparableDataIdentifiers.
A data set that can be used to see this is:
data <- TADA_DataRetrieval(statecode = "IL",
startDate = "2010-01-01",
endDate = "2020-12-31",
huc = c("0714010505", "0714010504", "0714010508", "0714010501", "0714010503"),
characteristicType = "Physical",
applyautoclean = TRUE)
Ideally, all of these pH results would be identified with the same TADA.ComparableDataIdentifier.
To solve this we could edit the metadata for pH entries using the harmonization table (we can specify that fraction is not needed for PH in the assumptions/notes column). All of these should harmonize to PH_NA_NA_NONE.
The text was updated successfully, but these errors were encountered: