Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create report about population #47

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

anitacaron
Copy link
Collaborator

@anitacaron anitacaron commented Jun 24, 2024

Fixes #46

I'd like to make sure we have all the information needed. The report has more than 12 thousand rows, so I'll upload it in google drive and share the link to be downloaded.

Here's a sample of the table: (click on the image to see it larger)

Screenshot 2024-06-24 at 16 55 20

@anitacaron anitacaron self-assigned this Jun 24, 2024
@Melek-C
Copy link
Collaborator

Melek-C commented Jun 25, 2024

Hi @anitacaron it looks like there is many duplicates, if you skip the family column, the table would be much more smaller.

@anitacaron
Copy link
Collaborator Author

Yes, but I did that so it can be easily grouped by family or country of origin. Or don't you need the family information? Can I get feedback from Meriem and Mariem, please?

@Melek-C
Copy link
Collaborator

Melek-C commented Jun 27, 2024

Hi @mariemh23 could you please check if it's ok for you to plot the map?

@mariemh23
Copy link
Collaborator

I think yes, it looks fine.

@mariemh23
Copy link
Collaborator

Hi Anita,
I took a closer look at the table and I think that there is something wrong with the database. 

First, there seems to be a separator problem in the table. Some values have been shifted. For example, some values in the region_name column contain population_name values (e.g. line 6628).
Similarly, in the population_size column, there are also shifted values and some values are missing. There are certain values that contain the symbol ">" or "<" (< 1 million, >3 million) that disturbs the conversion of population sizes into numerical values.
One last point, as raised earlier by Alia, we need a top group family, because with such a large number of values, it will be difficult to assign sufficiently different colors for each group to be easily visible on the map.
Best,

@anitacaron
Copy link
Collaborator Author

@mariemh23, I can fix the region_name and the family columns. However, the population_size is what is available in the ontology, and some values are missing. It would be good to discuss this with @abenkahla so I can change the ontology or just have a post-processing step to remove the symbol > in the population size annotation.

Concatenate family because this is used as a single group.
Get the population region by the geographic location of the population.
Note: this link is not available in the ontology, I did the change
to generate the report.
@anitacaron anitacaron requested a review from mariemh23 July 1, 2024 17:21
@anitacaron
Copy link
Collaborator Author

@mariemh23 I've updated the table in the Google spreadsheet. Could you please check?

@Melek-C
Copy link
Collaborator

Melek-C commented Jul 2, 2024

Hi @anitacaron thanks for the upadtae, we just checked the table with @mariemh23, all the columns looks good excepet the family one as it's not standardized and not presented in a harmonized way. We should move forward with the map draft untill we fix the family column.

@mariemh23
Copy link
Collaborator

Hi anita,
Thank you for your quick reply.
I would also like to ask you about the language location column.is it possible to have the geo-location coordinates in a separate column (separated from the language name).
Many thanks

@anitacaron
Copy link
Collaborator Author

it's not standardized and not presented in a harmonized way

@Melek-C let me know how I can change the family column

it possible to have the geo-location coordinates in a separate column (separated from the language name).

@mariemh23 yeah, I can do it in the spreadsheet, but this is how it's available in the ontology

@Melek-C
Copy link
Collaborator

Melek-C commented Jul 3, 2024

HI @anitacaron, it's a bit tricky with the family column. I think we should keep just one term in the column and select only big families (there different subfamilies).

@anitacaron
Copy link
Collaborator Author

Maybe we could put the subfamilies in another annotation to make the ontology clearer?

@Melek-C
Copy link
Collaborator

Melek-C commented Jul 8, 2024

Maybe we could put the subfamilies in another annotation to make the ontology clearer?

It could be interesting.

@anitacaron
Copy link
Collaborator Author

Do we have a final decision about the family annotation for the report? 😄

@Melek-C
Copy link
Collaborator

Melek-C commented Jul 18, 2024

Hi @anitacaron we are trying to fix some ambiguities with the family annotation as it's not standardized. We will back to you soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create report with population size, family, region and geo location
3 participants