-
Notifications
You must be signed in to change notification settings - Fork 6
IMPC web api tasks migration #427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…b_apiimpc_gene_diseases_mapperpy-to-use-airflow 399 migrate impc etljobsloadimpc web apiimpc gene diseases mapperpy to use airflow
…sk, use shared one
…b_apiimpc_batch_query_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_batch_query_mapper to use Airflow
…b_apiimpc_gene_search_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_gene_search_mapper to use Airflow
…b_apiimpc_idg_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_idg_mapper.py to use Airflow
…b_apiimpc_external_links_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_external_links_mapper to use Airflow
…b_apiimpc_gene_histopathology_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_external_links_mapper to use Airflow
…b_apiimpc_gene_images_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_gene_images_mapper to use Airflow
…b_apiimpc_phenotype_pleiotropy_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_phenotype_pleiotropy_mapper.py to use Airflow
…b_apiimpc_embryo_landing_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_embryo_landing_mapper.py to use Airflow
…b_apiimpc_gene_stats_results_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_gene_stats_results_mapper.py to use Airflow
"associationCurated", col("associationCurated").astype(BooleanType()) | ||
) | ||
|
||
max_disease_df.coalesce(100).write.option("ignoreNullFields", "false").json( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from repartition(500) to coalesce(100) for local development
impc_images_df.repartition(500).write.option("ignoreNullFields", "false").json( | ||
output_path | ||
) | ||
impc_images_df.coalesce(100).write.option("ignoreNullFields", "false").json( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from repartition(500) to coalesce(100) for local development
stats_results_df = stats_results_df.withColumn( | ||
"femaleMutantCount", col("femaleMutantCount").astype(IntegerType()) | ||
) | ||
stats_results_df.distinct().coalesce(5).write.option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from repartition(1000) to coalesce(5) for local development
…b_apiimpc_gene_summary_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_gene_summary_mapper.py to use Airflow
"embryoExpressionObservationsAverage" | ||
), | ||
) | ||
gene_avg_df.coalesce(10).write.option("ignoreNullFields", "false").json( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from repartition(1) to coalesce(10) for local development
gene_avg_df.coalesce(10).write.option("ignoreNullFields", "false").json( | ||
output_path + "_avgs" | ||
) | ||
gene_df.coalesce(10).write.option("ignoreNullFields", "false").json( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from repartition(100) to coalesce(10) for local development
…b_apiimpc_phenotype_search_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_phenotype_search_mapper.py to use Airflow
…b_apiimpc_phenotype_summary_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_phenotype_summary_mapper.py to use Airflow
…b_apiimpc_images_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_images_mapper.py to use Airflow
…b_apiimpc_histopathology_datasets_mapperpy-to-use-airflow Migrate impc_etl/jobs/load/impc_web_api/impc_histopathology_datasets_mapper.py to use Airflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all the transformations needed only on development, use:
environment = Variable.get("environment", "development")
And then:
if environment == 'development':
...
Branch containing all migrated tasks related to the impc_web_api_mapper.py original task.
Added shared utils model