diff --git a/Jupyterhub/convert_to_geotiff.qmd b/Jupyterhub/convert_to_geotiff.qmd deleted file mode 100644 index a192f87..0000000 --- a/Jupyterhub/convert_to_geotiff.qmd +++ /dev/null @@ -1,217 +0,0 @@ -# Jupyter Notebook Guide for Disaster COG Processing - -This guide helps you get started with converting disaster satellite imagery to Cloud Optimized GeoTIFFs (COGs). - -## Quick Start - -### πŸš€ Option 1: Simple Template (Recommended for Most Users) - -Use `templates/simple_disaster_template.ipynb` for a streamlined experience with just 5 cells: - -1. **Open the notebook** - ```bash - jupyter notebook templates/simple_disaster_template.ipynb - ``` - -2. **Configure your event** (Cell 1) - - Set `EVENT_NAME` (e.g., '202408_TropicalStorm_Debby') - - Set `PRODUCT_NAME` (e.g., 'landsat8') - - Modify filename functions to control output names - -3. **Run the cells in order** - - Cell 2: Imports and initializes - - Cell 3: Discovers files and shows preview - - Cell 4: Processes all files - - Cell 5: Reviews results - -### πŸŽ›οΈ Option 2: Advanced Template (For Power Users) - -Use `templates/disaster_processing_template.ipynb` for full control over: -- Memory management -- Chunk configurations -- Processing parameters -- Verification options -- Detailed error handling - -## Configuration Examples - -### Basic Configuration - -```python -EVENT_NAME = '202408_TropicalStorm_Debby' -PRODUCT_NAME = 'landsat8' -BUCKET = 'nasa-disasters' -SOURCE_PATH = f'drcs_activations/{EVENT_NAME}/{PRODUCT_NAME}' -DESTINATION_BASE = 'drcs_activations_new' -OVERWRITE = False # Set True to replace existing files -``` - -### Custom Filename Functions - -Define how your files are renamed: - -```python -def create_truecolor_filename(original_path, event_name): - """Create filename for trueColor products.""" - filename = os.path.basename(original_path) - stem = os.path.splitext(filename)[0] - date = extract_date_from_filename(stem) - - if date: - stem_clean = re.sub(r'_\d{8}', '', stem) - return f"{event_name}_{stem_clean}_{date}_day.tif" - return f"{event_name}_{stem}_day.tif" -``` - -### Map Products to Filename Functions - -```python -FILENAME_CREATORS = { - 'trueColor': create_truecolor_filename, - 'colorInfrared': create_colorinfrared_filename, - 'naturalColor': create_naturalcolor_filename, -} -``` - -## File Organization - -The system automatically: -- **Discovers** files in your S3 source path -- **Categorizes** them by product type (trueColor, NDVI, etc.) -- **Applies** the appropriate filename function -- **Saves** to organized output directories - -### Default Output Structure -``` -drcs_activations_new/ -β”œβ”€β”€ imagery/ -β”‚ β”œβ”€β”€ trueColor/ -β”‚ β”œβ”€β”€ colorIR/ -β”‚ └── naturalColor/ -β”œβ”€β”€ indices/ -β”‚ β”œβ”€β”€ NDVI/ -β”‚ └── MNDWI/ -└── SAR/ - └── processed/ -``` - -## Common Patterns - -### Process Multiple Product Types - -The system automatically detects and processes different product types: - -```python -# Files are auto-categorized by these patterns: -- 'trueColor' β†’ imagery/trueColor/ -- 'colorInfrared' β†’ imagery/colorIR/ -- 'NDVI' β†’ indices/NDVI/ -- 'MNDWI' β†’ indices/MNDWI/ -- 'SAR' β†’ SAR/processed/ -``` - -### Custom No-Data Values - -```python -NODATA_VALUES = { - 'NDVI': -9999, # Specific value for NDVI - 'MNDWI': -9999, # Specific value for MNDWI - 'trueColor': None, # Auto-detect for imagery -} -``` - -### Override Output Directories - -```python -OUTPUT_DIRS = { - 'trueColor': 'Landsat/trueColor', - 'colorInfrared': 'Landsat/colorIR', - 'naturalColor': 'Landsat/naturalColor', -} -``` - -## Troubleshooting - -### Issue: "No files found" -- Check `SOURCE_PATH` is correct -- Verify files exist: `aws s3 ls s3://bucket/path/` - -### Issue: "Failed to connect to S3" -- Check AWS credentials: `aws configure list` -- Ensure bucket access permissions - -### Issue: Files being skipped -- Files already exist in destination -- Set `OVERWRITE = True` to reprocess - -### Issue: Wrong filenames -- Modify filename creator functions -- Re-run from discovery step to preview - -### Issue: Processing is slow -- Large files take time (normal) -- System automatically uses GDAL optimization -- Files >1.5GB use optimized chunking - -## Performance Tips - -1. **File Size Optimization** - - Files <1.5GB: Processed whole (fastest) - - Files >1.5GB: Smart chunking - - Files >7GB: Ultra-large file handling - -2. **Compression** - - Uses ZSTD level 22 (maximum compression) - - Automatic predictor selection - - Intelligent resampling based on data type - -3. **Parallel Processing** - - For batch processing multiple events, use: - ```python - from batch_processor_parallel import process_files_parallel - ``` - -## Advanced Features - -### Using the Helper Module Directly - -```python -from notebooks.notebook_helpers import quick_process - -results = quick_process({ - 'event_name': '202408_TropicalStorm_Debby', - 'bucket': 'nasa-disasters', - 'source_path': 'drcs_activations/202408_TropicalStorm_Debby/landsat8', - 'destination_base': 'drcs_activations_new', - 'overwrite': False, - 'filename_creators': FILENAME_CREATORS -}) -``` - -### Batch Processing Multiple Events - -```python -events = [ - '202408_TropicalStorm_Debby', - '202409_Hurricane_Example', - '202410_Wildfire_Sample' -] - -for event in events: - config['event_name'] = event - config['source_path'] = f'drcs_activations/{event}/landsat8' - processor = SimpleProcessor(config) - processor.connect_to_s3() - processor.discover_files() - processor.process_all() -``` - -## Next Steps - -1. Start with the simple template -2. Run a small test batch -3. Verify output filenames are correct -4. Process full dataset -5. Check results in S3 - -For more details, see the main [README.md](README.md) or review the [RESAMPLING_GUIDE.md](RESAMPLING_GUIDE.md) for data type handling. \ No newline at end of file diff --git a/Jupyterhub/jupyterhub-training-guide.qmd b/Jupyterhub/jupyterhub-training-guide.qmd index 676e39a..0e312a5 100644 --- a/Jupyterhub/jupyterhub-training-guide.qmd +++ b/Jupyterhub/jupyterhub-training-guide.qmd @@ -7,13 +7,7 @@ 4. [Working with Jupyter Notebooks](#working-with-jupyter-notebooks) 5. [Data Management](#data-management) 6. [Environment and Package Management](#environment-and-package-management) -7. [Terminal and Command Line Access](#terminal-and-command-line-access) -8. [Collaboration and Sharing](#collaboration-and-sharing) -9. [Resource Management](#resource-management) -10. [Best Practices](#best-practices) -11. [Troubleshooting](#troubleshooting) -12. [Keyboard Shortcuts](#keyboard-shortcuts) -13. [Resources and Links](#resources-and-links) +7. [Shutting Down](#shutting-down-properly) --- @@ -45,8 +39,6 @@ The **Disasters Hub** (https://hub.disasters.2i2c.cloud/) is a specialized Jupyt βœ… **Pre-configured Environments** - Common packages already installed βœ… **Persistent Storage** - Your work is saved between sessions βœ… **Collaboration Ready** - Share notebooks with team members -βœ… **Scalable Resources** - Access to GPU and high-memory instances when needed - --- ## Getting Started @@ -58,7 +50,11 @@ The **Disasters Hub** (https://hub.disasters.2i2c.cloud/) is a specialized Jupyt - Go to: [https://hub.disasters.2i2c.cloud/](https://hub.disasters.2i2c.cloud/) - Bookmark this URL for easy access -2. **Authentication** +2. **First-Time Login** + - Must sign in through [Keycloak - CI Logon](https://cilogon.org/) + - After Keycloak has been completed, request to be added to the Disasters Jupyterhub account + +3. **Authentication** - You'll see a login screen with authentication options - Common authentication methods: - **GitHub**: Use your GitHub credentials @@ -66,10 +62,6 @@ The **Disasters Hub** (https://hub.disasters.2i2c.cloud/) is a specialized Jupyt - **Institutional Login**: Use your organization's credentials - Select your authentication method and follow the prompts -3. **First-Time Login** - - Accept terms of service if prompted - - Your home directory will be created automatically - - Initial setup may take 30-60 seconds ### Server Selection @@ -78,19 +70,12 @@ After login, you may be presented with server options: ``` Server Options: β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ β€’ Small (2 CPU, 4GB RAM) β”‚ -β”‚ β€’ Medium (4 CPU, 8GB RAM) β”‚ -β”‚ β€’ Large (8 CPU, 16GB RAM) β”‚ -β”‚ β€’ GPU Instance (if available) β”‚ +β”‚ β€’ Small (4 CPU, 4GB RAM) β”‚ +β”‚ β€’ Medium (4 CPU, 7GB RAM) β”‚ +β”‚ β€’ Large (4 CPU, 15GB RAM) β”‚ +β”‚ β€’ Additional resources if needed β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` - -**Tips for Server Selection:** -- Start with **Small** for basic notebook work -- Use **Medium** for data processing tasks -- Choose **Large** for machine learning or big data -- Select **GPU** only when needed (limited availability) - --- ## JupyterHub Interface Overview @@ -136,13 +121,6 @@ Once logged in, you'll see the JupyterLab interface: - Line/column position - File encoding and type -### Creating Your First Notebook - -1. Click the **Python 3** icon in the Launcher -2. Or: File β†’ New β†’ Notebook -3. Select kernel (usually Python 3) -4. Rename your notebook: Right-click on "Untitled.ipynb" β†’ Rename - --- ## Working with Jupyter Notebooks @@ -150,6 +128,7 @@ Once logged in, you'll see the JupyterLab interface: ### Notebook Basics A Jupyter notebook consists of **cells** that can contain: + - **Code**: Executable Python (or other language) code - **Markdown**: Formatted text, equations, and images - **Raw**: Unformatted text @@ -204,27 +183,6 @@ The **kernel** is the computational engine that executes your code. - **[*]**: Cell currently executing - **[1]**: Cell execution number -### Notebook Best Practices - -1. **Use meaningful cell divisions** - - One concept or operation per cell - - Separate imports, data loading, processing, visualization - -2. **Document your work** - ```python - # Good practice: Add comments and markdown cells - # Load disaster response data - df = pd.read_csv('disaster_data.csv') - - # Data preprocessing - df['date'] = pd.to_datetime(df['date']) - df = df.dropna() - ``` - -3. **Clear output before sharing** - - Kernel β†’ Restart & Clear Output - - Reduces file size and removes sensitive output - --- ## Data Management @@ -234,22 +192,14 @@ The **kernel** is the computational engine that executes your code. #### Uploading Files 1. **Drag and drop** files directly into the file browser 2. **Upload button**: Click the ⬆ button in the file browser toolbar -3. **Terminal upload**: Use `wget` or `curl` in terminal - ```bash - wget https://example.com/data.csv - curl -O https://example.com/data.zip - ``` #### Downloading Files 1. **Right-click** file in browser β†’ Download -2. **From notebook**: - ```python - from IPython.display import FileLink - FileLink('results.csv') # Creates downloadable link - ``` ### Working with Cloud Storage +Credentials for reading from S3 are already integrated within the Disasters Hub! + #### AWS S3 Integration ```python import boto3 @@ -262,38 +212,11 @@ df = pd.read_csv('s3://bucket-name/path/to/file.csv') df.to_csv('s3://bucket-name/output/results.csv', index=False) ``` -#### Google Cloud Storage -```python -# Read from GCS -df = pd.read_csv('gs://bucket-name/path/to/file.csv') - -# Using gsutil in terminal -!gsutil cp gs://bucket/file.csv ./data/ -``` - -### Data Organization - -Recommended directory structure: -``` -home/ -β”œβ”€β”€ data/ -β”‚ β”œβ”€β”€ raw/ # Original, immutable data -β”‚ β”œβ”€β”€ processed/ # Cleaned, transformed data -β”‚ └── external/ # Data from external sources -β”œβ”€β”€ notebooks/ -β”‚ β”œβ”€β”€ exploratory/ # Initial explorations -β”‚ β”œβ”€β”€ analysis/ # Detailed analysis -β”‚ └── reports/ # Final reports -β”œβ”€β”€ scripts/ # Reusable Python scripts -β”œβ”€β”€ results/ # Output files, figures -└── requirements.txt # Package dependencies -``` - ### Data Persistence ⚠️ **Important**: Your home directory is persistent, but understand the storage limits: -- **Home directory**: Usually 10-100 GB (persistent) +- **Home directory**: 100 GB/user (persistent) - **Shared data**: Read-only datasets available to all users - **Temporary storage**: `/tmp` cleared on restart - **Best practice**: Store large datasets in cloud storage, not home directory @@ -372,205 +295,6 @@ pip install -r requirements.txt --- -## Terminal and Command Line Access - -### Opening Terminal - -1. **From Launcher**: Click "Terminal" icon -2. **From menu**: File β†’ New β†’ Terminal -3. **Keyboard shortcut**: (varies by setup) - -### Common Terminal Commands - -```bash -# Navigation -pwd # Print working directory -ls -la # List files with details -cd ~/notebooks # Change directory - -# File operations -mkdir project # Create directory -cp file1.txt file2.txt # Copy file -mv oldname newname # Move/rename -rm file.txt # Delete file (careful!) - -# File viewing -cat file.txt # Display file contents -head -n 10 data.csv # First 10 lines -tail -n 10 log.txt # Last 10 lines -less large_file.txt # Page through file - -# Process management -ps aux # List processes -top # Monitor resources -kill -9 PID # Kill process - -# Git operations -git status -git add . -git commit -m "message" -git push -``` - -### Working with Data Files - -```bash -# Count lines in file -wc -l data.csv - -# View CSV structure -head -1 data.csv | tr ',' '\n' | nl - -# Search in files -grep "pattern" file.txt -grep -r "pattern" ./directory - -# Compress/decompress -zip archive.zip file1 file2 -unzip archive.zip -tar -czf archive.tar.gz directory/ -tar -xzf archive.tar.gz -``` - ---- - -## Collaboration and Sharing - -### Sharing Notebooks - -#### Method 1: Direct File Sharing -1. Download notebook: File β†’ Download as β†’ Notebook (.ipynb) -2. Share via email, Slack, or file sharing service -3. Recipient uploads to their JupyterHub - -#### Method 2: Using Git -```bash -# Initialize repository -git init -git add notebook.ipynb -git commit -m "Add analysis notebook" -git remote add origin https://github.com/user/repo.git -git push -u origin main -``` - -#### Method 3: Export Formats -- **HTML**: File β†’ Export Notebook As β†’ HTML -- **PDF**: File β†’ Export Notebook As β†’ PDF (requires LaTeX) -- **Python script**: File β†’ Export Notebook As β†’ Python -- **Markdown**: File β†’ Export Notebook As β†’ Markdown - -### Real-time Collaboration - -Some JupyterHub deployments support real-time collaboration: - -1. **Share workspace link**: Get shareable link from hub admin -2. **Collaborative editing**: Multiple users can edit simultaneously -3. **See collaborator cursors**: Real-time cursor positions -4. **Chat integration**: Built-in chat for discussion - -### Version Control Best Practices - -1. **Clear outputs before committing**: - ```bash - jupyter nbconvert --clear-output notebook.ipynb - ``` - -2. **Use .gitignore**: - ``` - .ipynb_checkpoints/ - __pycache__/ - *.pyc - .DS_Store - data/ # Don't commit large data files - ``` - -3. **Notebook diff tools**: - ```bash - # Install nbdime for better notebook diffs - pip install nbdime - nbdime config-git --enable - ``` - ---- - -## Resource Management - -### Understanding Resource Limits - -Your JupyterHub instance has resource limits: - -```python -# Check available resources -import psutil - -# Memory -memory = psutil.virtual_memory() -print(f"Total RAM: {memory.total / 1e9:.2f} GB") -print(f"Available: {memory.available / 1e9:.2f} GB") -print(f"Used: {memory.percent}%") - -# CPU -print(f"CPU cores: {psutil.cpu_count()}") -print(f"CPU usage: {psutil.cpu_percent()}%") - -# Disk -disk = psutil.disk_usage('/') -print(f"Disk space: {disk.total / 1e9:.2f} GB") -print(f"Disk used: {disk.percent}%") -``` - -### Monitoring Resource Usage - -#### JupyterLab Extension -- Install Resource Usage extension -- Shows real-time memory and CPU usage in status bar - -#### Command line monitoring -```bash -# Real-time resource monitoring -top -htop # If installed - -# Memory usage -free -h - -# Disk usage -df -h -du -sh * # Directory sizes -``` - -### Optimizing Resource Usage - -1. **Clear variables when done**: - ```python - # Clear specific variable - del large_dataframe - - # Clear all variables - %reset -f - - # Garbage collection - import gc - gc.collect() - ``` - -2. **Use efficient data types**: - ```python - # Use categories for strings with few unique values - df['category'] = df['category'].astype('category') - - # Use smaller numeric types when possible - df['count'] = df['count'].astype('int32') # Instead of int64 - ``` - -3. **Process data in chunks**: - ```python - # Read large CSV in chunks - chunk_size = 10000 - for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size): - process_chunk(chunk) - ``` - ### Shutting Down Properly Always shut down kernels and terminals when done: @@ -580,369 +304,11 @@ Always shut down kernels and terminals when done: 3. **Hub Control Panel**: File β†’ Hub Control Panel β†’ Stop My Server 4. **Logout**: File β†’ Log Out -⚠️ **Important**: Idle servers may be automatically culled after a period of inactivity (usually 1-2 hours). - ---- - -## Best Practices - -### Project Organization - -1. **Use consistent naming**: - ``` - 2024-01-15_earthquake_analysis.ipynb # Good - untitled1.ipynb # Bad - ``` - -2. **Create project templates**: - ```python - # notebook_template.ipynb - - # 1. Imports - import pandas as pd - import numpy as np - import matplotlib.pyplot as plt - - # 2. Configuration - pd.set_option('display.max_columns', None) - plt.style.use('seaborn') - - # 3. Data Loading - - # 4. Data Exploration - - # 5. Analysis - - # 6. Results - ``` - -3. **Document dependencies**: - ```python - # Generate requirements.txt - !pip freeze > requirements.txt - ``` - -### Security Considerations - -1. **Never commit credentials**: - ```python - # Bad - api_key = "sk-abc123def456" - - # Good - Use environment variables - import os - api_key = os.environ.get('API_KEY') - ``` - -2. **Use secrets management**: - ```python - # Store secrets in .env file - from dotenv import load_dotenv - load_dotenv() - - # Access secrets - secret = os.getenv('SECRET_KEY') - ``` - -3. **Be careful with outputs**: - - Clear cells containing sensitive information - - Review notebooks before sharing - -### Performance Tips - -1. **Vectorize operations**: - ```python - # Slow - results = [] - for i in range(len(df)): - results.append(df.iloc[i]['column'] * 2) - - # Fast - results = df['column'] * 2 - ``` - -2. **Use built-in functions**: - ```python - # Use pandas/numpy operations instead of loops - df['new_col'] = df['col1'] + df['col2'] # Vectorized - ``` - -3. **Profile your code**: - ```python - %%time # Time entire cell - - %timeit function() # Time single line - - # Detailed profiling - %load_ext line_profiler - %lprun -f function_to_profile function_to_profile() - ``` - ---- - -## Troubleshooting - -### Common Issues and Solutions - -#### Kernel Won't Start -- **Check resources**: Server might be full -- **Try different kernel**: Some kernels may be broken -- **Restart server**: Hub Control Panel β†’ Stop β†’ Start - -#### Package Import Errors -```python -# Check if package is installed -import importlib -if importlib.util.find_spec("package_name") is None: - !pip install package_name - -# Restart kernel after installation -from IPython import get_ipython -get_ipython().kernel.do_shutdown(True) -``` - -#### Out of Memory Errors -1. Clear unnecessary variables: `del variable_name` -2. Use smaller data samples for testing -3. Request larger server instance -4. Process data in chunks - -#### Notebook Won't Save -- **Check disk space**: `df -h` in terminal -- **Check file permissions**: `ls -la notebook.ipynb` -- **Save with new name**: File β†’ Save As -- **Download backup**: File β†’ Download - -#### Connection Issues -- **Check internet connection** -- **Try different browser** -- **Clear browser cache** -- **Check if hub is under maintenance** - -### Getting Help - -1. **Built-in help**: - ```python - help(function_name) - function_name? # Quick help - function_name?? # Source code - ``` - -2. **Documentation**: - - JupyterHub docs: https://jupyterhub.readthedocs.io - - JupyterLab docs: https://jupyterlab.readthedocs.io - - 2i2c docs: https://docs.2i2c.org - -3. **Community support**: - - Discourse forum - - GitHub issues - - Stack Overflow with tags: `jupyter`, `jupyterhub` - ---- - -## Keyboard Shortcuts - -### Command Mode (Blue cell border) -Press `Esc` to enter command mode - -| Shortcut | Action | -|----------|--------| -| `Enter` | Enter edit mode | -| `A` | Insert cell above | -| `B` | Insert cell below | -| `D,D` | Delete cell | -| `Y` | Change to code cell | -| `M` | Change to markdown cell | -| `Shift+Up/Down` | Select multiple cells | -| `Shift+M` | Merge selected cells | -| `C` | Copy cell | -| `X` | Cut cell | -| `V` | Paste cell below | -| `Shift+V` | Paste cell above | -| `Z` | Undo cell deletion | -| `0,0` | Restart kernel | -| `I,I` | Interrupt kernel | - -### Edit Mode (Green cell border) -Press `Enter` to enter edit mode - -| Shortcut | Action | -|----------|--------| -| `Esc` | Enter command mode | -| `Ctrl+Enter` | Run cell | -| `Shift+Enter` | Run cell, select below | -| `Alt+Enter` | Run cell, insert below | -| `Ctrl+S` | Save notebook | -| `Tab` | Code completion | -| `Shift+Tab` | Tooltip | -| `Ctrl+]` | Indent | -| `Ctrl+[` | Dedent | -| `Ctrl+A` | Select all | -| `Ctrl+Z` | Undo | -| `Ctrl+Y` | Redo | - -### JupyterLab Shortcuts - -| Shortcut | Action | -|----------|--------| -| `Ctrl+Shift+C` | Command palette | -| `Ctrl+B` | Toggle left sidebar | -| `Ctrl+Shift+D` | Toggle file browser | -| `Ctrl+Shift+F` | Find and replace | -| `Ctrl+Shift+[` | Previous tab | -| `Ctrl+Shift+]` | Next tab | -| `Alt+W` | Close tab | - ---- - -## Resources and Links - -### Official Documentation - -- **JupyterHub Documentation**: https://jupyterhub.readthedocs.io -- **JupyterLab Documentation**: https://jupyterlab.readthedocs.io -- **Jupyter Notebook Documentation**: https://jupyter-notebook.readthedocs.io -- **2i2c Infrastructure Guide**: https://docs.2i2c.org - -### Tutorials and Learning Resources - -- **Jupyter Tutorial**: https://jupyter.org/try -- **Real Python Jupyter Guide**: https://realpython.com/jupyter-notebook-introduction/ -- **DataCamp Jupyter Tutorial**: https://www.datacamp.com/tutorial/tutorial-jupyter-notebook -- **Official Jupyter Examples**: https://github.com/jupyter/jupyter/wiki/Gallery-of-Jupyter-Notebooks - -### Disaster Response Specific Resources - -- **NASA Disasters Program**: https://disasters.nasa.gov -- **USGS Hazards Data**: https://www.usgs.gov/natural-hazards -- **NOAA Disaster Data**: https://www.ncdc.noaa.gov/billions/ -- **Copernicus Emergency Management**: https://emergency.copernicus.eu - -### Python Libraries for Disaster Analysis - -```python -# Geospatial analysis -import geopandas as gpd -import rasterio -import xarray as xr -import folium - -# Data processing -import pandas as pd -import numpy as np -import dask.dataframe as dd - -# Visualization -import matplotlib.pyplot as plt -import seaborn as sns -import plotly.express as px - -# Machine learning -from sklearn import * -import tensorflow as tf -import torch - -# Earth observation -import ee # Google Earth Engine -import planetary_computer as pc -import pystac_client -``` - -### Helpful Extensions - -Install JupyterLab extensions for enhanced functionality: - -```bash -# Variable inspector -jupyter labextension install @lckr/jupyterlab_variableinspector - -# Table of contents -jupyter labextension install @jupyterlab/toc - -# Git integration -pip install jupyterlab-git - -# Code formatter -pip install jupyterlab-code-formatter -``` - -### Community and Support - -- **Jupyter Discourse Forum**: https://discourse.jupyter.org -- **Stack Overflow**: https://stackoverflow.com/questions/tagged/jupyter -- **GitHub Issues**: https://github.com/jupyterhub/jupyterhub/issues -- **2i2c Support**: https://2i2c.org/support -- **Gitter Chat**: https://gitter.im/jupyterhub/jupyterhub - -### Quick Reference PDFs - -- **JupyterLab Cheat Sheet**: https://www.datacamp.com/cheat-sheet/jupyterlab-cheat-sheet -- **Jupyter Shortcuts PDF**: https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/ -- **Markdown Guide**: https://www.markdownguide.org/cheat-sheet/ - ---- - -## Appendix: Sample Workflow - -Here's a complete example workflow for disaster analysis: - -```python -# 1. Setup and Imports -import pandas as pd -import geopandas as gpd -import matplotlib.pyplot as plt -import folium -from datetime import datetime, timedelta -import warnings -warnings.filterwarnings('ignore') - -# 2. Load Data -# Earthquake data -earthquakes = pd.read_csv('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_month.csv') -earthquakes['time'] = pd.to_datetime(earthquakes['time']) - -# 3. Data Processing -# Filter recent events -recent = earthquakes[earthquakes['time'] > datetime.now() - timedelta(days=7)] - -# Convert to GeoDataFrame -geometry = gpd.points_from_xy(recent.longitude, recent.latitude) -geo_df = gpd.GeoDataFrame(recent, geometry=geometry, crs='EPSG:4326') - -# 4. Analysis -print(f"Total earthquakes in last 7 days: {len(recent)}") -print(f"Average magnitude: {recent['mag'].mean():.2f}") -print(f"Largest earthquake: {recent['mag'].max():.2f}") - -# 5. Visualization -# Static plot -fig, ax = plt.subplots(figsize=(12, 8)) -world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres')) -world.plot(ax=ax, color='lightgray', edgecolor='black') -geo_df.plot(ax=ax, color='red', markersize=geo_df['mag']**2, alpha=0.6) -plt.title('Recent Earthquakes (M4.5+)') -plt.show() - -# Interactive map -m = folium.Map(location=[0, 0], zoom_start=2) -for idx, row in geo_df.iterrows(): - folium.CircleMarker( - location=[row['latitude'], row['longitude']], - radius=row['mag']*2, - popup=f"M{row['mag']} - {row['place']}", - color='red', - fill=True - ).add_to(m) -m.save('earthquake_map.html') - -# 6. Export Results -geo_df.to_csv('processed_earthquakes.csv', index=False) -print("Analysis complete! Results saved.") -``` +⚠️ **Important**: Idle servers will be automatically culled after a period of inactivity (usually 1-2 hours). --- -*Last Updated: 2024* +*Last Updated: 2025* *Version: 1.0* *Disasters Hub Training Guide* diff --git a/Jupyterhub/setup-disaster-repo.qmd b/Jupyterhub/setup-disaster-repo.qmd deleted file mode 100644 index fa4f4b5..0000000 --- a/Jupyterhub/setup-disaster-repo.qmd +++ /dev/null @@ -1,492 +0,0 @@ -# Setting Up Disaster Repository - Step-by-Step Guide - -## Table of Contents -1. [Prerequisites](#prerequisites) -2. [GitHub Account Setup](#github-account-setup) -3. [Configure Git Identity](#configure-git-identity) -4. [GitHub Authentication Setup](#github-authentication-setup) -5. [Clone the Repository](#clone-the-repository) -6. [Working with Branches](#working-with-branches) -7. [Making Changes and Pushing](#making-changes-and-pushing) -8. [Troubleshooting Common Issues](#troubleshooting-common-issues) - ---- - -## Prerequisites - -Before starting, ensure you have: -- Git installed in your JupyterHub environment -- Access to terminal in JupyterHub -- Internet connection -- GitHub account (we'll create one if needed) - -Check if Git is installed: -```bash -git --version -``` - -If not installed, contact your JupyterHub administrator. - ---- - -## GitHub Account Setup - -### Step 1: Create GitHub Account (if you don't have one) - -1. Visit [https://github.com](https://github.com) -2. Click **Sign up** -3. Enter your details: - - **Username**: Choose carefully (this is permanent and public) - - **Email**: Use your professional/institutional email - - **Password**: Create a strong password -4. Verify your email address -5. Complete profile setup - -### Step 2: Enable Two-Factor Authentication (Recommended) - -1. Go to **Settings** β†’ **Password and authentication** -2. Click **Enable two-factor authentication** -3. Use an authenticator app (Google Authenticator, Authy, or Microsoft Authenticator) -4. Save backup codes securely - ---- - -## Configure Git Identity - -Configure Git with your GitHub account information: - -```bash -# Set your name (visible in commits) -git config --global user.name "Your Full Name" - -# Set your email (MUST match your GitHub account email) -git config --global user.email "your.email@example.com" - -# Set default branch name to main -git config --global init.defaultBranch main - -# Enable colored output for better readability -git config --global color.ui auto - -# Verify your configuration -git config --list -``` - -**Example:** -```bash -git config --global user.name "Kyle Lesinger" -git config --global user.email "kyle.lesinger@example.com" -``` - ---- - -## GitHub Authentication Setup - -Since GitHub no longer supports password authentication, you need to use either: -1. **Personal Access Token** (Easier for JupyterHub) -2. **SSH Keys** (More secure, one-time setup) -3. **GitHub CLI** (Recommended - handles auth automatically) - -### Option 1: GitHub CLI Authentication (Recommended) - -```bash -# Authenticate with GitHub CLI -gh auth login - -# Follow the prompts: -# 1. Choose: GitHub.com -# 2. Choose: HTTPS (recommended for JupyterHub) -# 3. Choose: Login with a web browser -# 4. Copy the one-time code shown -# 5. Press Enter to open browser (or manually visit https://github.com/login/device) -# 6. Enter the code and authorize - -# Verify authentication -gh auth status -``` - -### Option 2: Personal Access Token - -1. Go to GitHub.com β†’ **Settings** β†’ **Developer settings** -2. Click **Personal access tokens** β†’ **Tokens (classic)** -3. Click **Generate new token** β†’ **Generate new token (classic)** -4. Name it: "JupyterHub Access" -5. Set expiration (90 days recommended) -6. Select scopes: - - βœ… `repo` (Full control of private repositories) - - βœ… `workflow` (Update GitHub Action workflows) -7. Click **Generate token** -8. **COPY THE TOKEN IMMEDIATELY** (you won't see it again!) - -Store the token securely for use when pushing: -```bash -# Store credentials (will be saved after first use) -git config --global credential.helper store -``` - -### Option 3: SSH Key Setup - -```bash -# Generate SSH key -ssh-keygen -t ed25519 -C "your.email@example.com" -# Press Enter for default location -# Optionally set a passphrase - -# Display your public key -cat ~/.ssh/id_ed25519.pub - -# Copy the entire output, then: -# 1. Go to GitHub.com β†’ Settings β†’ SSH and GPG keys -# 2. Click "New SSH key" -# 3. Paste your key and save - -# Test SSH connection -ssh -T git@github.com -``` - ---- - -## Clone the Repository - -### Step 1: Clone the Repository - -```bash -# Navigate to your workspace -cd ~/ - -# Clone the repository (creates a new folder called 'conversion_scripts') -git clone https://github.com/kyle-lesinger/conversion_scripts.git - -# Navigate into the repository -cd conversion_scripts - -# Verify the clone -ls -la -git status -``` - -### Step 2: Verify Remote Configuration - -```bash -# Check current remotes -git remote -v - -# You should see: -# origin https://github.com/kyle-lesinger/conversion_scripts.git (fetch) -# origin https://github.com/kyle-lesinger/conversion_scripts.git (push) -``` - -### Step 3: (Optional) Switch to SSH Remote - -If you set up SSH keys and prefer using SSH: - -```bash -# Remove HTTPS remote -git remote remove origin - -# Add SSH remote -git remote add origin git@github.com:kyle-lesinger/conversion_scripts.git - -# Verify the change -git remote -v -``` - ---- - -## Working with Branches - -### Create a New Branch - -Always create a new branch for your work instead of committing directly to main: - -```bash -# Make sure you're on the main branch -git checkout main - -# Pull latest changes -git pull origin main - -# Create and switch to a new branch -git checkout -b feature/your-feature-name - -# Example branch names: -# git checkout -b feature/add-preprocessing -# git checkout -b bugfix/fix-data-pipeline -# git checkout -b docs/update-readme -``` - -### Verify Your Branch - -```bash -# Check which branch you're on -git branch - -# List all branches (local and remote) -git branch -a -``` - ---- - -## Making Changes and Pushing - -### Step 1: Make Your Changes - -```bash -# Create or edit files -echo "# Conversion Scripts" > README.md -echo "This repository contains data conversion scripts." >> README.md - -# Check what files have changed -git status -``` - -### Step 2: Stage and Commit Changes - -```bash -# Add specific files -git add README.md - -# Or add all changes -git add . - -# Commit with descriptive message -git commit -m "Add README with project description" - -# View commit history -git log --oneline -``` - -### Step 3: Push to GitHub - -#### First Time Push (new branch): -```bash -# Push and set upstream branch -git push -u origin feature/your-feature-name - -# If using Personal Access Token, enter: -# Username: your-github-username -# Password: your-personal-access-token (NOT your GitHub password!) -``` - -#### Subsequent Pushes: -```bash -# After upstream is set, simply: -git push -``` - -### Step 4: Create Pull Request - -```bash -# Using GitHub CLI (if authenticated) -gh pr create --title "Add README documentation" --body "Added project description" - -# Or manually: -# 1. Visit https://github.com/kyle-lesinger/conversion_scripts -# 2. Click "Compare & pull request" button -# 3. Add title and description -# 4. Click "Create pull request" -``` - ---- - -## Complete Workflow Example - -Here's a complete example workflow from start to finish: - -```bash -# 1. Configure Git (one-time setup) -git config --global user.name "Kyle Lesinger" -git config --global user.email "kyle.lesinger@example.com" - -# 2. Authenticate with GitHub CLI -gh auth login -# Follow the interactive prompts - -# 3. Clone the repository -cd ~/ -git clone https://github.com/kyle-lesinger/conversion_scripts.git -cd conversion_scripts - -# 4. Create a new branch -git checkout -b feature/add-conversion-script - -# 5. Create a new file -cat > convert_data.py << 'EOF' -#!/usr/bin/env python3 -""" -Data conversion utility script -""" - -def convert_format(input_file, output_file): - """Convert data from one format to another""" - print(f"Converting {input_file} to {output_file}") - # Add conversion logic here - -if __name__ == "__main__": - convert_format("input.txt", "output.json") -EOF - -# 6. Stage and commit -git add convert_data.py -git commit -m "Add data conversion utility script" - -# 7. Push to GitHub -git push -u origin feature/add-conversion-script - -# 8. Create pull request -gh pr create --title "Add data conversion script" --body "Initial conversion utility" -``` - ---- - -## Troubleshooting Common Issues - -### Issue 1: Authentication Failed - -**Error:** `remote: Invalid username or password` - -**Solution:** -```bash -# Use Personal Access Token instead of password -# When prompted for password, paste your token - -# Or use GitHub CLI -gh auth login -``` - -### Issue 2: Permission Denied (publickey) - -**Error:** `git@github.com: Permission denied (publickey)` - -**Solution:** -```bash -# Check if SSH key exists -ls -la ~/.ssh/ - -# Generate new key if needed -ssh-keygen -t ed25519 -C "your.email@example.com" - -# Add to SSH agent -eval "$(ssh-agent -s)" -ssh-add ~/.ssh/id_ed25519 - -# Add public key to GitHub account -cat ~/.ssh/id_ed25519.pub -# Copy output and add to GitHub.com β†’ Settings β†’ SSH Keys -``` - -### Issue 3: Remote Already Exists - -**Error:** `error: remote origin already exists` - -**Solution:** -```bash -# Remove existing remote -git remote remove origin - -# Add new remote -git remote add origin https://github.com/kyle-lesinger/conversion_scripts.git -``` - -### Issue 4: Rejected Push (Non-fast-forward) - -**Error:** `! [rejected] main -> main (non-fast-forward)` - -**Solution:** -```bash -# Pull latest changes first -git pull origin main --rebase - -# Then push -git push origin main -``` - -### Issue 5: Wrong Branch - -**Error:** Working on main branch instead of feature branch - -**Solution:** -```bash -# Create new branch with current changes -git checkout -b feature/my-changes - -# Push to new branch -git push -u origin feature/my-changes -``` - ---- - -## Best Practices - -1. **Always work in branches** - Never commit directly to main -2. **Pull before pushing** - Always sync with remote before pushing -3. **Use descriptive commit messages** - Explain what and why -4. **Commit frequently** - Small, logical commits are better -5. **Keep tokens secure** - Never commit tokens or passwords -6. **Test locally** - Run your code before committing - ---- - -## Quick Command Reference - -```bash -# Clone repository -git clone https://github.com/kyle-lesinger/conversion_scripts.git - -# Create branch -git checkout -b feature/new-feature - -# Check status -git status - -# Add files -git add . - -# Commit -git commit -m "Description of changes" - -# Push new branch -git push -u origin feature/new-feature - -# Push existing branch -git push - -# Pull latest changes -git pull origin main - -# Switch branches -git checkout branch-name - -# List branches -git branch -a - -# Delete local branch -git branch -d branch-name - -# View commit history -git log --oneline --graph -``` - ---- - -## Additional Resources - -- [GitHub Docs](https://docs.github.com) -- [Git Documentation](https://git-scm.com/doc) -- [GitHub CLI Manual](https://cli.github.com/manual) -- [Pro Git Book (Free)](https://git-scm.com/book) - ---- - -## Getting Help - -If you encounter issues not covered here: - -1. Check the repository issues: https://github.com/kyle-lesinger/conversion_scripts/issues -2. Ask in the JupyterHub support channel -3. Consult the comprehensive [Git/GitHub guide](../Github/git-github-comprehensive-guide.md) - ---- - -*Last Updated: 2024* -*Version: 1.0* \ No newline at end of file diff --git a/_quarto.yml b/_quarto.yml index 79d5c13..a21761a 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -56,9 +56,7 @@ website: text: JupyterHub contents: - Jupyterhub/jupyterhub-training-guide.qmd - - Jupyterhub/setup-disaster-repo.qmd - - Jupyterhub/convert_to_geotiff.qmd - - Jupyterhub/simple_disaster_template.ipynb + - Jupyterhub/clone_conversion_repo.ipynb - section: workflow2.qmd text: Data Workflow Diagrams contents: diff --git a/workflow.qmd b/workflow.qmd index c6572af..df2c26f 100644 --- a/workflow.qmd +++ b/workflow.qmd @@ -1,52 +1,13 @@ --- -title: "U.S. Greenhouse Gas Center: Data Flow Diagrams" +title: "NASA Disasters: Data Flow Diagrams" --- -Welcome to the homepage for [U.S. Greenhouse Gas (GHG) Center](https://earth.gov/ghgcenter) data flow diagrams. These diagrams summarize the process a dataset goes through from acquisition to integration in the U.S. GHG Center. +Welcome to the homepage for [NASA Disasters](https://appliedsciences.nasa.gov/what-we-do/disasters) data flow diagrams. These diagrams summarize the process of how to find, download, and process data for NASA Disasters. Click on a dataset name to view the data flow diagram for that dataset. -[View the US GHG Center Data Catalog](https://earth.gov/ghgcenter/data-catalog) +[View the NASA Disasters Resources](https://appliedsciences.nasa.gov/what-we-do/disasters/practitioner-resources#portal) -- [Air-Sea COβ‚‚ Flux, ECCO-Darwin Model v5 Data Flow Diagram](data_workflow/eccodarwin-co2flux-monthgrid-v5_Data_Flow.qmd) +- [NRT Data Download](data_workflow2/NRT_data_download.qmd) -- [Atmospheric Carbon Dioxide Concentrations from the NOAA Global Monitoring Laboratory Data Flow Diagram](data_workflow/noaa-gggrn-co2-concentrations_Data_Flow.qmd) - -- [Atmospheric Methane Concentrations from the NOAA Global Monitoring Laboratory Data Flow Diagram](data_workflow/noaa-gggrn-ch4-concentrations_Data_Flow.qmd) - -- [Carbon Dioxide and Methane Concentrations from the Indianapolis Flux Experiment (INFLUX) Data Flow Diagram](data_workflow/influx-testbed-ghg-concentrations_Data_Flow.qmd) - -- [Carbon Dioxide and Methane Concentrations from the Los Angeles Megacity Carbon Project Data Flow Diagram](data_workflow/lam-testbed-ghg-concentrations_Data_Flow.qmd) - -- [Carbon Dioxide and Methane Concentrations from the Northeast Corridor (NEC) Urban Test Bed Data Flow Diagram](data_workflow/nec-testbed-ghg-concentrations_Data_Flow.qmd) - -- [CarbonTracker-CHβ‚„ Isotopic Methane Inverse Fluxes Data Flow Diagram](data_workflow/ct-ch4-monthgrid-v2023_Data_Flow.qmd) - -- [EMIT Methane Point Source Plume Complexes Data Flow Diagram](data_workflow/emit-ch4plume-v1_Data_Flow.qmd) - -- [Geostationary Satellite Observations of Extreme and Transient Methane Emissions from Oil and Gas Infrastructure Complexes Data Flow Diagram](data_workflow/goes-ch4plume-v1_Data_Flow.qmd) - -- [GOSAT-based Top-down Total and Natural Methane Emissions Data Flow Diagram](data_workflow/gosat-based-ch4budget-yeargrid-v1_Data_Flow.qmd) - -- [GRAΒ²PES Greenhouse Gas and Air Quality Species Data Flow Diagram](data_workflow/gra2pes-ghg-monthgrid-v1_Data_Flow.qmd) - -- [MiCASA Land Carbon Flux Data Flow Diagram](data_workflow/micasa-carbonflux-daygrid-v1_Data_Flow.qmd) - -- [OCO-2 GEOS Column COβ‚‚ Concentrations Data Flow Diagram](data_workflow/oco2geos-co2-daygrid-v10r_Data_Flow.qmd) - -- [OCO-2 MIP Top-Down COβ‚‚ Budgets Data Flow Diagram](data_workflow/oco2-mip-co2budget-yeargrid-v1_Data_Flow.qmd) - -- [ODIAC Fossil Fuel COβ‚‚ Emissions Data Flow Diagram](data_workflow/odiac-ffco2-monthgrid-v2024_Data_Flow.qmd) - -- [SEDAC Gridded World Population Density Data Flow Diagram](data_workflow/sedac-popdensity-yeargrid5yr-v4.11_Data_Flow.qmd) - -- [U.S. Gridded Anthropogenic Methane Emissions Inventory Data Flow Diagram](data_workflow/epa-ch4emission-grid-v2express_Data_Flow.qmd) - -- [Vulcan Fossil Fuel COβ‚‚ Emissions Data Flow Diagram](data_workflow/vulcan-ffco2-yeargrid-v4_Data_Flow.qmd) - -- [Wetland Methane Emissions, LPJ-EOSIM Model Data Flow Diagram](data_workflow/lpjeosim-wetlandch4-grid-v1_Data_Flow.qmd) - - -## Contact - -For technical help or general questions, please contact the support team using the [feedback form](https://docs.google.com/forms/d/e/1FAIpQLSeVWCrnca08Gt_qoWYjTo6gnj1BEGL4NCUC9VEiQnXA02gzVQ/viewform). \ No newline at end of file +- [NRT Directory Structure](data_workflow2/NRT_directory_structure.qmd) \ No newline at end of file diff --git a/workflow2.qmd b/workflow2.qmd deleted file mode 100644 index df2c26f..0000000 --- a/workflow2.qmd +++ /dev/null @@ -1,13 +0,0 @@ ---- -title: "NASA Disasters: Data Flow Diagrams" ---- - -Welcome to the homepage for [NASA Disasters](https://appliedsciences.nasa.gov/what-we-do/disasters) data flow diagrams. These diagrams summarize the process of how to find, download, and process data for NASA Disasters. - -Click on a dataset name to view the data flow diagram for that dataset. - -[View the NASA Disasters Resources](https://appliedsciences.nasa.gov/what-we-do/disasters/practitioner-resources#portal) - -- [NRT Data Download](data_workflow2/NRT_data_download.qmd) - -- [NRT Directory Structure](data_workflow2/NRT_directory_structure.qmd) \ No newline at end of file