Skip to content

Epidemiological Data Visualisation

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

ukhsa-collaboration/epiviz

Repository files navigation

epiviz

Lifecycle: experimental R-CMD-check

epiviz provides easy-to-use data visualisation functions for R data science products. The package includes functions to create a range of static and interactive visualisations. Each visualisation function can produce either a static (ggplot2) or dynamic (plotly/leaflet) output using the same set of parameters, allowing users to easily switch between visualisation types by simply changing the dynamic flag.

Installation

epiviz can be installed from GitHub using the following code:

# Install epiviz
remotes::install_github('ukhsa-collaboration/epiviz')

# To install the development version
remotes::install_github("ukhsa-collaboration/epiviz@dev")

Features

The following visualisation functions are available in the current release of epiviz:

  • line_chart(): Creates line charts with options for grouping, dual axes, threshold lines, and confidence limits (as ribbons or error bars).
  • point_chart(): Creates scatter/point charts with options for grouping, sizing points by values, custom shapes, and confidence intervals.
  • col_chart(): Creates column charts with options for grouping, dual axes, threshold lines, and confidence limits (as ribbons or error bars).
  • epi_curve(): Creates epidemic curves with options for different time periods, rolling averages, cumulative sums, and grouping by categories.
  • epi_map(): Creates choropleth maps with customisable colour scales, labels, and boundaries for different administrative levels.
  • age_sex_pyramid(): Creates age-sex pyramids with customisable age bands and confidence intervals.

All visualisation functions follow a consistent interface:

function_name(
  dynamic = FALSE,  # Set to TRUE for interactive plotly/leaflet output
  base = NULL,      # Optional base plot to add to
  params = list(    # List of parameters controlling the visualisation
    df = data,      # Data frame containing the data to visualise
    ...             # Function-specific parameters
  ),
  ...               # Additional arguments passed to underlying plotting functions
)

Key Features

  • Consistent Interface: All functions use the same parameter structure, making it easy to learn and use the package.
  • Static/Dynamic Flexibility: Switch between static (ggplot2) and interactive (plotly/leaflet) visualisations by changing a single parameter.
  • Customisation Options: Extensive parameters for customising colours, labels, axes, legends, and more.
  • Confidence Intervals: Support for displaying confidence intervals across different visualisation types.
  • Grouping Support: Easily create grouped visualisations with appropriate legends.
  • Dual Axes: Support for secondary y-axes in applicable chart types.
  • Threshold Lines: Add horizontal reference lines to applicable chart types.

Experimental Features

LLM Interpretation

The package includes an experimental function llm_interpret() that uses Large Language Models (LLMs) to automatically interpret epidemiological data or visualisations. This function can:

  • Generate narrative interpretations of data frames
  • Describe and analyse ggplot visualisations
  • Provide epidemiologically relevant observations
# Example: Interpret a data frame
interpretation <- llm_interpret(
  input = summarised_df,
  word_limit = 150
)

# Example: Interpret a ggplot visualisation
plot <- line_chart(
  dynamic = FALSE,
  params = list(
    df = summarised_df,
    x = "specimen_date",
    y = "count",
    group_var = "organism_species_name"
  )
)
plot_interpretation <- llm_interpret(plot)

Supported LLM Providers

The function supports multiple LLM providers:

  • OpenAI: Models like gpt-4o, gpt-4o-mini, o1-mini
  • Google Gemini: Models like gemini-1.5-flash
  • Anthropic Claude: Models like claude-1

Environment Variable Setup

To use the llm_interpret() function, you need to set up the following environment variables:

# In your .Renviron file or before calling the function:
Sys.setenv(LLM_PROVIDER = "openai")  # Choose from: "openai", "gemini", "claude"
Sys.setenv(LLM_API_KEY = "your-api-key-here")
Sys.setenv(LLM_MODEL = "gpt-4o")  # Use an appropriate model for your chosen provider

You can set these environment variables in your .Renviron file for persistent configuration:

# .Renviron file
LLM_PROVIDER=openai
LLM_API_KEY=your-api-key-here
LLM_MODEL=gpt-4o

Package data

epiviz includes an anonymised sample dataset from the SGSS laboratory database: lab_data()

The Second Generation Surveillance System (SGSS) is a data asset held by the UK Health Security Agency (formerly Public Health England) which routinely and automatically collects laboratory data from across England.

Laboratories return data on organisms isolated from samples such as the organism species, specimen type, sampling date and antimicrobial susceptibility testing results. These data are routinely used for public health surveillance and epidemiology in England.

Examples

Line Chart

# Create a static line chart
line_chart(
  dynamic = FALSE,
  params = list(
    df = summarised_df,
    x = "specimen_date",
    y = "count",
    group_var = "organism_species_name",
    line_colour = c("blue", "green", "orange"),
    line_type = c("solid", "dotted", "dashed")
  )
)

# Convert to an interactive plotly chart by changing dynamic to TRUE
line_chart(
  dynamic = TRUE,
  params = list(
    df = summarised_df,
    x = "specimen_date",
    y = "count",
    group_var = "organism_species_name",
    line_colour = c("blue", "green", "orange"),
    line_type = c("solid", "dotted", "dashed")
  )
)

Age-Sex Pyramid

# Create a static age-sex pyramid
age_sex_pyramid(
  dynamic = FALSE,
  params = list(
    df = lab_data,
    var_map = list(
      date_of_birth = 'date_of_birth',
      sex = 'sex'
    ),
    grouped = FALSE,
    colours = c("pink", "blue")
  )
)

# Convert to an interactive plotly chart by changing dynamic to TRUE
age_sex_pyramid(
  dynamic = TRUE,
  params = list(
    df = lab_data,
    var_map = list(
      date_of_birth = 'date_of_birth',
      sex = 'sex'
    ),
    grouped = FALSE,
    colours = c("pink", "blue")
  )
)

About

Epidemiological Data Visualisation

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages