Skip to content

Generating Datasets

Caleb P. Charpentier edited this page Sep 27, 2023 · 2 revisions

Mesh Generating Functions and Datasets in TraitBlender

Overview

TraitBlender allows users to generate datasets using a combination of trait generating functions and custom tabular data. This guide will help you understand how these elements interface, how to prepare your data, and how to generate a dataset.

Making Mesh Generating Functions

The Basics

To use TraitBlender effectively, you'll need a trait-generating function written in Python. This function should accept parameters that correspond to the traits you're interested in. For example, if you're studying eye diameter, your function should have a parameter named Eye_Diameter.

Examples

To be added

Generating Datasets

Preparing Your Data

  1. Tabular Data: TraitBlender requires a CSV file where each column corresponds to a parameter in your trait-generating function. Make sure the column names in the CSV match the function's parameter names.
  2. Special Columns: Your CSV file must include a special column named either "label," "species," or "tip" (case-insensitive). This column serves as an identifier.

Generating Processes

You can generate this tabular data in various ways. For example, in phylogenetics, you might generate datasets in R using phylogenetic trees and macroevolutionary models like Brownian Motion. Essentially, as long as your CSV file meets the requirements, you can use any process to generate it.

Examples

To be added

Generating an Entire Dataset

You can generate a complete dataset using the generate_dataset.py script. This script requires three arguments:

  • make_mesh_function_path: The absolute path to the Python file containing your trait-generating function. This file should include the function and any required imports.
  • csv_file_path: The path to your CSV file containing the tabular data.
  • json_file_path: The path to a JSON file generated using the "Export Settings" button in the TraitBlender GUI. This ensures that the same settings are applied across the dataset.