Datasets

Datasets are the primary storage containers for your structured and unstructured research data. They provide a version-controlled environment for organizing files and folders required for analyses by Workflows and Sandboxes.

Creating and Organizing Datasets

To begin uploading data, use the New dataset button in the list view. Each dataset requires a Name and a Version (e.g., 1.0.0), with an optional description for better traceability.

File Management

Inside a dataset, you have full control over the organizational structure:

Folder Structure: Use the New folder button to create a hierarchical organization for your data.
Uploading: You can use Upload for individual files or Upload folder to maintain directory structures from your local machine. Drag-and-drop is also supported.
File Actions: Individual files within a dataset can be downloaded or accessed via their S3 URI using the Copy S3 URI icon. This is particularly useful when referencing specific data paths in your own scripts or workflows.

Dataset Management

The top-right corner of the dataset view provides tools for maintaining the asset:

Edit: Modify the dataset name and description after creation.
Delete: Permanently remove the dataset and all associated files. This action cannot be undone and will prompt a confirmation dialog.

The list view provides an overview of all project datasets, displaying their total size, version number, and last modification date. Use the search bar or the sort toggle (e.g., Newest, Largest) to quickly locate specific assets.

Ready to define groups for your analysis? Let's dive into Cohorts! 🧑‍🤝‍🧑

Datasets ​

Creating and Organizing Datasets ​

File Management ​

Dataset Management ​

Datasets

Creating and Organizing Datasets

File Management

Dataset Management