Datasets

Organize and version your training data.

Datasets View

Creating a Dataset

Navigate to Deep Learning PlatformDatasets → Click Create

Create Dataset Form

Basic Information

Dataset Name* (Required)

  • Enter a descriptive name for the dataset

  • Example: imagenet-1k, coco-2017, custom-dataset

Version* (Required)

  • Semantic version (major.minor.patch)

  • Default: 1.0.0

  • Helper text: "Semantic version (major.minor.patch)"

Description* (Required)

  • Detailed description of dataset contents and purpose

Dataset Configuration

Dataset Type* (Required)

  • Select data type from dropdown:

    • Tabular

    • Image

    • Text

    • Audio

    • Video

    • Others

  • Default: tabular

Task Type* (Required)

  • Select task type:

    • Classification

    • Regression

    • Detection

    • Segmentation

    • Others

  • Default: classification

Data Format* (Required)

  • Select file format:

    • CSV

    • JSON

    • Parquet

    • TFRecord

    • Others

  • Default: csv

Status* (Required)

  • Current dataset status:

    • Uploading

    • Processing

    • Ready

    • Failed

  • Default: uploading

Metadata

Tags (Optional)

  • Comma-separated tags for organization

  • Example: computer-vision, training, augmented

License (Required)

  • Select license:

    • MIT

    • CC BY 4.0

    • CC0

    • Apache 2.0

    • Proprietary

  • Default: MIT

Public Access (Checkbox)

  • Make dataset accessible to all organization members

Actions

  • Cancel: Discard and close

  • Create Dataset: Submit and create the dataset

Example Configuration

Viewing Dataset Details

To view detailed information about a dataset:

  1. Navigate to Deep Learning PlatformDatasets

  2. Click on a dataset from the list

  3. View comprehensive details in the modal dialog

View Dataset Details

Details Panel Sections:

Basic Information:

  • Dataset Name: e.g., "Time Series Sales Data"

  • Version: Semantic version (e.g., 1.1.0)

  • Description: Full description of the dataset

Dataset Configuration:

  • Dataset Type: Time Series, Tabular, Image, Text, Audio, Video

  • Task Type: Forecasting, Classification, Regression, etc.

  • Data Format: CSV, JSON, Parquet, TFRecord, etc.

  • Status: Processing, Uploading, Ready, Failed

Metadata:

  • Tags: Comma-separated tags (e.g., "time-series,forecasting,sales,multivariate,business")

  • License: Proprietary, MIT, CC BY 4.0, etc.

  • Public Access: Checkbox for visibility

Editing a Dataset

To update dataset information:

  1. Open dataset details page

  2. Click Edit button (or three-dot menu → Edit)

  3. Modify editable fields in the Edit Dataset modal

Edit Dataset Form
  1. Click Update Dataset to save changes

[!NOTE] The Edit form is identical to the View form, but with editable fields and an "Update Dataset" button.

Editable Fields:

  • ✅ Description

  • ✅ Tags

  • ✅ License

  • ✅ Public Access

  • ❌ Dataset Name (cannot edit)

  • ❌ Version (create new version instead)

  • ❌ Data Type (cannot edit)

  • ❌ Data Format (cannot edit)

Downloading Dataset

To download dataset files:

  1. Open dataset details

  2. Click Download button

  3. Select download format:

    • Original format

    • Compressed archive

    • Specific splits (train/val/test)

  4. Download will start

Download Options:

  • Full dataset

  • Specific splits only

  • Sample subset

  • Metadata only

Creating a New Version

To create a new version of a dataset:

  1. Open dataset details

  2. Click New Version button

  3. Enter new version number

  4. Upload updated data

  5. Document changes in description

  6. Click Create

When to Version:

  • Added new samples

  • Fixed data errors

  • Changed preprocessing

  • Updated annotations

  • Removed corrupted files

Deleting a Dataset

To remove a dataset:

  1. Navigate to dataset details

  2. Click Delete button

  3. Confirm deletion

[!WARNING] You cannot delete a dataset that is being used by running experiments. Stop experiments first.

Before Deleting:

  • Check for experiments using this dataset

  • Download data if needed

  • Update documentation

  • Consider archiving instead

Dataset Versioning

When to Create New Version:

  • Added new samples

  • Fixed annotation errors

  • Changed preprocessing

  • Updated splits

  • Removed corrupted data

Best Practices

Organization:

  • Use consistent naming conventions

  • Document data collection process

  • Include data cards/datasheets

  • Provide sample data for preview

Quality Control:

  • Validate data integrity

  • Check for label errors

  • Monitor class balance

  • Document known issues

Next Steps

Last updated