Datasets

Upload, manage, and query synthetic training datasets. Lucitra accepts data in COCO, KITTI, nuScenes, or custom formats and stores it on Google Cloud Storage with signed upload URLs.

Supported Formats

COCO

Object detection, instance segmentation, and keypoints. The most common format for 2D vision tasks.

KITTI

3D bounding boxes, point clouds, and stereo pairs. Standard for autonomous driving benchmarks.

nuScenes

Multi-sensor, multi-frame sequences with ego pose. Designed for full autonomous driving stacks.

Custom

Bring your own annotation schema. Define a format adapter and Lucitra handles the rest.

Create a Dataset

Creating a dataset returns a time-limited signed URL for uploading your data file directly to cloud storage.

Create the dataset record

Send a POST request with your project ID, dataset name, format, and optional metadata.

curl -X POST https://api.lucitra.io/v1/datasets \
  -H "Authorization: Bearer luci_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "proj_abc123",
    "name": "warehouse-v3",
    "format": "coco",
    "metadata": {
      "simulator": "isaac-sim",
      "version": "4.5.0",
      "scene_count": 5000
    }
  }'

import requests

resp = requests.post(
    "https://api.lucitra.io/v1/datasets",
    headers={"Authorization": "Bearer luci_your_api_key"},
    json={
        "project_id": "proj_abc123",
        "name": "warehouse-v3",
        "format": "coco",
        "metadata": {
            "simulator": "isaac-sim",
            "version": "4.5.0",
            "scene_count": 5000,
        },
    },
)
data = resp.json()

Response

{
  "id": "ds_7kx9m2",
  "upload_url": "https://storage.googleapis.com/lucitra-datasets/...",
  "expires_at": "2026-03-06T13:00:00Z"
}

string

required

Unique dataset identifier. Use this in validation and report endpoints.

upload_url

string

required

Pre-signed GCS URL for uploading your data file. Valid for 1 hour.

expires_at

string

required

ISO 8601 timestamp when the upload URL expires.

Upload your data file

Use the signed URL from the response to upload your dataset archive via a PUT request.

curl -X PUT "${UPLOAD_URL}" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @warehouse-v3.tar.gz

with open("warehouse-v3.tar.gz", "rb") as f:
    requests.put(
        data["upload_url"],
        headers={"Content-Type": "application/octet-stream"},
        data=f,
    )

The upload URL expires after 1 hour. If it expires before your upload completes, create a new dataset to get a fresh URL.

Request Body

project_id

string

required

The project this dataset belongs to.

name

string

required

A human-readable name for the dataset.

format

string

required

Annotation format. One of coco, kitti, nuscenes, or custom.

metadata

object

Arbitrary key-value pairs for tracking simulator version, scene parameters, or any other context.

List Datasets

Retrieve all datasets belonging to a project with pagination support.

curl "https://api.lucitra.io/v1/datasets?project_id=proj_abc123&limit=20&offset=0" \
  -H "Authorization: Bearer luci_your_api_key"

resp = requests.get(
    "https://api.lucitra.io/v1/datasets",
    headers={"Authorization": "Bearer luci_your_api_key"},
    params={"project_id": "proj_abc123", "limit": 20, "offset": 0},
)
datasets = resp.json()

project_id

string

required

Filter datasets to this project.

limit

integer

default:"20"

Maximum number of datasets to return.

offset

integer

default:"0"

Number of datasets to skip for pagination.

Response

{
  "datasets": [
    {
      "id": "ds_7kx9m2",
      "project_id": "proj_abc123",
      "name": "warehouse-v3",
      "format": "coco",
      "scene_count": 5000,
      "total_size_bytes": 2147483648,
      "uploaded_at": "2026-03-06T12:05:00Z"
    }
  ],
  "total": 1
}

datasets

array

required

Array of dataset objects.

total

integer

required

Total number of datasets matching the query, regardless of limit and offset.

Get a Single Dataset

Retrieve full details for a specific dataset by ID.

curl "https://api.lucitra.io/v1/datasets/ds_7kx9m2" \
  -H "Authorization: Bearer luci_your_api_key"

resp = requests.get(
    "https://api.lucitra.io/v1/datasets/ds_7kx9m2",
    headers={"Authorization": "Bearer luci_your_api_key"},
)
dataset = resp.json()

Response

{
  "id": "ds_7kx9m2",
  "project_id": "proj_abc123",
  "name": "warehouse-v3",
  "gcs_path": "gs://lucitra-datasets/proj_abc123/ds_7kx9m2/warehouse-v3.tar.gz",
  "format": "coco",
  "scene_count": 5000,
  "total_size_bytes": 2147483648,
  "metadata": {
    "simulator": "isaac-sim",
    "version": "4.5.0",
    "scene_count": 5000
  },
  "uploaded_at": "2026-03-06T12:05:00Z"
}

string

required

Unique dataset identifier.

project_id

string

required

The project this dataset belongs to.

name

string

required

Human-readable dataset name.

gcs_path

string

required

Internal Google Cloud Storage path where the data is stored.

format

string

required

Annotation format: coco, kitti, nuscenes, or custom.

scene_count

integer

required

Number of scenes detected in the dataset after upload processing.

total_size_bytes

integer

required

Total size of the uploaded file in bytes.

metadata

object

User-provided metadata from dataset creation.

uploaded_at

string

required

ISO 8601 timestamp of when the upload completed.

Use the gcs_path value when configuring provenance tracking in the compliance engine. It uniquely identifies the stored artifact.

​Datasets

​Supported Formats

COCO

KITTI

nuScenes

Custom

​Create a Dataset

​Request Body

​List Datasets

​Get a Single Dataset

Datasets

Supported Formats

Create a Dataset

Request Body

List Datasets

Get a Single Dataset