Skip to main content

Datasets

Upload, manage, and query synthetic training datasets. Lucitra accepts data in COCO, KITTI, nuScenes, or custom formats and stores it on Google Cloud Storage with signed upload URLs.

Supported Formats

COCO

Object detection, instance segmentation, and keypoints. The most common format for 2D vision tasks.

KITTI

3D bounding boxes, point clouds, and stereo pairs. Standard for autonomous driving benchmarks.

nuScenes

Multi-sensor, multi-frame sequences with ego pose. Designed for full autonomous driving stacks.

Custom

Bring your own annotation schema. Define a format adapter and Lucitra handles the rest.

Create a Dataset

Creating a dataset returns a time-limited signed URL for uploading your data file directly to cloud storage.
1

Create the dataset record

Send a POST request with your project ID, dataset name, format, and optional metadata.
curl -X POST https://api.lucitra.io/v1/datasets \
  -H "Authorization: Bearer luci_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "proj_abc123",
    "name": "warehouse-v3",
    "format": "coco",
    "metadata": {
      "simulator": "isaac-sim",
      "version": "4.5.0",
      "scene_count": 5000
    }
  }'
{
  "id": "ds_7kx9m2",
  "upload_url": "https://storage.googleapis.com/lucitra-datasets/...",
  "expires_at": "2026-03-06T13:00:00Z"
}
id
string
required
Unique dataset identifier. Use this in validation and report endpoints.
upload_url
string
required
Pre-signed GCS URL for uploading your data file. Valid for 1 hour.
expires_at
string
required
ISO 8601 timestamp when the upload URL expires.
2

Upload your data file

Use the signed URL from the response to upload your dataset archive via a PUT request.
curl -X PUT "${UPLOAD_URL}" \
  -H "Content-Type: application/octet-stream" \
  --data-binary @warehouse-v3.tar.gz
The upload URL expires after 1 hour. If it expires before your upload completes, create a new dataset to get a fresh URL.

Request Body

project_id
string
required
The project this dataset belongs to.
name
string
required
A human-readable name for the dataset.
format
string
required
Annotation format. One of coco, kitti, nuscenes, or custom.
metadata
object
Arbitrary key-value pairs for tracking simulator version, scene parameters, or any other context.

List Datasets

Retrieve all datasets belonging to a project with pagination support.
curl "https://api.lucitra.io/v1/datasets?project_id=proj_abc123&limit=20&offset=0" \
  -H "Authorization: Bearer luci_your_api_key"
project_id
string
required
Filter datasets to this project.
limit
integer
default:"20"
Maximum number of datasets to return.
offset
integer
default:"0"
Number of datasets to skip for pagination.
{
  "datasets": [
    {
      "id": "ds_7kx9m2",
      "project_id": "proj_abc123",
      "name": "warehouse-v3",
      "format": "coco",
      "scene_count": 5000,
      "total_size_bytes": 2147483648,
      "uploaded_at": "2026-03-06T12:05:00Z"
    }
  ],
  "total": 1
}
datasets
array
required
Array of dataset objects.
total
integer
required
Total number of datasets matching the query, regardless of limit and offset.

Get a Single Dataset

Retrieve full details for a specific dataset by ID.
curl "https://api.lucitra.io/v1/datasets/ds_7kx9m2" \
  -H "Authorization: Bearer luci_your_api_key"
{
  "id": "ds_7kx9m2",
  "project_id": "proj_abc123",
  "name": "warehouse-v3",
  "gcs_path": "gs://lucitra-datasets/proj_abc123/ds_7kx9m2/warehouse-v3.tar.gz",
  "format": "coco",
  "scene_count": 5000,
  "total_size_bytes": 2147483648,
  "metadata": {
    "simulator": "isaac-sim",
    "version": "4.5.0",
    "scene_count": 5000
  },
  "uploaded_at": "2026-03-06T12:05:00Z"
}
id
string
required
Unique dataset identifier.
project_id
string
required
The project this dataset belongs to.
name
string
required
Human-readable dataset name.
gcs_path
string
required
Internal Google Cloud Storage path where the data is stored.
format
string
required
Annotation format: coco, kitti, nuscenes, or custom.
scene_count
integer
required
Number of scenes detected in the dataset after upload processing.
total_size_bytes
integer
required
Total size of the uploaded file in bytes.
metadata
object
User-provided metadata from dataset creation.
uploaded_at
string
required
ISO 8601 timestamp of when the upload completed.
Use the gcs_path value when configuring provenance tracking in the compliance engine. It uniquely identifies the stored artifact.