ApplicationsHow to Add Data/Masks to Labelbox

[TOC]

Introduction

This document is intended to be read after the official labelbox documentation has been reviewed and assumes an understanding of how data rows work in the tool. The purpose of this document is to explain how we at Genentech can upload images, dicom stacks, or masks to Labelbox without having them leave our cloud. Due to privacy concerns, we do not want data leaving any storage controlled by Genentech, and have had Labelbox disable our ability to upload images or masks directly to their servers. This means that some of their documentation will not work for us, which is where this document comes into play. In this document you will learn to:

  1. Link an image from a Genentech S3 to Labelbox
  2. Create a segmentation mask that can be used by Labelbox
  3. Link a segmentation mask from a Genentech S3 to Labelbox

:pencil:
Labelbox only has access to one specific portion of our S3 storage. Talk with DevOps to figure out where their access is. An example of such an access folder is the ck9g2vm820ffs0996ouensbsk directory in the ai-labelbox-prod-migration S3 bucket for the ACE AI team. Attempting to call files from any section of a bucket not shared with S3 will result in permission denied errors.


Uploading Images or Videos to Labelbox


:warning: Note that each workspace has their own S3 bucket defined. Below is referencing an example for the ACE AI team. Your bucket name will be different if you have a different workspace. Please follow up with you DevOps team to get the name of your bucket.


You will never actually upload an image or a video to labelbox directly. Instead, you must upload your images or videos to an S3 folder that has access shared with Labelbox. From there, when you make a new data row in label box you will reference the S3 file such that to add a new data row you could use the command: dataset.create_data_row(row_data = f”<file location on S3>”)

Where an example of <file location on S3> could be: https://ai-labelbox-prod-migration.s3.us-west-2.amazonaws.com/ck9g2vm820ffs0996ouensbsk/CERA/slices/105012_M12_OD_OCT/105012_M12_OD_OCT-038.png

This is the same whether you are using images such as PNG files or video or DICOM files.

Import a Segmentation Mask To the Project

Creating and Uploading the Segmentation Mask

In order to upload a segmentation mask to your project, you will have to first upload an image of the mask to an S3 folder that Labelbox can access and then link to that location in your mask defining json file. The mask should be an RGB mask in uint8 format, with each color corresponding to a different class and the dimensions of the image matching the image on which it will be overlaid. A png format has worked well previously. Generate the png locally, and then upload the mask image to a location on an S3 bucket that Labelbox has permissions to access. Once the file has been uploaded, the file must be signed in order to allow Labelbox to actually access the file through their servers. To sign a the file use the function generate_presigned_url . For more information on signing files see the documentation here. An example of the signing of a file can be seen below:

url = boto3.client('s3',config=Config(signature_version='s3v4', region_name='us-west-2')).generate_presigned_url(
                            ClientMethod='get_object', 
                            Params={'Bucket': bucket_name, 'Key': s3_key},
                            ExpiresIn=3600)

The output URL will then be used when assigning a mask to a data row.

Uploading Segmentation Masks to A Labelbox Project

In order to add annotations to a project, if you will add it to a data row first and then add the overall dictionary created to a project. You will add the signed URL from the previous section to the dictionary of annotations you are creating. The way to add a mask URL for a single frame is something like:

color = train_id_to_color[idx]
mask = {'instanceURI': signed_url, "colorRGB": color}
data = {"uuid": str(uuid4()),
                'schemaId': tool_schema_id,
                'mask': mask,
                'dataRow': {'id': dr.uid}}

Where dr is a data row in a dataset, the unique id for the tool you are assigning the mask to, and uuid4 is simply a unique identifier created by using the uuid library (imported with ​​from uuid import uuid4)

Note that even though the mask file might be for multiple different classes, you have to create one instance as shown above for each class individually, ensuring the match the color in the mask with the correct class.

To add a segmentation mask annotation to a DICOM file, you add one mask per frame as shown below (not specifying tools here):

frames.append({'index' : slice_n, 'instanceURI' : signed_url})

And then once you have a list of all the frames and corresponding masks you add it to another dictionary such where it fits in as:

data = {
           'uuid' : str(uuid4()),
           'groupKey' : 'axial',
            'masks' : {'frames': frames, 'instances': instances},
            'dataRow' : {'id' : dr.uid}
           }

Where instances is the variable that controls the matching of mask color to class. An example of instances is below:

instances = [{'colorRGB' : list(train_id_to_color[idx]), 'schemaId' :
tool_name_schema_id[raw_masks_to_use[label]['tool_name']]} for idx, label in
enumerate(raw_masks_to_use)]

Summary

Overall, the important thing to remember is that you need to know which of your S3 directories are shared with Labelbox and put your data there. Additionally, if you are adding segmentation masks, they need to be saved the the correct S3 directories and you need to produce a signed url referencing them in order for Labelbox to be able to read them in correctly.