Policy

We need to understand and validate the source of the data being brought into our environment prior to ingesting or migrating any data. This is required from a privacy and security perspective.

This policy is applicable to all DevOps engineers on the ACE Infra team being asked to import data.

Procedure and Process

Below describes the overall process and procedures that should be followed for any data onboarding request. A ticket should be created in Gitlab roadmap project and the following details documented. Note that existing data can only be used if the use case matches or secondary use-cases match. See step 4.

What is the name of the dataset?
Does and SRA exist for the use case? If yes, review the SRA
Has the data gone through a privacy review? Confirm with Murali.
Is the data from the EU or California?
Is the data already being used by another team in Genentech? If yes, does this use case match the current use of the data by the other team? Please explain.
What is the source of the data?
Does the dataset include study investigators data, patient data, both or another type of data? a. Is the dataset images, data files or voice recordings? b. For each type of data (investigators, patients, etc) in the dataset answer the following questions: c. Is the data psuedoanonymized, anonymized or none of the above? d. Are any personally identifiable information or personal sensitive data included? These can include but are not limited to age, birthday, gender, health information, genetic data, medical treatments, clinical/medical information, etc d. Do voice recordings or images include any personally identifiable information or personal sensitive data?

Overview Data Flow Diagram