ApplicationsnGene API

nGene - What is it?

nGene is a clinical data platform that integrates internal and external clinical data from all modalities, and studies, and provides access to them in a unified interface to users.

ngene-Business_Architecture

Source: https://docs.google.com/presentation/d/1Rb9NKtwN0GeWVB_HAPslE0_Zti4ZnSrtuQ6TqOK-CCo/edit#slide=id.g157a51256bf_0_3164

nGene API - Technical Architecture

nGene-API-Architecture-14SEP2022 (2)

Source: https://lucid.app/lucidchart/50c00c4e-570e-493b-b2e8-6787f1dc4c5a/edit?invitationId=inv_a9e5bb73-0a56-4966-8570-3bd123f76b19&page=sH7B5EqIqGX8z#

nGene - Code Base

  1. Infrastructure code will be owned by ACE-Infra Team.

    Infrastructure component includes K8s namespace, role, role-binding, user addition to role, ECR access policy, EKS Cluster access, Cognito user pool, Cognito user pool access

    Terraform files: https://github.com/gred-ecdi/terraform-ace-prod/tree/master/us-west-2/infra-ngene-api-test-usw2

  2. K8s component will be owned by Data Engineering Team.

    K8s components includes ACM Certificate, route 53 dns, EKS role for Neptune access, helm chart, K8s deployment, ingress, service, service account, configmap, hpa

    Terraform files: https://github.com/gred-ecdi/ace-de-data-platform/tree/feat/deployment/infra/deployment/test

nGene - CI/CD

In order to increase deployment speed, we provide a CI/CD pipeline - Lucid Chart ngene-ci_cd

  1. Any changes to the code base via push_request on the develop branch will trigger the creation of a Docker image. This will push the image to the GitHub repository and ECR.

  2. For pushing Docker images to ECR, we use Self-hosted runners. Self-hosted runners have full access to all repositories on ECR, based on this policy. When the self-hosted GitHub runner authenticates itself to AWS, it can push/pull from the registry.

  3. K8s components of the app are controlled via IaC - Terraform. If a new Docker image is created OR any changes are made to Terraform files, the terraform workflow will be triggered.

  4. On TF Cloud, a team called de has been created for data engineering. We added the team access API key to the repository’s Github secret - TF_API_TOKEN. This way the runner authenticates itself to the TF cloud.

  5. Finally, the TF Cloud Agent will do terraform init, terraform validate, terraform plan,and terraform apply on eks ace-test.

nGene - API User Access Management

User Authentication is managed manually via AWS Cognito. Authorization is not requested by nGene Team

SSO Login —> https://wam.roche.com/idp/startSSO.ping?PartnerSpId=urn:amazon:webservices_ACEPROD.
Cognito Userpool —> https://us-west-2.console.aws.amazon.com/cognito/users/?region=us-west-2#/pool/us-west-2_SJHnUP2Hr/details?_k=ou0lb8

Authentication and Authorization via Enterprise Single SignOn (Ping Federate) or KeyCloak is being considered

nGene - Data Engineering Namespace Access

Data Engineering Team has access to ngene namespace to look into ingress, pod, service logs.
Since SSO hasn’t been enabled for the Data Engineering team, access is provisioned by adding Data Engineering team members onto aws-auth config map on kube-system namespace.

Refer code below

https://github.com/gred-ecdi/terraform-ace-prod/blob/master/global/iam-user-mgmt/main.tf

nGene API & Neptune Connectivity: AWS SigV4 signer sidecar

In order to use IAM authentication for Neptune, the application needs to be configured to use AWS SigV4 Signing Process to properly authenticate to NeptuneDB. This is handled automatically by AWS SDK if the application is using it.

For nGene, the developers need to be able to use Neptune and non-Neptune GraphDBs synonymously—that is, they don’t want to use AWS SDK but rather use standard Python/Gremlin drivers. We use aws4-proxy sidecar to acheive this.

aws4-proxy is a small sidecar proxy service that runs alongside the application and exposes a local port to the application to use as if it was NeptuneDB. All requests are intercepted and subsequently signed and sent to the real NeptuneDB endpoint with proper authentication.

Currently, the sidecar is configured as follows (another container inside the pod)

- name: aws-sigv4-sidecar
  image: "{{ .Values.sidecar.image.repository }}:{{ .Values.sidecar.image.tag }}"
  imagePullPolicy: {{ .Values.sidecar.image.pullPolicy }}
  args:
  - --service=neptune-db
  - --region=us-west-2
  - --endpoint=ngene-api-test-cluster.cluster-ro-c7cfiskz6bjy.us-west-2.neptune.amazonaws.com
  - --endpoint-port=8182
  - --port=8080
  ports:
  - name: http
    containerPort: 8080
    protocol: TCP

This exposes http://localhost:8080 as if it was the NeptuneDB endpoint to the application. IAM credentias are automatically picked up from the IAM role assigned to the pod (IRSA)