Guides & How-TosGrumpy Gopher Tool

What is grumpy-gopher

Tool to facilitate ETL processes for the Data Engineering team.

It acts to pull data from remote vendor CRO’s sftp server hosts into our internal clinical S3 buckets. It is a collection of bash scripts, staging server to run the script to pull data from CRO’s, and a separate sftp server specifically for pulling internal Tibco/PD data.

Not that there is both a UAT and Prod environment for this with corresponding buckets for both environments that pull the same data.

Where is it hosted on:

EC2IPssh Command
grumpy-gopher10.158.21.103ssh -i <US_WEST_2_KEY> ec2-user@10.158.21.103

What data sources/CRO’s are being pulled

  • Infinata (Felix)
  • Endpointclinical (IXRS gRED)
  • Bracketglobal (IXRS gRED)
  • Pra
  • Pd (IXRS PD)
  • Ppd
  • Covance

S3 bucket locations:

Bucket URI’s listed below. Note that there are corresponding UAT bucket locations as well

  • s3://gcore-etl/clinical/prod/felix/
  • s3://gcore-etl/clinical/prod/ixrs/gred/
  • s3://gcore-etl/clinical/prod/pra/
  • s3://gcore-etl/clinical/prod/ixrs/pd/
  • s3://gcore-etl/clinical/prod/ppd/
  • s3://gcore-etl/clinical/prod/covance/

How it runs:

Script runs as a cron to transfer the same data to the UAT and PROD s3 environments. Currently runs as a cron under user - xander. To see the crontab file you need to switch to xander user:

sudo su xander
crontab -l

Here is how the crontab file looks like:

0   7          * * 0 /home/xander/grumpy-gopher -v --uat infinata
0   1,7,13,19  * * * /home/xander/grumpy-gopher -v --uat endpointclinical
0   3,9,15,21  * * * /home/xander/grumpy-gopher -v --uat bracketglobal
30  3,15       * * * /home/xander/grumpy-gopher -v --uat pra
30  6,12,18,0  * * * /home/xander/grumpy-gopher -v --uat pd
45   6,18      * * * /home/xander/grumpy-gopher-ppd -v --dev ppd
30 8,20      * * * /home/xander/grumpy-gopher-ppd -v --uat ppd
0   5,17,23     * * * /home/xander/grumpy-gopher-v1 -v --uat covance

0  2,8,14,20  * * * /home/xander/grumpy-gopher --prod endpointclinical
0  4,10,16,22 * * * /home/xander/grumpy-gopher --prod bracketglobal
30 4,16       * * * /home/xander/grumpy-gopher --prod pra
30 7,13,19,1  * * * /home/xander/grumpy-gopher --prod pd
0  8          * * 0 /home/xander/grumpy-gopher --prod infinata
30 10,22       * * * /home/xander/grumpy-gopher --prod ppd
0 6,18,0     * * * /home/xander/grumpy-gopher-v1 -v --prod covance

The log file is located at:

tail -f /var/log/messages

Code Location:

Grumpy-gopher code is hosted at: https://code.roche.com/ace/gcore-platform/-/tree/released/legacy/scripts

Tibco data

Note that PD data is coming from Roche Tibco to transfer into an intermediary sftp server that stores the files. Grumpy-gopher than transfers these files from the internal ACE sftp server to the gcore-etl s3 buckets

SFTP server

Hosted on: 10.158.21.159

Name: sftp

On the SFTP server the config is located at: /etc/ssh/sshd_config

Config is:

#### sftp user [{'name': 'sftpuser', 'status': 'active'}, 'sftpgroup'] - inserted by ansible ####
Match Group sftpgroup
      ChrootDirectory "/opt/sftp/pd" 
      ForceCommand internal-sftp 
      PermitTunnel no
      AllowAgentForwarding no
      X11Forwarding no
      AllowTcpForwarding no	
#### sftp user [{'name': 'sftpuser', 'status': 'active'}, 'sftpgroup'] - inserted by ansible ####