What is grumpy-gopher
Tool to facilitate ETL processes for the Data Engineering team.
It acts to pull data from remote vendor CRO’s sftp server hosts into our internal clinical S3 buckets. It is a collection of bash scripts, staging server to run the script to pull data from CRO’s, and a separate sftp server specifically for pulling internal Tibco/PD data.
Not that there is both a UAT and Prod environment for this with corresponding buckets for both environments that pull the same data.
Where is it hosted on:
| EC2 | IP | ssh Command |
|---|---|---|
| grumpy-gopher | 10.158.21.103 | ssh -i <US_WEST_2_KEY> ec2-user@10.158.21.103 |
What data sources/CRO’s are being pulled
- Infinata (Felix)
- Endpointclinical (IXRS gRED)
- Bracketglobal (IXRS gRED)
- Pra
- Pd (IXRS PD)
- Ppd
- Covance
S3 bucket locations:
Bucket URI’s listed below. Note that there are corresponding UAT bucket locations as well
- s3://gcore-etl/clinical/prod/felix/
- s3://gcore-etl/clinical/prod/ixrs/gred/
- s3://gcore-etl/clinical/prod/pra/
- s3://gcore-etl/clinical/prod/ixrs/pd/
- s3://gcore-etl/clinical/prod/ppd/
- s3://gcore-etl/clinical/prod/covance/
How it runs:
Script runs as a cron to transfer the same data to the UAT and PROD s3 environments. Currently runs as a cron under user - xander. To see the crontab file you need to switch to xander user:
sudo su xander
crontab -lHere is how the crontab file looks like:
0 7 * * 0 /home/xander/grumpy-gopher -v --uat infinata
0 1,7,13,19 * * * /home/xander/grumpy-gopher -v --uat endpointclinical
0 3,9,15,21 * * * /home/xander/grumpy-gopher -v --uat bracketglobal
30 3,15 * * * /home/xander/grumpy-gopher -v --uat pra
30 6,12,18,0 * * * /home/xander/grumpy-gopher -v --uat pd
45 6,18 * * * /home/xander/grumpy-gopher-ppd -v --dev ppd
30 8,20 * * * /home/xander/grumpy-gopher-ppd -v --uat ppd
0 5,17,23 * * * /home/xander/grumpy-gopher-v1 -v --uat covance
0 2,8,14,20 * * * /home/xander/grumpy-gopher --prod endpointclinical
0 4,10,16,22 * * * /home/xander/grumpy-gopher --prod bracketglobal
30 4,16 * * * /home/xander/grumpy-gopher --prod pra
30 7,13,19,1 * * * /home/xander/grumpy-gopher --prod pd
0 8 * * 0 /home/xander/grumpy-gopher --prod infinata
30 10,22 * * * /home/xander/grumpy-gopher --prod ppd
0 6,18,0 * * * /home/xander/grumpy-gopher-v1 -v --prod covanceThe log file is located at:
tail -f /var/log/messagesCode Location:
Grumpy-gopher code is hosted at: https://code.roche.com/ace/gcore-platform/-/tree/released/legacy/scripts
Tibco data
Note that PD data is coming from Roche Tibco to transfer into an intermediary sftp server that stores the files. Grumpy-gopher than transfers these files from the internal ACE sftp server to the gcore-etl s3 buckets
SFTP server
Hosted on: 10.158.21.159
Name: sftp
On the SFTP server the config is located at:
/etc/ssh/sshd_config
Config is:
#### sftp user [{'name': 'sftpuser', 'status': 'active'}, 'sftpgroup'] - inserted by ansible ####
Match Group sftpgroup
ChrootDirectory "/opt/sftp/pd"
ForceCommand internal-sftp
PermitTunnel no
AllowAgentForwarding no
X11Forwarding no
AllowTcpForwarding no
#### sftp user [{'name': 'sftpuser', 'status': 'active'}, 'sftpgroup'] - inserted by ansible ####