Issue Description
On May 14th at 4:02 PM PDT, Arindam notified us in slack that Surf3 dev cluster (http://surf3devcluster.c19hvxj0isaj.us-west-2.redshift.amazonaws.com/) was deleted along with any snapshots. Upon investigation we noticed that the root user was used to delete the cluster. Here are the relevant log lines from Cloudtrail
- Username:
root - Event Name:
DeleteCluster - Source IP:
73.162.153.91 - User Agent:
aws-internal/3 aws-sdk-java/1.11.965 Linux/4.9.230-0.1.ac.224.84.332.metal1.x86_64 OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 vendor/Oracle_Corporation - Resources:
{"resourceType":"AWS::EC2::VPC","resourceName":"vpc-06c52ff99c03314f6"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/AdamRedshiftRole_NoGlue_FullAccess"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/AdamRedshiftRoleSurf3"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/AdamRedshiftSpectrumGlueRoleSurf3"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/xander-mar30-redshift-role"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/AdamSurf3GlueNoFullAccessTest"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/GlueETLPOCGlueServiceRole"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/AdamSurf3DevSpecificARNRole"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/e2e_surf3_03302020"},{"resourceType":"AWS::EC2::SecurityGroup","resourceName":"sg-01ff6bae5097432a8"},
{"resourceType":"AWS::Redshift::ClusterSubnetGroup","resourceName":"default"},
{"resourceType":"AWS::Redshift::Cluster","resourceName":"surf3devcluster"},
{"resourceType":"AWS::Redshift::ClusterParameterGroup","resourceName":"default.redshift-1.0"}The geolocation for IP Address 73.162.153.91 points to the city of Belmont user using Comcast
$ curl "https://api.ip2location.com/v2/?ip=73.162.153.91&key={YOUR_API_KEY}&package=WS24&addon=continent,country,region,city,geotargeting,country_groupings,time_zone_info"
{
"country_code": "US",
"country_name": "United States of America",
"region_name": "California",
"city_name": "Belmont",
"latitude": "37.52021",
"longitude": "-122.2758",
"zip_code": "94002",
"time_zone": "-07:00",
"isp": "Comcast Cable Communications LLC",
"domain": "comcast.net",
"net_speed": "DSL",
"idd_code": "1",
"area_code": "650",
"weather_station_code": "USCA0082",
"weather_station_name": "Belmont",
"mcc": "-",
"mnc": "-",
"mobile_brand": "-",
"elevation": "12",
"usage_type": "ISP",
"credits_consumed": 33,
"continent": {
"name": "North America",
"code": "NA",
"hemisphere": [
"north",
"west"
],
}Full cloudtrail log details have been downloaded for actions taken by the root user from 5/12 - 5/18. Logs can be found here in GDrive as a csv
Timeline
All times are in PDT timezone
- May 14, 2021 @ 8:42 AM: The DeleteCluster was initiated from the console
- May 14, 2021 @ 4:02 PM: Arindam notified Todd and Hooman over slack that the cluster is missing
- May 14, 2021 @ 8:18 PM: Hooman started the investigation
- May 14, 2021 @ 8:38 PM: Hooman completed the investigation and notified Arindam in slack if his findings
- May 18, 2021 @ 7:30 PM: Hooman documented the root cause analysis here
How to Avoid the Issue in the Future
- Root user credentials have been rotated and only a small number of the infrastructure team has access to the root user.
- The root user is never used for access to the infrastructure.
- Alerting will be setup to alert if the root user is ever used to login.