AWS InfrastructureRedshift Cluster Deletion

Issue Description

On May 14th at 4:02 PM PDT, Arindam notified us in slack that Surf3 dev cluster (http://surf3devcluster.c19hvxj0isaj.us-west-2.redshift.amazonaws.com/) was deleted along with any snapshots. Upon investigation we noticed that the root user was used to delete the cluster. Here are the relevant log lines from Cloudtrail

  • Username: root
  • Event Name: DeleteCluster
  • Source IP: 73.162.153.91
  • User Agent: aws-internal/3 aws-sdk-java/1.11.965 Linux/4.9.230-0.1.ac.224.84.332.metal1.x86_64 OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 vendor/Oracle_Corporation
  • Resources:
{"resourceType":"AWS::EC2::VPC","resourceName":"vpc-06c52ff99c03314f6"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/AdamRedshiftRole_NoGlue_FullAccess"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/AdamRedshiftRoleSurf3"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/AdamRedshiftSpectrumGlueRoleSurf3"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/xander-mar30-redshift-role"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/AdamSurf3GlueNoFullAccessTest"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/GlueETLPOCGlueServiceRole"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/AdamSurf3DevSpecificARNRole"},
{"resourceType":"AWS::IAM::Role","resourceName":"arn:aws:iam::319647376096:role/e2e_surf3_03302020"},{"resourceType":"AWS::EC2::SecurityGroup","resourceName":"sg-01ff6bae5097432a8"},
{"resourceType":"AWS::Redshift::ClusterSubnetGroup","resourceName":"default"},
{"resourceType":"AWS::Redshift::Cluster","resourceName":"surf3devcluster"},
{"resourceType":"AWS::Redshift::ClusterParameterGroup","resourceName":"default.redshift-1.0"}

The geolocation for IP Address 73.162.153.91 points to the city of Belmont user using Comcast

$ curl "https://api.ip2location.com/v2/?ip=73.162.153.91&key={YOUR_API_KEY}&package=WS24&addon=continent,country,region,city,geotargeting,country_groupings,time_zone_info"
 
{
    "country_code": "US",
    "country_name": "United States of America",
    "region_name": "California",
    "city_name": "Belmont",
    "latitude": "37.52021",
    "longitude": "-122.2758",
    "zip_code": "94002",
    "time_zone": "-07:00",
    "isp": "Comcast Cable Communications LLC",
    "domain": "comcast.net",
    "net_speed": "DSL",
    "idd_code": "1",
    "area_code": "650",
    "weather_station_code": "USCA0082",
    "weather_station_name": "Belmont",
    "mcc": "-",
    "mnc": "-",
    "mobile_brand": "-",
    "elevation": "12",
    "usage_type": "ISP",
    "credits_consumed": 33,
    "continent": {
        "name": "North America",
        "code": "NA",
        "hemisphere": [
            "north",
            "west"
        ],
}

Full cloudtrail log details have been downloaded for actions taken by the root user from 5/12 - 5/18. Logs can be found here in GDrive as a csv

Timeline

All times are in PDT timezone

  • May 14, 2021 @ 8:42 AM: The DeleteCluster was initiated from the console
  • May 14, 2021 @ 4:02 PM: Arindam notified Todd and Hooman over slack that the cluster is missing
  • May 14, 2021 @ 8:18 PM: Hooman started the investigation
  • May 14, 2021 @ 8:38 PM: Hooman completed the investigation and notified Arindam in slack if his findings
  • May 18, 2021 @ 7:30 PM: Hooman documented the root cause analysis here

How to Avoid the Issue in the Future

  • Root user credentials have been rotated and only a small number of the infrastructure team has access to the root user.
  • The root user is never used for access to the infrastructure.
  • Alerting will be setup to alert if the root user is ever used to login.