Implementing sudo in GCP

Christopher Suarez
by Christopher Suarez

Working in multiple GCP projects used for development, testing and production increases the risk of accidentally performing the right operation in the wrong environment. This can of course have disastrous results. In addition, using terraform (which is the best way to go about, in our opinion) can quickly further increase the damage.

And while an extendable multiverse of roles and permissions available might make it easier to implement a more or less static POLP, for the majority of time all users, including administrators, only need read access to their GCP resources. Permanent write access does little to protect against accidental destructive operations. In my own experience, it often ends with a (often infrequently audited) list of people having the owner role.

Worth noting is that the owner role, as other basic roles, don’t support conditions, so granting temporary access using IAM is not an option.

Sudo in the cloud?

We have found that a very efficient way to minimize the risk of screwing things up is to implement a variant of the sudo pattern.

We prefer the following approach:

  • Add our engineers to a Google group.
  • Make that Google group IAM admins.
  • Allow them to sudo to the owner role.
  • Have a service periodically remove the owner role from any user.

This has turned out to save the day many times.

A word of caution

Make sure there is always someone with the role of Organization Administrator in your organization - don’t get locked out.

A sudo implementation

A nice way to implement sudo is using shell scripts, Workflows, Python and Cloud Scheduler. Engineers sudo using the shell script, preferably also unsudo when they don’t need elevated access anymore, and in case they forget, a scheduled workflow runs periodically to remove the owner role from everyone in the project.

Sudo script

A sudo shell script could look like below. Key points are:

  • Provide a list of projects the user can access.
  • Let the user choose the project they want to sudo.
  • Alternatively, let the user specify sudo project as arg.
#!/bin/bash

# Get the active account
get_active_account() {
  gcloud auth list --filter=status:ACTIVE --format="value(account)"
}

# List all projects the user can see
list_gcp_projects() {
  gcloud projects list --format="value(projectId)"
}

# Set the user as an owner for the selected project
set_owner() {
  local project_id=$1
  local account=$2

  gcloud projects add-iam-policy-binding $project_id \
    --member="user:${account}" --role="roles/owner" --no-user-output-enabled
  echo "You've been set as an owner for project: $project_id"
}

main() {
  local account=$(get_active_account)

  # If a project ID is provided as an argument, use it directly
  if [[ ! -z "$1" ]]; then
    set_owner $1 $account
    exit 0
  fi

  local projects=$(list_gcp_projects)

  echo "Select a project from the list:"
  select project_id in $projects; do
    if [[ -z "$project_id" ]]; then
      echo "Invalid option. Exiting..."
      exit 1
    fi

    set_owner $project_id $account
    break
  done
}

main "$@"

Unsudo workflow

An unsudo workflow would first call a function to make sure any owner is IAM admin, and then remove the owner privileges of that user. Note that in case GCP groups are used for IAM admin rights and the organization is set up properly, the first step could be skipped.

  - initialize:
      assign:
        - project: ${var.project_id}
        - add_iam_admin_role_function_url: ${google_cloudfunctions2_function.add_iam_admin_role.service_config[0].uri}
        - remove_owner_role_function_url: ${google_cloudfunctions2_function.remove_owner_role.service_config[0].uri}
  - add_iam_admin_role:
      call: http.get
      args:
        url: ${google_cloudfunctions2_function.add_iam_admin_role.service_config[0].uri}
        auth:
          type: OIDC
          audience: ${google_cloudfunctions2_function.add_iam_admin_role.service_config[0].uri}
      result: add_iam_admin_role_result
  - remove_owner_role:
      call: http.get
      args:
        url: ${google_cloudfunctions2_function.remove_owner_role.service_config[0].uri}
        auth:
          type: OIDC
          audience: ${google_cloudfunctions2_function.remove_owner_role.service_config[0].uri}
      result: remove_owner_role_result
  - final:
      return: "Workflow completed"

Unsudo cloud function

The actual cloud functions to remove user from sudo would do something like:

  • Obtain list of users who have owner role.
  • Validate that they have IAM admin role.
  • Remove the owner role binding from the user.
import os
from apiclient import discovery
from google.auth import default

def remove_owner_role(request):
    credentials, project_id = default()
    if not project_id:
        return "Error: Project id not found", 500
    service = create_service()
    print("Project: "  +project_id)

    if not remove_owner(service, project_id):
        return "Failed to remove owner role.", 500

    return "Successfully removed owner role.", 200

def create_service():
    """Provides a service using application default credentials."""
    return discovery.build('cloudresourcemanager', 'v1')

def get_policy(crm_service, project_id, version=3):
    """Gets IAM policy for a project."""
    policy = (
        crm_service.projects()
        .getIamPolicy(
            resource=project_id,
            body={"options": {"requestedPolicyVersion": version}},
        )
        .execute()
    )
    return policy

def set_policy(crm_service, project_id, policy):
    """Sets IAM policy for a project."""
    crm_service.projects().setIamPolicy(resource=project_id, body={"policy": policy}).execute()

def has_iam_admin_role(crm_service, project_id, user_email):
    """Checks if the specified user has the IAM admin role."""
    policy = get_policy(crm_service, project_id)
    iam_admin_binding = next((b for b in policy["bindings"] if b["role"] == "roles/resourcemanager.projectIamAdmin"), None)

    # Check if the user is in the IAM admin members list
    if iam_admin_binding and f"user:{user_email}" in iam_admin_binding["members"]:
        return True
    return False

def remove_owner(crm_service, project_id):
    """Removes the owner role."""
    policy = get_policy(crm_service, project_id)
    owner_binding = next((b for b in policy["bindings"] if b["role"] == "roles/owner"), None)

    if owner_binding:
        members_to_remove = []  # List to store members who will have their owner role removed
        for member in owner_binding["members"]:
            user_email = member.split(":")[1]  # Extract email from the "user:email" format
            if has_iam_admin_role(crm_service, project_id, user_email):
                members_to_remove.append(member)
            else:
                print(f"User {user_email} does not have IAM admin role. Refusing to remove owner role.")
                return False

        # Remove the members from the owner role and print their emails
        for member in members_to_remove:
            user_email = member.split(":")[1]
            print(f"Removing owner role for user: {user_email}")
            owner_binding["members"].remove(member)

        set_policy(crm_service, project_id, policy)
        return True
    return False


if __name__ == "__main__":
    # Simulate a call to the primary function
    result, status_code = remove_owner_role(None)
    print(f"Result: {result}, Status Code: {status_code}")

Complete implementation

A complete git repo containing source code, terraform module, cloud function implementation, etc. can be found here.

If you have any questions about what we do or if you think we can help in any way, please reach out on X or LinkedIn. We would love to hear your thoughts on what we are doing.