Cloud Operations Engineer

We are seeking a Cloud Operations Engineer to join our Managed Services team. This role focuses on AWS cloud operations, monitoring, incident response, and continuous improvement of client and internal cloud environments.


Key Responsibilities

  • Respond to and execute CloudOps tickets following established runbook procedures.

  • Handle system-generated alerts and client-raised tickets, including scenarios without predefined procedures.

  • Provide on-call support for urgent and high-priority incidents across first- and second-tier response teams.

  • Support and maintain internal cloud infrastructure used to deliver MSP services.

  • Establish and manage integrations between client cloud environments and internal monitoring tools.

  • Configure monitoring and observability tools, including defining appropriate thresholds and scopes.

  • Own the creation, maintenance, and enhancement of monitors and dashboards across client AWS accounts.

  • Create new monitoring assets for newly released AWS services.
    Create and update Cloud Operations documentation, runbooks, and training materials.

  • Improve internal training processes related to CloudOps and client environments.

  • Communicate with internal development teams to drive continuous improvement of internal tools and services.

  • Prepare audit reports based on security scans and monitoring results.

  • Provide risk assessments and remediation recommendations and present audit findings to internal stakeholders and client teams.

  • Collaborate with internal engineering and development teams to enhance service delivery.

  • Share technical knowledge and best practices with other team members.

  • Participate in client interactions in a consultative and support-oriented capacity.



Requirements

We are seeking a Cloud Operations Engineer to join our growing Managed Service Provider (MSP) team. This role is ideal for an operations-focused engineer who enjoys troubleshooting, monitoring, automation, and ensuring the reliability and security of cloud environments while supporting client workloads in a 24x7 operational model.

Role & Responsibilities

  • Respond to and resolve CloudOps tickets using established runbooks and operational procedures

  • Handle system-generated alerts and client-reported issues, including scenarios where standard procedures may not yet exist

  • Participate in on-call rotations, supporting both Tier 1 and Tier 2 response teams for urgent and high-priority incidents

  • Provide operational support for Caylent’s internal cloud infrastructure that enables MSP client services

  • Integrate client cloud environments into Caylent’s monitoring and observability platforms

  • Recommend and configure appropriate monitoring thresholds, scopes, alerts, and dashboards

  • Own the creation, maintenance, and continuous improvement of monitors and dashboards across client environments

  • Prepare audit and security scan reports, provide recommendations based on findings, and present results to internal teams or clients

  • Create, update, and maintain operational documentation, runbooks, and training materials

  • Collaborate with internal development teams to continuously improve tooling and operational capabilities

  • Identify opportunities to automate operational tasks and improve efficiency across environments

Required Skills & Experience

  • Bachelor’s degree in Computer Science, Information Technology, or a related field

  • 2+ years of general IT experience (systems, infrastructure, software, or programming)

  • 1+ year of hands-on experience with Amazon Web Services (AWS)

  • AWS certifications (Cloud Practitioner, Solutions Architect Associate, SysOps Associate, or Developer Associate)

  • Experience working with third-party security, monitoring, or observability tools

  • Familiarity with Agile or other project management methodologies

  • Exposure to scalable and automated cloud infrastructure

  • Strong understanding of cloud infrastructure and core AWS services

  • Experience with infrastructure as code, configuration management, and CI/CD practices

  • Experience supporting, operating, or troubleshooting cloud-based systems

  • Willingness to work in the afternoon shift (2 PM to 11 PM).

  • Strong communication skills with the ability to clearly document and explain issues and solutions

  • Effective time and self-management skills in a fast-paced, ticket-driven environment

  • Strong problem analysis and troubleshooting mindset

  • Solid technical aptitude with a willingness to learn continuously and grow

  • Ability to collaborate effectively with clients, engineers, and cross-functional teams.


Signs You May Be a Great Fit 

  • Impact: Play a pivotal role in shaping a rapidly growing venture studio with Cloud-driven digital transformation. 

  • Culture: Thrive in a collaborative, innovative environment that values creativity, ownership, and agility.

  • Growth: Access professional development opportunities, and mentorship from experienced peers. 

  • Benefits: Competitive salary, wellness packages, and flexible work arrangements that support your lifestyle and goals.