Cloud Operations Engineer
We are seeking a Cloud Operations Engineer to join our Managed Services team. This role focuses on AWS cloud operations, monitoring, incident response, and continuous improvement of client and internal cloud environments.
Key Responsibilities
Respond to and execute CloudOps tickets following established runbook procedures.
Handle system-generated alerts and client-raised tickets, including scenarios without predefined procedures.
Provide on-call support for urgent and high-priority incidents across first- and second-tier response teams.
Support and maintain internal cloud infrastructure used to deliver MSP services.
Establish and manage integrations between client cloud environments and internal monitoring tools.
Configure monitoring and observability tools, including defining appropriate thresholds and scopes.
Own the creation, maintenance, and enhancement of monitors and dashboards across client AWS accounts.
Create new monitoring assets for newly released AWS services.
Create and update Cloud Operations documentation, runbooks, and training materials.Improve internal training processes related to CloudOps and client environments.
Communicate with internal development teams to drive continuous improvement of internal tools and services.
Prepare audit reports based on security scans and monitoring results.
Provide risk assessments and remediation recommendations and present audit findings to internal stakeholders and client teams.
Collaborate with internal engineering and development teams to enhance service delivery.
Share technical knowledge and best practices with other team members.
Participate in client interactions in a consultative and support-oriented capacity.
Requirements
We are seeking a Cloud Operations Engineer to join our growing Managed Service Provider (MSP) team. This role is ideal for an operations-focused engineer who enjoys troubleshooting, monitoring, automation, and ensuring the reliability and security of cloud environments while supporting client workloads in a 24x7 operational model.
Role & Responsibilities
Respond to and resolve CloudOps tickets using established runbooks and operational procedures
Handle system-generated alerts and client-reported issues, including scenarios where standard procedures may not yet exist
Participate in on-call rotations, supporting both Tier 1 and Tier 2 response teams for urgent and high-priority incidents
Provide operational support for Caylent’s internal cloud infrastructure that enables MSP client services
Integrate client cloud environments into Caylent’s monitoring and observability platforms
Recommend and configure appropriate monitoring thresholds, scopes, alerts, and dashboards
Own the creation, maintenance, and continuous improvement of monitors and dashboards across client environments
Prepare audit and security scan reports, provide recommendations based on findings, and present results to internal teams or clients
Create, update, and maintain operational documentation, runbooks, and training materials
Collaborate with internal development teams to continuously improve tooling and operational capabilities
Identify opportunities to automate operational tasks and improve efficiency across environments
Required Skills & Experience
Bachelor’s degree in Computer Science, Information Technology, or a related field
2+ years of general IT experience (systems, infrastructure, software, or programming)
1+ year of hands-on experience with Amazon Web Services (AWS)
AWS certifications (Cloud Practitioner, Solutions Architect Associate, SysOps Associate, or Developer Associate)
Experience working with third-party security, monitoring, or observability tools
Familiarity with Agile or other project management methodologies
Exposure to scalable and automated cloud infrastructure
Strong understanding of cloud infrastructure and core AWS services
Experience with infrastructure as code, configuration management, and CI/CD practices
Experience supporting, operating, or troubleshooting cloud-based systems
Willingness to work in the afternoon shift (2 PM to 11 PM).
Strong communication skills with the ability to clearly document and explain issues and solutions
Effective time and self-management skills in a fast-paced, ticket-driven environment
Strong problem analysis and troubleshooting mindset
Solid technical aptitude with a willingness to learn continuously and grow
Ability to collaborate effectively with clients, engineers, and cross-functional teams.
Signs You May Be a Great Fit
Impact: Play a pivotal role in shaping a rapidly growing venture studio with Cloud-driven digital transformation.
Culture: Thrive in a collaborative, innovative environment that values creativity, ownership, and agility.
Growth: Access professional development opportunities, and mentorship from experienced peers.
Benefits: Competitive salary, wellness packages, and flexible work arrangements that support your lifestyle and goals.