AgileEngine is a top-ranking provider of software solutions to Fortune 500, Global 500, and Future 50 companies. Listed on Inc.5000 among the fastest-growing US companies, we are always open to talented software, UX, and data experts in the Americas, Europe, and Asia.If you like a challenging environment where you're working with the best and are encouraged to learn and experiment daily, there's no better place guaranteed! : )What you will doOperations and Service Availability : Participate in a 24 / 7 operations team to guarantee service availability, managing day-to-day alerts, system checks, and issue escalations;Monitoring and Troubleshooting : Actively monitor and troubleshoot alerts and issues within SaaS environments.- Utilize custom dashboards for effective troubleshooting as needed;- Infrastructure Knowledge : Gain proficiency in our existing infrastructure, particularly Docker Swarm, to effectively manage and support the environment;- Root Cause Analysis (RCA) : Conduct thorough RCAs to identify the root causes of issues and implement corrective actions to prevent future occurrences;- Alert Management : Investigate alerts, create action plans, and delegate tasks to the appropriate team members;- Support and Communication : Handle support requests, engage in customer calls to explain RCAs, and communicate effectively with managers, teams, and customers about product monitoring risks, issues, and changes;- Automation and Feedback : Identify automation opportunities to streamline RCAs and provide valuable feedback to the product and engineering teams to enhance product performance, logging, tracing, and monitoring;- Documentation and Compliance : Maintain process and procedure documentation and conduct internal audits to ensure SaaS infrastructure security and compliance;Collaboration and Improvement : Work collaboratively with support teams and customers to identify and resolve SaaS environment issues.Contribute to the improvement of monitoring, alerting, and overall system health.Must haves- Ability to operate independently and collaboratively in a team environment;- Proficient in EKS, Terraform, Helm, Docker, and Docker Swarm;- Strong sense of responsibility and accountability for delivering high-quality work;- Excellent communication skills, with the ability to effectively convey issues and RCAs to customers;- Experience with AWS, cloud and network administration, and SaaS product / application support;- Knowledgeable in infrastructure, security, compliance, Prometheus, Grafana, Linux, and shell scripting (Python, shell scripting);Understanding of APIs, databases,systems architecture, and design.The benefits of joining usProfessional growthAccelerate your professional journey with mentorship, TechTalks, and personalized growth roadmaps.Competitive compensationWe match your ever-growing skills, talent, and contributions with competitive USD-based compensation and budgets for education, fitness, and team activities.A selection of exciting projectsJoin projects with modern solutions development and top-tier clients that include Fortune 500 enterprises and leading product brands.FlextimeTailor your schedule for an optimal work-life balance, by having the options of working from home and going to the office whatever makes you the happiest and most productive.