Job Requirements
More than 5 years of relevant work experience, college degree or above, computer and related majors;
Proficient in Linux operating system, have a deep understanding of the system;
Proficient in at least one scripting language and static language, those with experience in large-scale system design are preferred;
Familiar with TCP/IP and HTTP protocols, those with a deep understanding of the protocols and practical troubleshooting experience are preferred;
Familiar with container technology and container orchestration technology, those with k8s production operation and maintenance experience are preferred;
Familiar with database principles, those with a deep understanding of common database engines are preferred;
Have a deep understanding of distributed distributed systems, and be familiar with commonly used open source basic components of the Internet (nginx, redis, kafka, mysql, hbase, zookeeper, hadoop, etc.);
Those with big data operation and maintenance and development experience and machine learning algorithm experience are preferred;
Experience in continuous integration/continuous deployment is a plus, and those with experience in super-large-scale cluster management are preferred;
Strong sense of responsibility, proactive, love to learn, and focus on teamwork;
Bonus points: familiar with common tool vulnerabilities for operation and maintenance, as well as common vulnerabilities and optimization of Linux servers
Job Responsibilities
Responsible for the online monitoring and alarm implementation of the department’s core systems and applications to ensure the stable operation of the system;
Participate in online emergency management, analysis and positioning, processing and tracking improvements;
Perform resource statistics, performance evaluation and capacity planning for the system;
Promote the implementation of devops in the department, and comprehensively improve operation and maintenance capabilities (continuous integration, application release, continuous deployment, monitoring and alarm, emergency plans, intelligent operation and maintenance, etc.);
Promote operation and maintenance standardization, automation and intelligence (AIOps)