admin 发表于 2020-11-20 12:11:55

谷歌运维解密翻译作者讲解SRE


• 生产线管理员
• Ensure user-visible uptime and service quality
• Authority over production environment.
• 跟网站一起成长
• Steep learning curve, mostly due to complexity
• Continuous retraining, sites always being improved
• 基础架构设施
• Specializations for shared infrastructure
• Ensure those components have good reliability


it just works
• Service Level Objective (SLO)
• Monitoring/Deployment
• Capacity Planning
• 以一敌百
• Team manages monitoring and develops automation
• Implies use of scripting and data analysis tools
• Most failures need automated recoveries in place
• 救火队员和纵火犯合体
• Elevated risk during convenient working hours
• Learn of age mortality risk during preceding workday
• Infant mortality ideally also avoids meals


码农
• Not administration
• 报警系统重度(中毒)用户
• Holes may cause outage before notification occurs
• Routinely use multiple layers, levels and viewpoints
• Design the manual and automatic escalation paths
• 对未来负责
• Responsible for enabling growth and scaling
• Plan for requirements, identify inefficiencies
• File bugs and, where appropriate, fix them too






页: [1]
查看完整版本: 谷歌运维解密翻译作者讲解SRE