本帖最后由 FYIRH 于 2022-8-10 17:22 编辑
返回 ITIL 4理论与实践整体知识体系中文版发布文件汇总
需要下载最新翻译版本请关注微信公众号:ITILXF,并回复“可用性”即可。
2.1目的与描述
可用性管理实践确保有效且符合组织的战略和承诺,从而理解并满足对可用性的服务和资源的要求。为此,从构思到操作,此实践都应用于组织的生产和服务生命周期。
当计划和设计产品和服务时,此实践极为重要。在此阶段做出的决定将影响可用性的级别和相关约束,以及组织监视和管理这些方面的能力。
从消费者的角度来看,可用性是服务的重要特性,因此,它需要进行协商,协议,监控和报告。这些活动涉及多种实践(包括业务分析,关系管理,服务设计,服务级别管理(SLM)和度量和报告实践,等等),并且可用性管理实践与那些实践结合使用,以确保可用性得到充分,一致的解决。
从理论上讲,可用性易于测量和理解。这取决于服务发生故障的频率以及失效之后恢复的速度。这些特性通常表示为平均故障间隔时间(MTBF)和平均恢复服务时间(MTRS):
● MTBF测量服务发生故障的频率。例如,平均而言,MTBF为4周的服务每年会发生13次故障。
● MTRS测量失效之后服务是恢复的速度。例如,平均而言,MTRS为四个小时的服务将在四个小时内从失效完全恢复。
在实践中,可用性是一个复杂的特性。要进行测量和理解,需要在服务的背景中进行多次测量以及关于如何理解这些测量的协议。可用性取决于服务架构,某些服务组件或服务操作的重要性,不可用的准则,服务时间以及其他参数。
从用户或一组用户的角度来看,可用性可能与从提供者或客户角度衡量的可用性不同。例如,一个200个组中的五个用户无法使用的服务将被五个人视为中断,但仍可以满足该组的议定可用性目标。
可用性管理实践应确保所有相关各方对可用性(预期,同意,设计和实际)具有透明,一致和实用的理解。
可用性管理 5
当将服务提供给成千上万的人时,通常不会有一个带有客户的通用可用性协议,但是整个服务可用性对于服务提供者至关重要。此类服务通常是为高可用性设计的,其中可靠性(高MTBF)与快速恢复(短MTRS)保持平衡。
可用性与服务绩效,容量,连续性和信息安全紧密相连。讨论这些领域的ITIL 管理实践指南通常解决配置项和服务的相同特征,但着重于质量的不同方面。这些实践可以从共享所有服务管理四维模型的资源中受益匪浅;但是,在某些情况下,尤其是在服务连续性和信息安全等受严格监管的区域中,需要明确区分责任。
2.2 术语和概念
服务可用性是业务成功的关键,服务可用性和客户与用户满意度之间存在直接关联。但是,可以在服务失败时实现客户满意度。服务提供者在失败情况下的反应方式在客户感知上具有主要的影响力。
在不了解服务如何支持消费者的情况下,很难对改进可用性进行操作。
2.2.1 关键业务功能
关键业务功能(VBF)是一个术语,用于反映服务的一部分,这对组织的成功至关重要。服务也可能支持许多不重要的业务功能。
例如,电子邮件服务的VBF将发送和接收电子邮件,并访问已归档的消息。访问日历的功能可能并不重要。
重要功能和非重要功能之间的区别很重要,应该影响力可用性设计和相关成本。通常,业务职能越重要,需求就越具有弹性和可用性。
2.2.2 可用性用于不同类型的服务
对于不同类型的服务产品,可以不同地定义可用性。例如,如果服务供应:
● 启用业务运营(例如贷款批准流程或财务报告流程),通常根据业务运营的执行来定义可用性。
● 提供对资源的访问(例如网络,打印或电子邮件服务),可用性是根据资源可用性定义和度量的。
● 包括实现操作(例如用户支持),可用性通常不是适用的措施。相反,重点应该放在及时完成请求上。
2.2.3 可用性准则
定义可用性对服务的要求通常很复杂。服务可能具有多种功能和客户,每个客户可能对每个职能都有不同的可用性要求。
通常,对于非职能型的要求,性能低下(服务缓慢,不安全,不兼容等)和不可用性之间的界限很难确定。
在定义服务可用性时,必须考虑以下几点:
● 服务启用的业务功能的重要性
● 各种形式的性能不佳和不可用的阈值;例如,在达到约定的阈值之前,发送或接收电子邮件的延迟可以视为服务级别降级,而不是服务不可用。
● 受影响的用户,业务单元和/或站点的数量;例如,只有在超过一定百分比的用户受到影响时,才可以将服务视为不可用
● 某些重要用户,业务单元,站点等是否受到影响;例如,对于电子邮件服务,如果需要直接与客户和合作伙伴通信的用户能够使用服务,则认为服务可用
● 服务的交付时间表和高峰时间:仅在晚上或周末停机的服务不被视为不可用。
这些因素反映了服务提供者和客户如何定义不可用性。实践最好在服务级别协议中记录服务的可用性准则。
2.2.4 可用性指标
可用性是服务质量的最重要指标之一,因此服务提供者必须能够测量,评估报告和可用性。广泛接受的实践是报告可用性的百分比,可以使用一个简单的公式来计算:
可用性= (约定服务时间- 停机时间) / 约定服务时间
该公式可能很有用,尤其是对于资源提供服务,但它不能反映复杂的服务中断场景对业务的影响。
理想的可用性指标将测量由于服务不可用而造成的财务损失。不幸的是,通常很难或不可能测量或估计这样的指标。因此,服务提供者和客户应该定义一组可接受的度量标准,以反映消费者如何因服务中断而丢失资金,即使这些度量标准可能略有不准确。
应考虑以下因素:
● 服务停机时间的累积时间越长,损失就越大。
● 一个服务中断时间越长,损失就越大。在大多数情况下,停运期间的财务损失成倍增长。服务提供者可能会面临罚款,监管判断,竞争优势减弱,声誉受损等问题。
● 中断频率越高,损失就越大,因为与管理损失事态和重新启动业务运营相关的费用很高。
可用性可以通过各种方式进行测量,评估和报告。这些包括但不限于以下指标:
● MTBF
● 两次故障之间的最短时间
● 服务中断次数
● 同期内总计停机时间
● 最大单次中断
● 地铁
在定义度量可用性的指标时,至关重要的是要反映服务中断的业务影响而不是服务组件的技术可用性。
2.2.5 可用性度量
可用性的测量基于停机时间的精确跟踪周期。因此,可用性管理实践的最重要目标之一就是使用设计和管理可用性监控工具,并将生成的数据转换为有意义的服务可用性信息。
事件管理记录是服务中断数据的来源。但是,基于事件日志的可用性数据通常是不可靠的,并且很难与商定的服务可用性指标保持一致。
基础结构监控工具是可用性数据的常见来源。但是,尽管这些工具的信息在评估资源供应服务的可用性时很有用,但在评估启用业务运营的服务的可用性时却没有用。诸如用户监控和业务交易监控之类的工具对于这些服务更为有用。
The availability management practice ensures that requirements for the availability of services and resources are understood and fulfilled efficiently and in line with the organization’s strategy and commitments. To enable this, this practice is applied throughout the organization’s product and service lifecycle, from ideation to operations.
This practice is extremely important when products and services are planned and designed; decisions made at this stage will affect availability levels and related constraints, as well as the organization’s ability to monitor and manage these aspects.
Availability is an important service characteristic from the consumers’ perspective, and therefore it is subject to negotiation, agreement, monitoring, and reporting. These activities involve multiple practices (including the business analysis, relationship management, service design, service level management (SLM), and measurement and reporting practices, among others), and the availability management practice is used in conjunction with those to ensure that availability is sufficiently and consistently addressed.
Theoretically, availability is simple to measure and understand; it depends on how frequently the service fails and how quickly it recovers after a failure. These characteristics are often expressed as mean time between failures (MTBF) and mean time to restore service (MTRS):
● MTBF measures how frequently the service fails. For example, on average, a service with a MTBF of four weeks fails 13 times each year.
● MTRS measures how quickly service is restored after a failure. For example, on average, a service with a MTRS of four hours will fully recover from failure in four hours.
In practice, availability is a complex characteristic. To be measured and understood, multiple measurements and agreements about how these measurements should be understood in the context of a service are needed. Availability depends on the service architecture, importance of certain service components or service actions, criteria of unavailability, service hours, and other parameters.
Availability from the perspective of a user or a group of users can be different from the availability measured from the provider’s or customer’s perspective. For example, a service that is unavailable to five users in a group of 200 will be perceived by the five as interrupted, but the agreed availability targets for the group may still be met.
The availability management practice should ensure a transparent, consistent, and practical understanding of availability (expected, agreed, designed, and actual) among all relevant parties.
Availability management 5
When a service is provided to thousands or even millions of people, there is not usually a single generic availability agreement with customers, but the overall service availability is critical for the service provider. Such services are usually designed for high availability, where reliability (high MTBF) is balanced with fast recovery (short MTRS).
Availability is closely connected to service performance, capacity, continuity, and information security. The ITIL management practice guides that discuss these areas often address the same characteristics of configuration items and services, but focus on different aspects of their quality. These practices can significantly benefit from sharing resources of all four dimensions of service management; however, clear separation of responsibilities is required in some cases, especially in heavily regulated areas, such as service continuity and information security.
2.2 TERMS AND CONCEPTS
Service availability is central to business success, there is a direct correlation between service availability and customer and user satisfaction. However, it is possible to achieve customer satisfaction when services fail. The way in which a service provider reacts in a failed situation has a major influence on customer perception.
It is difficult to improve availability without understanding how the services support the consumer.
2.2.1 Vital business function
Vital business function (VBF) is a term used to reflect the part of a service that is critical to the organization’s success. A service may also support a number of business functions that are not vital.
For example, an e-mail service’s VBFs would be sending and receiving email, and accessing archived messages. The ability to access a calendar may not be vital.
This distinction between vital and non-vital functions is important and should influence availability design and associated costs. Generally, the more vital the business function, the more resilient and available it needs to be.
2.2.2 Availability for different types of services
Availability can be defined differently for different types of service offerings. For example, if the service offering:
● Enables business operations (such as a loan approval process or financial reporting process), availability is normally defined in terms of the execution of business operations.
● Provides access to a resource (such as network, print, or email services), availability is defined and measured in terms of resource availability.
● Includes fulfilment actions (such as user support), availability is often not an applicable measure. Instead, the focus should be on timely request completion.
2.2.3 Availability criteria
Defining availability requirements for services is often complicated. A service may have multiple functions and customers, each of whom may have different availability requirements for each function.
Quite often for non-functional requirements, the line between underperformance (the service being slow, unsecure, non-compliant, and so on) and unavailability is difficult to identify.
When defining service availability, it is essential to consider the following:
● the criticality of business functions that are enabled by the service
● thresholds for various forms of underperformance and unavailability; for example, delays in sending or receiving e-mail may be treated as service level degradation, not service unavailability, until they reach the agreed threshold
● the number of users, business units, and/or sites that are impacted; for example, the service may only be considered unavailable if more than a certain percentage of users are impacted
● whether certain vital users, business units, sites, and so on, are impacted; for example, for an e-mail service, it may be that, if users who need to communicate directly with customers and partners are able to use the service, the service is considered available
● the service delivery schedule and peak hours: a service that only has outages at night or on weekends may not be considered unavailable.
These factors reflect how the service provider and customers define unavailability. It is good practice to document the agreed availability criteria for the service in a service level agreement.
2.2.4 Availability metrics
Availability is one of the most essential indicators of service quality, so service providers must be able to measure, assess, and report availability. Widely accepted practice is to report availability as a percentage, which can be calculated using a simple formula:
Availability = (agreed service time - downtime ) / agreed service time
This formula can be useful, especially for resource provision services, but it does not reflect the business impacts of complicated service disruption scenarios.
The ideal availability metric would measure financial losses due to service unavailability. Unfortunately, it is often difficult or impossible to measure or estimate such a metric. Therefore, the service provider and the customer should define a set of acceptable metrics that reflect how the consumer loses money due to service outages, even if these metrics may be slightly inaccurate.
The following factors should be considered:
● The longer the cumulative service downtime is, the higher the losses are.
● The longer a single service outage is, the higher the losses are. In most cases, financial losses grow exponentially during an outage. The service provider may face fines, regulatory judgments, diminished competitive advantage, reputational damage, and so on.
● The more frequent the outages are, the higher the losses are, because the expenses associated with managing a loss event and restarting business operations are high.
Availability can be measured, assessed, and reported in various ways. These include, but are not limited to, the following metrics:
● MTBF
● minimum time between failures
● number of service disruptions
Availability management 7
● total downtime over the period
● maximum single outage
● MTRS.
When defining metrics to measure availability, it is crucial to reflect the business impact of service disruptions rather than the technical availability of service components.
2.2.5 Availability measurement
Availability measurements are based on accurately tracked periods of downtime. Therefore, one of the most important objectives for the availability management practice is to design and manage availability monitoring tools and translate the resulting data into meaningful service availability information.
Incident management records are a source of service disruptions data. However, availability data based on incident logs is often unreliable and difficult to align to the agreed service availability metrics.
Infrastructure monitoring tools are common sources of availability data. However, although information from these tools is useful when measuring the availability of resource provision services, it is less useful when measuring the availability of services that enable business operations. Tools such as real user monitoring and business transaction monitoring are more useful for these services.
|
上一篇: Practice_Monitoring and event management 监控和事态管理实践下一篇: Practice_Architecture management ITIL 4架构管理实践
|