Dealing with information security incidents. Investigation of the ib incident in hot pursuit. OWA log analysis

Incident Management Process

Unfortunately, the world is not perfect. This applies equally to IT services. Failures may occur in the provision of IT services: the service may become unavailable, work with errors, may be received unauthorized access to information, etc. Those. negative deviations from the normal provision of the service may occur. In ITIL, these deviations are called incidents.

An incident is an unplanned interruption or reduction in the quality of an IT service. The failure of a configuration item that has not yet affected the service is also an incident, such as the failure of one drive from a mirror array.

In some cases, the incident may go unnoticed by users, while in others it may have a significant financial, reputational and other negative impact on the business. If the incident did occur, then it is necessary to minimize its negative impact.

How to do it? In one case - to "repair" as quickly as possible, in the other - to restore the most important functions as soon as possible, in the third - to apply a workaround, etc.

Workaround is the reduction or elimination of the impact of an incident or problem for which this moment full resolution is not available.

As a rule, the activities of IT departments related to the resolution of incidents have a significant impact on the perception of IT by users as a whole. In order to effectively manage these activities, an appropriate course of action must be determined. In accordance with ITIL recommendations, an incident management process should be built for this.

Incident Management is the process responsible for managing the life cycle of all incidents. Incident management ensures that the business impact is minimized and the service is restored to normal operation as quickly as possible.

As part of achieving the goal, the tasks of the incident management process are:

  • Ensuring the use of standard methods and procedures for effective and prompt response, analysis, documentation, ongoing management and reporting in the course of incident resolution.
  • Increasing transparency and communication when resolving incidents between business and IT.
  • Improve business perception of IT through a professional approach to incident resolution.
  • Aligning incident resolution priorities with business priorities.
  • Maintaining user satisfaction with the quality of IT services.

Incident Management Process Activities

Incidents can occur in any part of the infrastructure. Often they are reported by users, but they can also be detected by IT employees, but based on information from monitoring systems.

In most cases, incidents are logged by the Service Desk, where they are reported. All incidents should be logged immediately upon receipt of a report for the following reasons:

  • it is difficult to accurately record information about an incident if this is not done immediately;
  • monitoring the progress of work to resolve the incident is possible only if the incident is registered;
  • logged incidents help in diagnosing new incidents;
  • Problem management can use reported incidents when working to find root causes;
  • it is easier to determine the degree of impact if all messages (calls) are registered;
  • without registering incidents, it is impossible to control the implementation of agreements (SLA);
  • Immediate logging of incidents prevents situations where either multiple people are working on the same incident, or no one is doing anything to resolve the incident.

All relevant information about the incident should be recorded and available to support teams.

An example of incident information:

When an incident is initially recorded, it must be categorized.

A category is a named group of objects that have something in common. Categories are used to group similar objects. For example, cost types are used to group costs of the same type, incident categories - of the same type of incidents, CI types - of the same type of configuration items.

The correct categorization of incidents helps to redirect them immediately to desired group and analyze incidents from various perspectives, and also forms the basis for finding the causes of incidents and their elimination as part of the problem management process.

Each incident is assigned a specific priority.

Priority is based on impact and urgency and is used to determine the required processing time.

Urgency is a measure of how quickly, from the moment an incident occurs, it will have a significant impact on the business.

The degree of influence (impact) - a measure of the impact of the incident on the business process.

So, in effect, priority is a number based on urgency (how quickly it needs to be fixed) and impact (how much damage will be done if not fixed quickly). Priority = Urgency x Degree of impact. Based on the priority, the order in which incidents are resolved is determined.

Priority is set according to following factors:

  • Urgency
  • Impact on business
  • Risk to life or limb
  • Number of affected services
  • Financial losses
  • Impact on business reputation
  • Impact on compliance with laws and other regulations etc.

Taking into account the established priority and existing agreements (SLA), the user is informed about the maximum estimated time for resolving the incident (deadline). These dates are also fixed. The incident is assigned a unique number and the user is informed about the incident number for its accurate identification in subsequent calls.

Directly at the user's request, Service Desk specialists should conduct a preliminary diagnosis of the incident in order to obtain the necessary information to determine the cause of the incident, if possible, as well as for correct categorization and transfer to the next support line. If the solution to the incident is within the competence of the Service Desk employee, then it can be resolved immediately. The Service Desk sends incidents that do not have ready solution or beyond the competence of the employee working with him, a support team of the next level with more experience and knowledge. This group investigates and resolves the incident or forwards it to the next level of support.

In the process of resolving an incident, various specialists can update the registration record about it, changing the current status, information about the actions performed, revising the classification, and updating the time and code of the employee who worked.

In most cases, the Service Desk is responsible for monitoring the progress of the solution, as the "owner" of all incidents. This service should also inform the user about the status of the incident. User feedback may be appropriate after a status change, such as forwarding an incident to the next support line, a change in estimated time to resolve, escalation, etc. During monitoring, functional escalation to other support teams or hierarchical escalation to make management decisions is possible.

Escalation - activities aimed at obtaining additional resources when necessary to achieve targets service level or meet customer expectations. Escalation may be required within any process IT management-services, but is most commonly associated with incident management, problem management, and customer complaint management. There are two types of escalation: functional escalation and hierarchical escalation.

After the successful completion of the analysis and resolution of the incident, the employee records information about the applied solution. If at a certain point in time it is not possible to completely resolve the incident, its impact, if possible, should be reduced by applying a workaround. In the worst case, if no solution is found, the incident remains open.

After implementing a solution that satisfies the user, the support team forwards the incident back to the Service Desk. The Service Desk contacts the employee who reported the incident in order to receive confirmation that the issue was successfully resolved. If he confirms this, then the incident can be closed; otherwise, the process resumes at the appropriate level. When you close an incident, you must update the final category, priority, services affected by the incident, and the CI that caused the failure.

Policies and basic principles of the incident management process

Incident management process policies must be followed to ensure the effectiveness and efficiency of the process, and may include the following aspects:

  • Good coordination between users and incident responders
  • Incidents must be resolved within the time frame agreed with the business
  • User satisfaction must be ensured at all stages of incident resolution
  • Incident management activities should be aligned with service levels and support objectives based on real business needs
  • All incidents are managed and their data is stored in a single management system
  • All incidents must have a standard categorization scheme that matches the business processes of the enterprise.
  • Incident records should be checked regularly for correct entry and correct classification.
  • All incident records should, to the extent possible, have a common format and set of information fields.
  • There should be a common set of criteria agreed with the business to prioritize and escalate incidents

The following describes the basic principles that should be taken into account when implementing incident management.

Timescales - for all stages of incident processing, timescales must be agreed upon (they will differ depending on the priority level of the incident). All support groups should be fully aware of these time frames.

Many incidents are not new - they are related to something that has already happened before and may happen again. For this reason, it would be wise to define "standard" incident models in advance and apply them when relevant incidents occur.

An incident model is a predefined way of handling a specific type of incident.

An incident model may include the following aspects:

  • A predefined sequence of actions to handle a specific type of incident
  • Predetermined Responsibility
  • Precautions before Incident Resolution
  • Time frames and escalation procedures
  • Evidence of activity (records, logs)

Major incidents are identified as part of the incident management process.

A significant incident causes a significant loss to the business and should have separate handling procedures.

Incidents must be tracked throughout their life cycle to ensure they are properly handled and reported on the status of incidents. In an incident management system, status codes can be linked to incidents to indicate where they are in relation to the life cycle. Examples of these might include:

The status of the incident indicates the status of the incident in processing. Examples of statuses are:

  • new;
  • accepted;
  • planned;
  • appointed;
  • active;
  • postponed;
  • permitted;
  • closed.

Incident Management Process Metrics

To manage and evaluate the effectiveness of the incident management process, and to ensure feedback with other management processes, ITIL suggests using the following key metrics (CSFs and KPIs):

  • CSF Fast resolution of incidents, minimizing their impact on the business
    • KPI Average time taken to resolve an incident
    • KPI Distribution of incidents by status
    • KPI Percentage of incidents resolved by first line support
    • KPI Percentage of incidents resolved remotely
    • KPI Number of resolved incidents that did not affect the business
  • CSF IT Service Quality Support
    • KPI Total number of incidents (benchmark)
    • KPI Backlog queue size per service
    • KPI Number and percentage of major incidents per service
  • CSF User Satisfaction Support
    • KPI Average survey score by users/customers
    • KPI Percentage of response satisfaction compared to the total number of participants in the survey
  • CSF Increasing transparency and communication in incident resolution between business and IT support staff
    • KPI The average number of calls to the help desk or other contact with users about incidents that have already been notified
    • KPI The number of complaints and problems regarding the content and quality of communications when resolving incidents
  • CSF Alignment of Incident Management Activity Priorities with Business Priorities
    • KPI Percentage of incidents resolved without violating SLA goals
    • KPI Average cost per incident
  • CSF Ensuring that standard methods and procedures are used when dealing with incidents
    • KPI Number and percentage of misassigned incidents
    • KPI Number and percentage of misclassified incidents
    • KPI Number and percentage of incidents handled by Service Desk employees
    • KPI Number and percentage of incidents related to changes and releases

Risks and difficulties

When implementing incident management, consider the following: possible risks and difficulty:

  • The need for early detection of incidents - configuration of event management (monitoring) tools will be required, as well as user training in informing about incidents
  • The need for total registration of incidents
  • The need to introduce adequate automated system management and ensuring its integration with various IT management systems (for example, CMS)
  • The Need for High Availability of a Single Point of Contact
  • The need to ensure that process is followed and identify process bypasses - if users fix emerging errors themselves or contact specialists directly without following established procedures, the IT organization will not receive information about the level of service actually provided, the number of errors, and much more. Management reports will also not adequately reflect the situation.
  • Lack of resources in dealing with incidents, overload with incidents and postponing “for later” - with an unexpected increase in the number of incidents, there may not be enough time for proper registration, because before the end of entering information about the incident from one user, it becomes necessary to serve the next. In this case, the input of incident descriptions may not be accurate enough and the procedures for assigning incidents to support corpses will not be performed properly. As a result, the solutions are of poor quality and the workload increases even more. In cases where the number of open incidents begins to grow rapidly, the emergency allocation of additional resources within the organization can prevent staff overload.
  • Lack of Service Catalog and Service Level Agreements (SLAs) - If the services and products supported are not well defined, then it can be difficult for those involved in incident management to justifiably refuse assistance to users.
  • Lack of commitment to the process approach on the part of management and staff - handling incidents using the process approach usually requires a change in culture and a higher level of responsibility for their work on the part of staff. This can cause serious resistance within the organization. Effective incident management requires employees to understand and really commit to a process approach, not just participation.

Business Value

By implementing an incident management process in accordance with ITIL recommendations and solving all the difficulties that may arise during implementation, the following value for the business as a whole can be obtained:

  • Ability to reduce unplanned work and costs for business and IT caused by incidents
  • The ability to detect and resolve incidents, reducing downtime and increasing the availability of business services
  • Ability to allocate IT resources according to their business priority
  • Ability to initiate service improvement based on knowledge of the nature of incidents
  • Ability to identify needs for additional staff training

The incident management process is significantly "visible" to the business and allows you to see the results relatively quickly after its implementation. Therefore, incident management is often one of the first processes implemented in the transition to a process-based IT management organization. An additional benefit of this is the fact that incident management allows you to highlight other areas of IT management that require attention - thus ensuring that the necessary resources are allocated to implement other IT management processes.

Over time, there may be a need to change the IT infrastructure. This can be caused by a number of reasons - the need to fix a problem, a desire to improve the quality of IT services, an aging infrastructure, or a change in legislation.

Experience shows that if changes are not properly controlled, incidents can often occur as a result of their implementation: failures in the normal provision of services. The reasons for such incidents can be various: negligence of employees, lack of resources, insufficient training, poor analysis of the impact of the change, imperfect testing, etc. The number of incidents may increase, each of them will require urgent action, which in turn may lead to the emergence of new incidents. Daily planning often fails to accommodate the increasing workload.

Change - the addition, modification, or removal of anything that could affect IT services. This framework should include all changes to architectures, processes, tools, metrics, and documentation, as well as changes to IT services and other configuration items.

A number of Service Transition processes are responsible for providing change control in ITIL: Change Management, Service Asset and Configuration Management, and Release and Deployment Management.

Change Management is the process responsible for managing the lifecycle of all changes so that beneficial changes can be implemented with minimal disruption to IT services.

As part of achieving the goal, the objectives of the change management process are:

  • Respond to changing customer business requirements, maximizing business value and reducing incidents, failures and rework
  • Respond to change requests from business and IT to ensure services meet business needs
  • Ensure that all changes are recorded, evaluated, authorized, prioritized, planned, tested, implemented, documented, and reviewed in a controlled manner
  • Ensure that all configuration item changes are logged in the configuration management system (CMS)
  • Optimize business risks

The scope of the change management process includes changes to IT infrastructure, processes, tools, metrics and documentation, as well as changes to IT services and other configuration items.

Change Management Activities

The figure shows general scheme change management process. To ensure change control, all changes must be logged. If necessary making a change within the scope of the process, a request for change (RFC) must be filed.

A change request is a formal proposal to make a change. The change request includes details of the proposed change and may be written on paper or in electronic format. The term "change request" is often misused to mean "change record" or "change" by itself.

Within the ITIL change management process, there are three types of changes:

A standard change is a pre-authorized change that is low risk, relatively common, and follows some procedure or procedure. work instructions. For example, resetting a password or providing a new employee with standard equipment. RFCs are not required to implement standard changes, they are recorded and tracked using another mechanism such as service requests.

An emergency change is a change that must be implemented as quickly as possible, such as to resolve a major incident or install a security update. The change management process usually provides a specific procedure for emergency change management.

A normal change is a change that is not urgent or standard. Normal changes are handled through specific steps in the change management process.

Thus, if a change falls into the category of standard, then it should be managed as part of the service request management process. is it certain change standard or normal is set for each organization independently. For emergency changes, the usual procedures are not used, as the necessary resources are provided immediately.

The following is an example of information that may be included in Requests for Change (RFCs):

  • request identification number;
  • issue/known error number (if any) associated with the request;
  • description and definition of the corresponding configuration items;
  • the reason for the change, including justification and expected business result;
  • current and new version of the configuration item being changed;
  • the name, address and telephone number of the person making the request;
  • date of application;
  • preliminary assessment of the necessary resources and time;
  • etc.

A change request is created by an initiator, which can be an individual or a group of people. If a significant change is required, a change proposal may be required.

Change Proposal - A document containing a high-level description of a potential service or significant change, an associated business case, and an expected implementation schedule. Change proposals are typically created as part of the service portfolio management process and submitted to the change management process for authorization. As part of the change management process, the potential impact on other services, shared resources, and the overall change plan is assessed.

All change requests received must be logged and a change record must be created for each change.

Change record - a record containing detailed information about the change. Each change record documents life cycle one change. A change record is created for each change request received, even if it is subsequently denied.

After a Request for Change (RFC) is filed, Change Management does an initial check to see if any of the requests are unclear, illogical, impractical, or unnecessary. Such requests are rejected with an explanation of the reasons. The employee making the request should always be given the opportunity to defend their request.

In order to evaluate the change, ITIL suggests answering 7 questions (7 ‘R’s):

  • Who is the initiator? (RAISED) (Who RAISED the change?)
  • What is the reason? (REASON) (What is the REASON for the change?)
  • What result is required? (RETURN) (What is the RETURN required from the change?)
  • What are the risks associated with the change? (RISKS) (What are the RISKS involved in the change?)
  • What resources are required to make the change? (RESOURCES) (What RESOURCES are required to deliver the change?)
  • Who is responsible for building, testing, and implementing the change? (RESPONSIBLE) (Who is RESPONSIBLE for the build, test and implementation of the change?)
  • What is the relationship between this and other changes? (RELATIONSHIP) (What is the RELATIONSHIP between this change and other changes?)

When a Request for Change (RFC) is accepted for processing, the change record includes the information necessary for further processing of the change.

The following information may be added to the record later:

  • assigned priority;
  • impact assessment and cost implications;
  • category;
  • recommendations from the change management process manager;
  • date and time when the change was authorized;
  • scheduled date of the event;
  • plan to return to the original state;
  • support requirements;
  • change plan;
  • information about the developer and employees responsible for implementing the change;
  • the actual date and time of the change;
  • date of evaluation of the results;
  • test results and problems found;
  • reasons for rejecting the request (if necessary);
  • evaluation of results.

Upon receipt of a change request (RFC), its priority and category are determined. Priority indicates how important this request is compared to others. This, in turn, is determined by its urgency and degree of impact.

An example of a priority coding system:

  • Low Priority - The change is desirable, but implementation may be delayed until a more convenient time (for example, until the next release or scheduled maintenance).
  • Normal priority - no urgency and high impact, but change should not be delayed.
  • High Priority - The change is about a serious bug affecting a number of users, or a new atypical bug affecting a large group of users, or related to other urgent issues.
  • The highest priority - a request for change (RFC) concerns an issue that has a significant impact on a customer-critical service. Changes with this priority are classified as "emergency".
  • Low Impact - A change that requires little work to be done.
  • Significant Impact - A change that requires significant effort and has a significant impact on IT services. These changes are discussed in a Change Board (CAB) to determine the required effort (resources, etc.) and potential impact.
  • The highest degree of impact is the change that requires significant effort. the process manager must first obtain authorization to make a change to the IT management or IT steering committee, after which the change is submitted to the change board (CAB).

Change Board - A group of people who help evaluate, prioritize, authorize, and schedule changes. The change board typically includes representatives from the IT service provider, the business, and third parties (such as contractors).

These codes can be represented in numbers, for example: low = 1 / high = 3

Most of the changes fall into the first two categories. Based on the assessment of the impact of the change, the level of change authorization (change authority) should be determined, for example, as shown in the figure.

In addition to the classification, the teams involved in the work on the technical solution and the services affected by the change should also be identified.

If a change is approved by the appropriate authorities, the approved changes are communicated to the appropriate technicians who will develop and implement the changes. As part of the change management process, implementation is coordinated. Direct development, testing and implementation is carried out as part of the realism and deployment management process. The implementation of the change occurs after the approval of the test results as part of the change management process.

As part of the change management process, a schedule of changes is maintained.

Change schedule - a document listing all approved changes and planned dates for their implementation, as well as approximate dates implementation of later changes.

Members of the Change Board (CAB) provide advice on how to plan for change, as staffing availability, resources, costs, various aspects of the services involved, and customer input need to be taken into account. The Change Board (CAB) serves as an advisory body and meets on a regular basis. Change planning information should be distributed well in advance of the change board meeting. Relevant documentation and information on agenda items should also be circulated prior to the meeting.

The change board meeting agenda should include a number of standing items, including:

  • Unsuccessful or unauthorized changes
  • Requests for Changes (RFCs) submitted to Change Board members in order of priority
  • Requests for Changes (RFCs) reviewed by the change council
  • Planning for change and updating the change schedule
  • Assessments of the changes made
  • Change management process, additions and process changes
  • Process Achievements and Business Benefits Through the Change Management Process
  • Changes in progress and changes in progress
  • Scheduling Change Requests for Consideration at the Next Change Council
  • Check for unauthorized changes detected by the service asset and configuration management process

As part of the overall scheme for implementing the change, a procedure for reverting to the original state should be developed in case the change does not achieve the desired result. Change management should not approve the implementation of a change in the absence of a check-in procedure.

It is necessary to evaluate the changes made, with the possible exception of standard changes. If necessary, the change board (CAB) decides on further follow-up actions. The following questions should be considered:

  • Did the change achieve its goals?
  • Are users and customers satisfied?
  • There were no side effects?
  • Were the resources used to implement the change as planned?
  • Was the change implemented on time and without cost overrun?
  • Did the implementation plan function correctly?
  • Did the recovery plan function correctly if needed?
  • Etc.

If the change is successful, the change request (RFC) can be closed. This occurs during the Implementation Results Assessment (PIR) phase. If the change fails, the process resumes from where it failed using the new approach. Sometimes it's better to go back and create a new or modified change request (RFC). Continuing with a failed change often makes the situation worse.

Implementation Impact Assessment (PIR) is a review performed after a change or project has been implemented. Implementation evaluation determines the success of a change or project and identifies opportunities for improvement.

Depending on the nature of the change, the evaluation can be carried out either after a few days or after a few months. For example, a change in a daily used personal computer may be evaluated in a few days, but a change in a once a week system may be made in only three months.

Making emergency changes

No matter how well the planning is done, there may be changes that require the highest priority. Emergency changes are very important for the company and they should be implemented as soon as possible. They require separate procedures for urgent processing, but with overall control from the change management process. In the event of such a situation, an Emergency Change Board (eCAB) meeting may be arranged.

Emergency Change Board (eCAB) - A group of people on the change board who make decisions about emergency change. The decision on the composition of the members of the board for emergency changes can be made directly at the organization of the meeting. The need for participation is determined based on the nature of the urgent change.

If there is no time for this, or if the request came after business hours, there must be an alternative way to obtain change authorization. It doesn't have to be a face-to-face meeting, a conference call can be used instead.

Policies and basic principles of the change management process

Change management process policies must be followed to ensure the effectiveness and efficiency of the process, and may include the following:

  • Absolute inadmissibility of unauthorized changes, creation of a culture of change
  • Alignment of change management with change management processes and customer projects
  • Categorization of changes, e.g. innovative, exploratory, preventive, corrective changes
  • Determination of responsibility for changes at all stages of the service life cycle
  • Separation of responsibility for management
  • Create a single point of responsibility for changes to reduce the chance of conflicting changes and the risk of failure in the production environment

Change Management Process Metrics

To manage and evaluate the effectiveness of the change management process, as well as to provide feedback to other management processes, ITIL suggests using the following key indicators:

  • Percentage of changes that satisfied customer requirements
  • The benefit of the change, expressed as "the value of the improvements made" + "the negative impact avoided" compared to the cost of the change
  • Reducing service disruptions, defects and rework caused by inaccurate specifications or insufficient impact assessment
  • Reducing the number of unauthorized changes
  • Reduced change request queue, unplanned change rate, and hot fixes
  • Reducing the number of changes that require recovery
  • Reducing the number of unsuccessful changes
  • Average execution time by urgency/priority/type
  • Number of incidents related to the change
  • Change Estimation Accuracy

Business Value

By implementing a change management process in accordance with ITIL recommendations and resolving all the difficulties that may arise during implementation, the following value for the business as a whole can be obtained:

  • Prioritize and respond to change requests from business and customers
  • Implementation of changes that meet the agreed requirements for services is optimally cost-effective
  • Reducing the number of unsuccessful changes leading to service interruption, defects and rework
  • Implementation of changes in accordance with the time frame determined by the business
  • Track changes within the life cycle of the service and assets of its customers
  • Better estimate of the quality, time and cost of changes
  • Assessment of risks associated with service changes (commissioning or decommissioning)
  • Increasing staff productivity by minimizing the number of unplanned or "urgent" changes, and, as a result, increasing the availability of services
  • Reduced mean time to recovery through faster and more successful implementation of corrective changes
  • Maintain communication with the business change process to identify business improvement opportunities

Would you like the services provided to you to be of high quality? I think yes. One of the main tasks of ITSM, and ITIL in particular, is to provide quality IT services.

IT service management (ITSM) is the implementation and management of quality IT services that meet business needs.

Not always the opinion of IT service providers and customers regarding the quality of services converges.

Quality is the ability of a product, service, or process to deliver the value expected by the consumer. For example, the quality of a component can be considered high if its performance meets expectations and provides the required reliability.

The above is the definition of quality according to ITIL. Those. if we want to provide quality services, it is necessary that they meet the expectations of the customer.

As the saying goes: "You can't manage what you can't measure." Thus, in order to ensure the provision of quality services, it is first necessary to clarify the customer's expectations for IT services, agree on them, possibly limit them in some way, for example, if the customer's requirement is unrealistic, and present them in a measurable form. Then it remains to ensure that the actual parameters of the service meet the expectations of the customer and confirm this by providing appropriate reporting.

According to ITIL, the service level management process, which is vital, is responsible for agreeing and documenting service level objectives and responsibilities in the service level agreement (SLA) and service level requirements (SLR) for each service and associated IT activities. process for each IT service provider organization.

Service level management is the process responsible for negotiating and negotiating feasible service level agreements and ensuring that they are met. Service Level Management is responsible for ensuring that IT service management processes, operational level agreements, and external contracts meet agreed service level targets. Service Level Management monitors and reports on service levels, conducts regular service evaluations with customers, and identifies necessary improvements.

A service level agreement (SLA) is an agreement between an IT service provider and a customer. The service level agreement describes the IT service, documents the service level targets, specifies the areas of responsibility of the parties - the IT service provider and the customer. A single SLA can cover multiple IT services or multiple customers.

Service level requirement (SLR) is a customer requirement for an IT service. Service level requirements are based on business objectives and are used to negotiate and agree on service level targets.

Through the formation of service level objectives, service level management sets the requirements and performance parameters for a number of other operational and tactical ITIL processes, such as: incident management, service request management, problem management, change management, release management, availability management, etc.

Service level target - Commitments set out in a service level agreement. Service level targets are based on service level requirements and are needed to ensure that an IT service meets business objectives. Service level targets should be SMART and are usually based on key performance indicators.

If these service level targets match and accurately reflect business requirements, then the service provided by service providers will be in line with business requirements and meet customer and user expectations for service quality. If the goals do not meet business needs, then service providers' performance and service levels will not meet business expectations and problems may arise. Service Level Agreement - A level of assurance or assurance regarding the level of quality of service provided by the service provider for each service provided by the business.

Service Level Management is the process that links the IT service provider and the customer. This process has the following tasks:

  • Define, document, agree, monitor, report and evaluate the level of IT services provided
  • Maintain and improve relationships and communications with business and customers
  • Ensure that there are precise and measurable goals for all IT services
  • Monitor and improve customer satisfaction with service quality
  • Ensure clarity and unambiguity of service level expectations from IT and customers
  • Ensure that proactive service level improvements are implemented where justified and practicable.

Service level management should ensure constant communication and communication of managers of customer organizations and business. This should give the business a sense of the service provider and the IT service provider of the business.

The scope of the service level management process should include:

  • Organization of relations with business
  • Discussion and agreement current requirements and goals, documenting and maintaining SLA for the services provided
  • Discussing and agreeing on requirements and goals, documenting and maintaining SLRs for planned new and changed services.
  • Develop and maintain Operational Level Agreements (OLAs) to support SLA goals.
  • Evaluation and alignment with the objectives of the SLA of all external contracts (UC) - together with the management of suppliers.
  • Failure prevention, risk mitigation, and implementation of service improvements work together with other processes.
  • Reporting and evaluation of all services and analysis of any deviations from SLA targets.
  • Initiate and coordinate the Service Improvement Plan (SIP).

An operational level agreement (OLA) is an agreement between an IT service provider and another part of the same organization.

An underpinning contract (UC) is an agreement between an IT service provider and a third party. The third party provides goods or services that support the delivery of IT services to the customer. The external contract defines the scope and responsibilities required to achieve the agreed service level targets in one or more service level agreements.

A service improvement plan (SIP) is a formal plan for implementing improvements to a process or IT service.

Service Level Management Process Activities

The figure shows a general diagram of the service level management process.


As businesses become more dependent on IT services, the demand for high-quality IT services increases. As defined above, the quality of the service is determined by the expectations of the customer, as well as the continuous management of these expectations, the stability of the service and the acceptability of the cost level. Therefore, the most The best way ensure an appropriate level of quality - discuss this issue with the customer himself.

Customer requirements must be presented in measurable terms so that they can be used in the development and monitoring of IT services. If the metrics are not agreed with the customer, then it will not be possible to check how the services correspond to the agreements reached.

The first step to agreeing on current or future IT services should be to identify and define customer needs in the form of service level requirements (SLRs). In addition to performing this activity at the very beginning of this process, it is recommended to do it regularly at the request of the customer or at the initiative of the IT organization itself and cover both new and existing services.

The initial definition of what should be included in service level requirements and service level agreements is a very difficult task. Consideration should be given to the capabilities and limitations of all processes in relation to the measurability and achievability of certain service objectives.

If there is any doubt about the achievability of the goals of the service requested by the business, then the corresponding goals can be included in the pilot agreement for monitoring and evaluation during the control guarantee period. This will help to obtain the necessary statistics and make the necessary corrections.

While many organizations seek to document the services they provide first and foremost by entering into appropriate service level agreements, agreeing on service level requirements for new services being developed or acquired is also a very important task.

Service level requirements should be an integral part of the service design criteria, which also includes functional specifications. They should define test and run-in criteria for the various stages of design and development or procurement from the earliest stages of design. Service Level Requirements will be progressively refined at each lifecycle stage, becoming a pilot SLA in the initial support phase. A draft service level agreement must be signed and formalized before the service is put into operation and used.

Experience shows that often customers themselves cannot clearly define their expectations, they simply assume that some services will be provided to them without any specific agreements. The customer may need help understanding and formulating requirements, especially with respect to capacity, security, availability, and continuity. Be prepared for the fact that the initial requirements will not be immediately agreed and approved. It may take several iterations in the requirements discussion before an acceptable balance between wants and capabilities is reached. These iterations may require a redesign of the service solution.

It should be noted that additional resources may be required to support new services. There is often an expectation that already overburdened staff will magically handle the extra workload brought on by new services.

Using the draft agreement as a basis, you can negotiate with customers or their representatives to finalize the scope of service level agreements and initial service level goals, and with vendors to ensure that these goals are achievable.

Service level management should design an appropriate structure for service level agreements to ensure that all services and all customers are covered to the extent that they need to be covered by the organization. There are a number of possible structures, including the following:

  • service level agreements based on a single service;
  • customer-based service level agreements;
  • multi-level service level agreements.

SLAs based on a single service is when a service level agreement affects one service for all customers of that service. For example, a Service Level Agreement may be entered into for an email service, affecting all customers of that service. However, difficulties may arise if there are differences in the requirements of different customers for the same service, or if the characteristics of the infrastructure mean that different levels of services are unavoidable.

For example: Head office staff can communicate using a fast LAN, while local offices must be used by the slow WAN link. In such cases, separate objectives may be given in one agreement. However, as long as a common service level is provided across all areas of the business, such as for an email service, SLAs based on a single service can be an example. effective approach. There can be multiple service tiers in a single agreement, such as Gold, Silver, or Bronze.

Customer-Based Service Level Agreements - an agreement with an individual group of customers that covers all the services they use. For example, agreements can be reached through coverage by the finance department of the financial systems organization, accounting systems, settlement systems, billing systems, purchasing systems and any other IT systems they use. Customers often prefer such agreements, as all their requirements are then covered by one document. As a rule, one signature from the customer is enough, which simplifies the coordination.

The combination of any variants of the structure is possible provided that there are no duplications.

Some organizations use a tiered SLA structure. It may include, for example, three levels:

  • the corporate level covers all general service level management issues applicable to all customers in the organization, as a rule, these sections do not require frequent revision;
  • the level of customers describes the features of the provision of services to specific customers or groups of business units, characteristic of all the services provided to them;
  • the service level describes the specifics of individual services provided to a particular customer or group of customers.

This structure allows the size of the service level knowledge to remain within manageable limits, prevents unnecessary duplication, and reduces the need for frequent updates. However, this involves additional effort to maintain the integrity of the links in the service catalog and in the configuration management system.

Tiered SLAs increase manageability and reduce duplication of documentation across an organization. This means that updates only happen when needed. Within an organization, the names of the levels can be changed, for example: corporate, department and service or group, business area and service.

You need to make sure that the administration of multi-level SLAs is controlled, as any change introduced will have an impact on other levels. This applies to any changes made to the corporate SLA - they must be communicated to other levels. Administration of multi-level SLAs is complex, but it is easier than administration a large number SLAs that are not grouped into such a hierarchy.

Many organizations find it necessary to use standards and/or template agreements that are used as the basis for preparing specific service level agreements. Such templates can be used to develop draft agreements.

The development of standards and models ensures that all agreements are developed consistently, which in turn facilitates their subsequent use, management and operation.

Defining roles and responsibilities is part of the service level agreement. There are three perspectives to consider - IT vendor, IT customer and actual user.

The wording of the agreement should be clear and concise and should not leave room for ambiguity. As a rule, agreements are not required to be written in legal terminology, and simple language helps common understanding. It is useful to involve independent persons for the final proofreading who were not involved in the creation of draft agreements.

It is important that the goals that are documented and agreed upon are clear, specific and unambiguous as they provide the basis for the relationship and quality assurance of the service provided.

SLAs should not include requirements whose future provision cannot be monitored and measured at an agreed level. The importance of this cannot be overemphasized, as the inclusion of items that cannot be effectively monitored almost always results in disputes and a possible loss of trust on the part of the customer. Many organizations have learned this from their mistakes and as a result have received huge costs in both financial terms, as well as in their own image. It is imperative that the circumstances preventing the implementation of the agreements and actions in case of occurrence of such circumstances are identified.

Existing monitoring capabilities should be assessed and, if necessary, updated. Ideally, this should be done before or at the same time as the design of the service level agreement, which will help to use monitoring in the approval of the proposed goals.

It is essential that monitoring matches the customer's perception of the service. Unfortunately, this is often very difficult to achieve. For example, monitoring individual components, such as a network or a server, does not guarantee that the service will be available to the customer as he expects. The customer often worries only about the service that he cannot receive, although the failure may concern other services. A complete picture cannot be obtained without monitoring all components and the service as a whole, and this is difficult and expensive. Accordingly, users should be aware that they should report incidents immediately, especially performance-related incidents, to assist the monitoring provider's work.

There are a number of important parameters that cannot be measured using monitoring tools, such as the perception of services by customers (and this does not necessarily match the results of monitoring). For example, even when a number of incidents have occurred, the customer can maintain a positive perception of the service through visible and appropriate remedial action. Of course, the opposite is also possible, where the customer remains unsatisfied in the absence of violations of the service level agreement.

The first step is to try to manage customer expectations. This means setting the right expectations and goals, and then systematically adjusting them proactively, remembering that “satisfaction = perception - expectations” (when the value is greater than or zero customer is satisfied). SLAs are just documents, and by themselves do not replace the quality of the service provided (although they can influence behavior and help develop a proper service culture that will have both short and long-term positive effects). A certain degree of patience must be exercised and be part of the expectations.

Where the services provided are paid for by the customer, prices can be used to manage demand. (Customers can get whatever they can justify—as long as it fits with enterprise strategy—and have an authorized budget to do so, which is limited.) Where there is no reciprocity, top management support must be secured to limit unrealistic customer expectations.

  • periodic surveys and surveys of customers;
  • feedback at service evaluation meetings;
  • feedback when evaluating the changes made;
  • telephone surveys conducted by the Service Desk;
  • satisfaction questionnaires distributed during service and other contacts;
  • communication with user groups (on forums, etc.);
  • analysis of complaints and thanks.

Where possible, satisfaction targets should be defined and monitored as part of the SLA. Ensure that any feedback from users is responded to by demonstrating to them that their comments have been included in your action plan (Service Improvement Plan). All measurements of satisfaction should be evaluated, deviations should be analyzed, and adjustments should be planned based on the results of the analysis.

Service providers depend on their own support teams and external partners or vendors. They cannot guarantee service level agreements if internal and external dependencies do not support the same goals. Contracts with external providers are mandatory, but many organizations find it useful to also form simple agreements between internal groups, commonly referred to as operational level agreements. "Supporting Agreements" is a generic term for all supporting operational level agreements, service level agreements, and contracts.

Operational level agreements should not be overly complex, but should set clear goals for support teams to ensure that the goals of the service level agreement are met. For example, if an SLA requires incidents to be resolved within a specified timeframe, the Operational Level Agreement should include appropriate limits for each element in the support chain. Obviously, the goals in the SLA in this case should not be the same as the goals in the supporting agreements, since the SLAs define the total time, which includes the work of several groups, for each of which a supporting agreement can be agreed.

SLAs should include call response time, incident escalation time to technicians, and response time. Support hours for each support group should also be determined. If there are special contact procedures for staff ( telephone line for out-of-hours calls, etc.), this should also be documented.

The operational level agreement should be monitored for compliance with the goals set in the service level agreements and supporting contracts, and appropriate reporting should be generated and communicated to support team managers. This can help identify potential problem areas that require adjustments to work or agreements. Serious consideration should be given to the development of formal operational level agreements for all internal teams involved in supporting and delivering operational services.

Accordingly, before signing a new or revised SLA, it is important to review existing contractual agreements and, where necessary, update them. This may require additional costs, on the part of IT or the customer. In the latter case, these costs must be agreed with the customer, or softer targets should be included in the contracts. This review should be carried out in conjunction with supplier management to ensure that not only the requirements of the service level management process are met, but also compliance with other constraints, such as contract policies and standards.

Once a service level agreement has been agreed upon and accepted, monitoring and reporting of the service level achieved should be ensured. Operational reporting should be generated frequently (at least weekly) and, if possible, deviation reports should be generated in response to deviations (or threat of deviations) from the SLA. It is often difficult to meet SLAs early in the operation of a new service due to the large number of change requests that come in. We recommend that you limit the number of change requests allowed at this stage.

Reporting mechanisms, reporting intervals and format should be agreed with customers. The same applies to the frequency and format of service evaluation meetings. Regular intervals are recommended, synchronized with regular reporting.

Periodic reporting should be generated and sent to customers or their representatives and relevant IT managers a few days before service evaluation meetings so that possible difficulties and disagreements are resolved before the meeting and do not interfere with service evaluation.

Periodic reporting should include details of performance against SLA targets, as well as a description of trends and actions to improve service quality. It can be convenient to include tables on the front page of the report in SLA reports so that you can get a quick idea of ​​how the service fits the purpose. IT managers may request interim reporting to evaluate the performance of operational level agreements and contracts. Reporting is an evolving process, the first result is unlikely to be final.

The service level management process should identify reporting needs and automate reporting as much as possible. The variability, accuracy and ease of dissemination of reports is an important part of the criteria for choosing an automation tool. Service reporting should not only include details of service performance, but also provide historical information on past values ​​and trends, which will allow the effectiveness of service improvement measures to be assessed and planned.

Periodic meetings with customers should be organized to jointly evaluate services based on the results of the past period and the deviations and difficulties that have occurred. Usually these meetings are monthly or at least quarterly.

At these meetings, measures should be planned to correct weaknesses in the provision and consumption of services. Decisions should be recorded and their implementation monitored and verified at the next meetings.

Particular attention should be paid to service interruptions; the causes and possible measures to prevent the recurrence of such incidents should be clarified. If it is determined that previously set goals are unattainable, a decision may be made to evaluate, re-negotiate and agree on service goals. If the service interruption was due to a dependency on third parties, it may be necessary to revise the supporting agreements. Service interruption loss analysis provides important information for planning rational improvements. The constant pursuit of improvement must take into account the interests of the business, concentrating efforts in the most important and profitable areas.

The progress and results of the service improvement plan should be reported to assess compliance with the plan and the effectiveness of the measures taken.

All types of agreements must be kept up to date. They should be under change and configuration management control and reviewed periodically, at least once a year, to ensure they are current, complete, and consistent with business needs and strategy.

These checks should ensure that the agreements are up to date in terms of scope and established objectives, confirming that the agreements have not lost their validity (usability) due to any changes in infrastructure, business, suppliers, etc. When agreements are updated, changes made should be subject to change management control. If agreements are reflected in the configuration management system as CIs, this control is easier to carry out, and its results are more reliable.

Reviews should also cover general strategic documents to ensure that agreements are aligned with IT and business strategy and policies.

It is very important that the service level management process builds a relationship of trust and respect with the business, especially with key business representatives. For this to be possible, the service level management process must perform the following activities:

  • confirm lists of stakeholders, customers, business leaders and users;
  • help maintain accurate data in the portfolio and service catalogue;
  • provide flexibility and readiness to respond to the needs of business, customers and users, understanding current and planned business processes and their requirements for new and changing services, documenting and discussing these requirements with business, customers and users, forming long-term relationships;
  • ensure a complete understanding of the strategy, plans, needs and objectives of business, customers and users, developing partnerships between them and IT;
  • Regularly conduct performance reviews and customer experience studies - internal and external - and communicate relevant information to IT;
  • ensure the existence and effectiveness of procedures for interaction and their continuous improvement;
  • organize and conduct customer satisfaction surveys, ensuring their analysis and action on the results;
  • represent the service provider at user group meetings;
  • proactively research the market by analyzing the use of services and influencing the portfolio and catalog of services;
  • work with business, customers and users to ensure that IT delivers a level of service that meets current and future business needs;
  • promote service awareness and understanding of services;
  • raise awareness of the business benefits of using new technologies;
  • facilitate the definition and discussion of correct, achievable and realistic service level requirements and service level agreements between IT and the business;
  • ensure business, customers and users understand their relationships with IT and dependencies;
  • promote the recording of improvements and enhancements.

The service level management process should also include activities and procedures for registering and managing complaints and commendations. Logging is often performed by the Service Desk and is similar to logging incidents and service requests. Complaint and acknowledgment definitions should be agreed with customers along with points of contact and procedures. All complaints and commendations must be recorded and passed on to the appropriate parties. All complaints must also be dealt with and resolved to the satisfaction of the initiator. In case this does not happen, contacts and escalation procedures should be defined. All serious complaints should be analyzed and brought to the attention of management. Reporting should be done on statistics, trends, actions and results in handling complaints and commendations.

Service Level Management Process Metrics

  • CSF It is important to ensure quality of service management in general, including coverage and level of delivery:
    • KPI Percentage of reduction in non-compliance with SLA goals
    • KPI Percentage of Nonconformity Threat Mitigation
    • KPI Percentage of improvements in customer perception and satisfaction with SLA achievements based on service evaluation meetings and satisfaction surveys
    • KPI Percentage of reduction in nonconformities associated with dependency on third parties (UC)
    • KPI Percentage of reduction in nonconformities associated with reliance on internal contractors (OLA)
  • CSF Provision of services in accordance with the agreements for reasonable money:
    • KPI Number and percentage of increase in the number of fully documented SLAs
    • KPI Percentage of SLA improvements aimed at improving the services already provided
    • KPI Share of reduction in the cost of providing services
    • KPI Percentage of cost reduction for monitoring and reporting on SLA
    • KPI Percentage of development speed increase and SLA approval
    • KPI Frequency of service evaluation meetings
  • CSF Managing the interface between business and users:
    • KPI Increasing the number of services covered by SLA
    • KPI Documentation and alignment of the SLM process and procedures
    • KPI Reduced response and execution time for SLA requests
    • KPI Increasing the share of SLAs reviewed on time
    • KPI Reduction in the share of unfulfilled SLAs subject to review
    • KPI Decrease in the proportion of SLAs requiring adjustment
    • KPI Increase OLA and UC coverage while reducing the number of agreements through their consolidation and centralization
    • KPI Documentary evidence of improvements in identified deviations from SLA
    • KPI Reducing the number and severity of non-compliance with SLA goals
    • KPI Efficient evaluation and processing of all deviations and inconsistencies from SLA, OLA, UC

ITIL identifies subjective and objective measures of service level management performance. Objective:

  • Number or proportion of service goals achieved
  • Number and degree (severity) of deviations and violations
  • Number of current SLAs (up-to-date)
  • Number of services that are reported and evaluated in a timely manner

Subjective:

  • Customer Satisfaction Improvements

Risks and difficulties

When implementing service level management, the following potential risks and challenges need to be considered:

  • Lack of accurate input data, involvement and interest from business and customers
  • Need for resources and tools to agree, document, monitor, report and evaluate agreements and service levels
  • The process can become overly bureaucratic, focused on administrative procedures rather than actual proactive service improvement
  • Access and support of correct and up-to-date CMS and SKMS
  • Failure to follow SLM procedures
  • Business driven metrics are too hard to measure and improve, so they're not going to
  • Inappropriate level of contact and coordination
  • High expectations and low customer satisfaction
  • Ineffective communication with business

Problem Management Process

When providing IT services, one way or another, incidents (failures) happen. And if you have a properly organized incident management process and an event management process, then the negative impact from emerging incidents will be minimized. If incidents occur, then there is some unknown reason for this. The incident management process starts when an incident occurs and stops when the situation is corrected. This means that the root cause of an incident is not always identified and the incident may reoccur. In ITIL, this reason is called a problem.

A problem is the cause of one or more incidents. Usually, when a problem record is created, the cause is unknown, and it is the responsibility of the problem management process to investigate it further.

To determine the root causes of both existing and potential service failures, the problem management process examines the infrastructure and available information, including the incident database.

Issue management is the process responsible for managing the lifecycle of all issues. Problem management proactively prevents incidents from occurring and minimizes the impact of incidents that cannot be prevented.

Problem management includes proactive (proactive) and reactive activities. The task of the reactive components of the problem management process is to find out the root cause of past incidents and prepare a proposal for its elimination. Proactive problem management helps prevent incidents by identifying weaknesses in the infrastructure and preparing proposals for improvement.

Thus, the tasks of the problem management process are:

  • Prevention of problems and related incidents
  • Stopping the recurrence of incidents
  • Reducing the impact of incidents that cannot be prevented

Problem Management Activities

In principle, any incident that occurs for an unknown reason can be associated with a problem. In practice, it makes sense to initiate a problem only when the incident is repeated, it is possible to repeat it, or if it is a single but serious incident.

The "problem identification" activity is often performed by problem coordinators. However, it may happen that personnel not initially involved in this work, for example, capacity management specialists, can also identify problems. Such "finds" should also be recorded as problems.

The registration details of problems are similar to the details of incidents, but in the event of a problem, you do not need to include information about the user, etc. in the description. However, incidents related to a specific problem should be identified and logged accordingly. The following are examples of cases where problems can be identified:

  • Incident management cannot match (match) an incident to existing problems or known bugs
  • Incident trend analysis shows that there may be a problem
  • Analysis of the cause of a major incident is required
  • Other IT functions have determined that there may be a problem
  • Service Desk personnel were unable to determine the cause of the incident and there is a suspicion that this incident may repeat
  • Analysis of the incident by the support team showed that there is (or may be) a problem
  • Notification from the supplier that there is a problem to be solved

Possible signs of problems may include:

  • Incidents recurring in:
    • The same time period
    • In one subject area (category)
    • In the same CI or group of similar CIs
    • In the same location, order, division
  • The volume of similar incidents exceeds a certain level
  • A workaround was applied to resolve the incident
  • Incident(s) Processing Deadline Exceeded

Trend analysis allows you to identify areas that require special attention. Regardless of the problem detection method, all relevant data about the problem should be recorded in a problem record:

  • Information about the user(s)
  • Information about the service(s)
  • Hardware information
  • Registration time
  • Priority, category
  • Description of related incidents
  • Action taken to diagnose and resolve

Problem record - a record containing a detailed description of the problem. Each issue record documents the lifecycle of a single issue.

Just like incidents, problems must be classified. Problems can be classified into areas (categories). The classification of the problem is carried out at the same time as the analysis of the degree of its impact, i.e. the level of severity of the problem and its impact on services (urgency and degree of impact). Following this, the problem is assigned a priority, just like in the incident management process. Then, based on the results of the classification, resources and personnel are assigned to the problem and the time required to solve it is determined.

The classification of the problem includes the following:

A known error is a problem that has a documented root cause and workaround. Known bugs are created and managed throughout their life cycle as part of the problem management process. Known bugs can also be identified by developers or contractors.

The classification is not static, it can change throughout the life cycle of the problem. For example, having a workaround or a quick fix can help reduce the urgency of the problem, while new incidents can increase the impact of the problem.

Investigation and diagnosis are iterative phases of the process, they are repeated many times, each time getting closer to the intended result. Often attempts are made to reproduce an incident in a testing environment. Additional knowledge may be required to solve the problem, for example, you can involve specialists from the support team to analyze and diagnose the problem.

After determining the cause of the problem and a workaround, the problem is assigned the status of "Known Error". In many cases, a workaround for a problem is already in place, even if the bug is found by the developers themselves. But in some cases, a workaround needs to be found and then passed to the incident management process.

    A workaround is to reduce or eliminate the impact of an incident or problem for which a full resolution is not currently available. For example, restarting a failed configuration item. Workarounds for problems are documented in known bug records.

Problem management personnel determine what needs to be done to resolve the problem. Experts compare various solutions, taking into account service level agreements (SLAs), possible costs and benefits. All work to develop a solution should be recorded in the system, staff should have the means to monitor problems and determine their status.

In the previous stages, the optimal solution is chosen. However, it may be decided not to correct a known error, for example, because it is not economically feasible.

After the selection phase is completed, there is enough information to submit a change request. Further correction of the problem (known error) will be made under the control of the change management process.

A change intended to solve a problem should be considered when evaluating implementation results before the problem is closed. If the change produced the expected result, the issue can be closed and its status changed to "resolved" in the issue database. Incident management will be informed of this and incidents related to this issue may also be closed.

Evaluation of the results of implementation - a review performed after the implementation of a change or project. Implementation evaluation determines the success of a change or project and identifies opportunities for improvement.

Throughout the process, workarounds and quick fixes are communicated to Incident Management. Users can also be informed about this.

Problem Management Policies and Metrics

Problem management process policies must be enforced to ensure the effectiveness and efficiency of the process, and may include the following:

  • Problems should be tracked separately from incidents
  • All issues must be stored and managed unified system management
  • All issues should have a standard categorization scheme that matches the business processes of the enterprise.

To manage and evaluate the effectiveness of the service level management process, as well as to provide feedback to other management processes, ITIL suggests using the following key indicators (CSF and KPI):

  • CSF Minimize the business impact of incidents that cannot be prevented
    • KPI Number of known errors added by KEDB
    • KPI Percentage of KEDB Relevance (by database audit)
    • KPI Percentage of Incidents Closed by Help Desk ("First Point of Contact")
    • KPI Average time to resolve incidents for which an issue is open
  • CSF Maintain IT Service Quality by Eliminating Recurring Incidents
    • KPI Total number of issues (as a benchmark)
    • KPI Problem Queue Size per IT Service
    • KPI Number of repeat incidents for each IT service
  • CSF Ensuring quality and professionalism in problem solving to maintain business confidence in IT capabilities
    • KPI Number of significant issues (open, closed, and queued)
    • KPI Percentage of Successfully Completed Significant Issue Reviews
    • KPI Percentage of significant issue reviews completed successfully and on time
    • KPI Number and percentage of issues assigned incorrectly
    • KPI Number and percentage of issues with incorrect categorization
    • KPI The queue of accumulated unresolved problems and its trend
    • KPI Number and percentage of problems that exceeded the deadlines for resolution
    • KPI Percentage of issues resolved within SLA targets
    • KPI Average cost of solving one problem

Business Value

By implementing an incident management process in accordance with ITIL recommendations and solving all the difficulties that may arise during implementation, the following value for the business as a whole can be obtained:

  • Improving the quality of IT services by monitoring, documenting and/or eliminating errors in the infrastructure.
  • Reducing the number of incidents.
  • Increasing staff productivity
  • The use of permanent solutions instead of continuous “patching holes”.
  • Systematic activities for the accumulation of knowledge.
  • Ability to resolve more incidents on the first line of support.
  • Reducing the cost of efforts to extinguish fires or resolve re-incidents

Service asset and configuration management process

Every organization has information about the IT infrastructure. Often, in order to structure and summarize the available information, various schemes are developed that are hung on the wall. This method really allows, in certain cases, to quickly obtain information about the configuration of infrastructure components and their relationships, but it has a number of disadvantages:

  • complexity of updating: with each change, the diagram must be redrawn and reprinted, otherwise it cannot be relied upon if necessary
  • limited coverage: infrastructure components can be very closely intertwined and not always all elements can be reflected in the diagram
  • limited information: as a rule, only the most important information is indicated for each element, such as a domain name or IP address
  • complexity of analysis: with a large coverage of the scheme and in the presence of various complex relationships between components, the analysis of such schemes is difficult

Built in accordance with ITIL recommendations, the process of managing service assets and configurations allows you to use the available data about the IT infrastructure in the most effective way, while avoiding these disadvantages and obtaining additional benefits.

Service Asset and Configuration Management (SACM) is the process responsible for ensuring that all assets required to provide services are controlled and accurate, reliable information about them is available when needed. This information includes the configuration of the assets and the relationships between them.

Service asset and configuration management includes two sub-processes:

  • Asset Management is the activity or process responsible for tracking and reporting on the value and ownership of assets throughout their lifecycle.
  • Configuration Management is the activity or process responsible for managing the information about configuration items needed to provide IT services, including their relationships.

Tasks of the service asset and configuration management process:

  • Identify, control, document, report, and validate service assets and configuration items, including versions, baselines, components, their attributes, and relationships
  • Responsible for managing and protecting and protecting the integrity of service assets and configuration items (and, where appropriate, those owned by the customer) throughout the lifecycle of the service, ensuring that only authorized components are used and only authorized changes are made
  • Ensure asset and configuration integrity required to manage services and IT infrastructure by creating and maintaining an accurate and complete configuration management system

The core of the process is the configuration management system (CMS). CMS allows you to store all the necessary configuration information, its analysis and presentation in various sections.

Configuration management system (CMS) A set of tools, data and information that is used to support the process of managing service assets and configurations. CMS - part of the overall service knowledge management system, includes tools for collecting, storing, managing, updating, analyzing and presenting information about all configuration items and their relationships. The CMS may also include information about incidents, issues, known bugs, changes, and releases. The CMS is supported by the service asset and configuration management process and is used by all IT service management processes.

A Configuration Item (CI) is any component or other service asset that needs to be managed in order to provide an IT service. Information about each configuration item is recorded in the form of a configuration record in the configuration management system and is kept up-to-date throughout the life cycle by the service asset and configuration management process. CIs are under the control of the change management process. They typically include IT services, hardware, software, buildings, people, and documents such as process documentation and service level agreements.

Configuration units can be technical means, all types software, active and passive network elements, servers, building blocks, documentation, procedures, services and all other IT components controlled by the IT organization, etc. The following types of objects are stored in the CMS:

  • configuration item records including their corresponding attributes
  • relationships (links) between configuration items

Attributes allow you to take into account the information required for a particular type of configuration items. For example, for servers and laptops, information such as manufacturer, domain name, warranty period, etc. may be of interest. However, for software, this information is likely to be different.

An attribute is a piece of information about a configuration item. For example, name, location, version number, and cost. CI attributes are recorded in the Configuration Management Database (CMDB) and maintained as part of the Configuration Management System (CMS).

Thus, each configuration item must belong to a specific type (class), which defines common attributes for all CIs of this type (class) and a list of possible relationships between CIs of this type and CIs of another type.

CU type - a category that is used to classify configuration items. The CI type determines which attributes and relationships are required for the configuration entry. Common types of CIs are hardware, documentation, user, and so on.

The set of CUs and their relationships is actually a configuration model. The figure shows an example of a configuration model.
CMS allows you to effectively take into account the necessary configuration information, analyze and present in various forms including graphic. The CMS provides information to other service management processes:

  • to assess the impact of incidents and problems
  • to assess the impact of changes
  • for planning and designing new and changing services
  • for technology and software upgrade planning
  • for planning release packages and replicating services
  • to optimize the use of assets and costs

Thus, if the management of service assets and configurations is implemented effectively, then this process can provide, for example, information about the following:

  • Financial Information and Company Product Policy
    • What IT components are currently in use for each model (version) and for how long?
    • What trends exist in different product groups?
    • What is the current and residual value of the IT components?
    • Which IT components need to be removed from the operating environment and which require modernization?
    • How much will it cost to replace certain components?
    • What licenses are available and are they sufficient?
    • Which escort contracts should be reviewed?
    • What is the degree of infrastructure standardization?
  • Troubleshooting and evaluation of results
    • What IT components are needed to support the recovery process in case of emergency?
    • Will the contingency recovery plan work if the infrastructure configuration has been changed?
    • Which IT components will be affected when new services are deployed?
    • How is the equipment connected to the network?
    • What software modules are included in each of the software packages?
    • Which IT components are affected by the changes?
    • What requests for change (RFCs) for specific IT components are pending, and what incidents and issues have occurred in the past and are still relevant?
    • Which IT components cause known errors?
    • What IT components were purchased from a particular vendor during a particular period?
  • Provision of services and billing
    • What IT component configurations are essential for certain services?
    • What IT components are used in which location and by whom?
    • What standard IT components can a user order and which ones are supported (product catalog)?

Activities within the service asset and configuration management process

The figure shows a diagram of typical configuration management activities.

In ITIL materials, "planning" refers to the activity of organizing the configuration management process itself. Management and planning as a type of activity is used both at the stage of creation and at the stage of process improvement. The main output of planning is the "Configuration Management Plan".

The configuration management plan contains.

  • Description of the configuration management process
  • High-level description of the system architecture
  • Plan for significant events (identification, major releases, etc.)

The plan is a living document and is subject to regular review. The Configuration Management Process Manager is responsible for updating the plan.

Configuration identification activities include:

  • Definition and documentation of criteria for the selection of configuration items and their constituent components
  • Selection of configuration items and components based on documented criteria
  • Assignment of unique identifiers
  • Defining Attributes for Each CI
  • Determination of the moment when the CU is taken under control of the process
  • Determining the Owner Responsible for Each CI

Depending on the scale of the IT infrastructure and the complexity of the accounting rules, identification can be time consuming and resource intensive. Therefore, identification work must be carefully planned.

KE management activities include the following aspects:

  • Keeping CMDB data up to date
  • Ensuring the integrity of CMDB data (the origin and history of changes of each CI is clear)
    • Restricting access to modify CMDB data
    • Ensuring antivirus protection of CMDB controls
    • Provide backup and restore capabilities
  • Control rules should be developed at the planning stage of the process
  • Rules for the transfer of control from projects or suppliers
  • Control procedures must correspond to the types of CU

Configuration status accounting and reporting activities include:

  • Maintaining configuration records throughout the service lifecycle and archiving them in accordance with agreements, external requirements, best practices and standards (e.g. ISO 9001)
  • Managing the documentation, acquisition and consolidation of the current configuration status and the statuses of all previous configurations to ensure the correctness, timeliness, integrity and security of information
  • Ensuring that status information is available throughout the lifecycle of a service
  • Documenting CI changes from acceptance to decommissioning
  • Ensuring that basic configurations are properly documented

Verification and audit:

  • Verification - checking CU for compliance with standards or functional requirements:
    • At initial registration in CMDB
    • When you receive hardware or software from a supplier
    • When commissioning
  • Audit - checking the correspondence between the current state of the CI (as it is) and the description of the CI in the CMDB (as it should be)
    • Standard audit
    • Simplified audit
    • Current (operational) audit
  • Shortly after implementation new system/ configuration management process
  • Before and after major IT infrastructure changes
  • Before Deploying New Production Readiness Software
  • After recovering from a major outage (emergency)
  • Upon the discovery of a large number of discrepancies (for example, as part of an operational audit)
  • Regularly (at predetermined intervals)
  • From time to time ("sudden" checks)

Service Asset and Configuration Management Process Metrics

To manage and evaluate the effectiveness of the change management process, as well as to provide feedback to other management processes, ITIL suggests using key indicators such as:

  • Asset Lifecycle Support Improvement Percentage by Principle: Not Too Much, Not Too Late
  • The degree to which support meets business needs
  • Assets identified as the cause of service disruptions
  • Increasing the speed of incident resolution and service recovery through faster identification of failed CIs
  • Identification of links between specific types of CIs, incidents and problems
  • More effective use service assets
  • More efficient use of purchased licenses, average license cost per user
  • More accurate budgeting and pay-per-use assets
  • More efficient asset audits
  • Improving the quality and accuracy of asset information
  • Fewer errors caused by working with stale data
  • Reducing the number and scope of audits
  • Reduced use of unauthorized hardware and software, leading to reduced cost and risk in service support
  • Reduce time and cost in diagnosing and resolving incidents and problems
  • Reducing the time to identify assets that are problematic in terms of performance
  • Reducing the number of unsuccessful changes caused by incorrect impact assessments, incorrect data in the CMS, or poor version control
  • Reduce risk with early detection of unauthorized changes

Difficulties

When implementing service asset and configuration management, the following potential challenges need to be considered:

  • Staff Persuasion technical support to comply with accounting policies, which is often perceived as an obstacle in the rapid support of services.
  • Attract and justify the allocation of funds for the process, since, usually, the process is not visible to the customer departments that have the authority to allocate funds. Usually funded as an "invisible" element of change management and other more "visible" processes
  • Approach: "gather all the data we can", which leads to process overload, as well as the inability to maintain it
  • Lack of commitment and support from management who do not understand the key role of process

Method of critical incidents.

Identification of a critical incident - it is a method designed to identify

process, sub-process or problem area that is worth considering

improve. The method was developed by Lawlor in 1985. This is quite open

The shortest and shortest way to get information about the problems of the organization. As a prerequisite, it is assumed that all participants are absolutely free

in presenting his views. Any censorship or concealment of information for fear of

nor that it will prove to be too honest is strongly rejected.

The method includes three stages:

one). The participants in the analysis are selected. If the goal is to

making a decision to improve the entire process, it is natural

include representatives of different areas in the organization. If the

The goal is to define more precisely the direction of action within the

already defined business process, it is better to choose the people involved in

this process.

2). Participants are then asked to answer questions such as:

Which incident last week was the most difficult to deal with?

What episode did you create? biggest problems to meet the needs

consumer needs?

Which incident cost the most in terms of engagement

additional resources or direct costs?

At this stage of using the method, it is important to highlight the so-called Cree-

tic incidents, that create problems in one way or another

for individual employees, for the entire organization and for other stakeholders

bathroom sides. The period to which the question relates may vary.

from several days to several months. Not recommended, however,

take too long a period, since in this case it may be

tedious to highlight the most relevant critical incident, because

that for a long period of time there could be many such incidents.

3). The collected responses are sorted and it is determined which of the various initiatives

Dentov was mentioned more often than others. To highlight a critical incident

it is convenient to use a graphical representation of the results obtained. That

an incident that occurred more often than others, and will be critical. He is yav-

candidate for prevention. However, you need to fight not so much with yourself

incident and its symptom, as much as with the causes that gave rise to it.

Example.

A large corporation, which had 15 telephone operators on its staff, began

to the project of improving the telephone service of consumers with

messages for calls. It was decided to use the method of identifying critical

tic incident.

All telephone operators were asked to describe those incidents that had

last month, which put them in an extremely difficult position. Poll results were sorted by frequency

repetition of incidents. They are shown in fig. 7.1 in the form of a diagram. From ri-

the sun shows that the critical incidents were: 1) the inability to get through to

person who should answer the call, 2) not knowing who exactly should

wives to answer. Based on the results of the study, efforts were made

to create a system for tracking the movements of each employee, as well as

la developed instructions on which of the employees and on what request should

reply. Checklist - it is a form-form or a special form intended for

started for data logging, Rolstados (1995) . One of the main

The purpose of the checklist is to record how often

encounter various problems or incidents. This gives important information

information about problem areas or possible causes of errors. Usage

checklists provides a good basis for deciding where

efforts should be concentrated in carrying out improvement.

Filling out a checklist usually goes in several stages:

1) Reaching an agreement on what events to record. All this is necessary

determine precisely so that there is no doubt that an event has occurred

in fact. It is also advisable to include in the checklist the item

"Other" to register incidents that are difficult to attribute to

2) Determination of the data recording period and its convenient division into inter-

3) Development of a form (blank) of a control sheet used for registration

traditions. 4) Data collection takes place during the entire agreed period of time.

First you need to make sure that all those participating in the

collecting data equally understand the essence of what is happening. Then collected

different people data will be consistent.

5) At the end of data collection, they are analyzed to identify events

ties with the highest frequency of manifestation. This will determine

prioritize problem areas within a given business process for

providing emphasis in improvement work. Convenient auxiliary

a powerful tool for conducting such an analysis is the Pareto chartPareto chart

The construction of this scheme is based on the so-called Pareto principle, for-

emulated by the Italian mathematician Vilfredo Pareto in the 1800s. Under-

details of this scheme can also be found in the book of Rolstados. Pareto was

concerned about the distribution of wealth in society and believed that 20% of the population

account for 80% of all wealth. Translated into the modern language of quality systems, this

principle is that often about 80% of all possible manifestations

due to approximately 20% of all possible causes. smart approach to this

case - to begin work on improvement with an attack on precisely these 20%

ranks, commonly referred to as the "vital minority". It's quite

does not mean that you can ignore the remaining 80% of the reasons: in the proper

point in time by these reasons, which are called "this important most

property”, should also be dealt with. The Pareto principle prioritizes

issues that need to be addressed.

The Pareto chart itself is a graphical interpretation in

in the form of a skewed distribution of the so-called 80/20 rule. These are the reasons

sorted by importance, by frequency of occurrence, by cost,

by the level of indicators, etc. When ordering causes on a Pareto chart

the most important of them are referred to the left edge of the scheme, so that it is "vital

an important minority" was easy to identify. To improve information

the activity of the Pareto chart is usually applied to it and the curve of the accumulated parts

that. An example of constructing a diagram is shown in fig. 7.4.

When working with a Pareto chart, do the following:

one). Identify the main problem of the event and its various potential uses.

ranks. Based on the assumptions made in this book, we will assume that

that a specific process has already been selected that it is desirable to improve. So

Thus, the purpose of constructing a Pareto chart is to identify

the main reasons for the low level of performance.

2). Decide which metric to use when

comparison of possible causes. Such an indicator could be

take the frequency of occurrence of various kinds of problems or their consequences in the territory

mines of monetary costs and other conditions.

3). Determine the period of time during which the data will be collected and

take them. Often this work is already done earlier when

completing checklists. The essence of the checklist is described in § 7.2.

four). Arrange the causes from left to right along the horizontal axis of the chart

Pareto in descending order of their relative importance. Draw a table

scheme beacons. Their height corresponds to the degree of relative importance of the

leading reason. 5). Mark the obtained absolute values ​​of indicators on the left vertical-

axis. Mark the relative values ​​of the indicators in percent on the right

howl of the vertical axis. Draw an accumulation curve along the top

him the edges of the columns.

The study of the Pareto chart can provide an answer to questions like: 1) “What is

are two or three main reasons for the low level of performance in this

process? or 2) “What is the proportion of the costs that go to the most vital

reasons?”. This information can be used for actions

focused on efforts to improve the process towards achieving

his highest scores.

The construction of the Pareto chart can be simplified if you use the standard

computer software designed to compile electronic

crown tables. At the same time, there are specialists for constructing Pareto charts.

zated software. Two such specialized computer programs are StatGraphics Plus and ASAS/QC. They also give

the ability for the user to build control charts of the EMS "a. We also note the package

Memory Jogger Software, which can be applied with some tools

quality improvement.

Advantages: Allows you to receive information about the qualities that contribute to or hinder the achievement of results in work. Promotes better understanding content work.

Flaws: Some of the information received may not be used when creating a model, since a number of the described incidents may ultimately turn out to be completely uncharacteristic for work.

October 11, 2012 at 10:58 am

Incident Handling information security

  • Information Security

Good day, dear habrahabr!

I continue to publish articles from the practice of information security.
This time we will talk about such an important component as security incidents. Working with incidents will take the lion's share of time after the establishment of the information security regime (documents are accepted, installed and configured technical part conducted the first trainings).

Incident Reporting

First things first, you need to get information about the incident. This point needs to be considered at the stage of forming a security policy and creating presentations on educational program in information security for employees.
Main sources of information:

1. Helpdesk.
As a rule (and this is a good tradition), any problems, malfunctions or malfunctions in the operation of equipment are called or written to the helpdesk of your IT service. Therefore, it is necessary to “embed” in the helpdesk business process in advance and indicate the types of incidents with which the application will be transferred to the information security department.

2. Messages directly from users.
Organize a single point of contact, which will be reported in the training on information security for employees. At the moment, information security departments in organizations, as a rule, are not very large, often consisting of 1-2 people. Therefore, it will be easy to appoint a person responsible for receiving incidents, you don’t even have to bother with allocating an email address for the needs of IS Helpdesk.

3. Incidents discovered by IS employees.
Everything is simple here, and no body movements are required to organize such a reception channel.

4. Logs and system alerts.
Set up alerts in the console of antivirus, IDS, DLP and other security systems. It is more convenient to use aggregators that also collect data from the logs of programs and systems installed in your organization. Particular attention should be paid to points of contact with the external network and places where sensitive information is stored.

Although security incidents are diverse and varied, they are fairly easy to divide into several categories that are easier to keep statistics on.

1. Disclosure of confidential or non-public information, or the threat of such disclosure.
To do this, it is necessary to have at least an up-to-date list of confidential information, a working system for stamping electronic and paper media. Good example- document templates, for almost all occasions, located on the internal portal of the organization or in the internal file storage, by default, have the label “Only for internal use”.
To clarify a little about the threat of disclosure, in a previous post I described a situation where a document labeled “Internal Use Only” was posted in a common hall adjacent to another organization. Perhaps there was no disclosure itself (it was posted after the end of the working day, and it was noticed very quickly), but the fact of the threat of disclosure is on the face!

2. Unauthorized access.
To do this, you must have a list of protected resources. That is, those where there is any sensitive information of the organization, its customers or contractors. Moreover, it is desirable to include in this category not only penetration into a computer network, but also unauthorized access to premises.

3. Excess of authority.
In principle, this point can be combined with the previous one, but it’s better to single it out, I’ll explain why. Unauthorized access refers to access by those individuals who do not have any legal access to an organization's resources or premises. This is an external intruder that does not have a legal entry into your system. Abuse of authority is understood as unauthorized access to any resources and premises of legal employees of the organization.

4. Virus attack.
In this case, you need to understand the following: a single infection of an employee's computer should not lead to proceedings, since this can be attributed to an error or the notorious human factor. If a significant percentage of the organization's computers are infected (here, proceed from the total number of machines, their distribution, segmentation, etc.), then it is necessary to deploy a full-fledged development of a security incident with the necessary searches for sources of infection, causes, etc.

5. Compromise of accounts.
This point echoes 3 . In fact, the incident goes from 3 in 5 category if during the investigation of the incident it turns out that the user at that moment was physically and actually unable to use his credentials.

Incident classification

With this point in working with incidents, you can do 2 ways: simple and complex.
The easy way is to take your IT service level agreement and tweak it to fit your needs.
Complicated way: based on risk analysis, identify groups of incidents and / or assets in respect of which the solution or elimination of the causes of the incident must be immediate.
The simple path works well in small organizations where there is not much sensitive information and no huge amount employees. But it should be understood that the IT service proceeds from the SLA from its own risks and incident statistics. It is possible that the jammed printer on the table CEO will have a very high priority, in the event that it is more important for you to compromise the password of the corporate database administrator.

Incident Evidence Gathering

There is a special applied science - forensics, which deals with forensics in the field of computer crimes. And there is a wonderful book by Fedotov N.N. "Forensics - computer forensics". I won't go into detail about forensics right now, but I'll just highlight 2 main points in the preservation and provision of evidence that must be adhered to.

For paper documents: the original is kept securely with a record of the person who discovered the document, where the document was found, when the document was found, and who witnessed the discovery. Any investigation must ensure that the originals have not been falsified.
For information on computer media: mirroring or any removable media, information on hard drives or in memory must be taken to ensure availability. A record of all activities during the copying process must be kept and the process must be witnessed. The original media and protocol (if this is not possible then at least one mirror image or copy) must be kept secure and intact

After the incident is resolved

So, the incident has been settled, the consequences have been eliminated, and an official investigation has been carried out.
But the work doesn't have to end there.
Next steps after the incident:

Reassessment of the risks that led to the occurrence of the incident
preparation of a list of protective measures to minimize the identified risks, in case of a recurrence of the incident
updating the necessary policies, regulations, information security rules
conduct training of the organization's personnel, including IT employees, to increase awareness in terms of information security

That is, it is necessary to take all possible actions to minimize or neutralize the vulnerability that led to the implementation of a security threat and, as a result, the occurrence of an incident.

1. Maintain an Incident Log, where you record the time of discovery, the details of the person who discovered the incident, the category of the incident, the assets affected, the planned and actual time to resolve the incident, as well as the work carried out to resolve the incident and its consequences.
2. Record your activities. This is necessary first of all for yourself, in order to optimize the process of resolving the incident.
3. Notify employees of the occurrence of the incident, so that, firstly, they do not interfere with you in the investigation, and secondly, exclude the use of affected assets for the duration of the investigation.

Ensuring information security of business Andrianov V.V.

4.1.4. Incident examples

4.1.4. Incident examples

General information

This section provides descriptions of published details of some of the high-profile incidents. At the same time, the generalization of incidents gives a whole bunch of circumstances that characterize the variety of IS threats from personnel, both in terms of motives and conditions, and in terms of the means used. Among the most frequently occurring incidents, we note the following:

Leakage of service information;

Stealing the organization's customers and business;

Infrastructure sabotage;

Internal fraud;

Falsification of reporting;

Trading in the markets based on insider, proprietary information;

Abuse of authority.

annotation

In retaliation for the too small bonus, 63-year-old Roger Duronio (former system administrator for UBS Paine Webber) planted a "logic bomb" on the company's servers that destroyed all data and paralyzed the company for a long time.

Description of the incident

Duronio was dissatisfied with his salary of $125,000 a year, which may have been the reason for the introduction of the "logic bomb". However, the final straw for the system administrator was the $32,000 bonus he received instead of the expected $50,000. When he discovered that his bonus was much less than he expected, Duronio demanded that his superiors renegotiate the employment contract for $175,000 a year, or else he would leave the company. He was denied a pay raise and was asked to leave the bank building. In retaliation for this treatment, Duronio decided to use his "invention", introduced in advance, foreseeing such a turn of events.

Duronio carried out the introduction of the "logic bomb" from a home computer several months before he received a bonus that was too small, in his opinion. The "logic bomb" was installed on about 1,500 computers in a network of branches across the country and set to a specific time - 9:30, just in time for the beginning of the banking day.

Duronio quit UBS Paine Webber on February 22, 2002, and on March 4, 2002, a "logic bomb" sequentially deleted all files on the main central database server and 2000 servers in 400 bank branches, while disabling the backup system.

During the trial, Duronio's lawyer pointed out that not only the accused could be the culprit of the incident: given the insecurity of UBS Paine Webber IT systems, any other employee could get there under the Duronio login. IT security problems at the bank became known as early as January 2002: when checking, it was found that 40 people from the IT service could log in and get administrator rights using the same password, and understand who exactly did what or any other action was not possible. The lawyer also brought charges against UBS Paine Webber and @Stake, hired by the bank to investigate what happened, of destroying evidence of the attack. However, the indisputable evidence of Duronio's guilt was fragments of malicious code found on his home computers, and a printed copy of the code in his closet.

Insider Opportunities

As one of the company's system administrators, Duronio was given responsibility for and access to the entire UBS PaineWebber computer network. He also had access to the network from his home computer via a secure internet connection.

The reasons

As previously stated, his motives were money and revenge. Duronio received an annual salary of $125,000 and a bonus of $32,000 when he expected $50,000, and thus avenged his disappointment.

In addition, Duronio decided to capitalize on the attack: in anticipation of a fall in the bank's shares due to an IT disaster, he made a futures order to sell in order to get the difference if the rate fell. The defendant spent $20,000 on this. However, the bank's papers did not fall, and Duronio's investment did not pay off.

Effects

The "logic bomb" planted by Duronio stopped the work of 2,000 servers in 400 company offices. According to UBS Paine Webber IT manager Elvira Maria Rodriguez, it was a "10 plus on a scale of 10" disaster. Chaos reigned in the company, which was eliminated by 200 engineers from IBM for almost a day. In total, about 400 specialists worked on correcting the situation, including the IT service of the bank itself. The damage from the incident is estimated at 3.1 million dollars. Eight thousand brokers across the country were forced to stop working. Some of them were able to return to normal activities after a few days, some after a few weeks, depending on how badly their databases were affected and whether the bank branch backed up. In general, banking operations were resumed within a few days, but the work of some servers was not fully restored, largely due to the fact that 20% of the servers did not have backup facilities. Only a year later, the entire server park of the bank was completely restored again.

During the consideration of the Duronio case in court, he was accused under the following articles:

Securities fraud - a conviction under this section carries a maximum penalty of 10 years in federal prison and a $1 million fine;

Fraud in computer-related activities - a charge under this article carries a maximum penalty of 10 years in prison and a fine of $250,000.

As a result of the trial, at the end of December 2006, Duronio was sentenced to 97 months without the right to parole.

VimpelCom and Sherlock

annotation

For the purpose of profit, former employees of VimpelCom ( trademark"Beeline") through the website offered details of telephone conversations of cellular operators.

Description of the incident

Employees of VimpelCom (former and current) organized the website www.sherlok.ru on the Internet, which VimpelCom learned about in June 2004. The organizers of this site offered a service - search for people by name, phone number and other data. In July, the organizers of the site offered a new service - detailing telephone conversations of cellular operators. Call detailing is a printout of the numbers of all incoming and outgoing calls, indicating the duration of calls and their cost, used by operators, for example, for billing subscribers. According to these data, it is possible to draw a conclusion about the current activity of the subscriber, his area of ​​​​interest and circle of acquaintances. The press release of the Office "K" of the Ministry of the Interior (hereinafter referred to as the Ministry of Internal Affairs) states that such information cost the customer $500.

Employees of VimpelCom, having discovered this site, independently collected evidence criminal activity site and transferred the case to the Ministry of Internal Affairs. Employees of the Ministry of Internal Affairs opened a criminal case and, together with VimpelCom, identified the organizers of this criminal business. And on October 18, 2004, the main suspect 1 was caught red-handed.

In addition, on November 26, 2004, the remaining six suspects were detained, including three employees of the subscriber service of VimpelCom itself. During the investigation, it turned out that the site was created by a former student of the Moscow state university not employed by this company.

Proceedings on this incident became possible due to the ruling of the Constitutional Court, issued in 2003, which recognized that the details of the calls contained the secrecy of telephone conversations, protected by law.

Insider Opportunities

Two of the Vimpelcom employees identified among the participants in the incident worked as tellers in the company, and the third was a former employee and at the time of the crime he worked at the Mitinsky market.

Work in the company itself as tellers indicates that these employees had direct access to the information offered for sale on the website www.sherlok.ru. In addition, since the former employee of the company already worked at the Mitinsky market, it can be assumed that over time, this market could become one of the distribution channels for this information or any other information from the VimpelCom databases.

Effects

The main consequences for VimpelCom from this incident could be a blow to the reputation of the company itself and the loss of customers. However, this incident was made public directly due to the active actions of the company itself.

In addition, the disclosure of this information could have a negative impact on Vimpelcom's customers, since the details of conversations allow us to draw a conclusion about the subscriber's current activities, his area of ​​interest and circle of acquaintances.

In March 2005, the Ostankinsky District Court of Moscow sentenced the suspects, including three Vimpelcom employees, to various fines. So, the organizer of the group was fined 93,000 rubles. However, the operation of the site www.sherlok.ru was terminated for an indefinite period only from January 1, 2008.

The biggest personal data breach in Japanese history

annotation

In the summer of 2006, the biggest personal data breach in Japanese history occurred when an employee of the printing and electronics giant Dai Nippon Printing stole a disc containing the private information of nearly nine million citizens.

Description of the incident

Japanese firm Dai Nippon Printing, specializing in the production of printing products, made the largest leak in the history of its country. Hirofumi Yokoyama, a former employee of one of the company's contractors, copied to a mobile hard drive and stole the personal data of the company's customers. In total, 8.64 million people were at risk, as the stolen information contained names, addresses, phone numbers and credit card numbers. The stolen information included customer details for 43 different companies, such as 1,504,857 American Home Assurance customers, 581,293 Aeon Co customers, and 439,222 NTT Finance customers.

After stealing this information, Hirofumi opened a trade in private information in portions of 100,000 records. Thanks to a stable income, the insider even left his permanent job. By the time of his arrest, Hirofumi had managed to sell the data of 150,000 clients of the largest credit firms to a group of fraudsters specializing in online shopping. In addition, some of the data has already been used for credit card fraud.

More than half of the organizations whose customer data was stolen were not even warned about the information leak.

Effects

As a result of this incident, the losses of citizens who suffered due to credit card fraud, which became possible only as a result of this leak, amounted to several million dollars. In total, customers of 43 different companies were affected, including Toyota Motor Corp., American Home Assurance, Aeon Co and NTT Finance. However, more than half of the organizations were not even warned about the leak.

Japan passed the Personal Information Protection Act 2003 (PIPA) in 2003, but prosecutors failed to apply it in the actual trial of the case in early 2007. The prosecution failed to charge the insider with violating the PIPA. He is accused only of stealing a $200 hard drive.

Not appreciated. Zaporozhye hacker against Ukrainian bank

annotation

The former system administrator of one of the largest banks in Ukraine transferred about 5 million hryvnias through the bank where he used to work from the account of the regional customs to the account of a non-existent Dnepropetrovsk bankrupt company.

Description of the incident

The career of a system administrator began after he graduated from a technical school and was hired by one of the largest banks in Ukraine in the department of software and technical support. After some time, management noticed his talent and decided that he would be more useful to the bank as a department head. However, the arrival of a new management in the bank led to personnel changes. He was asked to temporarily vacate his position. Soon the new leadership began to form their team, and his talent was unclaimed, and he was offered a non-existent position of deputy chief, but in another department. As a result of such personnel reshuffles, he began to do something completely different from what he knew best.

The system administrator could not put up with such an attitude of management towards himself and resigned of his own free will. However, he was haunted by his own pride and resentment towards the management, in addition, he wanted to prove that he was the best in his field, and return to the department from which his career began.

Having resigned, the former system administrator decided to regain interest in his person from the former management by using the imperfection of the “Bank-Client” system used in almost all Ukrainian banks 2 . The system administrator's plan was that he decided to develop his own protection program and offer it to the bank when he returned to his previous place of work. The implementation of the plan consisted in penetrating the "Bank-Client" system and making minimal changes to it. The whole calculation was made on the fact that the bank should have discovered a hack in the system.

To penetrate into this system, the former system administrator used passwords and codes that he learned while working with this system. All other information necessary for hacking was obtained from various hacker sites, where various cases of hacking computer networks were described in detail, hacking techniques, and all the software needed for hacking was posted.

Having created a loophole in the system, the former system administrator periodically penetrated the bank's computer system and left various signs in it, trying to draw attention to the hacking facts. The bank's specialists were supposed to detect the hack and sound the alarm, but, to his surprise, no one even noticed the penetration into the system.

Then the system administrator decided to change his plan, making adjustments to it that could not go unnoticed. He decided to fake payment order and transfer a large amount through it through the bank's computer system. using a laptop and mobile phone with a built-in modem, the system administrator penetrated the bank's computer system about 30 times: looked through documents, customer accounts, movement Money- Finding suitable clients. As such clients, he chose the regional customs and the Dnepropetrovsk bankrupt company.

Having once again gained access to the bank's system, he created a payment order in which, with personal account the regional customs withdrew and transferred through the bank to the account of the bankrupt company 5 million hryvnia. In addition, he deliberately made several mistakes in the "payment", which in turn should have further attracted the attention of the bank's specialists. However, even such facts were not noticed by the bank's specialists serving the "Bank-Client" system, and they calmly transferred 5 million hryvnias to the account of a company that no longer exists.

In fact, the system administrator expected that the funds would not be transferred, that the fact of hacking would be detected before the transfer of funds, but in practice everything turned out differently and he became a criminal and his fake transfer turned into a theft.

The fact of hacking and theft of funds on an especially large scale was discovered only a few hours after the transfer, when bank employees called customs to confirm the transfer. But they said that no one transferred such an amount. The money was urgently returned back to the bank, and a criminal case was opened in the prosecutor's office of the Zaporozhye region.

Effects

The bank did not suffer any losses, since the money was returned to the owner, and the computer system received minimal damage, as a result of which the bank's management abandoned any claims against the former system administrator.

In 2004, by decree of the President of Ukraine, criminal liability for computer crimes was increased: fines from 600 to 1000 tax-free minimums, imprisonment - from 3 to 6 years. However, the former system administrator committed the crime before the presidential decree took effect.

At the beginning of 2005, a trial took place over the system administrator. He was accused of committing a crime under Part 2 of Article 361 of the Criminal Code of Ukraine - illegal interference in the operation of computer systems with causing harm and under Part 5 of Article 185 - theft committed on an especially large scale. But since the bank's management refused any claims against him, the article for theft was removed from him, and part 2 of article 361 was changed to part 1 - illegal interference in the operation of computer systems.

Uncontrolled trading at Societe Generale

annotation

On January 24, 2008, Societe Generale announced a loss of €4.9 billion due to the machinations of its trader Jérôme Kerviel. As shown by an internal investigation, for several years the trader opened over-limit positions in futures for European stock indices. The total amount of open positions amounted to 50 billion euros.

Description of the incident

From July 2006 to September 2007, the computerized internal control system 75 times (the number of times Jerome Kerviel carried out unauthorized transactions or his positions exceeded the allowable limit) issued a warning about possible violations. The bank 's risk monitoring staff did not conduct detailed checks on these warnings .

For the first time, Kerviel began experimenting with unauthorized trading in 2005. Then he took a short position on Allianz shares, expecting the market to fall. Soon the market really fell (after the terrorist attacks in London), so the first 500,000 euros were earned. About his feelings that he experienced from his first success, Kerviel later told the investigation: “I already knew how to close my position, and I was proud of the result, but at the same time I was surprised. Success forced me to continue, it was like a snowball... In July 2007, I offered to take a short position in the expectation of a market fall, but did not meet with support from my manager. My prediction came true, and we made a profit, this time it was quite legal. Subsequently, I continued to conduct such transactions in the market, either with the consent of the authorities, or in the absence of their explicit objection ... By December 31, 2007, my profit reached 1.4 billion euros. At that moment, I did not know how to announce this to my bank, since it was a very large amount, not declared anywhere. I was happy and proud, but I did not know how to explain to my management the receipt of this money and not incur suspicion in conducting unauthorized transactions. Therefore, I decided to hide my profit and conduct the opposite fictitious operation ... ".

In fact, in early January of that year, Jérôme Kerviel re-entered the game with futures contracts for the Euro Stoxx 50, DAX and FTSE three indices, which helped him beat the market in late 2007 (though he preferred to take a short position then). According to estimates, in his portfolio on the eve of January 11 there were 707.9 thousand futures (each worth 42.4 thousand euros) on Euro Stoxx 50, 93.3 thousand futures (192.8 thousand euros per 1 piece) on DAX and 24.2 thousand futures (82.7 thousand euros for 1 contract) on the FTSE index. In general, Kerviel's speculative position was equal to 50 billion euros, that is, it was more than the value of the bank in which he worked.

Knowing the time of checks, he opened a fictitious hedging position at the right time, which he later closed. As a result, the reviewers never saw a single position that could be called risky. They could not be alerted by large amounts of transactions, which are quite common for the market of futures contracts for indices. He was let down by fictitious transactions carried out from the accounts of bank customers. The use of accounts of different bank clients did not lead to problems visible to controllers. However, after a certain amount of time, Kerviel began to use the accounts of the same clients, which led to the "abnormal" activity observed in these accounts, and, in turn, attracted the attention of controllers. This was the end of the scam. It turned out that Kerviel's partner in the multibillion-dollar deal was a major German bank, allegedly confirming the astronomical transaction via email. However, the electronic confirmation aroused suspicion among the inspectors, for which a commission was created at the Societe Generale to check. On January 19, in response to a request, the German bank did not acknowledge this transaction, after which the trader agreed to make a confession.

When the astronomical size of the speculative position was discovered, the CEO and chairman of the board of directors of Societe Generale, Daniel Bouton, announced his intention to close the risky position opened by Kerviel. It took two days and resulted in a loss of 4.9 billion euros.

Insider Opportunities

Jerome Kerviel worked for five years in the so-called back office of the bank, that is, in a division that does not directly conclude any transactions. It deals only with accounting, execution and registration of transactions and controls traders. This activity made it possible to understand the features of the operation of control systems in the bank.

In 2005, Kerviel was promoted. He became a real trader. In direct responsibilities young man included elementary operations to minimize risks. Working on the market of futures contracts for European stock indices, Jerome Kerviel had to monitor how the bank's investment portfolio was changing. And his main task, as one of the representatives of Societe Generale explained, was to reduce risks by playing in the opposite direction: “Roughly speaking, seeing that the bank bets on red, it should have bet on black.” Like all junior traders, Kerviel had a limit that he could not exceed, his former colleagues in the back office monitored this. Societe Generale had several levels of protection, for example, traders could only open positions from their work computer. All data on opening positions was automatically transmitted in real time to the back office. But, as they say, the best poacher is a former forester. And the bank made an unforgivable mistake by putting the former forester in the position of a hunter. Jérôme Kerviel, who had almost five years of experience in monitoring traders, had no difficulty in getting around this system. He knew other people's passwords, knew when checks were being made at the bank, was well versed in information technology.

The reasons

If Kerviel was engaged in fraud, it was not for the purpose of personal enrichment. This is what his lawyers say, and representatives of the bank admit this, calling Kerviel's actions irrational. Kerviel himself says that he acted solely in the interests of the bank and only wanted to prove his talents as a trader.

Effects

Its activities in 2007 brought the bank about 2 billion euros of profit. In any case, Kerviel himself says so, arguing that the bank's management probably knew what he was doing, but preferred to turn a blind eye as long as he was in profit.

Closing the risky position opened by Kerviel led to a loss of 4.9 billion euros.

In May 2008, Daniel Bouton stepped down as CEO of Societe Generale and was replaced in this position by Frédéric Oudéa. A year later, he was forced to resign from the post of chairman of the board of directors of the bank. The reason for leaving was sharp criticism from the press: Buton was accused of the fact that top managers of the bank controlled by him encouraged risky financial transactions carried out by bank employees.

Despite the support of the board of directors, the pressure on Mr. Buton increased. His resignation was demanded by the shareholders of the bank and many French politicians. French President Nicolas Sarkozy also called on Daniel Bouton to step down after it became known that in the year and a half before the scandal, Societe Generale's computer internal control system issued a warning about 75 times, i.e. every time Jerome Kerviel carried out unauthorized operations. possible violations.

Immediately after the discovery of the losses, Societe Generale created a special commission to investigate the actions of the trader, which included independent members of the bank's board of directors and auditors of PricewaterhouseCoopers. The commission came to the conclusion that the system of internal control in the bank was not effective enough. This resulted in the bank not being able to prevent such a major fraud. The report states that "the bank staff did not systematically check" the activities of the trader, and the bank itself does not have "a control system that could prevent fraud."

The report on the results of the check of the trader states that, as a result of the investigation, a decision was made to "significantly strengthen the internal oversight procedure for the activities of Societe Generale employees." This will be done through a stricter organization of the work of various departments of the bank and coordination of their interaction. Measures will also be taken to track and personalize the trading operations of bank employees through “strengthening the IT security system and developing high-tech solutions for personal identification (biometrics).”

From the book Business Information Security author Andrianov V. V.

4.2.2. Typology of Incidents A generalization of world practice makes it possible to distinguish the following types of information security incidents involving the organization's personnel: - disclosure of official information; - falsification of reporting; - theft of financial and tangible assets; - sabotage

From the book Pension: calculation and registration procedure author Minaeva Lyubov Nikolaevna

4.3.8. Incident Investigation An incident involving an employee of an organization is an emergency for most organizations. Therefore, the way the investigation is organized is highly dependent on the prevailing corporate culture of the organization. But you can be sure

From the book Day trading in the Forex market. Profit Strategies by Lyn Ketty

2.5. Examples Consider some options for assigning labor pensions in the case of transferring documents to the territorial bodies of the Pension Fund by mail: Example 1 An application for the appointment of an old-age labor pension was sent to the territorial body of the fund

From the book Management Practice by human resourses author Armstrong Michael

3.5. Examples Example 1 Seniority consists of periods of work from 03/15/1966 to 05/23/1967; from September 15, 1970 to May 21, 1987; from 01/01/1989 to 12/31/1989; from 09/04/1991 to 07/14/1996; from 07/15/1996 to 07/12/1998 and military service from 05/27/1967 to 06/09/1969 Let's calculate the length of service to assess pension rights

From the author's book

4.4. Examples Example 1 Engineer Sergeev A.P., born in 1950, applied for an old-age pension in March 2010. In 2010, he turned 60 years old. The total length of service for assessing pension rights as of January 1, 2002 is 32 years 5 months 18 days, including 30 years prior to 1991.

From the author's book

6.3. Examples Example 1 Sales manager Sokolov V.N. worked on employment contract from 01/01/2010 to January 1, 2013, he dies at the age of 25. At the same time, he still has able-bodied parents, an able-bodied wife and a daughter at the age of 3 years. In this case, the right to receive employment

From the author's book

7.4. Examples Example 1 Manager Vasiliev R.S., 60 years old. The total length of service according to the work book for assessing pension rights as of 01.01.2002 is 40 years. The average monthly earnings for 2000–2001, according to personalized records, is 4,000 rubles. Calculate and compare pensions for

From the author's book

8.3. Examples Example 1 A pensioner receives a Group I disability pension. From May 20 to June 5, 2009, he underwent another re-examination at the BMSE and was recognized as a disabled person of group III on June 3, 2009. In this case, the disability group decreased. Basic part subject to pension

From the author's book

10.4. Examples Example 1 The death of a pensioner occurred on January 28, 2009. The widow of the pensioner applied for a pension in February 2009. Cohabitation of the widow with the pensioner on the day of death was not established.

From the author's book

14.7. Examples Example 1 Koshkina VN, who was dependent on her deceased husband, reached the age of 55 3 months after his death. She applied for a pension after 1 year from the date of the death of her spouse. According to pension legislation, a pension will be granted from the day

From the author's book

17.5. Examples Example 1 Four people work for an individual entrepreneur under an employment contract: Moroz KV (born in 1978), Svetlova T. G. (b. 1968), Leonova T. N. (b. 1956) and Komarov S. N. (b. 1952). Suppose monthly wage each of them is 7000 rubles.

From the author's book

Examples Let's consider some examples of how this strategy works.1. The 15-minute EUR/USD chart in fig. 8.8. According to the rules of this strategy, we can see that EUR/USD fell and traded below the 20-day moving average. Prices continued to decline moving towards 1.2800 which is

From the author's book

Examples Let's look at some examples.1. On fig. Figure 8.22 shows a 15 minute USD/CAD chart. The total range of the channel is approximately 30 points. In accordance with our strategy, we place entry orders 10 pips above and below the channel, i.e. at 1.2395 and 1.2349. Buy order filled

From the author's book

Examples Let's look at some examples of this strategy in action.1. On fig. Figure 8.25 shows the EUR/USD daily chart. On October 27, 2004, the EUR/USD moving averages formed a consistent correct order. We open a position five candles after the start of formation at 1.2820.

Hello to all cheaters,
very often, as a process service, one very popular question is heard from employees of large and small IT departments: what is the difference between a service request and an incident?

Discussions on this topic are as old as all IT management methodologies put together, however, let's turn to the original sources.

What ITIL tells us (official translation of the third version of the glossary):

Service Request- a user's request for information or advice or standard change, access to IT service.

Incident- unplanned interruption of an IT service or reduction in the quality of an IT service.

As usual, the methodology does not go into the depths of things and really does not like to answer substantive questions from employees of any Service Desk who classify user requests. And meanwhile, there are a lot of such questions, here are a few examples:

1) Christopheric call from a user asking for a password reset - how should it be classified as a service request or an incident? Or maybe as an information security incident?

2) A call from a user whose corporate mail does not work. A cursory analysis of the appeal suggests that the user needs to carry out the initial configuration of the mail client. Nevertheless, from his point of view, this is an incident, because. the service is not available, and no one notified him that "the mail itself will not fly"

Needless to say, the primary classification is very important, since it determines the entire subsequent life cycle of the appeal, incl. and deadlines.

My understanding of this issue boils down to a matter of evaluation service interruptions for the end user, and thus:

Incident is, in most cases, an interruption or partial interruption of an IT service that was previously provided to the user in an approved mode (the service is available 24/7 or 5/8).

Example: the company's chief accountant suddenly lost access to the system financial reporting. On the one hand, granting access is a classic service request, but in this case, there is a clear interruption of the service and, as a result, partial degradation of the business process.

Service Request- this is a request from a user who is interested in connecting additional service or improving the functionality of existing services.

Example: a particularly curious user tried to open one of the modules of the same financial reporting system, but received an error message. From his tz. this is an incident, since he did not achieve the desired goal and did not receive the information he was looking for, but, from the point of view. described above is a classic service request for access, requiring approval and performed according to the standard procedure at the agreed time.

At the same time, one should not forget about the variety of special cases that are generally difficult to classify, the point of view described above does not pretend to be a dogma, but only strives to help minimize the number of incorrectly classified calls and improve the overall IT response time to business needs.




Top