What is bi. Difference between Business Intelligence and Data Science. Connecting to a data source

Exists great amount terms: analytics, data mining, data analysis, business intelligence and the difference between them is not always so obvious even for people who are connected with this. Today we will talk about what is Business Intelligence (BI) in an accessible and understandable language. The topic is certainly huge and cannot be covered in just one short article, but our task is to help take the first step and interest the reader in the topic. The interested reader will also find an exhaustive list for further steps.

Article structure

Why all this is needed: from the life of an analyst

(clickable)

Let's imagine that we (a certain analyst Petrovich at the supplier Flower) are faced with the task of evaluating the sales of a number of stores (where we supply the goods) and each store keeps its own records of the goods sold. The reality is that the accounting forms will be filled in no matter how and no one understands by whom, that is, they will have a different structure and different storage format (some form of tables). Schematically, this task is depicted in the diagram above.

It would seem that the task is simple and therefore we will consider a frontal solution: let's say we have N tables and we need to collect them together into one table, then we will write N scripts that convert these tables and one collector that collects them together.

Cons of this approach:

  • it is necessary to support N scripts at the same time (where N is in the order of thousands);
  • when changing the structure of store reports over time (for example, a store has new employee) it is necessary to search for and rewrite individual scripts;
  • when a new store appears, you need to write a new script;
  • when changing our reporting (supplier Flower), it is necessary to make changes to all scripts;
  • difficult debugging and support, since stores do not notify structure changes and do not follow any specifications.

If we rise to the level of the whole organization, we will see that there are even more problems.

What is the problem: a problem at the company level

(clickable)

The Flower manufacturer actually does not work directly with stores, but through some intermediaries. Intermediaries visit stores and directly by their actions try to stimulate sales. Accordingly, they are materially interested parties and the information they give out has to be double-checked.

Basically, the problem looks similar: if we have N stores and K distributors, can we aggregate store data and compare it with the results of distributors? (All data have a different structure and format.)

Here, in addition to tables, we can already encounter a whole zoo of formats to which distributors' reports are added. As a rule, the task is characterized by very low data quality, including duplication, inconsistency and errors. Based on the results obtained and the comparison of data, the purchasing department makes decisions about how much, to whom and how much to ship what. That is, the solution of this problem directly affects financial indicators companies, which is very important.

Consider several solutions at the company level:

  • self-written solution: the manufacturer will need to hire a specialist not in the company's profile and critical software will depend on this specialist. If he leaves, the company will be forced to urgently look for a replacement that can support the software and the quality will directly depend on the hired specialist;
  • buy software from a third party, there are three key factors: price, quality and integration time. As a rule, the price and integration time are too high for the average manufacturer, and it also requires a significant amount of time for employees. The choice of supplier is also not trivial;
  • SaaS solutions: the methodology is still new to the market and many companies are skeptical about such services.

In general, if we are talking about a small or medium-sized manufacturer, then in terms of integration time, price and quality of the solution, the service looks like the best option, since pricing is dynamic and integration is minimal via the web. As a rule, the advantage of corporate software is customizability and customization (each business considers itself unique), but the described task is quite typical and standard for a fairly wide range of companies. Of course, there is no single solution for everyone, but for each individually it can be found.

The process itself at the company level looks similar: data is consolidated, transformed (aggregated) in a certain way and loaded into the system for analysis.
(clickable)

We generalize the problem: all these are links of one chain

(clickable)

What is the difference between analytics, data mining and business intelligence (BI)? The former include a set of methods for analyzing already clean data, and in practice, cleaning and converting data into a format convenient for analysis is an important and integral process. Also, in addition to working with the transformation and consolidation of data, the main task of BI is decision making for business.

24.04.2003 Valery Artemiev

The term “business intelligence” has existed for a relatively long time, although it is little used in our country due to the lack of an adequate translation and clear understanding, which, however, is also typical for the West. Let's try to understand its essence.

In Russian, the word "intelligence" is unambiguously understood as the mental ability of a person. At first glance, a good translation for the term business intelligence proposed in "data mining", but the question immediately arises whether there is "non-data mining".

The ambiguity of the term under discussion was influenced by the ambiguity English word intelligence:

  • the ability to recognize and understand; willingness to understand;
  • knowledge transferred or acquired through training, research or experience;
  • action or state in the process of cognition;
  • intelligence, intelligence data.

In Russian, the word "intelligence" is unambiguously understood as the mental ability of a person. At first glance, a good translation for the term Business intelligence is proposed in “data mining”, but the question immediately arises whether there is “non-mining data analysis”. The ways of the language are inscrutable, so we will use both the original in English and the “business intelligence” tracing paper.

Various definitions

The term "business intelligence" was first coined by Gartner analysts in the late 1980s as "a user-centric process that includes information access and exploration, analysis, intuition and understanding that lead to improved and informal decision making." Later in 1996, a clarification appeared - "tools for analyzing data, building reports and queries can help business users navigate the sea of ​​\u200b\u200bdata in order to synthesize meaningful information from them - today these tools collectively fall into a category called business intelligence ( business intelligence).

BI as methods, technologies, means of extracting and representing knowledge

According to the original definitions, BI is the process of analyzing information, generating intuition and understanding for improved and informal decision making by business users, as well as tools for extracting business-relevant information from data. It should be noted that most definitions interpret "business intelligence" as a process, technologies, methods and means of extracting and representing knowledge.

BI, EIS, DSS, eBusiness and Commerce

Over the past 10 years, the names and content of information and analytical systems have changed from executive information systems (EIS) to decision support systems (DSS) and now to business intelligence systems.

In the days of mainframes and minicomputers, when most users did not have direct access to computers, organizations depended on their IT departments to provide standard and parametric reports. But in order to get reports other than the standard ones, users had to order their development and wait for several days or weeks.

The EIS applications were customized to the needs of executives and managers and provided the ability to obtain basic aggregated information about the state of their business in the form of tables or charts. Usually they included scheduled requests with a set of parameters. Such packages were usually developed by their own IT departments. For getting additional information and further analysis, other applications were used or SQL queries or reports were created on demand.

The first generation DSS applications were application packages that dynamically generated SQL scripts based on the type of information requested by the user. They allowed analysts to get information from relational databases without requiring knowledge of SQL. Unlike EIS, DSS applications can answer a wide range of business questions, have multiple reporting options, and certain formatting options. However, the flexibility of such packages was still limited due to the focus on a specific set of tasks.

With the advent of the PC local networks the next generation of DSS applications is built on the basis of BI and allows a non-programmer user to easily and quickly extract information from various sources, generate their own customized reports or graphical representations, and conduct multidimensional data analysis. The development of business intelligence systems has gone from "fat" clients to Web applications in which the user conducts research using a browser and can work remotely. You can also create what-if scenarios and collectively view and update information.

Although users of corporate BI information have traditionally been located within the enterprise, with the spread of the Web to e-business, B2B, CRM and SCM BI users can be external to the enterprise, and in B2C, C2B and trading floors BI users are Internet users.

BI and data warehouses

The concept, methods and means of data storage (Data warehousing) define approaches and provide integration, cleaning, retrospective storage of information intended for analysis, answer the question "How to prepare information for analysis?". Business intelligence technology defines methods and means of accessing and operational analysis of information in terms of the subject area. BI tools do not have to work in the data warehouse infrastructure, but in this case the problem of data cleaning and reconciliation is assigned to them, and these operations will have to be performed on the fly or previously, but for a separate information resource. In addition, there is an impact on the performance and reliability of the online transaction processing system. That is why it is good corporate practice to separate the transactional and analytical components and use different data warehouse solutions for the second. The main joints are not only at the level of information, but also at the level of metadata. In the case of a data warehouse, metadata can be centrally managed.

It should be noted that often the term "data warehouse" refers to a DSS decision support system or an information and analytical system based on data warehouse and business intelligence technologies.

Classification of business intelligence products

Today's BI product categories include: BI tools and BI applications. The former, in turn, are divided into: query and report generators; advanced BI tools, primarily online analytical processing (OLAP) tools; corporate BI suites (enterprise BI suites, EBIS); BI platforms. The main part of BI tools is divided into corporate BI suites and BI platforms. Query and reporting tools are being largely absorbed and replaced by enterprise BI suites. Multidimensional OLAP engines or servers and relational OLAP engines are BI tools and infrastructure for BI platforms. Most BI tools are used by end users to access, analyze, and report on data that is most often located in data warehouses, data marts, or operational data warehouses. Application developers use BI platforms to create and deploy BI applications that are not considered BI tools. An example of a BI application is the EIS executive information system.

Query and Report Generation Tools

Query and report generators are typically "desktop" tools that provide users with access to databases, perform some analysis, and generate reports. Requests can be either unscheduled (ad hoc) or routine in nature. There are reporting systems (usually server-based) that support routine queries and reports. The desktop query and report generators are also enhanced with some lightweight OLAP features. The developed tools of this category combine the capabilities of batch generation of routine reports and desktop query generators, distribution of reports and their operational updates, forming the so-called corporate reporting. Its arsenal includes a report server, distribution tools, publishing reports on the Web, a mechanism for notifying events or deviations (alerts). Characteristic representatives are Crystal Reports, Cognos Impromptu and Actuate e.Reporting Suite.

OLAP or advanced analytical tools

OLAP tools are analytical tools that were originally based on multidimensional databases (MDBs).

MDBs are databases designed specifically to support the analysis of quantitative data with multiple dimensions, containing data in a "purely" multidimensional form. Most applications include the dimension of time, other dimensions may be geography, organizational units, customers, products, etc. OLAP allows you to organize dimensions in a hierarchy. The data is presented in the form of hypercubes (cubes) - logical and physical models of indicators that collectively use dimensions, as well as hierarchies in these dimensions. Some data is pre-aggregated in the database, others are calculated on the fly.

OLAP tools allow you to explore data across multiple dimensions. Users can choose which metrics to analyze, which dimensions and how to display in the crosstab, swap rows and columns "pivoting", then slice and dice to focus on a specific combination of dimensions. You can change the detail of the data by moving through the levels using drill down/roll up drill down and drill down, as well as drill across across other dimensions.

To support the MDB, OLAP servers are used that are optimized for multidimensional analysis and come with analytical capabilities. They provide good performance, but usually take a long time to load and expand the MDB. Comes with reach-through capability, allowing you to move from aggregates to details in relational databases. Classic OLAP server - Hyperion Essbase Server.

Today, relational DBMSs are used to emulate MDBs and support multivariate analysis. OLAP for relational databases (ROLAP) has the advantage of scalability and flexibility, but loses performance to multidimensional OLAP (MOLAP), although there are methods to improve performance, such as the star schema. Although MDBs are still the most suitable for online analytical processing, now this capability is built into or extended by relational DBMSs (for example, MS Analysis Services or ORACLE OLAP Services is not the same as ROLAP). There is also hybrid online analytical processing (HOLAP) for hybrid products that can store multidimensional data natively as well as relationally. MDBs are accessed through APIs for generating multidimensional queries, while relational databases are accessed through SQL queries. An example of a ROLAP server is the Microstrategy7i Server.

Desktop OLAP tools (eg BusinessObjects Explorer, Cognos PowerPlay, MS Data Analyzer) now built into EBIS make it easy for end users to view and manipulate multidimensional data that can come from ROLAP or MOLAP data backend resources. Some of these products have the ability to download cubes so that they can work offline. As part of EBIS, these desktop tools are equipped with server-side processing capabilities that go beyond their traditional capabilities, but do not compete with MOLAP tools. Desktop tools, compared to MOLAP tools, have little performance and analytical power. Often an interface is provided through Excel, such as MS Excel2000/OLAP PTS, BusinessQuery for Excel. Almost all OLAP tools have Web extensions (Business Objects WebIntelligence for example), for some they are basic.

Enterprise BI suites

EBIS is a natural way to deliver BI tools that were previously delivered as disparate products. These kits are integrated into query, report, and OLAP toolkits. Enterprise BI suites should be scalable and extend beyond internal users to key customers, vendors, and others. BI suite products should help administrators implement and manage BI without adding new resources. Due to the close relationship between the Web and enterprise BI suites, some vendors describe their BI suites as BI portals. These portal offerings provide a subset of EBIS capabilities through a Web browser, but vendors are constantly increasing their functionality to match those of thick client tools. Typical EBIS are provided by Business Objects and Cognos.

BI platforms

BI platforms offer a set of tools for creating, implementing, supporting and maintaining BI applications. There are data-rich applications with "custom" end-user interfaces, organized around specific business problems, with targeted analysis and models. BI platforms, although not as fast growing and widely used as EBIS, are an important segment due to the expected and ongoing growth of BI applications. Due to RDBMS vendors creating OLAP extensions to their RDBMS, many platform vendors that provided multidimensional DBMS for OLAP were forced to migrate to BI applications in order to survive. The DBMS product families that provide BI capabilities are really pushing the growth of the BI platform market. This is partly due to the increased activity of a number of DBMS vendors. Looking at various tools, we see that EBIS are highly functional tools, but they do not have such of great importance like BI platforms or custom BI applications. On the other hand, BI platforms are usually not as functionally complete as corporate BI suites. When choosing BI platforms, the following characteristics should be considered: modularity, distributed architecture, support for XML standards, OLE DB for OLAP, LDAP, CORBA, COM/DCOM, and web provisioning. They should also provide functionality specific to business intelligence, such as database access (SQL), multidimensional data manipulation, modeling functions, statistical analysis, and business graphics. This category of products is represented by Microsoft, SAS Institute, ORACLE, SAP and others.

BI applications

Business intelligence applications often have built-in BI tools (OLAP, query and report generators, modeling tools, statistical analysis, visualization, and data mining). Many BI applications extract data from ERP applications. BI applications are usually focused on a specific function of an organization or task, such as sales analysis and forecasting, financial budgeting, forecasting, risk analysis, trend analysis, "churn analysis" in telecommunications, etc. They can also be applied more broadly, as in the case of enterprise performance management applications or system balanced scorecard(balanced scorecard).

Data Intelligence

Data mining is the process of discovering correlations, trends, patterns, relationships, and categories. It is performed by rigorous data mining using pattern recognition technologies, as well as statistical and mathematical methods. Data mining repeatedly performs various operations and transformations on raw data (feature selection, stratification, clustering, visualization and regression) that are designed to: 1) find representations that are intuitive to people who, in turn, better understand the business -processes underlying their activities; 2) to find models that can predict the outcome or meaning of certain situations using historical or subjective data.

Unlike the use of OLAP, data intelligence is much less user-driven, instead relying on specialized algorithms that correlate information and help recognize important (and previously unknown) trends, free from user bias and assumptions.

Other BI methods and tools

In addition to the listed tools, BI may include the following analysis tools: statistical analysis packages and time series analysis and risk assessment; modeling tools; packages for neural networks; fuzzy logic tools and expert systems.

Additionally, it should be noted that graphic design results: means of business and scientific and technical graphics; "dashboards", means of analytical cartography and topological maps; means of visualization of multidimensional data.

business intelligence architecture

An enterprise BI architecture should be developed after the user's BI needs have been identified, but before the choice of BI tools. The Business Intelligence architecture defines the components of BI information delivery and BI technology components (Fig. 1). Once the usage profiles of BI information have been defined, an information delivery architecture can be designed based on these profiles and the type of implementation required. This can be any mixture of networked desktop clients, desktop clients and servers, web-based thin clients, and other mobile computing devices. The information delivery architecture will define user interfaces, which are often personalized portals.

Fig.1. Business intelligence architecture

The BI technology architecture defines the infrastructure and components needed to support the implementation, operation, and administration of BI tools and applications, and the interconnection of these components. A solid BI technology architecture will consist of two important layers: infrastructure and application services (or functionality). The infrastructure layer includes information resources, administration and networks. At this layer, data is collected, integrated and made available. The data warehouse is one of the possible components of the infrastructure layer. The use of BI in operational systems may require an operational data store (ODS), possibly related to corporate structures workflow. Application services include all BI services such as query, analysis, reporting, and visualization engines, as well as security and metadata.

Storage environment and access to BI information

In addition to traditional Oracle9i and MS SQL Server2000 data warehousing solutions, ERP warehousing applications are on the rise, such as SAP BW for R/3, or PeopleSoft Enterprise Warehouse with Enterprise Performance Management BI applications. However, in both cases, the functionality is tied to specific ERP systems, and therefore limited.

The use of ROLAP for storing BI information is growing rapidly, due to the convenience of relational DBMS for applications with very large detailed databases and due to the inclusion of OLAP capabilities in DBMS. The use of MDB and OLAP remains unchanged and is the most predominant, since they provide better performance and functionality where aggregated data and complex analytical calculations are important.

It is not surprising that with the high cost of two-tier client-server structures, access to BI is increasingly via the Web. The focus shifts to the server, reflecting the fact that access to corporate BI information is an important element, while standalone PCs are clearly not functional enough. Popular and growing delivery of BI reports e-mail, and mobile and wireless delivery methods are still slow to spread.

metadata

Most BI tools on the market use a metadata layer or repository. Business metadata includes definitions of data that are stored in data sources, in terms of the subject area. They may also contain rules and calculations that must be defined for that business. In addition, there are technical metadata for accessing physical data. CASE-tools, relational DBMS, tools for extracting, transforming and loading data use metadata. When creating data warehouses and data marts, it is often possible to automatically retrieve metadata from data sources, but sometimes users must retrieve the metadata themselves. Thus, a complex situation with several repositories existing in the same organization is possible. The lack of common metadata for tools - due to the lack of standards for metadata - is a major problem for IT departments.

Pros and cons of technology

The user's ability to conduct multi-aspect operational analysis of information in terms of the subject area to support business decision-making is rapidly expanding. The parallel movement from information anarchy or dictatorship to information democracy is expanding the contingent of business intelligence users. The need for flexible access to corporate data comes to the fore, and not just the need to solve a specific functional task. There is less direct dependence on IT departments to produce custom reports or queries. It is possible to switch from static regulatory reports to a “live report”, and the most advanced analysts get the opportunity to conduct cross-thematic analysis and build summary reports from scratch, having a semantic layer that describes all indicators and sections corporate information. The same tools can be used by programmers to quickly create routine, parametric reports. Web access to BI (both static and dynamic content) will provide a real corporate information space and teamwork of employees.

The main risk is too rapid changes in BI technology, the use of untested solutions and tools. It is necessary to track suppliers, evaluate their sustainability, development directions, regularly try new tools, typify and unify BI. Another risk is related to data quality - if they are not properly transformed, cleaned and consolidated, then no "fancy" features of BI tools or applications will be able to increase the reliability of the data. A number of problems can arise due to inconsistent metadata. Within a large corporation, these issues are resolved at the infrastructural level by creating a corporate data warehouse and centralized metadata management. The creation of a repository will help to bring order to the nomenclature of collected indicators, data collection, dissemination and authorization of access. The BI technology itself is not able to solve these problems comprehensively, and neglecting them returns to information anarchy and “data silo pits”.

Major players in the BI field

In accordance with the notorious Gartner magic squares, the technology leaders in EBIS today are Business Objects and Cognos, on the border between leaders and challengers - Information Builders, and Microsoft and Oracle - in challengers. One does not have a standalone OLAP client, but uses the Excel200x pivot table functionality and no report generator, the other does not yet have a replacement for Oracle Express Analyzer. In the group of "visionaries" stand out Crystal Decisions on the border with the leaders. Also of note are Actuate and MicroStrategy.

There are practically no leaders for BI platforms, which indicates the immaturity of technologies and the market. So far, only Microsoft is on the border of this area due to solutions for embedding OLAP services in MS SQL Server and developing them to an analytical server. Among other contenders - SAS Institute, further the dense group is formed by Oracle, PeopleSoft and SAP. Hyperion is literally at a crossroads - SAS and Hyperion lost their leading positions in 2000. Among the visionaries, MicroStrategy should be noted. Unfortunately, Crystal Decisions is still a niche player.

Trends

Among BI tools, EBIS is experiencing the most growth, reflecting increased competition in today's economy. The use of tools for generating queries and reports, data analysis is declining, organizations are updating them and replacing them with corporate BI suites. The core tools (ad hoc queries, reporting, and basic OLAP analysis) are still the most common, covering most needs. There is also a growing use of OLAP and other advanced BI tools like data mining technology. However, standalone data mining tools are disappearing, this technology is being absorbed and included in other BI tools, such as database extensions.

Within 5 years, capabilities such as XML for Analysis (XML/A), BI Web services, collaboration, wireless and mobile communications are expected to converge as business intelligence networks (BI networks), which will be complemented by business monitoring tools. activities (Business activity monitoring, BAM).

XML for parsing. XML / A originally appeared as a communication protocol between different BI layers (client, analytical server, database server). XML/A has serious performance problems - it creates a lot of overhead and is currently only applicable to a "lightweight" OLAP client. However, if these issues are resolved, XML/A could become a common language (lingua franca) between different BI environments, crossing multiple domains, vendors, and technologies, thus supporting BI networks.

BI Web Services. Vendors often identify EBIS products as BI portals because the Web versions of these products provide an entry point to corporate information. In fact, these BI portals often also support links to unstructured information, although this usually requires some sort of integration system. More and more EBIS products focus on the external components of the corporation (extranet e-business intelligence). The new service-oriented SOA component architecture is an evolution of application servers and corporate portals. This innovation is also related to J2EE and .NET technologies. BI Web services make BI tools open components with known interfaces and available on all kinds of networks. An increasing number of vendors of BI products are implementing them as Web services, but more often under the guise of portals.

Collaboration. Adding annotations to reports and sharing analysis results among multiple users has been possible since the days of EIS, but this functionality is now popular and workflow capabilities have been added to many BI applications. Users are expected to be able to work on the same model at the same time or link different BI applications in real time.

Wireless and mobile business intelligence. Another strong trend in delivering BI information is seen with vendors enabling BI products to deliver reports via mobile technology, including PDAs, Internet phones, and pagers.

Monitoring of business activity. New technology BAM is essentially operational BI and combines real-time application integration with business intelligence capabilities. Using transactional data extracted from real-time transaction processing systems, BI tools analyze this data and issue critical event alerts and information to operational decision makers.

Literature
  1. Korneev V.V., Gareev A.F., Vasyutin S.V., Raikh V.V. Database. Intelligent information processing. // M.: Knowledge, 2001
  2. Tom Sullivan.
  3. Kimbal R. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Willey & Sons, 1996
  4. Thomsen E. OLAP Solutions: Building Multidimensional Information Systems. Wiley Computer Publishing, 1997
  5. Spirli E. Corporate Data Warehouses. Planning, development, implementation. Volume 1: Per. from English. // M.: Williams, 2001
  6. Arkhipenkov S., Golubev D., Maksimenko O. DATA STORAGE. From concept to implementation / Ed. Ed. S.Ya. Archipenkova // M.: DIALOG-MEPhI, 2002
  7. V., Samoylenko A. Data mining: training course. // St. Petersburg: Peter, 2001
  8. Inside Gartner Group (Russian), Drezner H., Hostmann B. and F. Beitendijk. Management Note: Updated Gartner Magic Squares for Business Intelligence Systems, 2003, February
  9. Liautaud B., Hammond M. e-Business Intelligence: Turning Information into Knowledge into Profit. McGraw-Hill, 2001
  10. Christine Comaford. .
  11. Tom Sullivan. .

Valery Artemiev(avi @cbr.ru) - Advisor to the Director of the Main Informatization Center of the Bank of Russia (Moscow).



Buzzwords, popular terminology, not entirely clear definitions and completely unfamiliar lexical units. All of the above can be applied both to the concept of “business intelligence” and to the phrase “data science”. Let's try not only to overcome the difficulties of translation, but also to understand how "data science" and "business intelligence" differ.

Business Intelligence: intelligence, intelligence, comprehension, analytics

Many are sure that the term “business intelligence” first appeared in the 80s. last century, but this is not entirely true. The fact is that Hans Peter Lun, a researcher from IBM, was the first to use this term back in 1958. And in 1989, Howard Dresner, who later became an analyst at Gartner, defined "business intelligence" as describing "concepts and methods for improving business decision making using business data-driven systems."

Let's listen to other experts. So, Jonathan Wu, manager of Netgear, defines BI as the process of collecting multidimensional information about the subject being researched. And here is the interpretation proposed by The Data Warehousing Institute: Business intelligence is the process of turning data into knowledge, and knowledge into business actions for profit.

BI can be viewed not only as a process, but also as a result of the process of obtaining knowledge. However, if we compile all the definitions that “drift” in the market, we can argue that business intelligence in the broadest sense of the term is the process of turning the received data into business knowledge that is used to make better decisions. In addition, it is also Information Technology data collection and consolidation. And finally, BI is business knowledge, which is obtained through in-depth data analysis. In short, business intelligence is technology, analysis and knowledge.

Data Science: the science of chaos put in order

Recently, data science has been seen not only as an academic discipline, but also as a practical cross-industry field of activity. The term itself was coined by William Cleveland, a professor at Purdue University who is considered one of the biggest authorities in statistics, machine learning, and data visualization.

According to the definition of the international council CODATA (International Council for Science: Committee on Data for Science and Technology), data science is a discipline that combines various areas of statistics, data mining and machine learning. However, the most popular definition is given in the article “What is Data Science?” Mike Loukidis, editor of O "Reilly Media and author of books on operating systems, computer architecture and programming. It is worth noting that this interpretation is fundamental today. It is a generalized name for technologies that are designed to produce data as a product. If we compare the science of data with traditional statistics, at first glance it may seem that there are no differences between them.However, Data Science is characterized by integrated approach, and data scientists do not study data, but use it.

Thus, we come to the conclusion that Data Science studies the problems of analyzing, processing and using data. This is such a fantastic "assortment", from which the head is spinning: here you have statistics, and data mining, and artificial intelligence that processes large amounts of data, and database design methods, and much more.

Nothing new under the... data firmament

Cloud computing and other technological advances have forced companies to focus more on the future rather than analyzing reports based on past data. To obtain competitive advantages, companies have begun to combine and transform data, which is part of real data science.

At the same time, they practice Business Intelligence by creating graphs, reports and tables based on the data they have received. While there are big differences between Data Science and Business Intelligence, they are equally important and complement each other.


In order to practice BI and Data Science, many companies hire specialists who combine two positions at once - BI analysts and data scientists. However, this is where the confusion comes in due to the misunderstanding that these roles require different expertise.

It is unfair to expect a BI analyst to make accurate business forecasts. This can be disastrous for any company. However, by learning the main differences between BI and data science, you can learn how to select the right candidates for the specific tasks that your business intends to solve.

Area of ​​interest

On the one hand, the traditional BI approach involves creating dashboards to display historical data according to a fixed set of KPIs. From this we conclude that BI relies more on reports, modern trends and key indicators efficiency (KPI).


On the other hand, data science focuses more on predicting what might eventually happen in the future. Thus, data scientists are more focused on studying patterns and different patterns, as well as finding correlations for business forecasts.


For example, businesses need to anticipate the growing need for new types of training based on existing patterns and the requirements of corporate companies.

Analysis and data quality

BI requires analysts to be able to focus not only on the present and future, but also to look into the past - that is, to actively use historical data. Therefore, the analysis of BI analysts is more retrospective. The focus of Business Intelligence is absolutely accurate data based on what actually happened in the past.


For example, a company's quarterly results are generated from real business data over the past three months. Errors in this case are simply impossible, because reporting is descriptive and cannot be subjective.

When it comes to data science, data scientists should use predictive and prescriptive analytics. They are required to predict fairly accurately what is going to happen in the future, using probabilities and confidence levels.


How a company will perform the necessary actions based on predictive analysis and forecasts for the future cannot be based on simple guesses. Of course, data science cannot be 100% accurate, but it must be “good enough” for the business to make timely decisions and actions, and deliver the results you need.

A perfect example of data science in action is a company's earnings estimate for the next quarter.

Data Sources and Transformation

Business Intelligence is all about planning ahead and preparing to use the right combination of data sources to transform them. To get relevant insights about customers, business operations and products, Data Science is able to transform data on the fly using the information sources that are available on demand.


Need for mitigation

BI analysts do not have to mitigate any uncertainties surrounding historical data, as they are based on real situations. Such data is accurate and does not imply any probabilities.


In 2007, the market for BI platforms experienced major changes due to its significant consolidation. Major vendors have made strategic acquisitions: Oracle has completed its acquisition of Hyperion, SAP has announced its acquisition of Business Objects, Cognos has completed its acquisition of Applix and agreed to a merger with IBM.

How did these events affect the BI platform market? The clearest answer to this question can be obtained by looking at Gartner's magic square (Figure 1), which shows the distribution of top BI platform vendors in early 2007 and 2008.

Rice. 1. Position of leading vendors in the market of BI platforms (Source: Gartner)

Before commenting on the above changes, let's take a quick look at Gartner's methodology for selecting and presenting BI platform vendors on the "magic square" plane. First of all, let's clarify what Gartner understands by the term "BI platform".

What is a BI platform according to Gartner

In the most general terms, Gartner defines a BI platform as a tool that enables organizations to build applications that enable them to learn and understand their business. According to a more detailed interpretation, Gartner defines the BI platform (BI platform) as a software platform that provides 12 functions, which, in turn, are divided into three groups: integration, information delivery tools, and information analysis tools.

Integration

GeneralBI infrastructure- all platform tools should use the same security tools, common metadata, common administration tools, common query generation tools, and also have the same type of interfaces.

Metadata management- All application tools should not only rely on the same metadata, but should also provide fast search, storage, use and publication of metadata objects such as dimensions, hierarchies, performance evaluation parameters and reporting parameters.

Development tools- along with the means of creating separate BI applications, the BI platform should provide software development tools for integrating applications into a common business process or ensuring their embedding in another application. The BI platform should allow developers to create BI applications without coding, based on the use of wizards (wizard-like components) for visual editing.

Collaboration and workflow management - this opportunity allows BI users to share and discuss information using shared folders and discussion threads. In addition, BI applications can assign and track events or tasks assigned to individual users based on some predefined business rules. Typically, this functionality is provided through integration with a separate workflow tool.

Means of providing information

Reporting tools(Reporting) - make it possible to create formatted interactive reports. In addition to this, BI platform vendors should provide a wide range of report types (financial, operational, etc.) in the form of dashboards.

Dashboards(Dashboards) - one of constituent parts reports, presentation of information in the form of an intuitive graphic image, including diagrams, dials, traffic lights, etc. These indicators show the state of the analyzed parameter against the background of its intended purpose (Fig. 2).

Rice. 2. An example of an information panel (Dashboard)

A manager or analyst, like an airplane pilot, sees a "dashboard" in front of him and controls the system, focusing on the values ​​of the indicators. Wherein key factors, necessary for enterprise management, must be somehow measured and presented in the form of indicators. The motto of the concept is “If you can’t measure it, you can’t manage it”.

Ad Hoc Request Generator(Ad hoc query) - Also known as self-service reporting, this feature allows users to get answers to their questions. The system provides a means of navigating through the available data resources.

Integration with Microsoft Office- in some cases, BI platforms are used as an intermediate link in the information analysis chain, and Microsoft Office (in particular Excel) acts as a BI client. In these cases, it is critical that the BI vendor provide integration with Microsoft Office, including support for document formats, formulas, and pivot tables.

Information analysis tools

OLAP(Online Analytical Processing - analytical processing in real time) - information processing technology, including the compilation and dynamic publication of reports and documents. Used to quickly process complex database queries. OLAP technology provides high speed request processing. It takes a snapshot of a relational database and structures it into a spatial model for queries. The fact is that relational databases store entities in separate tables and complex multi-table queries are executed relatively slowly in them, while a spatial database is a better model for queries. The claimed query processing time in OLAP is about 0.1% of similar queries in a relational database.

Advanced Visualization- advanced visualization tools allow you to present data for more effective perception through the use of interactive pictures and charts instead of tables (Fig. 3). Typically, users can dynamically change the graphical representation, use scaling, combine data, change colors.

Rice. 3. An example of using visualization in providing data
on the Cognos dashboard

Predictive modeling and data mining. Predictive Modeling is the process of creating (or selecting) a model to predict the likelihood of an event occurring. Data Mining is the process of discovering previously unknown, non-trivial, useful and accessible knowledge in raw data that is necessary for decision making. The information found in the process of using Data Mining methods should describe new relationships between properties, predict the values ​​of some features based on others, etc. Found knowledge should be applicable to new data with some degree of certainty. When the extracted knowledge is not transparent to the user, there should be post-processing methods to bring it to an interpretable form. Tasks solved by Data Mining methods include:

  • classification - assignment of objects (observations, events) to one of the previously known classes;
  • regression, including forecasting tasks; establishing the dependence of continuous output on input variables;
  • clustering - grouping of objects (observations, events) based on data (properties) that describe the essence of these objects. Objects within a cluster must be similar to each other and different from objects in other clusters. The more similar objects within a cluster and the more differences between clusters, the more accurate the clustering;
  • association - identifying patterns between related events. An example of such a pattern is a rule indicating that from an event X event follows Y. Such rules are called associative. This problem was first proposed to find typical patterns of purchases made in supermarkets, so it is sometimes also called market basket analysis;
  • sequential patterns - establishing patterns between events related in time, that is, detecting a relationship according to which if an event occurs X, then after a given time, an event will occur Y;
  • deviation analysis - identifying the most uncharacteristic patterns.

Scorecards(Scorecards) use the benchmarks displayed on the dashboard for deeper analysis by overlaying them on some kind of strategy map that links key performance parameters to strategic objectives. This concept is illustrated in Fig. 4. Technology involves further analysis based on the application of performance management methodology, such as Six Sigma.

Rice. 4. Comparison key parameters performance
with strategic objectives

After we have explained the term BI platform, let's return to the analysis of the "magic square" in Fig. one.

Criteria for selection and evaluation of companies

The Gartner study (see Figure 1) included companies selected according to the following criteria:

  • providing at least 8 of the 12 features inherent in the BI platform;
  • occupying a significant share of the BI platform market, as evidenced by sales of at least $20 million;
  • solutions on platforms that work at the enterprise level, and not just at the departmental level.

On fig. 1, a number of terms are used, according to which vendors are located on the plane of the square. Let's explain their meaning:

  • the possibility of implementation is determined by the following factors:
    • how competitive and successful are the products,
    • what is the probability that the vendor will continue to invest in the product/service,
    • How successful is the vendor's pricing policy,
    • how resistant the vendor is to changes in the market,
    • how informed customers are about the vendor's offerings,
    • how vendors are able to fulfill marketing promises,
    • how satisfied customers are with the vendor's service support;
  • completeness of vision is the ability of a vendor to exploit market trends to create additional services for customers and corresponding benefits for themselves. The completeness of vision can be assessed based on the quality of:
    • forecasts of customer needs,
    • marketing strategy,
    • sales strategies,
    • development strategies in vertical market segments,
    • strategies for entering remote markets;
  • leaders are vendors that ensure the wide functionality of their products, their successful implementation and provide high-quality support at the global level;
  • applicants - have limitations that may be associated not only with the breadth of the range of technological solutions, but also with market indicators, such as the quality of the distribution network, etc.;
  • visionaries are vendors with a strong strategy for promoting BI platforms, which is manifested in the openness of standards, the flexibility of solution architecture, and the depth of functionality of the applications they create. They are leaders in the field innovation activities;
  • niche players - occupy a leading position in some limited product or technology area.

Trends in the BI platform market

As can be seen from fig. 1, megavendors are starting to dominate the BI market. Indeed, in less than a year, Microsoft, Oracle, SAP and IBM have gone from owning a quarter of the market to owning two-thirds.

When comparing the squares for 2007 and 2008, it is clear that Microsoft has risen to take first place in terms of implementation opportunities. SAP is not yet in the lead, apparently because the merger with Business Objects has not yet been completed. Oracle has moved into second place behind SAS in terms of completeness of vision.

Thus, the magic square of BI platforms for 2008 reflects the fact that leadership is moving from independent BI vendors such as Business Objects and Cognos to mega-vendors.

In July 2007, Oracle completed the acquisition of Hyperion. This has resulted in two competing platforms - Hyperion System 9 and Oracle Business Intelligence Enterprise Edition - merging under Oracle's leadership and thus expanding Oracle's BI resources in both technology and human resources.

In October 2007, SAP announced the acquisition of Business Objects in order to expand its market presence. This merger (which was completed in January 2008) closes a significant gap in SAP's query and report generators.

Cognos has completed the takeover of Applix, which has powerful OLAP technology, and in turn has agreed to be taken over by IBM.

Over the same period, factors such as the maturation of the Microsoft BI portfolio, the development of Web 2.0 technologies, the development of open source BI products, the development of Software as a Service (SaaS) offerings, have led to the fact that BI functionality has become more accessible than before.

OpenSource BI solutions have made significant progress in their development, but the turnover from their implementation is still insignificant. JasperSoft, one of the largest vendors in the field, claims to have over 7,000 commercial customers and over 70,000 active deployments.

There is also a growing interest in providing BI solutions in the form of SaaS. In particular, Business Objects is a leader in the business of providing BI applications on demand (OnDemand), but there are smaller firms such as Seatab, Oco and LucidEra that provide BI solutions as a service. The use of BI solutions in the form of an OnDemand service is not suitable for all organizations; it is of little use for organizations that work with classified data. Nevertheless, every year more and more companies choose the SaaS model as more economical and reliable enough.

Analysis of the position of leading vendors

business objects

Among companies that specialize exclusively in BI solutions, Business Objects offers the most complete platform with well-developed report and query generation technology.

About 90% of organizations that have implemented this solution note that it is standard for their organization.

Business Objects expanded its BI offering in 2007 with the addition of Inxight.

The rapid growth of Business Objects' on-demand (OnDemand) BI offerings to over 70,000 users makes it the de facto leader in SaaS-BI.

Business Objects will have to adjust its strategy after acquiring a new status as a result of the transition to the ownership of SAP, that is, it will have to spend some time on changing sales channels, support system, etc.

According to customer reviews, OLAP is weak side in Business Objects solutions.

Cognos

Cognos has an exceptionally high adoption rate of its BI platform as a standard solution for enterprises. More than 90% of those surveyed consider Cognos the standard for their organization.

Cognos is actively investing in efforts to improve the architecture of the platform. With the advent of version 8.2 and the future version 8.3, Cognos 8 BI has almost got rid of problems with insufficient stability and poor technical support. Most customers are currently running the latest version of Cognos BI.

Once the merger of Cognos with IBM is completed, the Cognos BI platform will benefit from its ability to integrate with IBM technologies.

Another benefit for Cognos will come as it embraces Applix TM1 OLAP technology.

Cognos' data mining technology is still weaker than its competitors' offerings.

Microsoft

Successful pricing policy and integration with MS Office makes Microsoft solutions especially attractive for organizations that are based on the company's infrastructure solutions.

When promoting its BI solutions, Microsoft can rely on a large audience of developers. Microsoft estimates that this is 2 thousand OEM / ISV partners for the implementation of its BI solutions.

According to customer reviews, BI solutions from Microsoft cause minimal complaints.

Microsoft's BI solutions were created by Microsoft, not purchased with an affiliated firm.

Microsoft belatedly joined the race to promote BI platforms, and therefore now its strategy is to "catch up and overtake." Customers estimate that Microsoft still lags behind traditional BI platform companies, especially in terms of metadata management, reporting, and dashboarding.

microstrategy

Instead of an affiliation tactic, MicroStrategy built the technology entirely in-house. This provides a high degree of integration within the platform.

MicroStrategy has positive customer reviews across all 12 criteria assessed by Gartner.

The development of new technologies can lead to a weakening of MicroStrategy's position, which it currently occupies in the field of processing extra-large amounts of data.

MicroStrategy has a reputation for offering expensive solutions that are hard to get a discount on.

MicroStrategy focuses exclusively on BI platforms and pays little attention to related technologies - CPM (Corporate Performance Management - corporate performance management) and data integration.

MicroStrategy has a small sales volume in the Asia-Pacific region.

Oracle

Even before Hyperion joined in mid-2007, Oracle's position in the BI market was quite strong: its combination of BI platform and analytical applications (Oracle BI Enterprise Edition (OBIEE) and Oracle Analytic Applications) was a very successful offering.

Customers give E positive feedback on OBIE. They note the wide range of solutions for organizing teamwork, as well as advanced visualization tools, which, according to them, are among the best on the market.

The strengths of the Essbase OLAP engine and Hyperion's integration with Microsoft Office enhance Oracle's potential in the BI market.

Oracle is in a good position to promote its BI technologies to a variety of clients, not just Oracle platform enthusiasts.

The process of integrating the BI solutions resulting from the merger will take a long time in 2008.

There is evidence that among installations of Hyperion BI Base, the percentage of the latest version is low, which indicates that customers are in no hurry to upgrade to the latest version of the product.

Oracle should improve technical support.

SAP

With over 13 thousand implementations, SAP company has made great strides in advancing the NetWeaver BI solution. More than 75% of SAP customers surveyed by Gartner testified that BI solutions from SAP are standard in their organizations.

With the integration of SAP and Business Objects complete, SAP will become the largest BI platform vendor, twice the size of any other competitor.

The strengths of Business Objects, primarily formatted report generation and self-service report creation, successfully fill in the gaps in SAP BI solutions.

In a Gartner study, SAP customers using the latest version of SAP BI noted implementation difficulties.

The inclusion of Business Objects somewhat reduces SAP's score, which Gartner loosely refers to as implementability. This is due to the inevitable uncertainty for clients who have relied on pre-existing domestic products SAP in BI.

Despite the fact that implemented NetWeaver BI solutions are capable of importing data from non-SAP applications, SAP can name no more than 25 large enterprises who have implemented NetWeaver BI, wherever credentials dominate SAP systems. To achieve market leadership, SAP needs to demonstrate that it can implement its platform in enterprises where SAP applications are not dominant.

SAS

SAS is a leader in advanced analytics (Advanced Analytic Solutions).

SAS offers analytics solutions that not only provide basic functionality at the KPI analysis level, but also offer advanced analytics for detecting business problems such as fraud detection.

SAS is famous brand, SAS solutions have worldwide service support.

SAS applications are considered difficult to learn. Many advanced analytics applications require the use of a special SAS programming language - this is an advantage for programmers and a significant limitation for people who do not have such skills.

In conclusion, we list the main trends in the BI platform market:

  • the relevance of the task of optimizing the performance of companies at all levels stimulates the demand for BI solutions;
  • the capabilities of BI platforms are expanding, and, in addition to traditional report and query generators, as well as OLAP functionality, "dashboards" (dashboards), scorecards (scorecards) and advanced visualization have been actively developed;
  • mega-vendors begin to dominate the BI market;
  • BI solutions in the form of SaaS are being actively promoted by many vendors;
  • the process of mergers and standardization is the engine of the market.

Analytical review: BI in Russia 2009

Analysts of the TAdviser center completed preparation of the open review of the market of the platforms for the business analysis (BI) presented on Russian market. On this page you can read the most interesting sections of the review.

Benefits of using a BI system

Systems for business analysis solve a very wide range of tasks. Thus, the “near horizon” is monitoring, analysis and adjustment of operational goals:

    support for the development of business processes and structural changes of the enterprise;

    the possibility of modeling various business situations in a single information environment;

    conducting operational analysis on non-standard requests;

    reducing the routine workload on staff and freeing up time for deeper analytical work;

    stable operation with an increase in the volume of processed information, the possibility of scaling.

In terms of supporting the strategic development of an enterprise, BI systems provide:

    evaluation of the effectiveness of various business areas;

    assessment of the feasibility of the goals set;

    assessment of the efficiency of resource use, including by subsidiaries;

    assessment of the effectiveness of operating, investment and financial activities;

    business modeling and evaluation of investment projects;

    cost management, tax planning, capital investment planning.

To date, according to experts from Gartner, only 15-20% of business users are actively working with BI applications, while the rest consider systems for business analysis too complicated to use. However, the active development of tools for interactive data visualization and the further spread of Internet technologies should soon improve the situation.

According to analysts at MiPro Consulting, the introduction of an independent BI system in an organization provides a number of advantages over using analytical tools built into other corporate information systems. Some of the benefits of a BI system include:

    greater visibility and convenience of working with information for business users, including top management;

    the ability to use several analytical solutions for different areas of activity on an enterprise-wide scale, and not within individual departments;

    allows you to extract, analyze and consolidate data from virtually any source;

    is based on an industrial, supported and developed BI platform;

    has the status of an independent, strategic, business-critical application;

    provides the necessary scalability, efficiency, performance;

    allows you to build and maintain end-to-end procedures and processing processes throughout the organization, unified centralized analytical models and projects;

    contains built-in tools for solving various and diverse analytical tasks, both from a business and IT point of view;

    provides access to data and analytical tools to more users.

The use of analytical tools built into other corporate information systems, for example, the ERP or CRM class, as a rule, has the following limitations:

    a limited set of implemented analytical tools that are the same for all users, regardless of their roles and tasks;

    the ability to use only your own, internal data for analysis, while information from other systems remains inaccessible, and data from various sources cannot be consolidated;

    the lack of developed built-in tools for analysis leads to the fact that the system is used only to extract the data stored in it, which are then exported and analyzed in Excel;

    ERP and CRM systems, as a rule, have a limited number of users, which “cuts off” a large number of company employees from analytics who would find this information useful and interesting (a significant increase in the number of users reduces the performance of transactional systems);

    transactional systems usually do not contain all the indicators necessary for analysis, do not include tools such as dashboards, which have already become the standard for presenting analytical information;

    the results of analysis in such systems are usually presented in the form of tabular reports or diagrams, which does not allow obtaining a detailed and comprehensive idea of ​​the real state of affairs and does not answer many questions that arise;

    the ability to create flexible custom (ad-hoc) requests is limited;

    the use of large volumes of accumulated historical information is limited.

When choosing or upgrading a system for business analysis, you should consider ways to store and integrate data, visualization and analytics tools.

Data storage

If a company is faced with the task of identifying long-term or periodic trends, that is, users need to analyze historical data coming from various departments over the past 3-5 years, then most likely they should think carefully about organizing ETL operations to load data into data warehouses.

If a company or any of its departments needs to analyze information on a monthly or weekly basis, then the best solution would be to allocate and organize for these purposes (for each of the departments or for solving specific problems) separate data marts, also using ETL tools.

If the company plans to analyze operational data in a near real-time mode (that is, updated several times during the day), then it may be necessary to abandon the organization of the data warehouse and pay attention to the development of integration tools based on an intermediate virtual metadata layer with elaboration appropriate interfaces and algorithms (according to the EII principle).

Data integration

As noted above, if the purpose of implementing a BI system is to solve individual, specific tasks, then it is advisable to limit ourselves to organizing data marts. In this case, the use of any separate integration algorithms is not required.

If, on the contrary, BI is implemented in order to obtain a single, holistic view of the overall state of the business, then it is probably impossible to do without creating a centralized data warehouse and, accordingly, introducing the necessary ETL tools. In addition, in order to obtain a truly adequate picture of the business, it is necessary to pay special attention to ensuring the high quality of the analyzed data, and this will require the introduction of an expanded set of tools for their "cleaning" - identifying incomplete or erroneous data, duplicate information, bringing data from various sources to a single format.

If the company focuses on the study of operational data, then you should consider means of replication and access.

Visualization and analytics

Depending on the tasks set, as well as on the qualifications of users, data visualization tools are also selected - control panels, scorecards, reports, OLAP cubes.

For experienced, qualified users, OLAP cubes will be the best tool, which will allow them to conduct deep and detailed business analysis, with the required degree of detail.

Users who in their daily activities are faced with the need to make management decisions, as well as analyze business performance, are interested in organizing a workplace in the form of a control panel, which displays the state of the business as a whole in the form of visual scales and indicators, with the ability to switch between individual areas activities.

Ordinary managers need funds to solve their current tasks, control the progress of certain types operations, as well as to control the activities of their employees (each individual employee and the team as a whole). In addition, in order to organize clear interaction with related departments (or regions), it is necessary to be able to get an idea of ​​the progress of the implementation of interrelated tasks.

Vertical or horizontal solution

There are both horizontal BI solutions on the market that implement a set of generally applicable tools, as well as specialized vertical solutions “tailored” for specific industries or tasks. Both of them have their advantages and disadvantages.

The advantage of horizontal solutions can be seen as their ability to grow with the organization. Such solutions are usually scalable and can cover all lines of business and all departments. big company and are also easier to change. The downside of this breadth of possibilities is the need for a longer and more thorough customization of solutions, adaptation to specific requirements. Implementation projects are becoming more costly and the demands on IT professionals are becoming higher.

Vertical solutions, for their part, do not require separate lengthy and time-consuming configuration to solve specific problems and to comply with the requirements of industry regulatory organizations (financial, medical, etc.). However, it may turn out that different departments within the same structure will not be able to use a single solution, and it may be necessary to master and integrate several different systems for business analysis.

Those organizations that now and in the future plan to engage in their specific activities that require compliance with certain strict regulations are likely to benefit from the introduction of vertical solutions. If there is no confidence in such a commitment to a certain type of activity in the future, and it is likely that the company's specialization will expand significantly, then choosing a vertical BI solution is a certain risk.




Top