Applied Microsoft Business Intelligence. Sarka Dejan
Чтение книги онлайн.

Читать онлайн книгу Applied Microsoft Business Intelligence - Sarka Dejan страница 7

СКАЧАТЬ Are the Data Sources,” reiterated the importance of a group of end users over and over again. You have a similar case when deciding how to implement and enforce data governance. From a technical perspective, this step is typically performed prior to loading the data in the data warehouse. So, why is it located after the data warehouse section? When planning a business intelligence solution, time is valuable. Why invest time and resources on something that you may not need? If data is accessed directly from the source, the want or need for data governance is reduced. However, because you may have multiple sources of data for a single data warehouse, you may require a data governance strategy to ensure that you deliver accurate, consistent, and trustworthy data to end users. Figure 2.3 illustrates where data governance would fit in a Microsoft business intelligence solution.

      DATA GOVERNANCE

      So, what is data governance? In short, data governance is a process or set of processes that include:

      ● Data quality

      ● Data management

      ● Data monitoring

      ● Data maintenance

      ● Data security

image

Figure 2.3 Data governance in a business intelligence project

      These tasks are accomplished by putting people familiar with the data together with a technology solution or set of solutions. In the case of Microsoft, it would be a set of technologies: Data Quality Services (DQS) and Master Data Services (MDS).

      NOTE

      A discussion about these technologies is well beyond the scope of this book; however, the exclusion of a data quality process could be detrimental to the entire business intelligence solution. The Tutorial: Enterprise Information Management using SSIS, MDS and DQS Together, teaches developers how to implement a sample Enterprise Information Management (EIM) solution. It can be downloaded from this link: http://www.microsoft.com/en-us/download/details.aspx?id=35462.

      For example, assume that during the data discovery process you realize that two very similar datasets are obtained from two completely different sources; for the sake of brevity and simplicity, assume that both datasets reference gender data. The catch is that in one system gender is stored as Male and Female, and in the other the data is stored as M and F. Prior to importing the data into the data warehouse, a decision must be made regarding how to store gender; this is part of the data governance process. Using the knowledge of those individual stakeholders and the technology of choice (MDS and DQS in this case) you can implement a process that ensures that the data is accurate and consistent.

      Planning an Analytical Model

      The next logical step in architecting a business intelligence solution is deciding whether an analytical model is necessary. Prior to the release of SQL Server 2012, making that decision was a much simpler process. However, considering the added and enhanced features in that release as well as the latest release, as of this writing, making that determination is now a little more complicated. Why, and what has complicated this process?

      In most cases, the driving factors behind developing analytical models is improving query performance time. Developers have come up with some workarounds that circumvent the need for an analytical model, such as building aggregate tables, using indexed views, or adding more hardware. Although these solutions work, they usually only temporarily fix the problem. As data needs grow, the return on either of the aforementioned solutions becomes too expensive or simply just does not work. Recognizing this, Microsoft included a new column-based index within the database engine. By implementing these index types, you can increase query performance several magnitudes over, possibly eliminating the need to develop an analytical model entirely.

      Although the new column-based index may improve query performance, when it comes to addressing analytical needs such as complex time-based analysis, key performance indicators, hierarchies, unary operators, and other capabilities that are beyond the scope of a relational model, the only choice is to develop an analytical model. Prior to SQL Server 2008R2, you only had a single type to choose from when developing a business intelligence solution based on the Microsoft stack. However, with later releases of SQL Server, you have three choices: Power Pivot, tabular, and multidimensional. Chapter 5 provides a detailed explanation that will assist you in choosing the right one. However, regardless of which model you select, the planning process is very similar across all three.

      Prior to building the model, you should consider the following;

      ● Make a concerted effort to ensure that the data warehouse schema is almost 100 percent complete, including the data governance and load process. The model development could actually begin without the data; however, little can be done in regards to validating the code without data.

      ● Ensure that the model requirements have been scoped. This may seem like an obvious step, but often developers start building the model without consulting any stakeholders or end users. Typically, the result is something irrelevant or wrong.

      ● Decide on data latency. In other words, can the data in the model be 15 minutes behind, 1 hour behind, 1 day behind, or 1 week behind. This decision is ultimately based on end-user needs.

      If a decision was made not to develop a data warehouse and instead obtain the data directly from the source, it is possible to have almost real-time access to data by implementing either a Direct Query tabular model or Real-time Online Analytical Process (ROLAP) multidimensional model. Note that this feature is not available when developing a Power Pivot model. These methods can also be implemented if a data warehouse is implemented, but the data will be as fresh as the data in the warehouse. For example, if the data is loaded into the warehouse nightly, then you'll have a 24-hour difference between the data in the source and what you'll see in the model.

      On the other hand, a period of latency may be acceptable. If that is the case, you should add a step after the data warehouse is loaded and prior to anyone accessing the data. This step will load or process the data into the model. For tabular data it would be an In-Memory model, and for multidimensional data it would be a Multidimensional Online Analytical Processing (MOLAP) model. In addition, adding this step also opens up the possibility of developing a Power Pivot model, which was not available before.

      One more thing to consider is a hybrid scenario where some data can afford latency and some may not. In that case, one solution would be to use a Hybrid Online Analytical Processing (HOLAP) model, in which some objects access data real-time and some require the processing of data.

      Planning the Business Intelligence Delivery Solution

      With everything in progress or close to completion, the time has come to decide how to deliver everything to end users. More importantly, you must ensure that the selected topology or solution is performant, secure, and available. Implementing a solution that encompasses all three of these characteristics can be time-consuming and costly. Which leads to the question: Are they all required? As a best practice yes, but typically no. The goal of most deployments is to improve performance and to present data in ways that resonate with consumers. Security and availability are usually an afterthought, often not occurring until it's too late. Fortunately, when building a Microsoft business intelligence delivery solution, you have several options available to economically implement a solution that addresses all three.

      When СКАЧАТЬ