Applied Microsoft Business Intelligence. Sarka Dejan
Чтение книги онлайн.

Читать онлайн книгу Applied Microsoft Business Intelligence - Sarka Dejan страница 6

СКАЧАТЬ important, is deadlines. End users often measure the success of a project based on hands-on interactivity. If end users have nothing to see, touch, and use on a given date, the project can easily lose the trust and support that existed at the inception of the project. You could categorize this as a positive aspect of the partnership in regards to success. Deadlines should ensure the on-time delivery of the solution. However, the developers working on the project may see this as negative because it could inhibit or minimize what can and will be delivered due to time constraints.

      Another item that is often a result of this partnership is the amount of partner involvement (or the lack thereof). Because a large part of a business intelligence project is discovery, partners must spend time discovering data sources, data needs, goals, how to visualize, which tools to use to visualize, and so on. This requires the involvement of certain people, who are often already inundated with their existing jobs. Because their time is already at a premium, finding additional time to devote to the new project is difficult. Although developers are often well versed in and have intimate understanding of the data to use, they often lack the knowledge of how to massage the data to meet the project's requirements. Without that knowledge, the business intelligence project is destined to fail before it begins.

      What Is the Goal(s)?

      Now that you've identified your project's end users, it's time to take their knowledge and convert it into goals and requirements. These goals and requirements typically equate to the scope of the project, which further assists in defining timelines, selecting tools, identifying data needs, and selecting hardware. Within the scope, you outline certain goals, such as what should be developed, how it should look, who or what should have access to it, and what to use as the delivery mechanism. Although not an exhaustive list of outlined goal types, they should assist you in quantifying and identifying a project's goals.

      What Are the Data Sources?

      With the users and goals identified, the time has now come to perform one of the most difficult steps in the project – identifying the data sources. Believe it or not, a person or group of people from the expected end users are the best source for this task. IT data sources often reference data that resides only in systems that IT manages with no regard to data hosted by a department, branch office, or third party. For a business intelligence project, this is not typical. You must perform an exhaustive search-and-discovery process with as much involvement from any stakeholders that are willing to assist. You may host these sources within or outside an organization.

      Regardless of whether the data is internal or external, you should carefully perform discovery to ensure that you have included all relevant data in the project. This process often requires several iterations. During report development, someone may recognize that data is missing due to an oversight. As a result, you'll need to modify the ETL process to include the new source.

      Using Internal Data Sources

      While you source the majority of the data from traditional IT-managed relational databases, you'll always have some data managed and maintained outside the IT department. This data is likely stored in spreadsheets, Access databases, comma-delimited files, text files, or other file types, and they may reside on someone's desktop or laptop. Often vital, these datasets contain small nuggets of information that can cripple the project if they are not included.

      End users may also manage other internal data sources, such as SharePoint lists or third-party applications that came with database backed during installation. In the case of the latter, the hosting department does not even realize what they have installed. In some cases these are common back ends, and others may require custom drivers to access the data.

      Using External Data Sources

      The data may also come from an external source via a web service, an OData feed, or even a hosted RDBMS (SQL Server or Oracle). If the source is a web service or OData feed, the data is typically accessed via a web URL. The consumption, on the other hand, may require some custom interface that parses and displays the data in a fashion meaningful to end users. Developers may overlook this data because no one on the team knows it exists; the same may apply to the hosted databases. As a result, this further heightens the need to involve end users because they may be the only people who know it's there and needed.

      Using a Data Warehouse (or Not)

      For the experienced business intelligence developer, developing a business intelligence solution without a data warehouse may seem absurd. However, with today's savvy end users, readily accessible data is no longer an option; it is a requirement. Therefore, nightly refreshes of data is becoming a thing of the past. And as a result, including a data warehouse as part of a business intelligence project is now optional.

      Traditionally, a data warehouse is loaded at some time interval – daily, weekly, and some even monthly. Depending on the organization the time period may be longer. For example, some colleges or universities load data into a data warehouse only at the end of the semester. As data needs become more stringent, the periods of latency between live data and analytical data have become smaller and smaller, presenting challenges that are often difficult to overcome. The primary challenge is moving data from the source systems to the data warehouse, which leads to the question: Is a data warehouse required?

Think back to Chapter 1, specifically to Figure 1-1, which we're showing again in this chapter (see Figure 2.1). This figure depicts an illustration of reporting from multiple data sources. How can you create a single report to reference multiple desperate sources? Which tool would you use? A few may accomplish this task, and because this is a Microsoft-focused book, the tool that comes to mind is Power Pivot. Power Pivot is an Excel add-in that creates an in-memory semantic model based on a plethora of data sources. Figure 2.2 displays an abbreviated list of data sources that possibly source the data warehouse.

image

Figure 2.1 Reporting against disparate data sources

image

Figure 2.2 Data sources that source a data warehouse

      With that in mind, and because this feature is available, you might ask yourself: Why should I develop a data warehouse? Couldn't data be accessed directly from the source, modeled, and then reported against? The answer is: absolutely. But in most cases the data has not been validated so that it could meet the needs of every aspect of the business, or possibly two systems exist that store similar data. Which set should you use for reporting? Another possibility is that a value is calculated one way by one set of users and another way by another set. Which means of calculation is the correct one? This is where the ETL process becomes a significant part of your business intelligence solution.

      NOTE

      Although this book does not focus on loading the data warehouse using SQL Server Integration Services (SSIS), please do not discount the importance of an ETL solution. The Microsoft Business Intelligence 24-Hour Trainer provides an SSIS section that provides a good starting point and overview.

      During the ETL process, the data is extracted from the original data sources, transformed into a format or formats that meet the business requirements, and finally loaded into a central repository (data warehouse). You can use the repository as a direct source for reporting or as a semantic model. Regardless of the approach, leveraging a single source for either makes the process of obtaining and visualizing data simpler for any individual or group that needs to access the data.

      Implementing and Enforcing Data Governance

The СКАЧАТЬ