Marlabs  
  >>
Passionate about your results
  About Us      Services      Products    Industries  Partners    Careers    Awards News Contact Us 98

Marlabs Blog


Stay connected with the
worldwide business
community
Introduction to Data Warehouse Testing – Part 1 Monday, August 27, 2012

In my several years of DW/BI experience, one of the toughest aspects over the entire lifecycle has been testing quality of data in the DWH. What are the different types of testing we can do on the data in the warehouse and what are the strategies we need to consider for the same? How we can do this testing (including some proprietary frameworks we have built at Marlabs  to perform line-by-line testing for every row) is a subject for another day.

 

Data Completeness (can be tested with both designer data and full load)

One of the most basic tests of data completeness is to verify that all expected data loads into the data warehouse. This includes validating that all records in all fields and the full contents of each field are loaded. Strategies to consider include:

 

  • Comparing record counts between source data, data loaded into the ODS and data loaded into the warehouse. You might also want to consider the rejected/suspense records.
  •  Comparing the count of distinct values of key fields between source data and data loaded to the warehouse. You could also do a comparison of the distinct values themselves.
  • Utilizing a data profiling tool (Talend, http://www.talend.com, provides a free tool) that shows the type, range and distributions of data in a data set. This can be used during testing and in production to compare source and target data sets. The profilers can also identify defects from source systems that may be missed even when the data movement is correct.
  •  Ensure that each field is tested to its maximum length – for example if you have a varchar(10) field, make sure you test it with 10 characters, 11 characters, blank and null.
  • Testing the boundaries of each field to find any database limitations. For example, for a decimal field with a precision of 3 include values of -99 and 999, and for date fields include the entire range of dates expected (including dates in multiple formats).

 

Data Transformation (typically tested with designer data)

Often the real value of data in a warehouse lies in the transformation rules that are applied. Validating that data is transformed correctly based on business rules/logic can often be the most complex part of testing an ETL application. One typical method is to pick some sample records and eyeball to validate data transformations manually. This can be useful but requires manual testing steps and testers who understand the ETL logic. A combination of automated data profiling and automated data movement validations is a better long-term strategy. Here are some simple automated data movement validation techniques:

 

  • Create a set of scenarios of input data and expected outcomes and validate these with the customer. This is a good exercise to be performed during requirements and can also be used during testing.
  • Create a test bed with test data that includes all scenarios. Automate the process of populating data sets with from the scenarios sheet prepared in the previous step. More scenarios can be added and the testing automation will remain valid.
  • Use the results of the data profiling tool to compare range and distribution of values in each field between source and target data.
  • Validate correct processing of technical fields generated during ETL design such as flags and surrogate keys.
  •  Validate that data types in the warehouse are as specified in the design and/or the data model.
  • Set up data scenarios that test referential integrity between tables. This is especially true when you are testing behavior of child records that do not have a parent and when checking many to many relationships.

 

Data Quality (typically tested with designer data)

For comprehensive data quality testing, make sure to include as many scenarios as possible. Typically, data quality rules are defined during design, for example:

 

  • If a state is spelt wrongly, look up from the master list and correct it.
  •  Substitute null if a certain decimal field has nonnumeric data.
  •  Validate the city and state based on ZIP code.

 
Depending on the data quality rules of the application being tested, scenarios to test might include null key values, duplicate records in source data and invalid data types in fields (e.g. special characters in date field). Data quality rules applied to the data will usually be invisible to the users once the application is in production; users will only see whats loaded to the database. For this reason, it is important to ensure users are aware of how bad data is being handled. The data quality reports often present patterns and trends that hint at underlying structural issues. Quite often, the DQ reports are published before the reporting tables are fully processed. It might make sense to stop processing and publishing of reports in the number of data quality issues passes a certain threshold.

 

Performance and Scalability (typically tested with mock-up data)

As the volume of data in a data warehouse grows, ETL load times can be expected to increase and performance of queries can be expected to degrade. The aim of the performance testing is to point out any potential weaknesses in the ETL architecture and design, such as reading a file multiple times or creating unnecessary intermediate files. The following strategies will help discover performance issues:

 

  •  Load the database with peak expected production volumes (this is often much higher than current production data and some method of statistical extrapolation might need to be considered) to ensure that this volume of data can be loaded by the ETL process within the agreed-upon window.
  • Compare these ETL loading times to loads performed with a smaller amount of data to anticipate scalability issues.
  • The ETL processing time should also be tested module by module to understand potential bottlenecks.
  • If parallelism is present in the ETL design, test it fully to ensure that there are no memory bottlenecks.
  • Monitor the timing of the reject/suspended process and consider how large volumes of rejected data will be handled – often this leads to source tables being scanned fully twice and that deteriorates performance.
  • Perform simple and complex outer join queries to validate query performance on large database volumes. Identify acceptable performance criteria for each query and benchmark against this. This is especially true for ad-hoc reporting scenarios.

 

In part 2 of this article we will look at Integration Testing, Audit, Logging and Error Handling Testing, Deployment Testing, User-Acceptance Testing, and Regression Testing

 

This article by Rajesh Ramaswamy, Head of the BI/DW/Analytics practice at Marlabs, was first published at  siliconindia.com.

Posted by Rajesh Ramaswamy | No Comments
Key Considerations for Developing a Mobile Business Intelligence (BI) Strategy Monday, August 13, 2012

Mobile Business Intelligence (BI) has come a long way. From the time of receiving automated text messages that signal failure of a batch process or breach of a critical threshold, today we have arrived in the age of interactive BI content delivered via mobile devices.  Most organizations today are either planning for a mobile BI strategy or have already framed one. The mobile BI strategy could either be a subset of the overall mobile strategy for the organization or an independent piece.

 

Forrester claims that mobile BI is “no longer a nice-to-have” and BI will soon fully catch up with mobility. Gartner predicts that by 2013 33% of BI functionality will be consumed via hand-held devices.

 

What are the elements that need to be considered when drawing up your mobile BI strategy? How important are they?

 

Use Case

What are the use cases for your organization to adopt a mobile BI strategy? Most of the current usage of traditional BI is for strategic purposes by people residing in the upper echelons of the pyramid. Mobile BI to perform strategic analysis does not really make sense, unless there is a reason why your C-level executive cannot open his/her laptop. The real use cases for mobile BI lie in operational decision making. Gartner believes that “The biggest value is in operational BI information in the context of applications not in pushing lots of data to somebody’s phone”. For example, a sales person on the road may want to know the Next Best Product to be sold to a customer. A valid use case could also be in integrating device specific features in BI. For example, a door-to-door service agent may be advised on the order in which to make the house visits based on multiple parameters including proximity (GPS!), criticality, aging etc. Irrespective of the use case, it is important to be clear on it and set expectations appropriately.

 

The other important factor that helps finalize the use cases is by tracking the RoI. Mobile BI investments span the cost of devices as well as software, security, development and maintenance costs. So it is important to ensure that your use cases provide tangible returns. The RoI can be quantified as Revenue Enhancement, Margin Enhancement, Cost Reduction, Cost Avoidance or Capital Cost Avoidance. For example, if an iPad containing drug performance comparison across patient profile can help increase quality face time for a medical representative with a physician by x%, your sales head would surely be comfortable promising an x/5% increase in sales. That, by itself, could fund your entire project.

Target Audience

Typically, the 70% of users who don’t access traditional BI are the representative audience for mobile BI. Hence it is important to tailor your strategy to the new cross-section of people who will be your consumers. They may be less techno-savvy, more impatient to see results and have narrower but very specific needs. The real value of mobile BI is when users can fully interact with BI content delivered to mobile devices there needs to be a distinction from informative email or text messages.

 

Mobile BI should complement your existing BI solution it need not cover all the bases. A desktop dashboard is meant for deep analysis while a mobile BI is designed for quick and easy consumption. A mobile BI solution is meant for the mobile folks in your organization: executives, sales personnel, line managers on the shop-floor etc. Tailor your mobile BI solutions to suit the needs of these people (not those of your research analyst and financial accountant).

 

Standards

Don’t ignore existing devices when framing the mobile BI strategy and attempting to arrive at organization standards. BYOD (Bring Your Own Device)/heterogeneous systems have their drawbacks especially in terms of consistency, but the task of converting iOS fanatics to Apple (and vice-versa) is not a battle worth getting into. There might also be a challenge in utilizing device specific capabilities and this is another negative that needs to be considered. HTML5 is helping us bridge the divide between browser specific and device specific strategies.

 

 

Prioritize

There is a tendency in BI projects to say “we are just doing some reports on the mobile; let us roll it out in a couple of weeks”. Beware! Mobile BI projects are governed by the same fundamentals that traditional BI projects are – just even more stringent. Garbage in is still garbage out. Visualizations that don’t make sense will kill adoption. Security is even more critical. User Types are a lot more critical. The mobile BI strategy should therefore be planned and executed in a well-thought out manner and not rushed into. Ensure it doesn’t get fast tracked.

 

Expectations

A critical part of the mobile BI strategy is managing expectations. No one should expect feature parity with a traditional BI solution (drag and drop etc.) but there are some cool new things that can be integrated (GPS, gyroscope, touch screen etc.). Telling people what they can get and what they cannot is an integral part of the strategy via trainings etc.

 

Security

Security architecture needs to be revisited for mobile BI implementations and should be a critical part (if not the most critical part) of the strategy. Key business information needs to be delivered via mobile devices; else the utility of the BI project is compromised; however this carries security considerations that need to be carefully handled. All mobile BI projects transmit data from an internal firewall through a DMZ and via an external firewall before hitting the end device. It is critical that the mobile BI strategy lays out the approach for handling a whole gamut of security considerations including:

 

• Transmission Security
• Authentication
• Authorization
• Access
• Device Security

 

Miscellaneous- Bandwidth, UX and Design Considerations
Some of the other factors that need to be kept in mind when framing your mobile BI strategy include bandwidth (system and resource), mobile UX and design considerations (templates, interactivity, device features), ability to reuse existing investments, strategy to reuse design for both traditional and mobile BI etc.

 

Mobile BI is something that can only be ignored at your own peril; but don’t just jump into the waters without some thorough preparation.

 

This article by Rajesh Ramaswamy, Head of the BI/DW/Analytics practice at Marlabs, was first published at  siliconindia.com.

Tags: , , ,
Posted by Rajesh Ramaswamy | No Comments
Creating Apps for SharePoint and Office using “Napa” Office 365 Development Tools Monday, August 6, 2012

Microsoft recently introduced “Napa”, an Office 365 development tool set for building applications for Microsoft’s Cloud App Model for Office and SharePoint. A free web-based application, Napa allows users to build apps for Office and SharePoint within a browser and without the help of Visual Studio. The tools can be downloaded by signing up for the Microsoft Office 365 Developer Preview. Once installed, users can begin creating Office and SharePoint apps within their web browsers. For more advanced tools, users can seamlessly switch to the fully featured Visual Studio IDE and continue development.

 

Here’s an article by Shiju Varghese from the Marlabs Cloud and Mobile Center of Excellence (CoE) that explains how to install Napa Office 365 Development Tools onto your Office 365 developer site and build apps for SharePoint and Office using the Napa developer tools.

 

Click here to read the article on Building Apps with Napa Office 365 Development Tools

Tags: , , , , , ,
Posted by Marlabs | No Comments
 
  Blogger Profiles
 
 
 
 
  Linked in
 
 
 
 
  Marlabs on
Facebook
 
 
 
 
  Follow us
on Twitter
 
 
 
 
  Read our Feed