What is data warehousing?
Data warehousing is the electronic storage of a large amount of information by a business or organization. Data warehousing is a vital component of business intelligence that uses analytical techniques on business data.
The concept of data warehousing was introduced in 1988 by IBM researchers Barry Devlin and Paul Murphy. The need to store data has evolved as computer systems have become more complex and handle increasing amounts of data. A key work on data warehousing is “Building the Data Warehouse” by W. H. Inmon, which was first published in 1990 and has been reprinted several times since.
How data warehousing works
Data warehousing provides a better understanding of a company’s performance by comparing consolidated data from several heterogeneous sources. A data warehouse is designed to perform queries and analyzes on historical data derived from transactional sources.
Once the data has been incorporated into the warehouse, it does not change and cannot be changed because a data warehouse performs analyzes on events that have already occurred focusing on data changes as the time. Stored data must be stored securely, reliably, easy to recover and easy to manage.
Certain steps are necessary to create a data warehouse. The first step is data extraction, which involves collecting large amounts of data from multiple source points. Once the data is compiled, it goes through data cleaning, the process of finding errors in the data and correcting or excluding the errors found.
The cleaned data is then converted from a database format to a warehouse format. Once stored in the warehouse, the data goes through sorting, consolidation, summary, etc. to be more coordinated and easier to use. Over time, more data is added to the warehouse as multiple data sources are updated.
Key points to remember
- Data warehousing is the electronic storage of a large amount of information by a business or organization.
- A data warehouse is designed to perform queries and analyzes on historical data derived from transactional sources for business intelligence and data mining purposes.
- Data warehousing provides a better understanding of a company’s performance by comparing consolidated data from several heterogeneous sources.
Special considerations: data mining
Businesses can store data for exploration and data mining, looking for patterns of information that will help them improve their business processes. A good data warehousing system can also make it easier for different departments within a company to access other people’s data.
For example, a data warehouse can make it easy for a business to assess sales team data and help make decisions about how to improve sales or streamline service. The company could choose to focus on the spending habits of its customers to better position its products and increase sales.
With data warehousing, the business can collect historical data on customer spending for the past 20 years, for example, and run analyzes on that data. The information obtained could give an overview of the preferences of its consumers; time of day, month or year with higher sales; or customer who spent the most for the year.
Efficient data storage and management are also what make processes such as initiating travel reservations and using ATMs.
The data mining process is broken down into five stages:
- Organizations collect data and upload it to their data warehouses.
- They then store and manage the data, either on internal servers or in the cloud.
- Business analysts, management teams, and IT professionals access data and determine how they want to organize it.
- Application software then sorts data based on user results
- Finally, the end user presents the data in an easy-to-share format, such as a graph or table.
Data warehousing and databases
A data warehouse is not necessarily the same concept as a standard database. A database is a transactional system that is configured to monitor and update data in real time so that only the most recent data is available. A data warehouse is programmed to aggregate structured data over a period of time. For example, a database may have only the most recent address of a customer, while a data warehouse may have all the addresses in which the customer has lived in the past 10 years.