An overview of data warehouse design, architecture and concepts
Picture a warehouse. Not an industrial warehouse with forklifts driving through endless rows of shelves filled with dusty pallets of unknown goods. Picture a modern warehouse, the kind Amazon uses to fulfill it's over 1 billion orders a year. The size is just as big as the old industrial warehouses, but a few things stand out: it's clean, automated and incredibly efficient.
A Data Warehouse is analogous to this idea of a modern warehouse. The goods, in our case "data", are brought into the warehouse from various sources. Instead of boxes of toothbrushes, headphones and garden tools, a Data Warehouse contains data from user profiles, transactions, browser history, marketing campaigns, staffing history, call volume, etc. The various forms of data are organized within the warehouse such that any of it can be gathered quickly for analysis.
The end result is a large collection of a variety of data that organizations can analyze to support management decistion-making.
As noted by William Inmon, data warehouses are integrated, nonvolatile, time variant and subject oriented systems.
Since data is gathered from a variety of sources, inconsistencies often need to be fixed. Once the data is stored consistently, it is much easier to extract meaningful reports about an organization's operations.
One of the reasons data warehouses are often so large is because once data is entered, it's never removed. Nonvolatile data doesn't change and new data is regularly added.
Since the data is nonvolatile, and new data can be added at any time, all data must have some sort time associated with it to prevent conflicts and allow for reporting on historical business trends.
Since the goal of a data warehouse is analytics (not processing transactions or other time sensitive tasks), the data can be structured in databases according to the subject matter.