Objective
This comprehensive data warehousing project uses dimensional modeling to examine the relationship between various economic variables and Gross Domestic Product (GDP) to gain insights into national economic growth factors. The study draws upon a comprehensive dataset compiled from reputable sources, including the U.S. Bureau of Economic Analysis (BEA), Bureau of Labor Statistics (BLS), Federal Reserve Economic Data (FRED), and Congressional Budget Office (CBO). The analysis aims to find patterns, trends, and associations by applying correlation, descriptive statistics, and visualization techniques on the organized data.
Methodology
Data preparation included aggregating the data from separate sources and taking the resulting thirty-four tables to be reduced to nine by merging them across the categories of consumer price index (CPI), federal reserve debt (FED_DEBT), federal reserve rate (FED_RATE), gross domestic product (GDP), mortgage delinquency rates (MORT), savings deposits (M2), personal consumer expenditures (PCE), unemployment rates (UNEMP), and median sales price of houses sold in the United States (MSPUS). The merging operation was based on the common column of RECORD_DATE, which provided a consistent timeline across all tables.
The dimensional model (featured on bottom) features dimension tables surrounding the central fact table with each dimension table linked to the fact table by date. This star schema design simplifies querying and provides fast data retrieval for complex queries. The data cubes represent the multidimensional views of this data, facilitated by the use of ROLLUP, CUBE, and other aggregate functions
Conclusions
Correlation between Unemployment Rate and Federal Interest Rate
One result from our exploratory data analysis was the discovery of a moderately strong negative correlation (-0.611) between the unemployment rate and the federal interest rate. This suggests that as the unemployment rate increases, the federal interest rate tends to decrease, and vice versa. The relationship between unemployment and interest rates in reality is influenced by a multitude of factors, including the policies of the Federal Reserve, which typically lowers interest rates during periods of high unemployment to stimulate economic activity.
This correlation was efficiently computed through our dimensional model by joining the FACT_DATA table with the UNEMP and FED_RATE dimension tables. This demonstrates the power of dimensional modeling and its capacity to generate meaningful insights.
Benefits of Pre-Aggregation
We utilized the power of pre-aggregated data with our quarterly rolled-up tables, such as "CPI_Q", "GDP_Q", etc. An example to highlight is the simplicity of computing averages over the year when data is pre-aggregated by quarter. Without pre-aggregation, we'd need to implement complex calculations to average over the correct date intervals ourselves. The usage of pre-aggregated tables substantially enhances the performance of queries and makes it easier to perform temporal analysis.
We utilized the power of pre-aggregated data with our quarterly rolled-up tables, such as "CPI_Q", "GDP_Q", etc. An example to highlight is the simplicity of computing averages over the year when data is pre-aggregated by quarter. Without pre-aggregation, we'd need to implement complex calculations to average over the correct date intervals ourselves. The usage of pre-aggregated tables substantially enhances the performance of queries and makes it easier to perform temporal analysis.
Skills
- Dimensional modeling
- Data collection
- Extract-transform-load (ETL) on large datasets
- Query optimization
- Visualization
- Data collection
- Extract-transform-load (ETL) on large datasets
- Query optimization
- Visualization
