project // econometrics // multiple linear regression
Beijing Housing Price Prediction (MLR)_
An econometrics project that models Beijing housing prices with Multiple Linear Regression using 300k+ records (2011-2017). The workflow focuses on interpretability, diagnostics, and statistically defensible refinement.
一个基于 30 万+ 北京住房样本(2011-2017)的多元线性回归研究项目,重点在可解释建模、 诊断检验与可复现的数据清洗流程,而非黑盒预测。
R
Multiple Linear Regression
VIF + ANOVA
Residual Diagnostics
Role
Researcher & Model Builder
End-to-End Analysis
Scope
Econometric Modeling / Feature Analysis
Data
Lianjia Beijing (2011-2017)
What I Built
lifecycle // planning → implementation → deployment
- • Built a full preprocessing pipeline (encoding cleanup, outlier filtering, missing-value handling) over 300k+ housing observations.
- • Constructed transformed MLR models (log/roots), evaluated multicollinearity via VIF, and refined variable sets using ANOVA partial F-tests and subset selection.
- • Validated assumptions with residual/fitted and QQ diagnostics, then documented interpretable factor effects on price (rooms, subway, build year, community context).
- • 构建了完整预处理流程(编码清洗、异常值过滤、缺失值处理),用于 30 万+ 北京住房样本。
- • 在 MLR 建模中进行对数/平方根变换、VIF 多重共线性检验,并结合 ANOVA 偏 F 检验与子集选择完成模型精化。
- • 通过残差图、QQ 图等诊断验证模型假设,并输出对关键价格影响因素的可解释结论。
Tech Stack
core tools
R / Tidyverse
cleaning / transforms
car + leaps + broom
vif / subset / diagnostics
ggplot2 + GGally
residual / qq / model plots
Project Links
source + article