Portfolio 作品集
Real-world project experience in data engineering, including ETL pipeline design and data visualization. 資料工程領域的實際專案經驗,包含 ETL 流程設計、資料視覺化

Real Estate Price Registration Visualization 不動產實價登錄視覺化
👆 Interactive Dashboard: You can explore the data by clicking, dragging, and using the filters below 👆 互動式儀表板:您可以點擊、拖曳和使用下方的篩選器來探索資料
1. Data Collection and Web Scraping 1. 資料收集與爬蟲
Developed Python web scraping program to obtain real estate transaction data from the price registration system. Implemented coordinate conversion functionality to transform addresses into latitude and longitude coordinates. 使用 Python 開發爬蟲程式,從實價登錄系統獲取房地產交易資料。 實作座標轉換功能,將地址轉換為經緯度座標。

# 座標轉換功能說明
def convert_address_to_coordinates(address):
"""
將地址轉換為經緯度座標
使用內政部地政司地址轉換 API
回傳格式:[經度, 緯度]
"""
# API 端點設定
api_url = f"https://api.nlsc.gov.tw/idc/TextQueryMap/{address}"
# 設定請求標頭
headers = {
"Content-Type": "application/xml",
"Referer": "victor.proagent.com.tw"
}
try:
# 發送 API 請求
response = requests.get(api_url, headers=headers)
response.raise_for_status()
# 解析 XML 回應
root = ET.fromstring(response.text)
coordinates = root.find('.//LOCATION').text
# 回傳經緯度陣列
return coordinates.split(",")
except requests.exceptions.RequestException as e:
print(f"地址轉換失敗: {e}")
return None
This function converts Taiwanese addresses into latitude and longitude coordinates using the Ministry of the Interior's address conversion API. It's a crucial part of our data processing pipeline for geographic visualization. 此函數使用內政部地政司的地址轉換 API,將台灣地址轉換為經緯度座標。 這是我們地理視覺化資料處理流程中的重要組成部分。
2. Data Cleaning and Preprocessing 2. 資料清洗與前處理
Clean and standardize the raw scraped data, handle missing values and outliers, and perform necessary data transformations and encoding. Main tasks include: 對爬取的原始資料進行清洗和標準化,處理缺失值、異常值, 並進行必要的資料轉換和編碼。主要工作包括:
- Remove duplicate and invalid data 移除重複和無效資料
- Standardize date and amount formats 統一日期和金額格式
- Handle special characters and encoding issues 處理特殊字元和編碼問題
- Supplement missing geographical information 補充缺失的地理資訊
3. ETL Process Implementation 3. ETL 流程實作
Design and implement ETL processes to store processed data in SQL database. Establish automation workflow to ensure data timeliness and consistency. 設計並實現 ETL 流程,將處理後的資料存入 SQL 資料庫。 建立自動化流程,確保資料的即時性和一致性。
4. Tableau Visualization Design 4. Tableau 視覺化設計
Create interactive dashboards using Tableau, including: 使用 Tableau 建立互動式儀表板,包含:
- Housing price trend analysis 房價趨勢分析
- Geographic distribution map 地理位置分布圖
- Housing type and price relationship analysis 房型和價格關係分析
- Transaction volume trend 交易量變化趨勢
5. System Maintenance and Optimization 5. 系統維護與優化
Responsible for maintaining and optimizing the entire system, including data update mechanisms, performance tuning, and continuous improvement of visualization based on user feedback. 負責維護和優化整個系統,包括資料更新機制、效能調校, 以及根據使用者回饋持續改進視覺化呈現方式。

Presale Registration ETL System 預售屋實價登錄 ETL 系統
1. Automated Data Collection 1. 自動化資料收集
Developed an automated web scraping system using Selenium and undetected-chromedriver to collect presale registration data from the government website. The system handles anti-bot measures and ensures reliable data extraction. 使用 Selenium 和 undetected-chromedriver 開發自動化爬蟲系統, 從政府網站收集預售屋實價登錄資料。系統能夠處理反爬蟲機制, 確保資料提取的可靠性。
def initialize_driver():
options = webdriver.ChromeOptions()
options.add_argument("--disable-notifications")
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument("--headless")
driver = uc.Chrome(options=options)
driver.get('https://lvr.land.moi.gov.tw/')
return driver
2. Data Processing Pipeline 2. 資料處理流程
Implemented a comprehensive data cleaning and transformation pipeline that includes: 實作完整的資料清洗和轉換流程,包含:
- Standardization of building materials and property types 建築材料和物業類型的標準化
- Calculation of property areas and parking spaces 計算物業面積和停車位
- Address normalization and coordinate conversion 地址正規化和座標轉換
3. Airflow DAG Implementation 3. Airflow DAG 實作
Designed and implemented an Airflow DAG to orchestrate the ETL process with: 設計並實作 Airflow DAG 來協調 ETL 流程,包含:
- Scheduled data collection (1st, 11th, 21st of each month) 排程資料收集(每月 1、11、21 日)
- Automated data cleaning and transformation 自動化資料清洗和轉換
- Coordinate data enrichment 座標資料補充
with DAG(
'regist_presale',
default_args=default_args,
description='ETL process for presale data',
schedule_interval='0 0 1,11,21 * *',
start_date=datetime(2024, 11, 1),
catchup=False
) as dag:
task_etl >> task_clean >> task_coords
4. Database Integration 4. 資料庫整合
Implemented MySQL database integration for storing processed data with: 實作 MySQL 資料庫整合來儲存處理後的資料,包含:
- Efficient data storage and retrieval 高效的資料儲存和檢索
- Duplicate detection and handling 重複資料檢測和處理
- Data versioning and history tracking 資料版本控制和歷史追蹤
5. System Monitoring and Maintenance 5. 系統監控和維護
Established monitoring and maintenance procedures including error handling, logging, and performance optimization to ensure system reliability. 建立監控和維護程序,包含錯誤處理、日誌記錄和效能優化, 以確保系統的可靠性。