Projects

Sentiment Analysis

Electronic Direct Mail Precision Marketing

Period: 2022/02 to 2022/06 (5 months)

Unit: Software Development Club, National Cheng Kung University & Kdan Mobile

Summary:

Email marketing team face a persistent challenge in reducing subscriber churn, yet most approaches rely on intuition rather than data-driven insight into what specific content patterns actually trigger unsubscribes. The team analyzed EDM (Electronic Direct Mail) campaign data by combining three methods: frequency-based context analysis to identify high-risk subject line patterns, a Logistic Regression and XGBoost model using subscriber attributes (app, email domain, spam complaint history) to predict unsubscribe likelihood, and a BERT text classification model to score any new subject line’s churn risk before sending. The context analysis revealed that recurring templates such as “Tips of [Month],” “Tips & Tricks,” discount-heavy exclamation-point subject lines, and “New + !” announcements were disproportionately associated with unsubscribes , while the XGBoost model achieved 85% accuracy and the BERT model reached 93% F1-score even when trained on only 20,000 records. Together, these tools give content creators actionable guardrails — both a set of subject line templates to avoid and a real-time scoring system — that could meaningfully reduce unsubscribe rates if integrated into the campaign workflow before emails are sent.

Micron Summer Internship, Business Planning Intern

Period: 2025/07/01 to 2025/08/31 (2 months)

Unit: Project Management Office, Micron Inc, Taiwan.

Summary:

Business planning teams at semiconductor companies like Micron face a fragmented workflow — production forecasting data, internal documentation, and personal task tracking are typically siloed across separate tools, forcing employees to context-switch constantly and making it hard to surface relevant internal resources quickly. BPMia, built during a summer internship under Micron Taiwan’s Project Management Office, addresses this by combining three capabilities into a single Streamlit web app: interactive wafer production forecasting visualization , a Retrieval-Augmented Generation (RAG) agent powered by Google Gemini that answers questions grounded in internal documents, and a personal action items tracker . The result is a unified internal productivity tool that lets business planners query production trends, locate internal portals and resources through natural language, and manage follow-up tasks — all within one interface. While the project is scoped as an intern prototype rather than a production system, its architecture — modular pages , and document ingestion pipelines — provides a reusable foundation that future teams could extend with additional data sources or tighter integration into Micron’s existing planning infrastructure.

Computer Vision

Satellite Cloud Image prediction using ConvLSTM

Period: 2022/02 to 2022/06 (5 months)

Unit: Department of Geomatics, National Cheng Kung University

Summary:

Accurate short-term prediction of satellite cloud imagery is critical for weather forecasting, yet generating reliable future frames from spatiotemporal sequences remains a challenging deep learning problem. The team framed cloud movement as a spatiotemporal sequence prediction task and trained a ConvLSTM model — along with a Self-Attention-ConvLSTM variant — on two distinct datasets from the Himawari-8 satellite: Japan visible channel images and Taiwan infrared channel images, then systematically compared three loss functions (MSE, MAE, and SSIM) across four evaluation metrics (MSE, MAE, SSIM, and PSNR). ConvLSTM consistently outperformed Self-Attention-ConvLSTM in both visual quality and quantitative metrics, and while all three loss functions successfully captured the general direction of cloud movement in the Japan visible channel dataset, none dominated across all four evaluations — and the models struggled more noticeably with the Taiwan infrared channel data , where cloud movement trends were harder to learn. These findings suggest that loss function selection should be matched to the specific evaluation criterion and image type of interest, and that the added complexity of self-attention does not automatically improve spatiotemporal cloud prediction, pointing to the need for more targeted architectural choices depending on the satellite channel and geographic region.

MNISTMindBigData

Period: 2025/02 to 2025/06 (5 months)

Unit: Institute of Information Systems and Applications, National Tsing Hua University

Summary:

Classifying EEG signals into digit categories is an open problem in Brain-Computer Interface research — prior work on the MindBigData dataset has reported accuracies as low as 10–17%, largely due to high noise, low signal-to-noise ratio, and insufficient artifact removal . The project applied a multi-stage preprocessing pipeline — bandpass filtering (1–40 Hz), Artifact Subspace Reconstruction (ASR), and Independent Component Analysis (ICA) — to the 64,344-trial MindBigData-EP dataset, then trained both a CNN directly on cleaned EEG signals and an MLP on FFT-derived bandpower features across delta, theta, alpha, and beta bands. The CNN achieved 18.9% accuracy (F1: 0.18) while the MLP reached 12.4% (F1: 0.10), confirming that learning spatial-temporal patterns from cleaned raw signals outperforms handcrafted frequency features, though both remain well above chance (10%) on this notoriously difficult dataset. These results demonstrate that rigorous artifact removal meaningfully improves EEG decoding feasibility, while pointing to model architecture improvements, data augmentation, and larger multi-subject datasets as the most promising avenues for pushing classification accuracy further.