Hello I am
Viet-Thai Nguyen

Welcome to my data playground

Music Source Separation
& Karaoke

Deep Learning Audio Segmentation MLOps Web Dev

MMaVVie is an application that has 2 main features: Audio Separation & Karaoke. The core technology is the music source segmentation model built on U-Net architecture. The model is called from the Streamlit web app by FastAPI endpoints. External APIs like Whisper & GeniusLyrics are also used for the Karaoke session.

Kindle Books Reviews
ML application

NLP Sentiment Analysis MLOps Data Validation Web Dev

Built a NLP model for Sentiment Analysis of Amazon Kindle books reviews. Deployed the interface in a Streamlit app for users to interact with the model through FastAPI and PostgreSQL. Created Airflow jobs to ingest raw data and predict a batch of validated data. The pipeline is then monitored by a Grafana dashboard.

Starbucks Promotional Offers Analysis

Machine Learning EDA Blogging

Raised questions and analyzed Starbucks marketing strategy to find the best promotional offer, using a few prediction models and performed hyperparameters tuning with Python & Scikit-learn. EDA, preprocessing, and modelling steps are explained in a Medium post.

Disaster Response
Pipeline

Data Engineering ETL Machine Learning Web Dev

To help organizations respond to disaster events faster and more accurate, using Python, I built an ETL pipeline to process the raw data, an ML pipeline to classify into categories, and a Flask application to input a new message and visualize the data.

English-Vietnamese
Machine Translation

Deep Learning NLP Machine Translation

Translated an English sentence to Vietnamese using data cleaned and trained by Seq2Seq model. Optimized predictions by using Attention mechanism and LSTM/GRU encoder-decoder.

Adult Census Income Prediction

Machine Learning EDA Blogging

Analyzed impacts of education and other features on American adults' salaries using Pandas & Seaborn. Defined who is more likely to have a better income through different prediction models by Scikit-learn.

AWS Data Warehouse
Pipeline

Data Engineering ETL AWS

Built an ETL pipeline for Sparkify database hosted on Redshift by loading data from S3 to staging tables on Redshift and executed SQL statements to create the analytics tables.

Tax Identification Number Crawling

Web Scraping Data Wrangling

Used Selenium module to crawl more than 2000 companies web data of Tax Identification Number, Company Name and Address and wrote into an Excel file by Openpyxl package.

Alternator Market Analytics

Business Inteligence Data Cleaning

Cleaned over 120k of raw data and extracted valuable categories of the alternator market in Vietnam, visualized and gained multidimensional insights using Python and Power BI.

University Entrance Exam Scores Analytics

Web Scraping Data Wrangling

Using Python, crawled data of 75k Vietnamese students attending the National Exam from a HTML website, cleaned raw data using string manipulation, analyzed collected data which was written in CSV file.

ERP Item Code
Automation

ERP Automation

Standardized name rules for 12k products in company used in Microsoft Dynamics 365, created a Python automation tool for IT admin to import, edit, and manage products information.

Project Quotation Generating Tool

ERP Automation

Built a Python tool that automatically converts all item and product names to Vietnamese accent, and arrange them into an Excel template, efficiently helping Salesman create quotations and send them to customers.