Deep Learning
Audio Segmentation
MLOps
Web Dev
MMaVVie is an application that has 2 main features: Audio Separation & Karaoke. The core technology is the music source segmentation model built on U-Net architecture. The model is called from the Streamlit web app by FastAPI endpoints. External APIs like Whisper & GeniusLyrics are also used for the Karaoke session.
NLP
Sentiment Analysis
MLOps
Data Validation
Web Dev
Built a NLP model for Sentiment Analysis of Amazon Kindle books reviews. Deployed the interface in a Streamlit app for users to interact with the model through FastAPI and PostgreSQL. Created Airflow jobs to ingest raw data and predict a batch of validated data. The pipeline is then monitored by a Grafana dashboard.
Machine Learning
EDA
Blogging
Raised questions and analyzed Starbucks marketing strategy to find the best promotional offer, using a few prediction models and performed hyperparameters tuning with Python & Scikit-learn. EDA, preprocessing, and modelling steps are explained in a Medium post.
Data Engineering
ETL
Machine Learning
Web Dev
To help organizations respond to disaster events faster and more accurate, using Python, I built an ETL pipeline to process the raw data, an ML pipeline to classify into categories, and a Flask application to input a new message and visualize the data.
Deep Learning
NLP
Machine Translation
Translated an English sentence to Vietnamese using data cleaned and trained by Seq2Seq model. Optimized predictions by using Attention mechanism and LSTM/GRU encoder-decoder.
Machine Learning
EDA
Blogging
Analyzed impacts of education and other features on American adults' salaries using Pandas & Seaborn. Defined who is more likely to have a better income through different prediction models by Scikit-learn.
Data Engineering
ETL
AWS
Built an ETL pipeline for Sparkify database hosted on Redshift by loading data from S3 to staging tables on Redshift and executed SQL statements to create the analytics tables.
Web Scraping
Data Wrangling
Used Selenium module to crawl more than 2000 companies web data of Tax Identification Number, Company Name and Address and wrote into an Excel file by Openpyxl package.
Business Inteligence
Data Cleaning
Cleaned over 120k of raw data and extracted valuable categories of the alternator market in Vietnam, visualized and gained multidimensional insights using Python and Power BI.
Web Scraping
Data Wrangling
Using Python, crawled data of 75k Vietnamese students attending the National Exam from a HTML website, cleaned raw data using string manipulation, analyzed collected data which was written in CSV file.
ERP
Automation
Standardized name rules for 12k products in company used in Microsoft Dynamics 365, created a Python automation tool for IT admin to import, edit, and manage products information.
ERP
Automation
Built a Python tool that automatically converts all item and product names to Vietnamese accent, and arrange them into an Excel template, efficiently helping Salesman create quotations and send them to customers.