I'm

Le VAN LONG

BI Developer/Data Engineer
Hero Image


My technical skillset

Programming

Competent in Python programming, focus on data processing and the development of data pipelines (CLI Syntax).

Machine Learning

Have knowledge about data preprocessing, building machine learning model for predicting outcome and evaluate the models.

Data Analysis

Have abilities to process data by programming languages above, visualize them with Redash, Looker Studio,PowerBI, Excel,... provide answers to business questions

Big data

Have basic knowledge in working with big data. Used Hadoop and Apache Spark in school projects

Data pipeline

Skilled in developing ETL pipelines using Python and modern tools such as Airbyte and n8n, ensuring data quality and integrity.

Data querying

Proficient with query languages such as: SQL, MySQL, MDX, AQL... can perform data processing in multi types of data (csv, json, txt,...) and multi database types (SQL, NoSQL)

What I've learned

University & Personal projecs

(Check the link to go to my github respository)

September 2021

Distributed No-SQL project Github

Arango DB

Built an ArangoDB noSQL distributed database on Google Cloud Platform

March 2022

House Price Prediction Github

Python Tkinter

Built an app with machine learning to predict house price based on a train dataset with Python Tkinter.

April 2022

Decision supporting system Github

Python-Jupyter Notebook

Built a job recommendation system with content-based filtering algorithm.

May 2022

Data Warehouse and OLAP Github

2nd Semester of 2021-2022

Process Multi-dimensional Data in cube with MDX

October 2022

Social Network Github

Python Jupyter Notebook

Do community detection and clustering in Airline Network between countries.

November 2022

Data Mining Project Github

Python-Jupyter Notebook

Predict football game outcome using machine learning algorithm on Jupyter Notebook

April 2023

Cloud Computing Data Analysis Github

Azure Synapse

Built and End to End Credit Card Fraud detection project started by ingesting data into Azure Synapse, do some machine learning and visualize the result with PowerBI

April 2023

Big Data projectGithub

Apache Spark on Google Cloud Platform

Use Spark RDD to do "Expected goal prediction" in football. Manually programmed the K-nearest neighbor algorithm without using spark.mllib libraries.

March 2024

Data Pipeline from Multiple Source to BigQuery Github

Python, Gitlab CI, BigQuery

Built Data Pipeline from multiple data source to GCP BigQuery