Lucas Quemelli

About me

I am a Chemical Engineer and a Master of Science in Chemical Engineering who jumped over the Data Science world. I have been working with predictive modeling, data analysis, descriptive statistics and Machine Learning in the academic research and in the industry. In one of my experiences, I collaborated with Afterverse, which is a game studio and the PK XD founder. In my last experience, I collaborated with MeuTudo: a Brazilian fintech whose purpose is make easier Brazilian people's financial life.

Skills

Programming Languages and Database

Python applied to data analysis.
Webscraping with Python.
SQL for data extraction, cleaning and analysis.
R for statistical modeling.
Google BigQuery.
MySQL Database.
SQLite Database.
AWS Redshit Database.
Oracle Database.
Pyspark for data transformation.

Statistics and Machine Learning

Descriptive statistics.
Regression, Classification and Clusterization.
Algorithm perfomance metrics: RMSE, MAE, MAPE, K-Cross Validation and R².
Machine Learning packages: scikit-learn, SciPy, Pycaret, Keras, Tensorflow and H2O.

Data Visualization

Google Data Studio/Looker Studio.
Tableau.
Matplotlib.
Seaborn.
Plotly.
Dashboards.
Word Cloud.

Software Engineering

Apache Airflow.
Git.
GitHub.
Virtual Environment.
Streamlit.
Heroku.
Docker Container.

Professional Experiences

4 months as a Data Scientist at Vert. (∇)

I have been working with Artificial Intelligence (AI) and Natural Language Processing (NLP).

11 months as a Data Scientist at meutudo. (t.)

I worked with CRM, Product and CS fields in order to optimize marketing campaings, to find people more prone to sign a contract and to optimize taxes values. I mainly worked with data analysis, Machine Learning and prompt engineering to find better business opportunities. Daily, I used languages such as SQL and Python, and also tools such as AWS Redshit, Oracle Database, Anaconda, Tableau, Docker Container and GitHub. The main activities and projects were:

Analysis of user acquisition performance such as conversion, costs, profit and ROI.
Analysis of user characteristics to find users more prone to sign a contract.
Creation of reports and generation of insights.
Propensity analysis.
Results presentation.
AdHoc analysis.
Predictive modelling.
Machine Learning models: classification, clusterization and regression.
Prompt Engineering.
Model deploy.

9 months as a Data Analyst at Afterverse (AV)

The main concepts used in this position were: data analysis, descriptive statistics, hypothesys test, predictive modeling, Machine Learning, optimization and data engineering. I mainly worked with the creation and the automation of pipelines which were used to make dashboards with KPIs such as revenue, installs, ARPU, LTV, ARPDAU, user retention, IPM, CPI, DAU, WAU and MAU. Daily, I used languages such as SQL and Python, and also tools such as BigQuery, Data Studio, Airflow and GitHub. Below, the list of principal activities:

Creation of pipelines, layers and tables.
Generation and maintenance of KPIs.
Creation of reports.
Creation and maintenance of dashboards.
Validation tests.
Support.
Documentation.
1 Machine Learning project to predict LTV.

18 Data Science projects

Data solutions to business problems similar to those faced in real companies. Public datasets were used to perform the data analysis and modelling as well as webscraping to collect data. The projects steps are from business problems to algorithm deploy in cloud environment. These projects and the concepts used may be seen in the next section.

Data Science projects

A/B Testing

Eletronic House is an e-commerce enterprise. We want to perform A/B testing to know what website layout will result in better conversion. Performing A/B testing allows us to scientifically compare different website layouts to determine which one leads to better conversion rates.

Languages & Tools

Statistical Analysis.
CSV files.
Python, Pandas, Numpy, Scipy and Statsmodels.
Anaconda and Jupyter Notebook.
Git and GitHub.
A/B Testing.

Topic Modeling

Topic modeling is a technique used to discover the underlying structure in a set of text documents. It is commonly used in text analysis and text mining to identify key topics or themes in a corpus of documents. The purpose was to scrape, analyze, compare and model texts from websites.

Languages & Tools

Machine Learning.
CSV files.
Webscraping.
Python, Pandas, Numpy, Matplotlib, Spacy, NLTK, BERTopic and WordCloud.
Anaconda and Jupyter Notebook.
Google Colab.
Git and GitHub.
Natural Language Processing (NLP).
Topic modeling.

Customer Satisfaction Prediction

Customer satisfaction measures how products and services offered by companies and organizations meet customer expectations. This project aims to identify dissatisfied customers and take proactive steps to improve their happiness before it's too late.

Languages & Tools

Machine Learning.
CSV files.
Python, Pandas, Numpy, Matplotlib, Seaborn, SciPy, Sklearn and WordCloud.
Anaconda and Jupyter Notebook.
Git and GitHub.
Classification.
Sentiment Analysis.
NLP.

Credit Default Risk Prediction

To work with credit in the Finance Industry, firstly we must know: there are (1) a lender and (2) a borrower. The borrower repays the lender at a moment. The purpose was to predict loan defaulters and minimize the risk of loss on the basis of credit history, employment, and demographic data.

Languages & Tools

Machine Learning.
CSV file.
Python, Pandas, Numpy, Matplotlib, Seaborn, SciPy, Sklearn and Pycaret.
Anaconda and Jupyter Notebook.
Git and GitHub.
Classification.

Netflix Stock Price Prediction

Stock Price Prediction using machine learning is the process of predicting the future value of a stock traded on a stock exchange for reaping profits. The purpose was to predict Netflix (NFLX) stock price based on daily data over 3 years.

Languages & Tools

Machine Learning.
CSV file.
Python, Pandas, Numpy, Matplotlib, Tensorflow and Keras.
Anaconda and Jupyter Notebook.
Git and GitHub.
Regression.
Time series.

Credit Risk Assessment

Credit default risk is the chance that companies/individuals cannot make the required payments on their debt obligations, which can lead to a possibility of loss for a lender. The purpose was to predict loan defaulters and minimize the risk of loss on the basis of credit history, employment, and demographic data.

Languages & Tools

Machine Learning.
CSV file.
Python, Pandas, Numpy, Matplotlib, Seaborn, SciPy, Sklearn and Pycaret.
Anaconda and Jupyter Notebook.
Git and GitHub.
Classification.

Tesla Stock Price Prediction

Tesla, Inc. is an American electric vehicle and clean energy company based in Palo Alto, California. The purpose was to predict Tesla (TSLA) stock price based on daily data over 5 years.

Languages & Tools

Machine Learning.
CSV file.
Python, Pandas, Numpy, Matplotlib, ARIMA, Tensorflow and Keras.
Anaconda and Jupyter Notebook.
Git and GitHub.
Regression.
Time series.

Cardio Catch Disease

Cadio Catch Disease is a company whose business model is detecting heart disease in the early stages. The company offers an early diagnosis of cardiovascular disease for a certain price. Currently, the diagosis has been made manually. The purpose of this project was to build a model to predict heart disease to improve diagosis precision.

Languages & Tools

Machine Learning.
Excel.
Python, Pandas, Numpy, Matplotlib, Seaborn, SciPy, Sklearn and Boruta.
Anaconda and Jupyter Notebook.
Git and GitHub.
Classification.

Customer Propensity to Purchase

Our Client is an early-stage e-commerce company selling various products from daily essentials (such as Dairy & vegetables) to high-end electronics and home appliances. The purpose of this project was to build a model to predict the purchase probability of each user in buying a product with the help of a propensity model.

Languages & Tools

Machine Learning.
Excel.
Python, Pandas, Numpy, Matplotlib, Seaborn, SciPy, Sklearn and Boruta.
Anaconda and Jupyter Notebook.
Git and GitHub.
Classification.

Airbnb User Destination

Airbnb is an American company which operates an online marketplace focused on short-term homestays and experiences. Its business model allows anyone to offer or book accommodations around the world. This is a Machine Learning project and the purpose was to forecast country destination of the Airbnb users.

Languages & Tools

Machine Learning.
Excel.
Python, Pandas, Numpy, Matplotlib, Seaborn, SciPy, Sklearn and Boruta.
Anaconda and Jupyter Notebook.
Git and GitHub.
Classification.

Rossmann Store Sales

Rossmann is a real German drugstore which is available in many European countries. This is a Machine Learning project and the purpose was to forecast sales revenue of the next 6 weeks for Rossmann.

Languages & Tools

Machine Learning.
Excel.
Python, Pandas, Numpy, Matplotlib, Seaborn, SciPy and Sklearn.
Anaconda and Jupyter Notebook.
VSCode.
Git and GitHub.
Regression.
Time Series.

House Rocket insight

House Rocket is a fictitious company whose business model is purchase and sale of real estate. This is a insight project and the purpose was to find the best business opportunities in the real estate market and maximize the company's revenue.

Languages & Tools

Libre Office.
Python, Pandas, Numpy, Matplotlib and Seaborn.
Anaconda and Jupyter Notebook.
PyCharm.
Git and GitHub.
Streamlit and Heroku Cloud.

House Rocket Machine Learning

House Rocket's CEO asked for a Machine Learning model to make better decisions in their real estate business model. The purpose was to determine the best business opportunities in the real estate market and maximize the company's revenue using Machine Learning modeling.

Languages & Tools

Libre Office.
Python, Pandas, Numpy, Seaborn, Plotly, Matplotlib, SciPy, Sklearn, Tqdm and Dateutil.
GitHub.
Google Colab.

Star Jeans

Star Jeans is a fictitious American enterprise whose business model is the sale of jeans by B2C ecommerce. Eduardo and Marcelo are Brazilian businessmen and decided to build a jeans company in the United States of America. The initial idea is the sale of jeans to men.

Languages & Tools

Webscraping.
Python, Pandas, Numpy, Seaborn, requests and BeautifulSoup.
Google Colab.
VSCode.
Git and GitHub.

Telecom Customer Churn

Customer churn is one of the essential metrics that every business must evaluate to grow. The purpose was to build a machine learning model to help predict customers likely to churn and facilitate taking business actions to reduce the churn.

Languages & Tools

Classification.
Microsoft Excel.
R.
Neural Network and Keras.
RStudio and Quarto.
GitHub.

Artificial Neural Network (ANN) modeling

I used ANN to predict the acetic anhydride selectivity by the maximization of its yield and the minimization of steam residue. Using predictive modeling in this case is quite profitable because reduce feedstock waste and better manegement of the final product yield.

Languages & Tools

Microsoft Excel.
R, readxl and neuralnet.
RStudio.
GitHub.

Data Pipeline with Apache Airflow and Apache Spark

The process of extracting, transforming and loading data from twitter's API was automated by the creation of a data pipeline. The pipeline was created using Airflow and Spark. The extracted data was analyzed to the generation of insights.

Languages & Tools

Libre Office.
Python, Pandas, Numpy and Matplotlib.
Google Colab.
PyCharm.
Apache Airflow and Apache Spark.
Git and GitHub.

Covid-19 Data Analysis

This is a data analysis project in order to explore COVID-19 world data using SQL with SQLite and DBeaver. We treated and explored the datasets to create a dashboard with Tableau and then deployed it online.

Languages & Tools

DBeaver.
SQL.
SQLite.
Excel.
Tableau.
Git and GitHub.

Get in touch

lucasquemelli@gmail.com