Kaci Kus | Portfolio

Automated Mineralogy (Spring 2021)

GitHub Repository

In-progress with Dr. Zane Jobe, Colorado School of Mines

Python

tkinter

Pandas

NumPy

Matplotlib

Tensorflow

Pillow

OpenCV

Semantic Segmentation

The goal of this project is to save TIME and MONEY by developing a machine learning model to identify mineralogy using only petrographic photographs.

Geologists cut rocks into 0.03 mm thick slices in order to determine their composition (mineralogy). This process is crucial in order to understand the origin/formation of the rock, and can also help to identify subtle differences between rocks that may look identical to the naked eye.

Example of thin sections (http://www.microscopy-uk.org.uk/mag/artfeb04/iwouslides2.html)

Typically, a petrologist is able to identify minerals based on their different features and color-changes under different light polarizations. An alternative method is to ship samples to a facility that has the proper instrumentation to shine light down onto the thin section and determine which mineral each grain is based on the spectrum of light reflected back. The first method is 1. time consuming and 2. requires years of expertise. The second method is automated and therefore potentially faster, however it can be very costly and inefficient as most facilities do not have the proper instrumentation to run this type of analysis in-house.

Thin section under plain-polarized light

Thin section under cross-polarized light

False-color mineralogy map produced by TIMA

For this project I developed a novel, semi-automated approach to aligning fine-grained images under different lighting conditions. The primary challenge I faced was that the immediate color-relationships between objects in the photos is not preserved under different light polarizations, therefore making fully-automated alignment nearly impossible. That being said, both plain-polarized and cross-polarized light examples are needed to differentiate mineral grains, and therefore important for building our model. Once image alignment was achieved and I was able to pre-process the necessary images for training a model, I built a CNN that would classify each pixel as one of 9 mineral grains.

Color Quantization Python Library (Initiated February 2021)

GitHub Repository // PyPI Documentation

Ongoing project to create an easy-to-use open-source library for color quantization and exploratory data analysis.

Python

Poetry

NumPy

Matplotlib

Pillow

OpenCV

Color Quantization

Kcquant (short for Kaci Color Quantization) is a passion project of mine to create an open-source library with useful functions to perform color quantization and exploratory data analysis on images.

Currently, kcquant has only 1 function, get_colors(), which takes 5 parameters. Future work will include 1. color quantization methods to reduce the number of pixel values, 2. methods to easily plot color frequencies and relationships within images, 4. methods to compare/overlay images, and 5. alignment methods.

Parameters:

img : image converted to np.array

figsize : Set the figure size of one row (containing columns number colors). Will default to (15,5).

columns : Number of pixel colors to display in a row. Will default to 5.

color : Denotes the pixel format of the input image. Will default to 'BGR'. If your image is already converted to 'RGB', set color='RGB'.

top : The number of most frequent pixel values to show. Will default to 10.

Example of kcquant.get_colors() being used.

Hashtag Recommender (Winter 2020)

GitHub Repository // Presentation

Completed project to improve social media engagement

Python

HTML/CSS

Flask

AWS

EC2

MobileNetV2 (CNN)

Pandas

NumPy

Selenium

Web-scraping

K-NN

Tensorflow

Word2Vec

Continuous Bag-of-Words

Skip-gram

Recommender system web-app deployed on AWS EC2 instance. To keep costs down, my recommender system will only be accessible during specific periods. Please refer to the demo video to see it in action!

Instagram has over 1 billion users, and over 500 million daily active users. As one of the most highly used social media apps, many people use it as a business tool for relatively "free" advertising. Moreover, if a user gains enough followers, they can even use their platform to earn a living through brand sponsorships.

FOLLOWERS = MONEY

The goal of this project was to build a tool that would take in any photo as input and return a list of related hashtags that is optimized to get the most engagement.

Recommender system workflow

I built a modular web-scraper to collect >5,324 posts from Instagram's discover page using hashtags related to: cats, technology, food, and travel. The recommender system works by using the pre-trained convolutional neural network (CNN) MobileNetV2 to extract deep features from each image and store those features in a database along with the hashtags used and number of likes associated with each post.

When a user uploads a new image to the recommender, deep features will be extracted. K-nearest neighbors will then be used to determine the 6 most similar photos in the database. All of the hashtags used on those 6 most similar photos will then be used to find the 50 most semantically similar hashtags from the entire database of >31,500 unique hashtags. Each hashtag was also given a rating to normalize (1) how often it is used, and (2) how many likes are typically associated with it.

The final recommendation is the top 10, 20, or 30 hashtags that are most similar to the image AND have the highest rating score.

Next-day Stock Price Predictor (Winter 2020)

GitHub Repository // Presentation

Completed project to predict whether a company's stock price will rise or fall the next day.

Python

Supervised Learning

Gradient Boosting

Random Forest

Support Vector Machines

Logistic Regression

K-NN

Pandas

BeautifulSoup

NumPy

NLTK

Sentiment Analysis

Predicting changes in the stock market is not a novel idea, and there is no perfect formula yet -- otherwise we'd all be rich! With that in mind, I wanted to take a slightly different approach to predicting stock-price changes that would be not only accurate, but also practical for swing trading (essentially buying at after-hours and selling the next day).

I coded a web-scraper to gather the 100 most recent news headlines taken from FINVIZ for companies in the S&P500 list. I then used the yfinance python library to collect the historical stock-price data for each day that I had news headlines for each company.

By combining sentiment analysis based on news headlines, and stock-price metrics, such as 7-day and 30-day running averages, % price change, standard deviation, etc., I was able to build multiple classification models to predict whether the next day price of a stock would RISE or FALL. I found gradient boosting to produce the best results, with 83% accuracy, and 15% Type II error.

Comparitive results from five classification models. Gradient boosting seems to have the best accuracy while also minimizing false negatives. This is important because to mitigate financial risk, it is preferable to encounter more Type II errors than Type I errors.

Research Projects

Automated Mineralogy (Spring 2021)

Color Quantization Python Library (Initiated February 2021)

Parameters:

Hashtag Recommender (Winter 2020)

Next-day Stock Price Predictor (Winter 2020)