Portfolio

My portfolio includes three ML projects on different topics focusing on computer vision, NLP and tabular data. To see more of my work, visit my GitHub profile, download my CV or check out the about page.

My portfolio features the following projects:

📖 ~~Text reading complexity prediction with transformer~~
🧬 ~~Image-to-text translation of chemical structures with deep learning~~
📈 ~~Fair machine learning in credit scoring applications~~

Click "read more" to see project summaries. Follow GitHub links for code and documentation. Scroll down to see more Machine Learning and Deep Learning projects grouped by application areas.

Text Readability Prediction with Transformers

Highlights

This Project is not mine. Just list it as my recommended portfolio design style.
developed a comprehensive PyTorch / HuggingFace text classification pipeline
build multiple transformers including BERT and RoBERTa with custom pooling layers
implemented an interactive web app for custom text reading complexity estimation

Tags: natural language processing, deep learning, web app

Summary

Estimating text reading complexity is a crucial task for school teachers. Offering students text passages at the right level of challenge is important for facilitating a fast development of reading skills. The existing tools to estimate text complexity rely on weak proxies and heuristics, which results in a suboptimal accuracy. In this project, I use deep learning to predict the readability scores of text passages.

My solution implements eight transformer models, including BERT, RoBERTa and others in PyTorch. The models feature a custom regression head that uses a concatenated output of multiple hidden layers. The modeling pipeline includes text augmentations such as sentence order shuffle, backtranslation and injecting target noise. The solution places in the top-9% of the Kaggle competition leaderboard.

The project also includes an interactive web app built in Python. The app allows to estimate reading complexity of a custom text using two of the trained transformer models. The code and documentation are available on GitHub.