blog – yHat

How LDA Works, Using Shiny for Python

A small Shiny for Python app exploring how LDA works

2023 #30DayChallenge

A few charts related to time-series packages for the 30DayChallenge

Performance Benchmarking Data Read Write

Which of the popular data read write methods is faster? Let’s find out.

Making the Anomaly Database

This is part two of the two part post related to Docker, Postgres databases and Anomaly data-sets. Read Part 1, which teaches you how to setup a new postgres database using…

Docker based RStudio & PostgreSQL

How to setup a Docker based workflow for development in RStudio with a local Postgres server, also hosted in Docker

Enhance ETL pipeline monitoring with text plots

Quick visualizations in command line using {txtplot}

Visualizing Correlations

Correlation plot for Kepler’s Planets, for day 13 of the 2021 30-day-chart-challenge

TidyTuesday - The Tate Collection

Last week's #TidyTuesday. Had something very specific in mind & it forced me to learn a new pkg and some base R to finish this plot.

I wanted to showcase the change in the…

TidyTuesday - Transit Costs

Comparing Indian rail projects to our neighbour China, I find that, on average, Indian lines have a higher number of stations and longer lines than our Chinese counterparts.

…

TidyTuesday - Big Mac Index

For my first #TidyTuesday post, I've attempted a comparison of the 2015 to 2020 movement of the Big Mac index : https://t.co/AOGOvt3ve5#RStats #dataviz #r4ds #ggplot2 pic.twi…

Perf Benchmarking Dummy Variables - Part II

Is {fastDummies} any better than {stats} to create dummy variables? Let’s find out.

M5 Competition Virtual Awards Ceremony

Notes from the M5 Forecasting Competition keynote speakers.

Reproducible Work in R

A few ways I ensure my work is reproducible in R

Using tryCatch for robust R scripts

A quick introduction to tryCatch below, followed by three use-cases I use on a regular basis.

Performance Benchmarking for Date-Time conversions

I have 6 methods compete against each other to figure out the fastest way to convert characters to date-time for large datasets.

Books I Reference

A list of Data Science books I reference

Visualising Linear Discriminant Analyses

Linear Discriminant Analysis visualized using Shiny

Performance Benchmarking for Dummy Variable Creation

How do the four popular methods of creating dummy variables perform on large datasets? Let’s find out!