Some friends and I participated in a Mercado Libre (MELI) data science challenge focused on predicting the probability of inventory being sold after N days. This was my first time working on this type of project with e-commerce data. While we didn’t win, it was an incredibly valuable learning experience. After thorough data analysis, we discovered that the cumulative sales followed Gaussian and Poisson distributions, making Poisson regression a natural choice for predicting sell-through probabilities. The results were impressive!

Data

An example of the data is shown in the next figure

Figure 1

Analysis

Our main assumption was that all the information required to predict when inventory will be sold is contained in the data, and that the probability of selling one product is independent from selling other products. With this assumption, we created histograms of the probability to sell M products in a given day. Then, using convolution, we calculated probabilities for successive days.

Figure 2

What this example (and others) shows is that after several days, the distribution follows an exponential distribution. This means that the cumulative probability of a product being sold must follow a Poisson distribution. This insight led us to use Poisson Regression.

Poisson Regression

By implementing Poisson regression, we trained a model for each product category. One example is shown below:

Figure 3

The results were astonishing! The model captured the sell-through dynamics remarkably well.

Final Considerations

This project involved many additional components, including careful definition of “sold out” probability, handling test data we didn’t have access to, and feature engineering. I’ve focused on the core methodology here, but if you’re interested in more details, check out the GitHub repository linked above.