Chapter 1 Introduction

The emergent trend of online shopping is sweeping the enture world. In the US, Amazon and Ebay allows users to have the goods delievered to them in a few days just via clicking a button. In fact, the industry of online shopping has transitioned from non-existence to nearly fanaticism in just a couple of years since Alibaba, who owns the most famous online shopping platform Taobao, started their company in Hangzhou in 1999. One surprising fact about Alibaba is that, during the time it was developing Taobao, Amazon has already been in the Chinese online shopping industry for a long period of time, before which Amazon has numerous successful experience in the US. Surprisingly, Alibaba successfully defeated its competitor from the US and became the largest online shopping platform in China.

In this project, we collected the user behavior data in 2017 from Alibaba’s data cloud platform Tianchi. The dataset is a giant one with over 100 million data, while it only contains a tiny sample of user behavior in 9 days. This shows the excessive large volumn of online shoppers via Taobao platform. By analyzing the dataset, we want to answer the following question:

  1. What’s the basic feature of online consumer? (eg. When do they prefer to scan the website? When do they prefer to purchase the item? What’s their favorite item? The distribution of time spending on the website for different consumer?)

  2. What is the correlation bewteen item click,buy,add to cart,add favor? What’s the difference bewteen “add to cart” and “add favor”? (Many Chinese consumers add lots of items to cart but just do not buy it for a very long time.)

  3. What’s the outlier of item or consumer? What’s the correlation bewteen items? (eg.The classic beer and diaper question)

There are about 1 million users and 100 million interactions, so the interactions can be very detailed and reflects the inner thought of consumers.