Projet_Analyse_Exploratoire_des_Ventes_de_Supermarché¶

Vous avez dans le fichier zip ci-joint un dossier contenant des données de ventes d'une entreprise sur l'année 2019. Vous avez un fichier csv par mois. Chaque fichier contient les colonnes suivantes:

  • OrderID: l'id de la commande
  • Product: La désignation du produit
  • Quantity Ordered: la quantité vendue
  • Price each: le prix unitaire
  • order date
  • Purchase adress: l'adresse de la boutique de vente

Vous êtes appelé à analyser ces données pour répondre aux questions suivantes:

  • Quel est le meilleur mois de vente ? et Quel est le chiffre d'affaires de ce mois ?
  • Dans quelle ville a-t-on le plus vendu de produits ?
  • A quelle heure devons-nous passer de la publicité pour augmenter nos ventes ?
  • Quels sont les produits qui sont souvent achetés ensemble ?
  • Quel produit a-t-on le plus vendu et pourquoi ?

Accompagnez vos solutions de visualisations parlantes afin d'aider le business à améliorer ses chiffres de l'année suivante.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os
In [36]:
import warnings
warnings.filterwarnings('ignore')
In [2]:
df = pd.read_csv("./Sales_Data/Sales_April_2019.csv")
In [3]:
df
Out[3]:
Order ID Product Quantity Ordered Price Each Order Date Purchase Address
0 176558 USB-C Charging Cable 2 11.95 04/19/19 08:46 917 1st St, Dallas, TX 75001
1 NaN NaN NaN NaN NaN NaN
2 176559 Bose SoundSport Headphones 1 99.99 04/07/19 22:30 682 Chestnut St, Boston, MA 02215
3 176560 Google Phone 1 600 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001
4 176560 Wired Headphones 1 11.99 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001
... ... ... ... ... ... ...
18378 194090 Google Phone 1 600 04/08/19 17:11 177 Jackson St, Los Angeles, CA 90001
18379 194091 AA Batteries (4-pack) 1 3.84 04/15/19 16:02 311 Forest St, Austin, TX 73301
18380 194092 AAA Batteries (4-pack) 2 2.99 04/28/19 14:36 347 Sunset St, San Francisco, CA 94016
18381 194093 AA Batteries (4-pack) 1 3.84 04/14/19 15:09 835 Lake St, Portland, OR 97035
18382 194094 Lightning Charging Cable 1 14.95 04/18/19 11:08 354 North St, Boston, MA 02215

18383 rows × 6 columns

Merging 12 months of sales data into a single csv file¶

In [4]:
data = ("./Sales_Data/")
In [5]:
os.listdir(data)
Out[5]:
['Sales_April_2019.csv',
 'Sales_August_2019.csv',
 'Sales_December_2019.csv',
 'Sales_February_2019.csv',
 'Sales_January_2019.csv',
 'Sales_July_2019.csv',
 'Sales_June_2019.csv',
 'Sales_March_2019.csv',
 'Sales_May_2019.csv',
 'Sales_November_2019.csv',
 'Sales_October_2019.csv',
 'Sales_September_2019.csv']

Jetons un coup d'oeil à la data d'un mois, celui de Janvier¶

In [6]:
janvier_sales = pd.read_csv(data + 'Sales_January_2019.csv')
In [7]:
janvier_sales.head()
Out[7]:
Order ID Product Quantity Ordered Price Each Order Date Purchase Address
0 141234 iPhone 1 700 01/22/19 21:25 944 Walnut St, Boston, MA 02215
1 141235 Lightning Charging Cable 1 14.95 01/28/19 14:15 185 Maple St, Portland, OR 97035
2 141236 Wired Headphones 2 11.99 01/17/19 13:33 538 Adams St, San Francisco, CA 94016
3 141237 27in FHD Monitor 1 149.99 01/05/19 20:33 738 10th St, Los Angeles, CA 90001
4 141238 Wired Headphones 1 11.99 01/25/19 11:59 387 10th St, Austin, TX 73301
In [8]:
janvier_sales.describe()
Out[8]:
Order ID Product Quantity Ordered Price Each Order Date Purchase Address
count 9697 9697 9697 9697 9697 9697
unique 9269 20 8 19 8077 9161
top Order ID USB-C Charging Cable 1 11.95 Order Date Purchase Address
freq 16 1171 8795 1171 16 16
In [9]:
janvier_sales.columns
Out[9]:
Index(['Order ID', 'Product', 'Quantity Ordered', 'Price Each', 'Order Date',
       'Purchase Address'],
      dtype='object')

Merger la data¶

In [10]:
files = [fichier for fichier in os.listdir(data) if fichier.endswith('.csv')]
files
Out[10]:
['Sales_April_2019.csv',
 'Sales_August_2019.csv',
 'Sales_December_2019.csv',
 'Sales_February_2019.csv',
 'Sales_January_2019.csv',
 'Sales_July_2019.csv',
 'Sales_June_2019.csv',
 'Sales_March_2019.csv',
 'Sales_May_2019.csv',
 'Sales_November_2019.csv',
 'Sales_October_2019.csv',
 'Sales_September_2019.csv']
In [11]:
df = pd.DataFrame()

for file in files:
  data_interm = pd.read_csv(data + file)
  df = pd.concat([df, data_interm])
In [12]:
df.shape
Out[12]:
(186850, 6)
In [13]:
df.describe()
Out[13]:
Order ID Product Quantity Ordered Price Each Order Date Purchase Address
count 186305 186305 186305 186305 186305 186305
unique 178438 20 10 24 142396 140788
top Order ID USB-C Charging Cable 1 11.95 Order Date Purchase Address
freq 355 21903 168552 21903 355 355
In [14]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 186850 entries, 0 to 11685
Data columns (total 6 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   Order ID          186305 non-null  object
 1   Product           186305 non-null  object
 2   Quantity Ordered  186305 non-null  object
 3   Price Each        186305 non-null  object
 4   Order Date        186305 non-null  object
 5   Purchase Address  186305 non-null  object
dtypes: object(6)
memory usage: 10.0+ MB

Nettoyage de la data¶

In [15]:
df.isnull().sum(axis=0)
Out[15]:
Order ID            545
Product             545
Quantity Ordered    545
Price Each          545
Order Date          545
Purchase Address    545
dtype: int64

Affichons un df avec toutes les lignes qui ont au moins une valeur manquante¶

In [16]:
val_manquantes = df[df.isnull().any(axis=1)]
val_manquantes
Out[16]:
Order ID Product Quantity Ordered Price Each Order Date Purchase Address
1 NaN NaN NaN NaN NaN NaN
356 NaN NaN NaN NaN NaN NaN
735 NaN NaN NaN NaN NaN NaN
1433 NaN NaN NaN NaN NaN NaN
1553 NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ...
10012 NaN NaN NaN NaN NaN NaN
10274 NaN NaN NaN NaN NaN NaN
10878 NaN NaN NaN NaN NaN NaN
11384 NaN NaN NaN NaN NaN NaN
11662 NaN NaN NaN NaN NaN NaN

545 rows × 6 columns

Est-ce que toutes les lignes n'ont que des valeurs manquantes¶

In [17]:
val_manquantes.isnull().all()
Out[17]:
Order ID            True
Product             True
Quantity Ordered    True
Price Each          True
Order Date          True
Purchase Address    True
dtype: bool

Supprimer toutes les lignes avec des valeurs manquantes:¶

In [18]:
df.shape
Out[18]:
(186850, 6)
In [19]:
df.dropna(inplace=True)
In [20]:
df.shape
Out[20]:
(186305, 6)

Vérifier si il y a des valeurs manquantes¶

In [21]:
df.isnull().any()
Out[21]:
Order ID            False
Product             False
Quantity Ordered    False
Price Each          False
Order Date          False
Purchase Address    False
dtype: bool

Débarassons-nous des lignes avec des valeurs abérantes¶

In [22]:
df[df['Order Date'] == "Order Date"]
Out[22]:
Order ID Product Quantity Ordered Price Each Order Date Purchase Address
519 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
1149 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
1155 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
2878 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
2893 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
... ... ... ... ... ... ...
10000 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
10387 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
11399 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
11468 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
11574 Order ID Product Quantity Ordered Price Each Order Date Purchase Address

355 rows × 6 columns

Vérifiez si tous les caractères du texte sont des chiffres :¶

In [23]:
"1624".isdigit()
Out[23]:
True
In [24]:
df.loc[~df['Order ID'].str.isdigit(), : ]
Out[24]:
Order ID Product Quantity Ordered Price Each Order Date Purchase Address
519 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
1149 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
1155 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
2878 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
2893 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
... ... ... ... ... ... ...
10000 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
10387 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
11399 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
11468 Order ID Product Quantity Ordered Price Each Order Date Purchase Address
11574 Order ID Product Quantity Ordered Price Each Order Date Purchase Address

355 rows × 6 columns

In [26]:
df_clean = df.drop(df.loc[df['Order Date'] == "Order Date", :].index)
In [27]:
df_clean.loc[~df_clean['Order ID'].str.isdigit(), : ]
Out[27]:
Order ID Product Quantity Ordered Price Each Order Date Purchase Address
In [28]:
df_clean.head()
Out[28]:
Order ID Product Quantity Ordered Price Each Order Date Purchase Address
0 176558 USB-C Charging Cable 2 11.95 04/19/19 08:46 917 1st St, Dallas, TX 75001
2 176559 Bose SoundSport Headphones 1 99.99 04/07/19 22:30 682 Chestnut St, Boston, MA 02215
3 176560 Google Phone 1 600 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001
4 176560 Wired Headphones 1 11.99 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001
5 176561 Wired Headphones 1 11.99 04/30/19 09:27 333 8th St, Los Angeles, CA 90001

Convert columns to the correct type¶

In [34]:
import warnings
warnings.filterwarnings('ignore')
In [35]:
df_clean['Quantity Ordered'] = pd.to_numeric(df_clean['Quantity Ordered'])
# df_clean['Quantity Ordered'] = df_clean['Quantity Ordered'].astype('int')
In [37]:
df_clean['Price Each'] = pd.to_numeric(df_clean['Price Each'])
In [38]:
df_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 182735 entries, 0 to 11685
Data columns (total 6 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   Order ID          182735 non-null  object 
 1   Product           182735 non-null  object 
 2   Quantity Ordered  182735 non-null  int64  
 3   Price Each        182735 non-null  float64
 4   Order Date        182735 non-null  object 
 5   Purchase Address  182735 non-null  object 
dtypes: float64(1), int64(1), object(4)
memory usage: 9.8+ MB
In [39]:
df_clean['Order Date'] = pd.to_datetime(df_clean['Order Date'])
In [40]:
df_clean.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 182735 entries, 0 to 11685
Data columns (total 6 columns):
 #   Column            Non-Null Count   Dtype         
---  ------            --------------   -----         
 0   Order ID          182735 non-null  object        
 1   Product           182735 non-null  object        
 2   Quantity Ordered  182735 non-null  int64         
 3   Price Each        182735 non-null  float64       
 4   Order Date        182735 non-null  datetime64[ns]
 5   Purchase Address  182735 non-null  object        
dtypes: datetime64[ns](1), float64(1), int64(1), object(3)
memory usage: 9.8+ MB
In [41]:
df['Order Date']
Out[41]:
0        04/19/19 08:46
2        04/07/19 22:30
3        04/12/19 14:38
4        04/12/19 14:38
5        04/30/19 09:27
              ...      
11681    09/17/19 20:56
11682    09/01/19 16:00
11683    09/23/19 07:39
11684    09/19/19 17:30
11685    09/30/19 00:18
Name: Order Date, Length: 186305, dtype: object
In [42]:
df_clean['Order Date']
Out[42]:
0       2019-04-19 08:46:00
2       2019-04-07 22:30:00
3       2019-04-12 14:38:00
4       2019-04-12 14:38:00
5       2019-04-30 09:27:00
                ...        
11681   2019-09-17 20:56:00
11682   2019-09-01 16:00:00
11683   2019-09-23 07:39:00
11684   2019-09-19 17:30:00
11685   2019-09-30 00:18:00
Name: Order Date, Length: 182735, dtype: datetime64[ns]

Analyse Exploratoire proprement dite¶

I. Quel est le meilleur mois de vente ? et Quel est le chiffre d'affaires de ce mois ?¶

In [43]:
df_clean.head()
Out[43]:
Order ID Product Quantity Ordered Price Each Order Date Purchase Address
0 176558 USB-C Charging Cable 2 11.95 2019-04-19 08:46:00 917 1st St, Dallas, TX 75001
2 176559 Bose SoundSport Headphones 1 99.99 2019-04-07 22:30:00 682 Chestnut St, Boston, MA 02215
3 176560 Google Phone 1 600.00 2019-04-12 14:38:00 669 Spruce St, Los Angeles, CA 90001
4 176560 Wired Headphones 1 11.99 2019-04-12 14:38:00 669 Spruce St, Los Angeles, CA 90001
5 176561 Wired Headphones 1 11.99 2019-04-30 09:27:00 333 8th St, Los Angeles, CA 90001
In [47]:
df_clean.index
Out[47]:
Int64Index([    0,     2,     3,     4,     5,     6,     7,     8,     9,
               10,
            ...
            11676, 11677, 11678, 11679, 11680, 11681, 11682, 11683, 11684,
            11685],
           dtype='int64', length=182735)
In [48]:
df_clean = df_clean.set_index('Order Date')
In [49]:
df_clean.head()
Out[49]:
Order ID Product Quantity Ordered Price Each Purchase Address
Order Date
2019-04-19 08:46:00 176558 USB-C Charging Cable 2 11.95 917 1st St, Dallas, TX 75001
2019-04-07 22:30:00 176559 Bose SoundSport Headphones 1 99.99 682 Chestnut St, Boston, MA 02215
2019-04-12 14:38:00 176560 Google Phone 1 600.00 669 Spruce St, Los Angeles, CA 90001
2019-04-12 14:38:00 176560 Wired Headphones 1 11.99 669 Spruce St, Los Angeles, CA 90001
2019-04-30 09:27:00 176561 Wired Headphones 1 11.99 333 8th St, Los Angeles, CA 90001
In [50]:
df_clean.sort_index(inplace=True)
In [51]:
df_clean['Month'] = df_clean.index.month_name()
In [52]:
df_clean.head()
Out[52]:
Order ID Product Quantity Ordered Price Each Purchase Address Month
Order Date
2019-01-01 03:07:00 147268 Wired Headphones 1 11.99 9 Lake St, New York City, NY 10001 January
2019-01-01 03:40:00 148041 USB-C Charging Cable 1 11.95 760 Church St, San Francisco, CA 94016 January
2019-01-01 04:56:00 149343 Apple Airpods Headphones 1 150.00 735 5th St, New York City, NY 10001 January
2019-01-01 05:53:00 149964 AAA Batteries (4-pack) 1 2.99 75 Jackson St, Dallas, TX 75001 January
2019-01-01 06:03:00 149350 USB-C Charging Cable 2 11.95 943 2nd St, Atlanta, GA 30301 January
In [53]:
df_clean['chiffre_daffaire'] = df_clean['Quantity Ordered'] * df_clean['Price Each']
In [54]:
df_clean.sample(5)
Out[54]:
Order ID Product Quantity Ordered Price Each Purchase Address Month chiffre_daffaire
Order Date
2019-05-03 22:01:00 198775 AA Batteries (4-pack) 1 3.84 516 Chestnut St, Boston, MA 02215 May 3.84
2019-02-20 13:17:00 152419 27in FHD Monitor 1 149.99 66 Washington St, San Francisco, CA 94016 February 149.99
2019-01-13 13:09:00 142639 27in 4K Gaming Monitor 1 389.99 345 South St, Seattle, WA 98101 January 389.99
2019-04-16 11:49:00 187268 Wired Headphones 1 11.99 811 Forest St, San Francisco, CA 94016 April 11.99
2019-01-23 18:55:00 144038 iPhone 1 700.00 751 Wilson St, Austin, TX 73301 January 700.00
In [55]:
df_clean.groupby('Month')["chiffre_daffaire"].sum().sort_values(ascending=False)
Out[55]:
Month
December     4.557905e+06
October      3.679254e+06
April        3.336376e+06
November     3.149785e+06
May          3.101881e+06
March        2.755969e+06
July         2.587445e+06
June         2.524465e+06
August       2.191698e+06
February     2.158127e+06
September    2.050361e+06
January      1.786511e+06
Name: chiffre_daffaire, dtype: float64
In [56]:
df_clean.groupby('Month').sum()["chiffre_daffaire"].sort_values(ascending=False).December
Out[56]:
4557905.420001525

Décembre est le mois avec le plus de ventes et le CA est de 4557905.420001525

In [57]:
df_clean['Purchase Address']
Out[57]:
Order Date
2019-01-01 03:07:00         9 Lake St, New York City, NY 10001
2019-01-01 03:40:00     760 Church St, San Francisco, CA 94016
2019-01-01 04:56:00        735 5th St, New York City, NY 10001
2019-01-01 05:53:00            75 Jackson St, Dallas, TX 75001
2019-01-01 06:03:00              943 2nd St, Atlanta, GA 30301
                                        ...                   
2020-01-01 04:13:00           825 Adams St, Portland, OR 97035
2020-01-01 04:21:00    754 Hickory St, New York City, NY 10001
2020-01-01 04:54:00      784 River St, San Francisco, CA 94016
2020-01-01 05:13:00     657 Spruce St, New York City, NY 10001
2020-01-01 05:13:00     657 Spruce St, New York City, NY 10001
Name: Purchase Address, Length: 182735, dtype: object
In [58]:
def get_ville(addresse):
  return addresse.split(',')[1].strip()
In [59]:
get_ville("760 Church St, San Francisco, CA 94016")
Out[59]:
'San Francisco'
In [60]:
df_clean['ville'] = df_clean['Purchase Address'].apply(get_ville)
In [61]:
df_clean.sample(5)
Out[61]:
Order ID Product Quantity Ordered Price Each Purchase Address Month chiffre_daffaire ville
Order Date
2019-11-14 11:41:00 284107 AAA Batteries (4-pack) 2 2.99 772 11th St, San Francisco, CA 94016 November 5.98 San Francisco
2019-12-22 22:20:00 305268 iPhone 1 700.00 516 Lincoln St, Los Angeles, CA 90001 December 700.00 Los Angeles
2019-02-17 03:50:00 154226 Wired Headphones 1 11.99 499 Cedar St, Los Angeles, CA 90001 February 11.99 Los Angeles
2019-09-30 15:30:00 252165 27in FHD Monitor 1 149.99 605 6th St, Los Angeles, CA 90001 September 149.99 Los Angeles
2019-03-11 21:53:00 165902 Lightning Charging Cable 1 14.95 337 10th St, Portland, OR 97035 March 14.95 Portland
In [62]:
df_clean['ville'].unique()
Out[62]:
array(['New York City', 'San Francisco', 'Dallas', 'Atlanta', 'Boston',
       'Seattle', 'Los Angeles', 'Portland', 'Austin'], dtype=object)
In [63]:
df_clean.groupby('ville').sum()["chiffre_daffaire"].sort_values(ascending=False)
Out[63]:
ville
San Francisco    8.124121e+06
Los Angeles      5.354040e+06
New York City    4.581659e+06
Boston           3.604081e+06
Atlanta          2.741642e+06
Dallas           2.717794e+06
Seattle          2.693049e+06
Portland         2.276649e+06
Austin           1.786746e+06
Name: chiffre_daffaire, dtype: float64

On a fait un meilleur chiffre d'affaire à San Francisco

In [64]:
ca_by_ville = df_clean.groupby('ville').sum()["chiffre_daffaire"].sort_values(ascending=False)
In [67]:
ca_by_ville.plot(kind='bar', figsize=(8, 6))
plt.ylabel('Chiffre daffaire en Millions de dollars')
plt.title('Chiffre daffaire par Ville')
plt.show()

A quelle heure devons-nous passer de la publicité pour augmenter nos ventes ?¶

A quelle heure on vend le plus ?

In [68]:
df_clean['heure'] = df_clean.index.hour
df_clean['time'] = df_clean.index.time
In [69]:
df_clean.head()
Out[69]:
Order ID Product Quantity Ordered Price Each Purchase Address Month chiffre_daffaire ville heure time
Order Date
2019-01-01 03:07:00 147268 Wired Headphones 1 11.99 9 Lake St, New York City, NY 10001 January 11.99 New York City 3 03:07:00
2019-01-01 03:40:00 148041 USB-C Charging Cable 1 11.95 760 Church St, San Francisco, CA 94016 January 11.95 San Francisco 3 03:40:00
2019-01-01 04:56:00 149343 Apple Airpods Headphones 1 150.00 735 5th St, New York City, NY 10001 January 150.00 New York City 4 04:56:00
2019-01-01 05:53:00 149964 AAA Batteries (4-pack) 1 2.99 75 Jackson St, Dallas, TX 75001 January 2.99 Dallas 5 05:53:00
2019-01-01 06:03:00 149350 USB-C Charging Cable 2 11.95 943 2nd St, Atlanta, GA 30301 January 23.90 Atlanta 6 06:03:00
In [70]:
df_clean.time.sort_values(ascending=False)
Out[70]:
Order Date
2019-11-02 23:59:00    23:59:00
2019-09-26 23:59:00    23:59:00
2019-11-04 23:59:00    23:59:00
2019-07-17 23:59:00    23:59:00
2019-07-01 23:59:00    23:59:00
                         ...   
2019-12-31 00:00:00    00:00:00
2019-12-31 00:00:00    00:00:00
2019-11-15 00:00:00    00:00:00
2019-11-30 00:00:00    00:00:00
2019-05-03 00:00:00    00:00:00
Name: time, Length: 182735, dtype: object
In [71]:
df_clean.groupby('heure')['chiffre_daffaire'].sum().sort_values(ascending=False)
Out[71]:
heure
19    2370005.59
12    2269676.91
11    2267169.73
20    2242856.37
18    2176209.65
13    2118394.83
17    2086082.23
14    2045085.42
21    2013288.02
10    1907179.29
15    1900151.56
16    1864530.92
9     1612498.90
22    1578173.21
8     1181363.76
23    1161803.96
7      734595.75
0      698694.76
1      447663.86
6      441523.14
2      234191.08
5      224313.72
4      161350.57
3      142976.54
Name: chiffre_daffaire, dtype: float64
In [72]:
ca_par_heure = pd.DataFrame(df_clean.groupby('heure')['chiffre_daffaire'].sum())
In [73]:
ca_par_heure
Out[73]:
chiffre_daffaire
heure
0 698694.76
1 447663.86
2 234191.08
3 142976.54
4 161350.57
5 224313.72
6 441523.14
7 734595.75
8 1181363.76
9 1612498.90
10 1907179.29
11 2267169.73
12 2269676.91
13 2118394.83
14 2045085.42
15 1900151.56
16 1864530.92
17 2086082.23
18 2176209.65
19 2370005.59
20 2242856.37
21 2013288.02
22 1578173.21
23 1161803.96
In [74]:
sns.lineplot(data=ca_par_heure['chiffre_daffaire'])
plt.xticks(ticks=range(0, 24))
plt.show()

On devrait afficher la publicité entre 10 et 11 et entre 18 et 19.

Quels sont les produits qui sont souvent achetés ensemble ?¶

In [75]:
df_clean.head()
Out[75]:
Order ID Product Quantity Ordered Price Each Purchase Address Month chiffre_daffaire ville heure time
Order Date
2019-01-01 03:07:00 147268 Wired Headphones 1 11.99 9 Lake St, New York City, NY 10001 January 11.99 New York City 3 03:07:00
2019-01-01 03:40:00 148041 USB-C Charging Cable 1 11.95 760 Church St, San Francisco, CA 94016 January 11.95 San Francisco 3 03:40:00
2019-01-01 04:56:00 149343 Apple Airpods Headphones 1 150.00 735 5th St, New York City, NY 10001 January 150.00 New York City 4 04:56:00
2019-01-01 05:53:00 149964 AAA Batteries (4-pack) 1 2.99 75 Jackson St, Dallas, TX 75001 January 2.99 Dallas 5 05:53:00
2019-01-01 06:03:00 149350 USB-C Charging Cable 2 11.95 943 2nd St, Atlanta, GA 30301 January 23.90 Atlanta 6 06:03:00
In [76]:
def concat_produit_par_order_id(p):
  return ";".join(p)
In [77]:
produit_ensemble = df_clean.drop_duplicates(['Order ID'])
len(produit_ensemble)
Out[77]:
175449
In [78]:
df_clean['sales_together'] = df_clean.groupby('Order ID')['Product'].apply(concat_produit_par_order_id)
In [79]:
df_clean.head()
Out[79]:
Order ID Product Quantity Ordered Price Each Purchase Address Month chiffre_daffaire ville heure time sales_together
Order Date
2019-01-01 03:07:00 147268 Wired Headphones 1 11.99 9 Lake St, New York City, NY 10001 January 11.99 New York City 3 03:07:00 NaN
2019-01-01 03:40:00 148041 USB-C Charging Cable 1 11.95 760 Church St, San Francisco, CA 94016 January 11.95 San Francisco 3 03:40:00 NaN
2019-01-01 04:56:00 149343 Apple Airpods Headphones 1 150.00 735 5th St, New York City, NY 10001 January 150.00 New York City 4 04:56:00 NaN
2019-01-01 05:53:00 149964 AAA Batteries (4-pack) 1 2.99 75 Jackson St, Dallas, TX 75001 January 2.99 Dallas 5 05:53:00 NaN
2019-01-01 06:03:00 149350 USB-C Charging Cable 2 11.95 943 2nd St, Atlanta, GA 30301 January 23.90 Atlanta 6 06:03:00 NaN
In [80]:
len(df_clean['sales_together'])
Out[80]:
182735
In [81]:
produit_ensemble['groupby'] = df_clean.groupby('Order ID')['Product'].apply(concat_produit_par_order_id)
In [82]:
produit_ensemble.head()
Out[82]:
Order ID Product Quantity Ordered Price Each Purchase Address Month chiffre_daffaire ville heure time groupby
Order Date
2019-01-01 03:07:00 147268 Wired Headphones 1 11.99 9 Lake St, New York City, NY 10001 January 11.99 New York City 3 03:07:00 NaN
2019-01-01 03:40:00 148041 USB-C Charging Cable 1 11.95 760 Church St, San Francisco, CA 94016 January 11.95 San Francisco 3 03:40:00 NaN
2019-01-01 04:56:00 149343 Apple Airpods Headphones 1 150.00 735 5th St, New York City, NY 10001 January 150.00 New York City 4 04:56:00 NaN
2019-01-01 05:53:00 149964 AAA Batteries (4-pack) 1 2.99 75 Jackson St, Dallas, TX 75001 January 2.99 Dallas 5 05:53:00 NaN
2019-01-01 06:03:00 149350 USB-C Charging Cable 2 11.95 943 2nd St, Atlanta, GA 30301 January 23.90 Atlanta 6 06:03:00 NaN
In [83]:
df_clean.groupby('Order ID')['Product'].apply(concat_produit_par_order_id)
Out[83]:
Order ID
141234                        iPhone
141235      Lightning Charging Cable
141236              Wired Headphones
141237              27in FHD Monitor
141238              Wired Headphones
                     ...            
319666      Lightning Charging Cable
319667         AA Batteries (4-pack)
319668               Vareebadd Phone
319669              Wired Headphones
319670    Bose SoundSport Headphones
Name: Product, Length: 175449, dtype: object
In [84]:
df_clean.groupby('Order ID')['Product'].groups
Out[84]:
{'141234': [2019-01-22 21:25:00], '141235': [2019-01-28 14:15:00], '141236': [2019-01-17 13:33:00], '141237': [2019-01-05 20:33:00], '141238': [2019-01-25 11:59:00], '141239': [2019-01-29 20:22:00], '141240': [2019-01-26 12:16:00], '141241': [2019-01-05 12:04:00], '141242': [2019-01-01 10:30:00], '141243': [2019-01-22 21:20:00], '141244': [2019-01-07 11:29:00], '141245': [2019-01-31 10:12:00], '141246': [2019-01-09 18:57:00], '141247': [2019-01-25 19:19:00], '141248': [2019-01-03 21:54:00], '141249': [2019-01-05 17:20:00], '141250': [2019-01-10 11:20:00], '141251': [2019-01-24 08:13:00], '141252': [2019-01-30 09:28:00], '141253': [2019-01-17 00:09:00], '141254': [2019-01-08 11:51:00], '141255': [2019-01-09 20:55:00], '141256': [2019-01-29 10:40:00], '141257': [2019-01-12 18:51:00], '141258': [2019-01-19 21:47:00], '141259': [2019-01-20 17:26:00], '141260': [2019-01-01 22:00:00], '141261': [2019-01-09 18:14:00], '141262': [2019-01-16 12:35:00], '141263': [2019-01-11 23:33:00], '141264': [2019-01-03 09:46:00], '141265': [2019-01-01 16:52:00], '141266': [2019-01-02 22:21:00], '141267': [2019-01-09 08:28:00], '141268': [2019-01-14 10:13:00], '141269': [2019-01-03 20:05:00], '141270': [2019-01-27 23:10:00], '141271': [2019-01-30 10:51:00], '141272': [2019-01-12 13:09:00], '141273': [2019-01-29 12:04:00], '141274': [2019-01-17 11:30:00], '141275': [2019-01-07 16:06:00, 2019-01-07 16:06:00], '141276': [2019-01-21 22:23:00], '141277': [2019-01-13 19:07:00], '141278': [2019-01-26 12:14:00], '141279': [2019-01-03 19:10:00], '141280': [2019-01-20 16:10:00], '141281': [2019-01-05 16:51:00], '141282': [2019-01-11 18:10:00], '141283': [2019-01-02 16:16:00], '141284': [2019-01-29 18:30:00], '141285': [2019-01-14 14:13:00], '141286': [2019-01-02 20:33:00], '141287': [2019-01-31 08:38:00], '141288': [2019-01-19 08:17:00], '141289': [2019-01-28 11:17:00], '141290': [2019-01-02 08:25:00, 2019-01-02 08:25:00], '141291': [2019-01-26 18:11:00], '141292': [2019-01-21 08:46:00], '141293': [2019-01-18 12:21:00], '141294': [2019-01-25 08:12:00], '141295': [2019-01-06 20:06:00], '141296': [2019-01-20 00:21:00], '141297': [2019-01-04 11:09:00], '141298': [2019-01-21 13:24:00], '141299': [2019-01-31 23:23:00], '141300': [2019-01-23 10:21:00], '141301': [2019-01-27 11:58:00], '141302': [2019-01-18 20:12:00], '141303': [2019-01-19 09:23:00], '141304': [2019-01-28 13:10:00], '141305': [2019-01-27 16:51:00], '141306': [2019-01-19 10:25:00], '141307': [2019-01-11 11:18:00], '141308': [2019-01-12 12:00:00], '141309': [2019-01-20 07:02:00], '141310': [2019-01-07 13:32:00], '141311': [2019-01-28 12:45:00], '141312': [2019-01-08 23:20:00], '141313': [2019-01-09 15:03:00], '141314': [2019-01-13 23:51:00], '141315': [2019-01-10 01:32:00], '141316': [2019-01-01 07:26:00], '141317': [2019-01-21 11:18:00], '141318': [2019-01-09 15:21:00], '141319': [2019-01-01 10:43:00], '141320': [2019-01-24 13:37:00], '141321': [2019-01-10 09:07:00], '141322': [2019-01-12 21:56:00], '141323': [2019-01-17 21:54:00], '141324': [2019-01-04 23:36:00], '141325': [2019-01-23 15:17:00], '141326': [2019-01-05 13:49:00], '141327': [2019-01-31 01:38:00], '141328': [2019-01-06 23:18:00], '141329': [2019-01-01 16:01:00], '141330': [2019-01-21 13:04:00], '141331': [2019-01-09 18:32:00], '141332': [2019-01-19 18:52:00], '141333': [2019-01-01 13:58:00], ...}
In [85]:
df_clean['sales_product_together'] = df_clean.groupby('Order ID')['Product'].transform(lambda x: '; '.join(set(x)))
In [86]:
df_clean.head()
Out[86]:
Order ID Product Quantity Ordered Price Each Purchase Address Month chiffre_daffaire ville heure time sales_together sales_product_together
Order Date
2019-01-01 03:07:00 147268 Wired Headphones 1 11.99 9 Lake St, New York City, NY 10001 January 11.99 New York City 3 03:07:00 NaN Wired Headphones
2019-01-01 03:40:00 148041 USB-C Charging Cable 1 11.95 760 Church St, San Francisco, CA 94016 January 11.95 San Francisco 3 03:40:00 NaN USB-C Charging Cable
2019-01-01 04:56:00 149343 Apple Airpods Headphones 1 150.00 735 5th St, New York City, NY 10001 January 150.00 New York City 4 04:56:00 NaN Apple Airpods Headphones
2019-01-01 05:53:00 149964 AAA Batteries (4-pack) 1 2.99 75 Jackson St, Dallas, TX 75001 January 2.99 Dallas 5 05:53:00 NaN AAA Batteries (4-pack)
2019-01-01 06:03:00 149350 USB-C Charging Cable 2 11.95 943 2nd St, Atlanta, GA 30301 January 23.90 Atlanta 6 06:03:00 NaN USB-C Charging Cable
In [87]:
pduit = pd.DataFrame(df_clean['sales_product_together'].value_counts())
pduit
Out[87]:
sales_product_together
Lightning Charging Cable 19632
AAA Batteries (4-pack) 19597
USB-C Charging Cable 19563
AA Batteries (4-pack) 19537
Wired Headphones 16979
... ...
Google Phone; LG Washing Machine 2
27in 4K Gaming Monitor; LG Washing Machine 2
AA Batteries (4-pack); LG Washing Machine 2
27in FHD Monitor; LG Washing Machine 2
Vareebadd Phone; LG Dryer 2

241 rows × 1 columns

In [88]:
pduit.index.to_list()
Out[88]:
['Lightning Charging Cable',
 'AAA Batteries (4-pack)',
 'USB-C Charging Cable',
 'AA Batteries (4-pack)',
 'Wired Headphones',
 'Apple Airpods Headphones',
 'Bose SoundSport Headphones',
 '27in FHD Monitor',
 '27in 4K Gaming Monitor',
 '34in Ultrawide Monitor',
 'iPhone',
 'Flatscreen TV',
 'Macbook Pro Laptop',
 'ThinkPad Laptop',
 '20in Monitor',
 'Google Phone',
 'USB-C Charging Cable; Google Phone',
 'Vareebadd Phone',
 'iPhone; Lightning Charging Cable',
 'Lightning Charging Cable; iPhone',
 'iPhone; Wired Headphones',
 'LG Washing Machine',
 'LG Dryer',
 'USB-C Charging Cable; Vareebadd Phone',
 'Google Phone; Wired Headphones',
 'iPhone; Apple Airpods Headphones',
 'Bose SoundSport Headphones; Google Phone',
 'USB-C Charging Cable; Google Phone; Wired Headphones',
 'Vareebadd Phone; Wired Headphones',
 'USB-C Charging Cable; Lightning Charging Cable',
 'USB-C Charging Cable; AAA Batteries (4-pack)',
 'AAA Batteries (4-pack); AA Batteries (4-pack)',
 'AAA Batteries (4-pack); Wired Headphones',
 'USB-C Charging Cable; Wired Headphones',
 'AAA Batteries (4-pack); Lightning Charging Cable',
 'AA Batteries (4-pack); Wired Headphones',
 'AAA Batteries (4-pack); Apple Airpods Headphones',
 'USB-C Charging Cable; AA Batteries (4-pack)',
 'AA Batteries (4-pack); Apple Airpods Headphones',
 'Lightning Charging Cable; Apple Airpods Headphones',
 'Lightning Charging Cable; Wired Headphones',
 'Bose SoundSport Headphones; Lightning Charging Cable',
 'USB-C Charging Cable; Apple Airpods Headphones',
 'Bose SoundSport Headphones; Vareebadd Phone',
 'Bose SoundSport Headphones; AA Batteries (4-pack)',
 'USB-C Charging Cable; Bose SoundSport Headphones',
 'Lightning Charging Cable; iPhone; Wired Headphones',
 'Lightning Charging Cable; AA Batteries (4-pack)',
 'USB-C Charging Cable; Bose SoundSport Headphones; Google Phone',
 'AA Batteries (4-pack); Lightning Charging Cable',
 'Bose SoundSport Headphones; Apple Airpods Headphones',
 'Apple Airpods Headphones; Wired Headphones',
 'USB-C Charging Cable; Vareebadd Phone; Wired Headphones',
 'Bose SoundSport Headphones; Wired Headphones',
 'USB-C Charging Cable; 27in FHD Monitor',
 'AAA Batteries (4-pack); 27in FHD Monitor',
 'Lightning Charging Cable; 27in FHD Monitor',
 'iPhone; Lightning Charging Cable; Apple Airpods Headphones',
 'AA Batteries (4-pack); 34in Ultrawide Monitor',
 'Bose SoundSport Headphones; Google Phone; Wired Headphones',
 'Lightning Charging Cable; 34in Ultrawide Monitor',
 '27in 4K Gaming Monitor; Lightning Charging Cable',
 'iPhone; Lightning Charging Cable; Wired Headphones',
 '27in 4K Gaming Monitor; AAA Batteries (4-pack)',
 '34in Ultrawide Monitor; Wired Headphones',
 'AAA Batteries (4-pack); Bose SoundSport Headphones',
 'Bose SoundSport Headphones; AAA Batteries (4-pack)',
 'AA Batteries (4-pack); 27in FHD Monitor',
 '27in 4K Gaming Monitor; Wired Headphones',
 'Lightning Charging Cable; iPhone; Apple Airpods Headphones',
 'AAA Batteries (4-pack); iPhone',
 '27in 4K Gaming Monitor; AA Batteries (4-pack)',
 'USB-C Charging Cable; 34in Ultrawide Monitor',
 'Bose SoundSport Headphones; 27in FHD Monitor',
 'USB-C Charging Cable; 20in Monitor',
 '20in Monitor; Lightning Charging Cable',
 '27in FHD Monitor; Apple Airpods Headphones',
 'Lightning Charging Cable; Google Phone',
 'AAA Batteries (4-pack); 34in Ultrawide Monitor',
 'USB-C Charging Cable; iPhone',
 'Wired Headphones; Apple Airpods Headphones',
 'Macbook Pro Laptop; Lightning Charging Cable',
 '27in FHD Monitor; Wired Headphones',
 '27in 4K Gaming Monitor; Apple Airpods Headphones',
 'AAA Batteries (4-pack); Flatscreen TV',
 'AA Batteries (4-pack); Google Phone',
 'USB-C Charging Cable; ThinkPad Laptop',
 'AA Batteries (4-pack); Flatscreen TV',
 'USB-C Charging Cable; Bose SoundSport Headphones; Vareebadd Phone',
 'Lightning Charging Cable; Flatscreen TV',
 'AAA Batteries (4-pack); Google Phone',
 'USB-C Charging Cable; Macbook Pro Laptop',
 'Macbook Pro Laptop; Wired Headphones',
 'iPhone; Wired Headphones; Apple Airpods Headphones',
 'ThinkPad Laptop; Lightning Charging Cable',
 '27in 4K Gaming Monitor; Bose SoundSport Headphones',
 'AA Batteries (4-pack); iPhone',
 '20in Monitor; Wired Headphones',
 'Macbook Pro Laptop; AA Batteries (4-pack)',
 'ThinkPad Laptop; AA Batteries (4-pack)',
 'AAA Batteries (4-pack); Macbook Pro Laptop',
 'Google Phone; Apple Airpods Headphones',
 'USB-C Charging Cable; Flatscreen TV',
 'ThinkPad Laptop; Apple Airpods Headphones',
 'Macbook Pro Laptop; Apple Airpods Headphones',
 'Bose SoundSport Headphones; Macbook Pro Laptop',
 'Bose SoundSport Headphones; 34in Ultrawide Monitor',
 'iPhone; Apple Airpods Headphones; Wired Headphones',
 '20in Monitor; AA Batteries (4-pack)',
 '27in 4K Gaming Monitor; USB-C Charging Cable',
 'AAA Batteries (4-pack); ThinkPad Laptop',
 '20in Monitor; Bose SoundSport Headphones',
 'ThinkPad Laptop; Wired Headphones',
 '34in Ultrawide Monitor; Apple Airpods Headphones',
 'Flatscreen TV; Apple Airpods Headphones',
 '20in Monitor; Apple Airpods Headphones',
 'ThinkPad Laptop; AAA Batteries (4-pack)',
 'iPhone; 34in Ultrawide Monitor',
 '27in 4K Gaming Monitor; Macbook Pro Laptop',
 'Bose SoundSport Headphones; Flatscreen TV',
 'Bose SoundSport Headphones; iPhone',
 'AA Batteries (4-pack); Vareebadd Phone',
 'iPhone; AA Batteries (4-pack)',
 'Flatscreen TV; Wired Headphones',
 '34in Ultrawide Monitor; 27in FHD Monitor',
 '20in Monitor; AAA Batteries (4-pack)',
 'Vareebadd Phone; Apple Airpods Headphones',
 'iPhone; Flatscreen TV',
 '27in 4K Gaming Monitor; 27in FHD Monitor',
 '27in 4K Gaming Monitor; iPhone',
 'Macbook Pro Laptop; 34in Ultrawide Monitor',
 '34in Ultrawide Monitor; Flatscreen TV',
 'Bose SoundSport Headphones; ThinkPad Laptop',
 'Macbook Pro Laptop; 27in FHD Monitor',
 'USB-C Charging Cable; 27in 4K Gaming Monitor',
 'Macbook Pro Laptop; iPhone',
 '20in Monitor; Macbook Pro Laptop',
 '27in 4K Gaming Monitor; 34in Ultrawide Monitor',
 'ThinkPad Laptop; iPhone',
 'ThinkPad Laptop; Flatscreen TV',
 'Lightning Charging Cable; LG Washing Machine',
 'iPhone; Google Phone',
 '27in 4K Gaming Monitor; Google Phone',
 'ThinkPad Laptop; 27in FHD Monitor',
 'AAA Batteries (4-pack); Vareebadd Phone',
 'Macbook Pro Laptop; Google Phone',
 'AA Batteries (4-pack); LG Dryer',
 '27in 4K Gaming Monitor; ThinkPad Laptop',
 'ThinkPad Laptop; Bose SoundSport Headphones',
 '27in 4K Gaming Monitor; Flatscreen TV',
 'ThinkPad Laptop; Google Phone',
 'Google Phone; 27in FHD Monitor',
 '27in FHD Monitor; Flatscreen TV',
 'USB-C Charging Cable; AAA Batteries (4-pack); Google Phone',
 'AAA Batteries (4-pack); iPhone; Apple Airpods Headphones',
 '20in Monitor; 27in FHD Monitor',
 'Vareebadd Phone; Google Phone',
 'ThinkPad Laptop; 34in Ultrawide Monitor',
 'Google Phone; Flatscreen TV',
 'USB-C Charging Cable; Bose SoundSport Headphones; Google Phone; Wired Headphones',
 'USB-C Charging Cable; Bose SoundSport Headphones; Vareebadd Phone; Wired Headphones',
 'iPhone; Vareebadd Phone',
 'Macbook Pro Laptop; Flatscreen TV',
 'Vareebadd Phone; 34in Ultrawide Monitor',
 'AAA Batteries (4-pack); LG Washing Machine',
 '20in Monitor; Google Phone',
 'ThinkPad Laptop; Macbook Pro Laptop',
 'Bose SoundSport Headphones; Vareebadd Phone; Wired Headphones',
 'Lightning Charging Cable; Vareebadd Phone',
 'USB-C Charging Cable; AA Batteries (4-pack); Google Phone',
 'LG Dryer; Apple Airpods Headphones',
 'Lightning Charging Cable; LG Dryer',
 'LG Washing Machine; Wired Headphones',
 '34in Ultrawide Monitor; Google Phone',
 'Wired Headphones; iPhone; Lightning Charging Cable; Google Phone; Apple Airpods Headphones',
 '20in Monitor; Flatscreen TV',
 'AAA Batteries (4-pack); LG Dryer',
 'AA Batteries (4-pack); Lightning Charging Cable; iPhone; Wired Headphones',
 'USB-C Charging Cable; Apple Airpods Headphones; Google Phone; Wired Headphones',
 '20in Monitor; ThinkPad Laptop',
 'iPhone; Apple Airpods Headphones; Lightning Charging Cable; Wired Headphones',
 'Vareebadd Phone; 27in FHD Monitor',
 'USB-C Charging Cable; Lightning Charging Cable; iPhone; Wired Headphones',
 '27in 4K Gaming Monitor; 20in Monitor',
 'Bose SoundSport Headphones; LG Washing Machine',
 'USB-C Charging Cable; iPhone; Google Phone; Wired Headphones',
 '20in Monitor; LG Washing Machine',
 'Macbook Pro Laptop; LG Washing Machine',
 'LG Dryer; 27in FHD Monitor',
 '27in 4K Gaming Monitor; Vareebadd Phone',
 '20in Monitor; iPhone',
 '27in 4K Gaming Monitor; LG Dryer',
 'USB-C Charging Cable; Bose SoundSport Headphones; 34in Ultrawide Monitor; Google Phone',
 'iPhone; LG Washing Machine',
 'USB-C Charging Cable; AA Batteries (4-pack); Google Phone; Wired Headphones',
 'USB-C Charging Cable; Google Phone; 27in FHD Monitor; Wired Headphones',
 'Vareebadd Phone; Flatscreen TV',
 'Lightning Charging Cable; Apple Airpods Headphones; iPhone; Wired Headphones',
 'ThinkPad Laptop; Vareebadd Phone',
 'iPhone; Wired Headphones; Lightning Charging Cable; Apple Airpods Headphones',
 '20in Monitor; 34in Ultrawide Monitor',
 'USB-C Charging Cable; Vareebadd Phone; Apple Airpods Headphones',
 'AA Batteries (4-pack); Lightning Charging Cable; iPhone',
 'Bose SoundSport Headphones; iPhone; Apple Airpods Headphones',
 '27in 4K Gaming Monitor; iPhone; Lightning Charging Cable',
 'USB-C Charging Cable; Vareebadd Phone; Google Phone',
 'USB-C Charging Cable; Lightning Charging Cable; iPhone',
 'iPhone; Lightning Charging Cable; Flatscreen TV',
 'Bose SoundSport Headphones; Google Phone; Apple Airpods Headphones',
 'iPhone; Vareebadd Phone; Wired Headphones',
 'Bose SoundSport Headphones; Google Phone; 27in FHD Monitor',
 'USB-C Charging Cable; iPhone; Vareebadd Phone',
 'AA Batteries (4-pack); Google Phone; Wired Headphones',
 'Bose SoundSport Headphones; Lightning Charging Cable; Google Phone',
 'Lightning Charging Cable; Google Phone; iPhone',
 'USB-C Charging Cable; Lightning Charging Cable; Google Phone',
 'USB-C Charging Cable; Google Phone; 27in FHD Monitor',
 'Macbook Pro Laptop; Google Phone; Wired Headphones',
 'Apple Airpods Headphones; Vareebadd Phone; Wired Headphones',
 'Lightning Charging Cable; 34in Ultrawide Monitor; iPhone',
 '27in 4K Gaming Monitor; Vareebadd Phone; Wired Headphones',
 'Lightning Charging Cable; Vareebadd Phone; iPhone',
 'USB-C Charging Cable; Google Phone; Apple Airpods Headphones',
 'Bose SoundSport Headphones; Vareebadd Phone; Flatscreen TV',
 'iPhone; Lightning Charging Cable; AA Batteries (4-pack)',
 'USB-C Charging Cable; iPhone; Lightning Charging Cable',
 'LG Dryer; Wired Headphones',
 'LG Dryer; Flatscreen TV',
 'Google Phone; LG Dryer',
 'Apple Airpods Headphones; LG Washing Machine',
 'ThinkPad Laptop; LG Dryer',
 'Wired Headphones; LG Washing Machine',
 'USB-C Charging Cable; LG Dryer',
 'iPhone; 27in FHD Monitor',
 '34in Ultrawide Monitor; LG Washing Machine',
 'Google Phone; 34in Ultrawide Monitor',
 'Google Phone; LG Washing Machine',
 '27in 4K Gaming Monitor; LG Washing Machine',
 'AA Batteries (4-pack); LG Washing Machine',
 '27in FHD Monitor; LG Washing Machine',
 'Vareebadd Phone; LG Dryer']
In [89]:
df_clean.groupby('Product').sum()['Quantity Ordered'].sort_values(ascending=False)
Out[89]:
Product
AAA Batteries (4-pack)        30487
AA Batteries (4-pack)         27148
USB-C Charging Cable          23555
Lightning Charging Cable      22841
Wired Headphones              20201
Apple Airpods Headphones      15383
Bose SoundSport Headphones    13236
27in FHD Monitor               7429
iPhone                         6732
27in 4K Gaming Monitor         6134
34in Ultrawide Monitor         6076
Google Phone                   5440
Flatscreen TV                  4724
Macbook Pro Laptop             4645
20in Monitor                   4058
ThinkPad Laptop                4054
Vareebadd Phone                2023
LG Washing Machine              649
LG Dryer                        640
Name: Quantity Ordered, dtype: int64
In [90]:
def filter_number(pdu):
  return len(pdu.split(';')) > 1
In [91]:
pduit['name'] =  pduit.index
In [92]:
pduit['n'] = pduit.name.apply(filter_number)
In [93]:
pduit
Out[93]:
sales_product_together name n
Lightning Charging Cable 19632 Lightning Charging Cable False
AAA Batteries (4-pack) 19597 AAA Batteries (4-pack) False
USB-C Charging Cable 19563 USB-C Charging Cable False
AA Batteries (4-pack) 19537 AA Batteries (4-pack) False
Wired Headphones 16979 Wired Headphones False
... ... ... ...
Google Phone; LG Washing Machine 2 Google Phone; LG Washing Machine True
27in 4K Gaming Monitor; LG Washing Machine 2 27in 4K Gaming Monitor; LG Washing Machine True
AA Batteries (4-pack); LG Washing Machine 2 AA Batteries (4-pack); LG Washing Machine True
27in FHD Monitor; LG Washing Machine 2 27in FHD Monitor; LG Washing Machine True
Vareebadd Phone; LG Dryer 2 Vareebadd Phone; LG Dryer True

241 rows × 3 columns

Lightning Charging Cable; iPhone

In [99]:
df_clean.groupby('Product').sum()['Price Each'].sort_values()
Out[99]:
Product
AAA Batteries (4-pack)          60676.07
AA Batteries (4-pack)           77621.76
Wired Headphones               222438.48
USB-C Charging Cable           257199.85
Lightning Charging Cable       318554.60
LG Dryer                       384000.00
LG Washing Machine             389400.00
20in Monitor                   443479.68
Vareebadd Phone                808000.00
27in FHD Monitor              1107826.14
Bose SoundSport Headphones    1310768.91
Flatscreen TV                 1411500.00
Apple Airpods Headphones      2290950.00
34in Ultrawide Monitor        2301979.42
27in 4K Gaming Monitor        2387518.78
Google Phone                  3259800.00
ThinkPad Laptop               4051959.48
iPhone                        4707500.00
Macbook Pro Laptop            7889700.00
Name: Price Each, dtype: float64
In [ ]: