Pandas
Data Exploration
Paint cell by value:
It will look like :
DataFrame info (e.g. memory) :
Such as memory used
all_data.info()
"""e.g.
<class 'pandas.core.frame.DataFrame'>
Int64Index: 12970 entries, 0 to 4276
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 12970 non-null object
1 HomePlanet 12682 non-null object
.......
dtypes: float64(6), object(8)
memory usage: 1.5+ MB
"""
Display null value count :
Display some statistic summary of data :
Display unique values in a column :
Display counting of unique values in a column :
all_data['HomePlanet'].value_counts()
"""e.g.
Earth 6865
Europa 3133
Mars 2684
Name: HomePlanet, dtype: int64
"""
The number of unique values in each column :
all_data.nunique(
# dropna=False
)
"""e.g.
PassengerId 12970
HomePlanet 3
CryoSleep 2
Cabin 9825
Destination 3
Age 80
VIP 2
RoomService 1578
FoodCourt 1953
ShoppingMall 1367
Spa 1679
VRDeck 1642
Name 12629
Transported 2
dtype: int64
"""
Display some unique values in each column :
Note: only show the columns with < 20 unique values
all_uniq = { i: all_data[i].unique() for i in all_data.columns if len(all_data[i].unique()) < 20 }
all_uniq
Helper func - display multiple DataFrame together
from IPython.core.display import HTML
def multi_table_vertical(lt) :
return HTML(" <hr> ".join(table._repr_html_() for table in lt))
# Usage:
multi_table_vertical([all_st, train_st, test_st])
Data Transform
To list of dict :
df.to_dict('records')
"""
[{'customer': 1L, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'},
{'customer': 2L, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'},
{'customer': 3L, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]
"""