Pandas Intro (short)
B.Sc course, University of Debrecen, Department of Data Science and Visualization, 2024
Basics
Pandas
# !pip install pandas
import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/titanic_train.csv")
df.head()
df.head(2)
df.tail()
df.info()
df.describe()
df.to_numpy()
Extracting information
df["Name"].head()
df[0:5]
df.loc[:5, ["Name"]]
type(df.loc[:5, ["Name"]])
df.loc[:5, "Name"]
type(df.loc[:5, "Name"])
df.iloc[:5]
df.iloc[4]
type(df.iloc[4])
df.iloc[:3, -2:]
df[df["Pclass"] == 3]
df["Pclass"] == 3
df.head(1)
Setting values
df.at[0, "Age"]
df.at[0, "Age"] = 10
df.head(1)
df.iat[0, 2] = 1
df.head(1)
Beware of referencing a dataframe!
df2 = df
df2.iat[0, 2] = 0
df2.head(1)
df.head(1)
df.iat[0, 2] = 1
df2.head(1)
df.head(1)
Use the .copy() function to solve this problem!
df2 = df.copy()
df2.iat[0, 2] = 0
df2.head(1)
df.head(1)
df.iat[0, 2] = 2
df.head(1)
df2.head(1)
df.iat[0, 2] = 1
df.describe()
df["Survived"].value_counts()
df["Pclass"].value_counts()
df["PassengerId"].plot()
df["Survived"].value_counts().head(5).plot()
df["Survived"].value_counts().head(5).plot(kind="bar")
df["Pclass"].value_counts().plot(kind="bar")