Most Useful Pandas Functions

In this article, we will learn about different pandas functions. If you are new to pandas then you can follow this blog for a better understanding – Pandas Introduction

Python’s Pandas library is the most widely used library in Python. Pandas is one of the most useful libraries In Python for Data Analysis, Data Science, Data manipulation, and Machine Learning. Pandas is a predominantly used python data analysis library. It provides many functions and methods to expedite the data analysis process.

Let’s Start to Learn the most important Pandas functions with examples.

1)  read_csv & read_excel

These functions are useful for reading CSV or EXCEL files. You can easily read the CSV file using the read_csv function and read the EXCEL file using the read_csv function. here I’m using a Jupyter Notebook for coding, If you are not familiar with the Jupyter notebook then you can follow this Jupyter Notebook

import pandas as pd
# Read CSV File using read_scv
pd.read_csv('data.csv')

# Read Excel File using read_excel
pd.read_excel('CarData.csv')

2) df.head & df.tail

These functions are useful for showing the top of some data and the last of some data in your all dataset. These functions are very useful if you have to work on a big dataset.

import pandas as pd
df = pd.read_csv('data.csv')

# Top of Five data display using below command
df.head(5)
# Bottom of Five data display using below command
df.tail(5)

3) df.columns

When you have a big dataset like that it can be hard to see all the columns. using the df.columns function, you can print out all the columns of the dataset.

import pandas as pd 
df = pd.read_csv('data.csv')
df.columns

#OUTPUT
Index(['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration',
       'num-of-doors', 'body-style', 'drive-wheels', 'engine-location',
       'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type',
       'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke',
       'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg',
       'highway-mpg', 'price'],
      dtype='object')

4) df.drop

You can drop some unnecessary columns using df.drop(). In this dataset, we have so many columns we are not going to use all of them for this tutorial. So, we can easily drop some:

df = df.drop(columns=['symboling', 'normalized-losses', 'make'])

5) .len

Provides with the length of the DataFrame

len(df) #Output 205

6) df.describe()

This function is used to generate descriptive statistics of the data in a Pandas DataFrame or Series. It summarizes central tendency and dispersion of the dataset

7) loc() and iloc()

loc() is label-based, which means that you have to specify rows and columns based on their row and column labels.

iloc() is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position)

8) drop_duplicates()

An important part of Data analysis is analyzing Duplicate Values and removing them. Pandas drop_duplicates() method helps in removing duplicates from the data frame.

  • subset: column label or sequence of labels to consider for identifying duplicate rows. By default, all the columns are used to find the duplicate rows.
  • keep: allowed values are (‘first’, ‘last’, False), default ‘first’. If ‘first’, duplicate rows except the first one is deleted. If ‘last’, duplicate rows except the last one is deleted. If False, all the duplicate rows are deleted.
  • inplace: if True, the source DataFrame is changed and None is returned. By default, the source DataFrame remains unchanged and a new DataFrame instance is returned.

9) nlargest and nsmallest

This gives you the dataset with n number of largest values or smallest values of a specified variable (Column name).

import pandas as pd 

df = pd.read_csv('data.csv')
x = df.nsmallest(5, "wheel-base")
y = df.nlargest(5, "wheel-base")
display(x)
display(y)

10) df.rename()

One way of renaming the columns in a Pandas dataframe is by using the rename() function. This method is quite useful when we need to rename some selected columns because we need to specify information only for the columns which are to be renamed.

I hope this article helps you and you will like it.

Please give your valuable feedback and if you have any questions or issues about this article, please let me know.

Submit a Comment

Your email address will not be published. Required fields are marked *

Subscribe

Select Categories