跳到主要內容

Python data visualizations

This notebook demos Python data visualizations on the Iris dataset

This Python 3 environment comes with many helpful analytics libraries installed. It is defined by the kaggle/python docker image
We'll use three libraries for this tutorial: pandasmatplotlib, and seaborn.
Press "Fork" at the top-right of this screen to run this notebook yourself and build each of the examples.
In [1]:
# First, we'll import pandas, a data processing and CSV file I/O library
import pandas as pd

# We'll also import seaborn, a Python graphing library
import warnings # current version of seaborn generates a bunch of warnings that we'll ignore
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", color_codes=True)

# Next, we'll load the Iris flower dataset, which is in the "../input/" directory
iris = pd.read_csv("../input/Iris.csv") # the iris dataset is now a Pandas DataFrame

# Let's see what's in the iris data - Jupyter notebooks print the result of the last thing you do
iris.head()

# Press shift+enter to execute this cell
Out[1]:
IdSepalLengthCmSepalWidthCmPetalLengthCmPetalWidthCmSpecies
015.13.51.40.2Iris-setosa
124.93.01.40.2Iris-setosa
234.73.21.30.2Iris-setosa
344.63.11.50.2Iris-setosa
455.03.61.40.2Iris-setosa
In [2]:
# Let's see how many examples we have of each species
iris["Species"].value_counts()
Out[2]:
Iris-virginica     50
Iris-versicolor    50
Iris-setosa        50
Name: Species, dtype: int64
In [3]:
# The first way we can plot things is using the .plot extension from Pandas dataframes
# We'll use this to make a scatterplot of the Iris features.
iris.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm")
Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f87d034b0f0>
In [4]:
# We can also use the seaborn library to make a similar plot
# A seaborn jointplot shows bivariate scatterplots and univariate histograms in the same figure
sns.jointplot(x="SepalLengthCm", y="SepalWidthCm", data=iris, size=5)
Out[4]:
<seaborn.axisgrid.JointGrid at 0x7f87d02f0630>
In [5]:
# One piece of information missing in the plots above is what species each plant is
# We'll use seaborn's FacetGrid to color the scatterplot by species
sns.FacetGrid(iris, hue="Species", size=5) \
   .map(plt.scatter, "SepalLengthCm", "SepalWidthCm") \
   .add_legend()
Out[5]:
<seaborn.axisgrid.FacetGrid at 0x7f87c6a86eb8>
In [6]:
# We can look at an individual feature in Seaborn through a boxplot
sns.boxplot(x="Species", y="PetalLengthCm", data=iris)
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f87c68072e8>
In [7]:
# One way we can extend this plot is adding a layer of individual points on top of
# it through Seaborn's striplot
# 
# We'll use jitter=True so that all the points don't fall in single vertical lines
# above the species
#
# Saving the resulting axes as ax each time causes the resulting plot to be shown
# on top of the previous axes
ax = sns.boxplot(x="Species", y="PetalLengthCm", data=iris)
ax = sns.stripplot(x="Species", y="PetalLengthCm", data=iris, jitter=True, edgecolor="gray")
In [8]:
# A violin plot combines the benefits of the previous two plots and simplifies them
# Denser regions of the data are fatter, and sparser thiner in a violin plot
sns.violinplot(x="Species", y="PetalLengthCm", data=iris, size=6)
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f87c6626668>
In [9]:
# A final seaborn plot useful for looking at univariate relations is the kdeplot,
# which creates and visualizes a kernel density estimate of the underlying feature
sns.FacetGrid(iris, hue="Species", size=6) \
   .map(sns.kdeplot, "PetalLengthCm") \
   .add_legend()
Out[9]:
<seaborn.axisgrid.FacetGrid at 0x7f87c3482898>
In [10]:
# Another useful seaborn plot is the pairplot, which shows the bivariate relation
# between each pair of features
# 
# From the pairplot, we'll see that the Iris-setosa species is separataed from the other
# two across all feature combinations
sns.pairplot(iris.drop("Id", axis=1), hue="Species", size=3)
Out[10]:
<seaborn.axisgrid.PairGrid at 0x7f87c3325588>
In [11]:
# The diagonal elements in a pairplot show the histogram by default
# We can update these elements to show other things, such as a kde
sns.pairplot(iris.drop("Id", axis=1), hue="Species", size=3, diag_kind="kde")
Out[11]:
<seaborn.axisgrid.PairGrid at 0x7f87c24c0e10>
In [12]:
# Now that we've covered seaborn, let's go back to some of the ones we can make with Pandas
# We can quickly make a boxplot with Pandas on each feature split out by species
iris.drop("Id", axis=1).boxplot(by="Species", figsize=(12, 6))
Out[12]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f87c1fbd780>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f87bb3b9da0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x7f87bb039048>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f87bafb4278>]], dtype=object)
In [13]:
# One cool more sophisticated technique pandas has available is called Andrews Curves
# Andrews Curves involve using attributes of samples as coefficients for Fourier series
# and then plotting these
from pandas.tools.plotting import andrews_curves
andrews_curves(iris.drop("Id", axis=1), "Species")
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f87baf6f198>
In [14]:
# Another multivariate visualization technique pandas has is parallel_coordinates
# Parallel coordinates plots each feature on a separate column & then draws lines
# connecting the features for each data sample
from pandas.tools.plotting import parallel_coordinates
parallel_coordinates(iris.drop("Id", axis=1), "Species")
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f87ba94ec88>
In [15]:
# A final multivariate visualization technique pandas has is radviz
# Which puts each feature as a point on a 2D plane, and then simulates
# having each sample attached to those points through a spring weighted
# by the relative value for that feature
from pandas.tools.plotting import radviz
radviz(iris.drop("Id", axis=1), "Species")
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f87ba6b84a8>

Wrapping Up

I hope you enjoyed this quick introduction to some of the quick, simple data visualizations you can create with pandas, seaborn, and matplotlib in Python!
I encourage you to run through these examples yourself, tweaking them and seeing what happens. From there, you can try applying these methods to a new dataset and incorprating them into your own workflow!
See Kaggle Datasets for other datasets to try visualizing. The World Food Facts data is an especially rich one for visualization.

留言

這個網誌中的熱門文章

2017通訊大賽「聯發科技物聯網開發競賽」決賽團隊29強出爐!作品都在11月24日頒獎典禮進行展示

2017通訊大賽「聯發科技物聯網開發競賽」決賽團隊29強出爐!作品都在11月24日頒獎典禮進行展示 LIS   發表於 2017年11月16日 10:31   收藏此文 2017通訊大賽「聯發科技物聯網開發競賽」決賽於11月4日在台北文創大樓舉行,共有29個隊伍進入決賽,角逐最後的大獎,並於11月24日進行頒獎,現場會有全部進入決賽團隊的展示攤位,總計約為100個,各種創意作品琳琅滿目,非常值得一看,這次錯過就要等一年。 「聯發科技物聯網開發競賽」決賽持續一整天,每個團隊都有15分鐘面對評審團做簡報與展示,並接受評審們的詢問。在所有團隊完成簡報與展示後,主辦單位便統計所有評審的分數,並由評審們進行審慎的討論,決定冠亞季軍及其他各獎項得主,結果將於11月24日的「2017通訊大賽頒獎典禮暨成果展」現場公佈並頒獎。 在「2017通訊大賽頒獎典禮暨成果展」現場,所有入圍決賽的團隊會設置攤位,總計約為100個,展示他們辛苦研發並實作的作品,無論是想觀摩別人的成品、了解物聯網應用有那些新的創意、尋找投資標的、尋找人才、尋求合作機會或是單純有興趣,都很適合花點時間到現場看看。 頒獎典禮暨成果展資訊如下: 日期:2017年11月24日(星期五) 地點:中油大樓國光廳(台北市信義區松仁路3號) 我要報名參加「2017通訊大賽頒獎典禮暨成果展」>>> 在參加「2017通訊大賽頒獎典禮暨成果展」之前,可以先在本文觀看各團隊的作品介紹。 決賽29強團隊如下: 長者安全救星 可隨意描繪或書寫之電子筆記系統 微觀天下 體適能訓練管理裝置 肌少症之行走速率檢測系統 Sugar Robot 賽亞人的飛機維修輔助器 iTemp你的溫度個人化管家 語音行動冰箱 MR模擬飛行 智慧防盜自行車 跨平台X-Y視覺馬達控制 Ironmet 菸消雲散 無人小艇 (Mini-USV) 救OK-緊急救援小幫手 穿戴式長照輔助系統 應用於教育之模組機器人教具 這味兒很台味 Aquarium Hub 發展遲緩兒童之擴增實境學習系統 蚊房四寶 車輛相控陣列聲納環境偵測系統 戶外團隊運動管理裝置 懷舊治療數位桌曆 SeeM智能眼罩 觸...
opencv4nodejs Asynchronous OpenCV 3.x Binding for node.js   122     2715     414   0   0 Author Contributors Repository https://github.com/justadudewhohacks/opencv4nodejs Wiki Page https://github.com/justadudewhohacks/opencv4nodejs/wiki Last Commit Mar. 8, 2019 Created Aug. 20, 2017 opencv4nodejs           By its nature, JavaScript lacks the performance to implement Computer Vision tasks efficiently. Therefore this package brings the performance of the native OpenCV library to your Node.js application. This project targets OpenCV 3 and provides an asynchronous as well as an synchronous API. The ultimate goal of this project is to provide a comprehensive collection of Node.js bindings to the API of OpenCV and the OpenCV-contrib modules. An overview of available bindings can be found in the  API Documentation . Furthermore, contribution is highly appreciated....
2019全台精選3+個燈會,週邊順遊景點懶人包 2019燈會要去哪裡看?全台精選3+個燈會介紹、週邊順遊景點整理給你。 東港小鎮燈區-鮪鮪到來。 2019-02-15 微笑台灣編輯室 全台灣 各縣市政府 1435 延伸閱讀 ►  元宵節不只看燈會!全台元宵祭典精選、順遊景點整理 [屏東]2019台灣燈會在屏東 2/9-3/3:屏東市 · 東港鎮 · 大鵬灣國家風景區 台灣燈會自1990年起開始辦理,至2019年邁入第30週年,也是首次在屏東舉辦,屏東縣政府與交通部觀光局導入創新、科技元素,融入在地特色文化設計,在東港大鵬灣國家風景區打造廣闊的海洋灣域燈區,東港鎮結合漁港及宗教文化的小鎮燈區,及屏東市綿延近5公里長的綵燈節河岸燈區,讓屏東成為璀璨的光之南國,迎向國際。 詳細介紹 ►  2019台灣燈會在屏東 第一次移師國境之南 大鵬灣燈區 主題樂園式燈會也是主燈所在區,區內分為農業海洋燈區、客家燈區、原住民燈區、綠能環保燈區、藝術燈區、宗教燈區、競賽花燈及317個社區關懷據點手作的萬歲光廊等。 客家燈籠隧道。 平日:周一~周四14:00-22:30(熄燈) 假日:周五~周六10:00-22:30(熄燈)  屏東燈區: 萬年溪畔 屏東綵燈節藍區-生態。 綵燈節--每日17:30 - 22:00(熄燈) 勝利星村--平日:14:00 - 22:30(熄燈) 假日:10:00 - 22:30(熄燈) 燈區以「彩虹」為主題,沿著蜿蜒市區的萬年溪打造近5公里長的光之流域,50組水上、音樂及互動科技等不同類型燈飾,呈現紅色熱情、橙色活力、黃色甜美、綠色雄偉、藍色壯闊、靛色神祕、紫色華麗等屏東風情。勝利星村另有懷舊風的燈飾,及屏東公園聖誕節燈飾。 東港小鎮燈區 東港小鎮燈區-鮪鮪到來。 小鎮燈區以海的屏東為主題,用漁港風情及宗教文化內涵規劃4個主題區,分別為張燈結綵趣、東津好風情、神遊幸福海、延平老街區。每日17:00~22:30(熄燈) 以上台灣燈會資料來源: 2019台灣燈會官網 、 i屏東~愛屏東 。 >> 順遊行程 小吃旅行-東港小鎮 東港小吃和東港人一樣,熱情澎湃...