Data Visualization¶
When we have thousands of sampled numerical data, it makes no sence
without classifying them and analyzing them. Many Statistical tools are
available to classify the data in Python. pandas
is one such
library. After classifying the data, it is useful to visualize the
classified data. Visualization can result in greater understanding of
Data, such as Corelation and so on. matplotlib
is one of the famous,
easy-to-use library for data visualization
Standard Import statement¶
In matplotlib
, we won’t use entire library. We just use a part of
library which is dedicated for plotting data. In further discussions
related about matplotlib
, we assume that the reader has imported the
library in following manner
In [1]:
import matplotlib.pyplot as plt
Our First Graph - A Parabola¶
is the equation of standard parabola. We sample some
values and calculate the square of them. Then we plot a graph
of
versus
to obtain the parabola
In [2]:
import numpy as np
In [3]:
x = np.arange(50) # 0..19
y = x**2
In [4]:
plt.plot(x,y) # First argument is x data, second data is y data
# plt.show() , If in Python Script
Out[4]:
[<matplotlib.lines.Line2D at 0x7fda54f14ba8>]
Note
If not using a Interactive Notebook or IPython
shell, then issue a
plt.show()
to see the plot
Also see how matplotlib
converted set of points to represent a
parabola by interpolation
Customizing the Graph - Changing its type and color¶
When representing various data in graph, different style must be used to
distinguish between the data sets. In this section, we will see how to
manipulate the line style and color. Following are the named arguments
that are sent to plot()
functions.
linestyle = value
¶
can be used to change line style.
We shall see the inbuilt lineStyles
dict to see what are the
possible styles for value
{'-': '_draw_solid', '--': '_draw_dashed', '-.': '_draw_dash_dot', ':': '_draw_dotted', 'None': '_draw_nothing', ' ': '_draw_nothing', '': '_draw_nothing'}
color = value
¶
can be used to change color of line. value
can be one of
- b: blue
- g: green
- r: red
- c: cyan
- m: magenta
- y: yellow
- k: black
- w: white
alpha = value
¶
- value determines the visibility of plot. It is a
floating point number between 0 and 1.
implies that
the plot is not visible.
implies that the plot is
completely visible
Plotting multiple graphs on same axis¶
Many times, it is required to plot many datasets on same axis, so that
we can compare them. MatPlotLib makes it possibe in a simple way. One
can achieve this by issuing plotting commands successively and finally
issuing a show()
.
An All-in-One example¶
Let’s examine all these things by plotting
and
in a single plot. Instead of using
np.arange()
for data, We shall use the
np.linspace()
method
np.linspace(start, stop, num=50)
¶
Return evenly spaced numbers over a specified interval.
Returns num evenly spaced samples, calculated over the interval
[start, stop]
.
The endpoint of the interval can optionally be excluded.
Parameters:¶
start : scalar
: The starting value of the sequence.stop : scalar
: The end value of the sequence, unless endpoint is set to False. In that case, the sequence consists of all but the last of num + 1 evenly spaced samples, so that stop is excluded. Note that the step size changes when endpoint is False.num : int
, optional : Number of samples to generate. Default is 50. Must be non-negative. endpoint : bool, optional
Returns:¶
samples : ndarray
: There are num equally spaced samples in the closed interval [start, stop] or the half-open interval [start, stop) (depending on whether endpoint is True or False).
In [5]:
f1 = lambda x: 1/x
f2 = lambda x: np.sin(x)
f3 = lambda x: np.cos(2 * x)
f4 = lambda x: 2 * np.sin(2 * x)
x = np.linspace(-4 * np.pi, 4 * np.pi , 200)
p1 = plt.plot(x,f1(x), color = 'r', alpha = 0.5)
plt.plot(x,f2(x), color = 'g', alpha = 0.8)
plt.plot(x,f3(x), color = 'b', alpha = 0.6)
plt.plot(x,f4(x), color = 'y', alpha = 0.6)
# plt.show() # If using in Python Script
Out[5]:
[<matplotlib.lines.Line2D at 0x7fda54bc49b0>]
Subplots¶
In many cases, we want the opposite of what we have just discussed. We want to plot the data sets in different subplots. MatplotLib has many ways to obtain the subplots of given plot. Here we will just discuss one of them.
plt.subplot(nrows,ncols,active)
creates the subplots with shape , and selects a
subplot for plotting specified on
active
. active
is a 1
based index for selecting subplot. It selects subplots in row-wise
order.
Adding Title¶
Adding title to subplot can be achieved via
plt.title('label')
Adding title to Super plot can be achieved by
plt.suptitle('label')
An example¶
In the below example, Let’s see all of the things discussed in action
In [6]:
functions = [ lambda x: 1/x, lambda x: np.sin(x), lambda x: np.cos(2 * x), lambda x: 2 * np.sin(2 * x) ]
lables = [r'$y = \frac{1}{x}$' , '$y = sin(x)$', '$y = cos(2x)$', '$y = 2 sin(2x)$' ]
x = np.linspace(-4 * np.pi, 4 * np.pi , 200)
plt.suptitle('Some curves in $xy$ plane')
for i,(function,label) in enumerate(zip(functions,lables),start = 1):
# zip() combines 2 iterables as list of tuples
# enumerate() enumerated the zip here
# enumerate returns an iterator through (count,value) tuples
# but value is iteself is a tuple of (funciton,label) here
# So we have to catch a tuple (count,(function,lablel))
plt.subplot(2,2,i)
plt.plot(x, function(x))
plt.title(label)
plt.tight_layout(h_pad=3) # Exclude this and see what happens
# plt.show() # if using in script
Plotting irregular data - Scatter and Bar Plots¶
Some data shows irregular pattern, due to which they can’t be
interpolated. When plotting such data, MatplotLib behaves crazily. In
this situation, we have to use some other plotting method other than
plot()
. Before exploring other methods, Let’s see a situation where
ordinary plotting doesn’t work.
In [7]:
arr = np.linspace(-10,10)
x = np.copy(arr) # If you use x = arr, their reference will be copied
np.random.shuffle(arr)
plt.plot(x,arr)
# plt.show() # if using in Python Script
Out[7]:
[<matplotlib.lines.Line2D at 0x7fda540f7eb8>]
Above image does not seem to be like a plot of some Polynomial or Other function. In fact, We will not treat them as plot of some function. They are just data.
To visualize this kind of data, Scatter and Bar plots can be used
Scatter Plot¶
Scatter plot only plots the sample points, instead of interpolation and
drawing lines between them. It takes the same arguments as that of
plot()
. Let’s see one
In [8]:
plt.scatter(x, arr, color='b',alpha = 0.6)
Out[8]:
<matplotlib.collections.PathCollection at 0x7fda540dec50>
Note how we changed the color and alpha of plot.
Bar Plot¶
Bar plot visualizes the data as bars, whose height is proportional to the magnitude of data. Let’s plot the same data as bar chart and understand it’s customization.
In [9]:
plt.bar(x, arr, alpha = 0.6, edgecolor='k')
Out[9]:
<Container object of 50 artists>
Note how rectangle edges are visible with black color. Overlapping
rectangles are also visible with
Visualizing 2D Data - Matrix¶
A matrix can be interpreted as values of a function where
and
are indices of matrix. Now
can be
visualized as a surface over
. This requires switching
to 3D co-ordinates. Instead of doing that, one can visualize the same in
2D plane by mapping the each value to a colormap. In MatplotLib, we can
do this by
imshow()
and matshow()
In [10]:
data = np.arange(100)
np.random.shuffle(data)
data.shape = (10,10)
plt.matshow(data)
Out[10]:
<matplotlib.image.AxesImage at 0x7fda54bfa400>
To know what color means what value, one can enable the colorbar
In [11]:
plt.matshow(data)
plt.colorbar()
Out[11]:
<matplotlib.colorbar.Colorbar at 0x7fda54b9ddd8>
Let’s experiment with some large data
In [12]:
n = 1000
data = np.arange(n**2)
np.random.shuffle(data)
data.shape = (n,n)
plt.matshow(data)
plt.colorbar()
Out[12]:
<matplotlib.colorbar.Colorbar at 0x7fda4efbf828>
It looks like above plot is like a random image. In fact, images are
also matrices. Different file formats like jpeg
,png
and
tiff
store the matrix and associated data in different ways.
Consider an image with resolution 1900 * 1600
- Its data is a matrix with shape
(1900 , 1600)
- If it is a color image, Each element of matrix is either a value, 3-tuple or 4-tuple based on it’s color scheme
- If image is monochromatic, each element of matrix is value. 0 representing white, 255 representing black
- If color scheme of image is RGB, each element of matrix is
(Red,Green,Black)
tuple with each element ranging from 0 to 256 - If color scheme of image is CMYK, each element of matrix is
(Cyan,Magenta,Yellow,blacK)
tuple with each element ranging from 0 to 256
Since image is a matrix, any operation on matrix is a operation on image. It is the basis of how Photo Editing Softwares work. It is also the fundamental of a field of Computer Science known as Image Processing
Going Further¶
In this tutorial, we have seen just the fundamentals of Data
Visualizations using matplotlib
. There are many more kinds of plots,
one can even animate the plots. Interested reader can refer Official
Tutorial