| Author: | Dave Kuhlman |
|---|---|
| Address: | dkuhlman@rexx.com http://www.rexx.com/~dkuhlman |
| Revision: | 1.0a |
| Date: | June 23, 2006 |
| Copyright: | Copyright (c) 2005 Dave Kuhlman. All Rights Reserved. This software is subject to the provisions of the MIT License http://www.opensource.org/licenses/mit-license.php. |
Abstract
This document provides an outline for a course on NumPy/SciPy. PyTables and Matplotlib are also discussed.
SciPy is both (1) a way to handle large arrays of numerical data in Python and (2) a way to apply scientific, statistical, and mathematical operations to those arrays of data. When combined with a package such as PyTables, if is also capable of storing and retrieving large arrays of data in an efficient way. Since much of it's calculations are done in C extension modules, SciPy can be quite fast.
See the instructions at the SciPy Web site:
Below, we look at two ways to install SciPy:
If you have installed previous versions, it is recommended that you remove the old versions from your Python site-packages directory first.
Method 1 -- Use this version for the newest distribution. Note that the version numbers of the latest versions may have changed by the time you read this. The new SciPy is composed of two projects: (1) numpy and (2) scipy.
Download and install NumPy.
A link to NumPy is available at: http://www.scipy.org/
After extracting it, use the following to build and install it:
$ cd numpy-?.?.? $ python setup.py build $ python setup.py install # as root
Download and install SciPy.
SciPy is available at: http://new.scipy.org/Wiki/Download
After extracting it, use the following to build and install it:
$ cd scipy-?.?.? $ python setup.py build $ python setup.py install # as root
Method 2 -- You can also install from SVN:
Install NumPy:
$ svn co http://svn.scipy.org/svn/numpy/trunk numpy $ cd numpy $ rm -rf build $ python setup.py build $ sudo python setup.py install # or without sudo as root
Install SciPy:
$ svn co http://svn.scipy.org/svn/scipy/trunk scipy $ cd scipy $ rm -rf build $ python setup.py build $ sudo python setup.py install # or without sudo as root
There are binary installers for MS Windows. You can find them at: http://www.scipy.org/Download.
Suggestion: Use IPython for your interactive Python shell. IPython comes with a SciPy profile. You can run it with:
$ ipython -p scipy
Doing so automatically loads SciPy.
In IPython, get help on SciPy modules, classes, functions, and methods with the help built-in function. Or, with IPython, use the ? operator and the pdoc magic command. Examples:
In [6]:help(scipy.io.read_array)
...
In [7]:scipy.io.read_array?
...
In [8]: %pdoc stats.norm.pdf
Probability density function at x of the given RV.
...
To see the contents of modules, use dir(). Example:
In [24]:dir(scipy.io)
Or, if you have loaded SciPy by doing ipython -p scipy, you can do:
In [9]: dir(io)
A good introduction to arrays is at Numerical Python. In particular, in that document, see 5. Array Basics and 8. Array Functions.
Arrays are simple. An example:
$ ipython In [1]:import scipy In [2]:a1 = scipy.array([1, 2, 3, 4,]) In [3]:a2 = scipy.array([4, 3, 2, 1,]) In [4]:print a1 [1 2 3 4] In [5]:a3 = a1 * a2 In [6]:print a3 [4 6 6 4] o o o In [41]: a1 = scipy.zeros((4,5)) In [42]: print a1 [[0 0 0 0 0] [0 0 0 0 0] [0 0 0 0 0] [0 0 0 0 0]] In [43]: a2 = scipy.empty((4,5)) In [44]: print a2 [[-1209828888 -1209828888 14 3 24] [ 24 6 6 6 6] [ 6 6 6 139519736 64] [ 9 139519712 11 12 139519680]] In [45]: a3 = scipy.zeros((4,5), dtype='f') In [46]: print a3 [[ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.]]
To index into multi-dimension arrays, use either of the following:
In [37]:a2 = zeros((4,3),dtype='f') In [38]:a2 Out[38]:NumPy array, format: long [[ 0. 0. 0.] [ 0. 0. 0.] [ 0. 0. 0.] [ 0. 0. 0.]] In [39]:a2[3,0] = 5. In [40]:a2[2][1] = 6. In [41]:a2 Out[41]:NumPy array, format: long [[ 0. 0. 0.] [ 0. 0. 0.] [ 0. 6. 0.] [ 5. 0. 0.]]
But, indexing into a complex array seems a little counter intuitive:
In [31]: aa = zeros((5,4), dtype=complex64)
In [32]: aa
Out[32]:
array([[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]], dtype=complex64)
In [33]: aa.real[0,0] = 1.0
In [34]: aa.imag[0,0] = 2.0
In [35]: aa
Out[35]:
array([[ 1.+2.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j],
[ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j]], dtype=complex64)
Note that we use this:
aa.real[0,0] = 1.0 aa.imag[0,0] = 2.0
and not this:
aa[0,0].real = 1.0 # wrong aa[0,0].imag = 2.0 # wrong
Package base has array helper functions. Examples:
import scipy
def test():
a1 = scipy.arange(5, 10)
print a1
a2 = scipy.zeros((4,5), dtype='f')
print a2
test()
Prints the following:
[5 6 7 8 9] [[ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.]]
For help, use something like the following:
help(scipy) help(scipy.zeros)
Or in IPython:
scipy? scipy.zeros?
You can also "reshape" and transpose arrays:
In [47]: a1 = arange(12)
In [48]: a1
Out[48]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [49]: a2 = a1.reshape(3,4)
In [50]: a2
Out[50]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [51]: a3 = a2.transpose()
In [52]: a3
Out[52]:
array([[ 0, 4, 8],
[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11]])
And, you can get the "shape" of an array:
In [53]: a1.shape Out[53]: (12,) In [54]: a2.shape Out[54]: (3, 4) In [55]: a3.shape Out[55]: (4, 3)
You can "vectorize" a function. Doing so turns a function that takes a scalar as an argument into one when can process a vector. For example:
In [9]:def t(x): ....: return x + 3 In [10]:a1 = scipy.zeros((5,4)) In [11]:a1 Out[11]:NumPy array, format: long [[0 0 0 0] [0 0 0 0] [0 0 0 0] [0 0 0 0] [0 0 0 0]] In [12]:s = scipy.vectorize(t) In [13]:a2 = s(a1) In [14]:a2 Out[14]:NumPy array, format: long [[3 3 3 3] [3 3 3 3] [3 3 3 3] [3 3 3 3] [3 3 3 3]]
The array interface is a specification for a developer who wishes to implement a replacement for the implementation of arrays, e.g. those used in scipy.
The array protocol is the way in which, for example, a scipy user uses arrays. It includes such things as:
You should be aware of the difference between (1) a1[3,4] and (2) a1[3][4]. Both work. However, the second results in two calls to the __getitem__ method.
At times you may need to convert an array from one type to another, for example from a numpy array to a scipy array or the reverse. The array protocol will help. In particular, the asarray() function can convert an array without copying. Examples:
In [8]: import numpy In [9]: import scipy In [10]: a1 = zeros((4,6)) In [11]: type(a1) Out[11]: <type 'scipy.ndarray'> In [12]: a2 = numpy.asarray(a1) In [13]: type(a2) Out[13]: <type 'numpy.ndarray'> In [14]: a3 = numpy.zeros((3,5)) In [15]: type(a3) Out[15]: <type 'numpy.ndarray'> In [16]: a4 = scipy.asarray(a3) In [17]: type(a4) Out[17]: <type 'scipy.ndarray'>
SciPy has its own input/output capabilities. They are in module scipy.io. Here is a simple example:
import scipy
def test_io():
scipyArray1 = scipy.array([[1.0,2.0],[3.0,4.0],[5.0,6.0]])
outFile = file('tmpdata1.txt', 'w')
scipy.io.write_array(outFile, scipyArray1)
outFile.close()
inFile = file('tmpdata1.txt', 'r')
scipyArray2 = scipy.io.read_array(inFile)
print 'type(scipyArray2):', type(scipyArray2)
print 'scipyArray2:\n', scipyArray2
test_io()
prints the following:
type(scipyArray2): <type 'scipy.ndarray'> scipyArray2: [[ 1. 2.] [ 3. 4.] [ 5. 6.]]
io.write_array() and io.read_array() become very slow when applied to large arrays. PyTables scales much better.
PyTables writes and reads HDF5 files. It supports the ability to save and retrieve SciPy arrays into HDF5 files. Multiple arrays and separate data sets can be organized in nested groups (analogous to folders or directories).
You can learn more about PyTables at PyTables -- Hierarchical Datasets in Python.
Obtain PyTables from PyTables -- Hierarchical Datasets in Python.
For MS Windows, there are binary executable installers.
For Linux, install PyTables with something like the following (depending on the version):
$ tar xvzf orig/pytables-1.3.2.tar.gz $ cd pytables-1.3.2/ $ python setup.py build_ext --inplace $ sudo python setup.py install
When installing from source, there are possible problems with Pyrex (possibly in combination with Python 2.4). If you try installing PyTables before these problems are fixed and get errors while building and installing, take a look at the fixes suggested in the following messages:
There is extensive documentation in the PyTables source distribution. See: pytables-?.?.?/doc/html/usersguide.html.
The source distribution also contains a number of examples. See: pytables-?.?.?/examples.
You can also find user documentation at the PyTables Web site. See PyTables User's Guide: http://www.pytables.org/docs/manual/. Of particular interest are:
From PyTables 1.3 on, PyTables supports NumPy (and hence SciPy) arrays right out of the box in Array objects. So, if you write a NumPy array, you will get a NumPy array back, and the same goes for Numeric and numarray arrays. In other objects (EArray, VLArray or Table) you can make use of the 'flavor' parameter in constructors to tell PyTables: "Hey, every time that I read from this object, please, return me an (rec)array with the appropriate flavor". Of course, PyTables will try hard to avoid doing data copies in conversions (i.e. the array protocol is used whenever possible).
For versions of PyTables prior to 1.3, PyTables can save and read only numarray arrays. You can still use PyTables with SciPy, but for versions of PyTables prior to 1.3, an array conversion is needed.
If you are using a recent version of SciPy and numarray, then you will be able to do this conversion without copying, using the array protocol. Converting a Scipy array to a numarray array:
numarray_array = numarray.asarray(scipy_array)
And, converting a numarray array to a SciPy array:
scipy_array = scipy.asarray(numarray_array)
If you insist on using older versions, a simple method is to convert a SciPy array to a Python list. For example:
In [17]:data1 = s.array([[1.0,2.0],[3.0,4.0],[5.0,6.0]]) In [18]:list1 = data1.to In [18]:list1 = data1.tolist() In [19]:print list1 [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]
However, conversion from numarray arrays to SciPy arrays is simple. This example:
import scipy
import numarray
def test():
scipyArray = scipy.array([[1.0,2.0],[3.0,4.0],[5.0,6.0]])
list1 = scipyArray.tolist()
print 'list1:', list1
numarrayArray = numarray.array([[1.0,2.0],[3.0,4.0],[5.0,6.0]])
print 'numarrayArray:\n', numarrayArray
scipyArray2 = scipy.array(numarrayArray)
print 'type(scipyArray2):', type(scipyArray2)
print 'scipyArray2:\n', scipyArray2
test()
prints the following:
list1: [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]] numarrayArray: [[ 1. 2.] [ 3. 4.] [ 5. 6.]] type(scipyArray2): <type 'scipy.ndarray'> scipyArray2: [[ 1. 2.] [ 3. 4.] [ 5. 6.]]
Here is an example that uses sufficiently recent versions of PyTables and SciPy to write and read arrays:
#!/usr/bin/env python
import sys
import getopt
import scipy
import tables
Filename = 'testpytables2.h5'
Dataset1 = [[1, 2, 3, 4],[5, 6, 7, 8],[9, 10, 11, 12]]
Dataset2 = [[1.,2., 2.1],[3.,4.,4.1],[5.,6.,6.1]]
def test1():
"""Write out several sample data sets.
"""
filename = Filename
print "Creating file:", filename
#filter = tables.Filters()
h5file = tables.openFile(filename,
mode = "w", title = "PyTables test file",
# filters=filter
)
print '=' * 30
print h5file
print '=' * 30
root = h5file.createGroup(h5file.root, "Datasets", "Test data sets")
datasets = h5file.createGroup(root, "Phase1", "Test data sets")
scipy_array = scipy.array(Dataset1)
h5file.createArray(datasets, 'dataset1', scipy_array, "Test data set #1")
scipy_array = scipy.array(Dataset2)
h5file.createArray(datasets, 'dataset2', scipy_array, "Test data set #2")
scipy_array = scipy.zeros((100,100))
h5file.createArray(datasets, 'dataset3', scipy_array, "Test data set #3")
h5file.close()
#
# Read in and display the data sets.
#
def test2():
filename = Filename
h5file = tables.openFile(filename, 'r')
dataset1Obj = h5file.getNode('/Datasets/Phase1', 'dataset1')
dataset2Obj = h5file.getNode('/Datasets/Phase1', 'dataset2')
print repr(dataset1Obj)
print repr(dataset2Obj)
dataset1Array = dataset1Obj.read()
dataset2Array = dataset2Obj.read()
print 'type(dataset1Array):', type(dataset1Array)
print 'type(dataset2Array):', type(dataset2Array)
print 'array1:\n', dataset1Array
print 'array2:\n', dataset2Array
# print several slices of our array.
print 'slice [0]:', dataset1Array[0]
print 'slice [0:2]:',dataset1Array[0:2]
print 'slice [1, 0:4:2]:',dataset1Array[1, 0:4:2]
h5file.close()
USAGE_TEXT = """
Usage:
python test_pytables1.py [options]
Options:
-h, --help Display this help message.
-t n, --test=n Test number:
1: Write file
2: Read file
Example:
python test_pytables1.py -t 1
python test_pytables1.py -t 2
"""
def usage():
print USAGE_TEXT
sys.exit(-1)
def main():
args = sys.argv[1:]
try:
opts, args = getopt.getopt(args, 'ht:', ['help', 'test=', ])
except:
usage()
testno = 0
for opt, val in opts:
if opt in ('-h', '--help'):
usage()
elif opt in ('-t', '--test'):
testno = int(val)
if len(args) != 0:
usage()
if testno == 1:
test1()
elif testno == 2:
test2()
else:
usage()
if __name__ == '__main__':
main()
Run the above by typing the following at the command line:
$ python test_pytables2.py -t 1 $ python test_pytables2.py -t 2
Notes:
We use h5file.createGroup() to create a group in the HDF5 file and then to create another group nested inside that one. A group is the equivalent of a folder or directory. PyTables supports nested groups in HDF5 files.
To write an array, we use h5file.createArray().
To retrieve an array, we use getNode() followed by node.read().
Notice, also, that we can read slices of an array directly from disk using the array subscription and slicing notation. See function test2.
You may find both h5dump and h5ls (from hdf5-tools) helpful for displaying the nested data structures:
$ h5dump -n testpytables2.h5
HDF5 "testpytables2.h5" {
FILE_CONTENTS {
group /Datasets
group /Datasets/Phase1
dataset /Datasets/Phase1/dataset1
dataset /Datasets/Phase1/dataset2
dataset /Datasets/Phase1/dataset3
}
}
See NCSA HDF5 Tools.
Other examples are provided in the PyTables distribution and in the PyTables tutorial.
We'll be learning the use of matplotlib.
You can find matplotlib and information about it here: http://matplotlib.sourceforge.net/.
There are Examples (zip). Most of the code that follows is based on those examples.
Extensive documentation can be found at the matplotlib Web site, including:
See the user guide and also see the comments in the (default) matplotlibrc.
Some notes:
Configuration can be set in ~/.matplotlib/matplotlibrc.
During runtime, configuration options are acessible in the dictionary pylab.rcParams.
If you have installed the new scipy/numpy combination, then you will want to change your configuration (in ~/.matplotlib/matplotlibrc) to use it:
numerix : numpy # Numeric or numarray
If you wish to use matplotlib interactively (for example in IPython, the following may be helpful:
backend : TkAgg # the default backend interactive : True # see http://matplotlib.sourceforge.net/interactive.html
With IPython, you may want to start up with the following:
ipython -pylab -p scipy
This has the effect of doing:
from scipy import * from pylab import *
and also sets some options that make showing, drawing, and updating graphs more automatic and convenient.
This code is based on simple_plot.py in Examples (zip).
#!/usr/bin/env python
"""
dave_simple_plot.py
"""
import sys
import pylab as pl
# epydoc -- specify the input format.
# __docformat__ = "restructuredtext en"
def simple(funcName):
"""Create a simple plot and save the plot to a .png file
@param funcName: The name of a function, e.g. sin, cos, tan, ...
"""
t = pl.arange(0.0, 1.0+0.01, 0.01)
funcStr = 'pl.%s(2*2*pl.pi*t)' % (funcName,)
s = eval(funcStr)
pl.plot(t, s)
pl.xlabel('time (s)')
pl.ylabel('voltage (mV)')
pl.title('About as simple as it gets, folks')
pl.grid(True)
pl.savefig('simple_plot')
pl.show()
def usage():
print 'Usage: python dave_simple_plot.py <func_name>'
print 'Examples:'
print ' python dave_simple_plot.py sin'
print ' python dave_simple_plot.py cos'
print ' python dave_simple_plot.py tan'
sys.exit(-1)
def main():
args = sys.argv[1:]
if len(args) != 1:
usage()
simple(args[0])
if __name__ == '__main__':
main()
Notes:
Configuration -- For interactive use, you can set the following in your matplotlibrc:
Or, for interactive use, ipython is a recommended shell. It has a matplotlib mode. Start it with:
$ ipython -pylab
Several notes on interactive mode and IPython:
A sample session (after starting ipython -pylab:
1: t = arange(0.0, 1.0+0.01, 0.01) 2: s = tan(2 * 2 * pi * t) 3: plot(t,s)
A sample session with ipython and without -pylab:
1: import pylab as pl 2: t = pl.arange(0.0, 1.0+0.01, 0.01) 3: s = pl.tan(2 * 2 * pl.pi * t) 4: pl.plot(t,s)
Using IPython -- Start IPython with one of the following:
$ ipython -pylab
or:
$ ipython -pylab -p scipy
Some convenient and useful commands while using matplotlib interactively:
Learn the following in order to create and control your plots:
Learn the following in order to annotate your plots:
Note that these annotation functions return a matplotlib Text object. You can use this object to get and set properties of the text. Example:
91: cla()
92: plot([1,2,3])
93: t = xlabel('increasing temp')
94: t.set_weight('bold')
95: t.set_color('b')
96: draw()
Notes:
You can display your matplotlib plots inside a GUI application written in Tk, WxPython, ...
There are examples in the examples directory of the matplotlib source distribution. These examples are also available at the matplotlib Web site. Go to matplotlib, then click on "Examples (zip)".
The example files for embedding are named:
embedding_in_gtk.py embedding_in_tk.py embedding_in_wx3.py embedding_in_gtk2.py embedding_in_tk2.py embedding_in_wx4.py embedding_in_gtk3.py embedding_in_wx.py embedding_in_qt.py embedding_in_wx2.py
In general, you can create your plot, possibly testing it interactively in IPython, then use one of the examples for embedding the plot in the GUI tool of your choice.
This section lists and gives brief descriptions of the contents of scipy.
Much of the following documentation was generated from within IPython. I used either (1) the help(obj) built-in or (2) IPython's ? operator to view documentation on a module, class, etc, and then, where necessary, used the "s" command from within less (my pager) to save to a file. I've also done some light editing to reformat this for reST (reStructuredText), which is the format for the source of this document. For more on reST see: Docutils: Documentation Utilities.
SciPy: A scientific computing package for Python
Available subpackages:
scipy provides functions for defining a multi-dimensional array and useful procedures for Numerical computation. Use the following to get a list of the members of scipy:
>>> import scipy >>> dir(scipy)
Functions:
More Functions:
(Universal) Math Functions:
Basic functions used by several sub-packages and useful to have in the main name-space
Type handling:
Index tricks:
Useful functions:
Shape manipulation:
Matrix (2d array) manipluations:
Polynomials:
Import tricks:
Machine arithmetics:
Threading tricks:
Discrete Fourier Transform algorithms.
Fast Fourier Transforms:
Differential and pseudo-differential operators:
Helper functions:
Extension modules:
Integration routines.
Methods for Integrating Functions given function object:
Methods for Integrating Functions given fixed samples.
See the special module's orthogonal polynomials (special) for Gaussian quadrature roots and weights for other weighting factors and regions.
Interface to numerical integrators of ODE systems:
Interpolation Tools.
Wrappers around FITPACK functions:
Interpolation class:
Data input and output.
Classes:
Functions:
Linear algebra routines.
Linear Algebra Basics:
Eigenvalues and Decompositions:
matrix Functions:
Iterative linear systems solutions
Package contents:
Optimization Tools -- A collection of general-purpose optimization routines.
Constrained Optimizers (multivariate):
Global Optimizers
Scalar function minimizers
Also a collection of general_purpose root-finding routines.
Scalar function solvers
Utility Functions
Package contents:
Sparse matrix support.
Package contents:
Functions:
Special Functions.
Airy Functions:
Elliptic Functions and Integrals:
Bessel Functions:
Zeros of Bessel Functions:
Faster versions of common Bessel Functions:
Integrals of Bessel Functions:
Derivatives of Bessel Functions:
Spherical Bessel Functions:
Ricatti-Bessel Functions:
Struve Functions:
Raw Statistical Functions (Friendly versions in scipy.stats):
Gamma and Related Functions:
Error Function and Fresnel Integrals:
Legendre Functions:
Orthogonal polynomials --- 15 types
** These functions all return a polynomial class which can then be evaluated: vals = chebyt(n)(x). This class also has an attribute 'weights' which return the roots, weights, and total weights for the appropriate form of Gaussian quadrature. These are returned in an n x 3 array with roots in the first column, weights in the second column, and total weights in the final column.
HyperGeometric Functions:
Parabolic Cylinder Functions:
mathieu and Related Functions (and derivatives):
** All the following return both function and first derivative **
Spheroidal Wave Functions:
** The following functions require pre-computed characteristic values **
Kelvin Functions:
Other Special Functions:
Convenience Functions:
** in the description indicates a function which is not a universal function and does not follow broadcasting and automatic array-looping rules.
Error handling:
Errors are handled by returning nans, or other appropriate values. Some of the special function routines will print an error message when an error occurs. By default this printing is disabled. To enable such messages use errprint(1) To disable such messages use errprint(0). Example:
>>> print scipy.special.bdtr(-1,10,0.3) >>> scipy.special.errprint(1) >>> print scipy.special.bdtr(-1,10,0.3)
Statistical functions.
This module contains a large number of probability distributions as well as a growing library of statistical functions.
Each included distribution is an instance of the class rv_continous. For each given name the following methods are available. See docstring for rv_continuous for more information.
Calling the instance as a function returns a frozen pdf whose shape, location, and scale parameters are fixed.
For example, to generate a single normally distributed random variable, use something like the following:
$ ipython
o
o
o
In [1]: from scipy import stats
In [2]: stats.norm.rvs(size=10, loc=5.0)
Out[2]:
array([ 4.45700017, 4.39348877, 5.82171326, 3.05493492, 4.77358828,
4.86479922, 5.42006364, 2.59309408, 4.01344497, 6.1543075 ])
o
o
o
In [4]: stats.norm.rvs(size=10, loc=5.0, scale=1)
Out[4]:
array([ 4.04022461, 3.76628997, 3.49915895, 4.38231034, 4.53075502,
3.37048989, 4.39382196, 3.65657395, 5.79550509, 4.57862224])
In [5]: stats.norm.rvs(size=10, loc=5.0, scale=2)
Out[5]:
array([ 4.60439161, 3.21791066, 4.1434995 , 2.70335034, 8.23381385,
7.85801707, 5.07002064, 4.66661538, 2.97583978, 5.77055363])
In [6]: stats.norm.rvs(size=10, loc=5.0, scale=5)
Out[6]:
array([ 4.50706583, 5.62037197, 5.04515902, 2.6058127 ,
1.84169023, 15.28502793, 0.87783722, 6.73873743,
12.52279616, 7.53976885])
o
o
o
The distributions available with the above methods are:
Continuous (Total == 81 distributions):
Discrete (Total == 10 distributions):
Statistical Functions (adapted from Gary Strangman):
Doc strings are available for many of the above functions. For example, from within IPython, use the pdoc "magic command":
$ ipython -p scipy
In [1]: %pdoc scipy.stats.gmean
Calculates the geometric mean of the values in the passed array.
That is: n-th root of (x1 * x2 * ... * xn).
If a is 1D, a single value is returned. If a is multi-dimensional,
the geometric mean along the dimension specified is calculated. The
returned array has one less dimension than a. dimension defaults
to the last dimension of the array. This means that, for a two
dimensional array, the default is to calculate the geometric mean
of each row.
In [2]: pdoc scipy.stats.anova
o
o
o
If you omit the -p scipy flag to ipython, then you will need to do import scipy.stats.
As an additional example, we consider the gamma function. Let us first look at the documentation (and doc-string):
In [2]: ?stats.gamma
o
o
o
gamma.rvs(a,loc=0,scale=1)
- random variates
o
o
o Gamma distribution
For a = integer, this is the Erlang distribution, and for a=1 it is the exponential distribution:
gamma.pdf(x,a) = x**(a-1)*exp(-x)/gamma(a) for x >= 0, a > 0.
These last lines explain the meaning of the parameter a (mentioned in the line gamma.rvs). To generate multiple gamma distributed random variates, use:
In [6]: stats.gamma.rvs(2,size=10)
Out[6]:
array([ 2.12111063, 1.91618176, 0.86085755, 0.27087561, 0.21773439,
3.14291742, 1.58128949, 0.82045958, 4.64099272, 6.66068163])
Note that the first argument is the parameter a, and the second is the size. Reversing these two function arguments results in an error.
Plot-tests:
Once again, in IPython, you can obtain information about each of the above. For example, use:
In [1]: %pdoc stats.probplot
Return (osm, osr){,(scale,loc,r)} where (osm, osr) are order statistic
medians and ordered response data respectively so that plot(osm, osr)
is a probability plot. If fit==1, then do a regression fit and compute the
slope (scale), intercept (loc), and correlation coefficient (r), of the
best straight line through the points. If fit==0, only (osm, osr) is
returned.
sparams is a tuple of shape parameter arguments for the distribution.
Package contents: