Friday, 24 April 2009

Python vs IDL. The intricacies of the "where" statement.

Selecting specific elements from arrays by means of their index is a quite useful tool when you're manipulating huge data files. In the past I had been using mainly IDL (great but very expensive licensed software) and standard C-shell scripting to do most of the processing but I have recently started to experiment with Python v2.5 (open-source ftw!) and specifically the Enthought and Python(x,y) distributions which, among other things, contain the Matplotlib and SciPy libraries.

Here's how I used to do it in IDL:
First set up a test array called data
IDL>data = findgen(10)
This statement will create an integer array with 10 elements, from 0 to 9.
To select a part of the elements then use:
IDL> data_sub = data(where(data lt 8 and data gt 3))
data_sub now contains the elements 4,5,6,7

In Python we can do something similar using Numpy.
At the Python prompt:
>>> import numpy as np
Set up the test data array as previously. In Python we can do that with:
>>> data = np.arange(0,10,1) # from 0 to 9 incrementing by 1

Now define the limits
>>> lim1 = data > 3
>>> lim2 = data <>>> data_sub = data[lim1 & lim2]
is now an array with the values 4,5,6,7

It is possible to easily replace specific elements with zero values:
>>> data_zeros = np.where(data > 5, 0, data)
will replace all array elements with a value greater than 5 with 0.
is then this array: 0,1,2,3,4,5,0,0,0,0