Skip to content

2017 Edition

JulesKouatchou edited this page Jul 23, 2021 · 3 revisions

Comparing programming languages such as Python, Julia, R, etc. is not an easy task. Many researchers and practitioners have attempted to determine how fast a particular language performs against others when solving a specific problem (or a set of problems). Raschka presents Matlab, Numpy, R and Julia while they performed matrix calculations (Raschka, 2014). Hirsch does a benchmarking analysis of Matlab, Numpy, Numba CUDA, Julia and IDL (Hirsch, 2016). From his experiments, he states which language has the best speed in doing matrix multiplication and iteration. Rogozhnikov uses the calculation of the log-likelihood of normal distribution to compare Numpy, Cython, Parakeet, Fortran, C++, etc. He draws conclusions on which ones of them are faster to solve the problem (Rogozhnikov, 2015). Puget determines how several languages scire in carrying out the LU factorization (Puget, 2016).

All these analyses are important to assess how fast a language performs. However, focusing only on the speed may not give us a good picture on the capability of each language. It turns out if we compare how fast languages execute a given computation over the years, we might reach different conclusions as some of them evolve over time (to be more efficiency in solving a set of problems). To determine the usefulness of a language, we want to take into consideration its accessibility (open source or commercial), its readability, its support base, how it can interface with other languages, its strengths/weaknesses, the availability of a vast collection of libraries.

As we deal with legacy scientific applications (written in Fortran or C for instance), our primary intent is not to find a new language that can be used to rewrite existing codes. We rather want to identify and leverage "new" languages to facilitate and speed up pre/post-processing, initialization and visualization procedures. As far as possible, we may want to interface our legacy codes to "new" languages. We also intend to use new language to prototype some applications before they are written in languages like Fortran and C.

In this work, we are interested in how each package handles loops and vectorization, reads a large collection of netCDF files and does multiprocessing. The goal is not to highlight which software is faster than the other but to provide basic information on the strengths and weaknesses of individual packages when dealing with specific applications.

All the experiments presented here were done on Intel Xeon Haswell processor node. Each node has 28 cores (2.6 GHz each) and 128 Gb of available memory. We consider the following versions of the languages:

Language Version Open Source?
Python 2.7.1 Yes
Julia 0.6.0 Yes
R 3.2.2 Yes
IDL 8.5 No
Matlab R2016a No
GNU Compilers 6.1.0 Yes
Intel Compilers 17.0.0.098 No
Scala 2.12.1 Yes

Remark: We assume that Python refers to Numpy too.

We consider here six problems.

Problem 1

Consider an arbitrary nxnx3 matrix A. We want to perform the following operations on A:

         A(i,j,1) = A(i,j,2)

         A(i,j,3) = A(i,j,1)

         A(i,j,2) = A(i,j,3)

For instance, in Python the code looks like:

   for i in range(n):
       for j in range(n):
           A[i,j,0] = A[i,j,1]
           A[i,j,2] = A[i,j,0]
           A[i,j,1] = A[i,j,2]

The above code segment uses loops. We are also interested on how the same operations are done using vectorization:

          A[:,:,0] = A[:,:,1]
          A[:,:,2] = A[:,:,0]
          A[:,:,1] = A[:,:,2]

The problem allows us to see how each language handles loops and vectorization. We record the elapsed time needed to do the array assignments. The results are summarized on the tables below.

Apart from Julia, vectorization is the fastest method for accessing arrays/matrices.

Problem 2

We multiply two randomly generated nxn matrices A and B:

   C=AxB

This problem shows the importance of taking advantage of built-in libraries available in each language.

The elapsed times presented here only measure the times spent on the multiplication (as the size of the matrix varies).

The above table suggests that built-in functions are more appropriate to perform matrix multiplication. DGEMM is far more efficient. It is important to note that DGEMM is more suitable for large size matrices. If for instance n=100, the function matmul out performs DGEMM.

Problem 3

We find the numerical solution of the 2D Laplace equation:

$$U_{xx} + U_{yy} = 0$$

We use the Jacobi iterative solver. We are interested in fourth-order compact finite difference scheme (Gupta, 1984):

$$U_{i,j} = (4(U_{i-1,j} + U_{i,j-1} + U_{i+1,j} + U_{i,j+1}) + U_{i-1,j-1} + U_{i+1,j-1} + U_{i+1,j+1} + U_{i-1,j+1} )/20$$

The Jacobi iterative solver stops when the difference of two consecutive approximations falls below 10^{-6}.

Problem 4

We have a set of daily NetCDF files (7305) covering a period of 20 years (1990-2009). The files for a given month are in a sub-directory labeled YYYYMM (for instance 199001, 199008, 199011). We want to write a script that opens each file, reads a three-dimensional variable (longitude/latitude/level), manipulates it and does a contour plot after all the files are read. A pseudo code for the script reads:

Loop over the years
         Loop over the months
                  Obtain the list of NetCDF files
                  Loop over the files
                           Read the variable (longitude/latitude/level)
                           Compute the zonal mean average (new array of latitude/level)
                           Extract the column array at latitude 86 degree South
                           Append the column array to a "master" array (or matrix)

create a contour plot using the "master" array
(the x-axis should be the days (1 to 7035)to be converted into years)
(the y-axis should be the vertical pressure levels in log scale)

This is the kind of problems that a typical user we support faces: a collection of thousands of files that needs to be manipulated to extract the desired information. Having tools that allow us to quickly read data from files (in formats such as NetCDF, HDF4, HDF5, grib) is critical for the work we do.

We report in Table 4.1 the elapsed times it took to solve Problem 4 with the various languages.

Language Elapsed time (s)
Python 1399
Julia 1317 (no plot)
IDL 1689
R 2220
Matlab 1678

Table 4.1: Elapsed time (in seconds) obtained by manipulating 7305 NetCDF files on a single processor. We were not able to produce the plot with Julia because we could not build the plotting tool.

All the above runs were conducted on a node that has 28 cores. Basically, only one core was used. We want to take advantage of all the available cores by spreading the reading of the files and making sure that the data of interest are gathered in the proper order. We use the multi-processing capabilities of the various languages to slightly modify the scripts. For each month, the daily files are read in by different threads (cores).The results are shown in Table 4.2. We were able to fully complete the task with Python, R and Julia only. The Julia script is fragile and we could run with 8 threads. We obtained unexpected error messages Matlab and could not resolve the issues (we will continue to look into it). We did not try to do the task in IDL because we could not find a simple IDL multi-processing documentation that could help us.

Language Elapsed time (s)
Python 273
Julia 520 (no plot)
IDL
R 420
Matlab

Table 4.2: Elapsed time (in seconds) obtained by manipulating 7305 NetCDF files using multiple threading.

We observe that the use of multiple threads significantly reduces the processing time without requiring more resources (all the calculations were done within a node). The multi-thread processing scripts were written by making minor modifications of the serial ones. In fact, the multi-thread scripts ended up being more modular (use of functions) and more readable.

Clone this wiki locally