Object-oriented processing of scientific data, how to skillfully approach data, analyze and visualize objects?

Question

Object-oriented processing of scientific data, how to skillfully approach data, analyze and visualize objects?

As a biologist, I often write python software to do some data analysis. The general structure is always:

There is data to download , do the analysis (statistics, clustering ...), and then visualize the results.

Sometimes for the same experiment, the data can come in different formats, you can have different methods for their analysis and possible visualization, which might or may not depend on the analysis.

I'm struggling to find a common “pythonic” and object-oriented way to make it understandable and easily extensible. It should be easy to add a new type of action or make small options for existing ones, so I'm pretty sure I have to do this with oop.

I already made a Data object with experimental data loading methods. I plan to create an inherited class if I have multiple data sources to override the load function.

After that ... I'm not sure. Do I have to run an abstract Analysis class with a child class for each type of analysis (and use their attributes to store the results) and do the same for visualization using a shared Experiment object containing a Data instance and several Analysis and Visualization instances? Or should visualizations be functions that take an analysis object and / or data as parameter (s) for plotting? Is there a more efficient way? Did I miss something?

+7

python oop scientific-computing

Geeklhem Jul 08 '13 at 8:53

source share

1 answer

dkar · Accepted Answer · 2013-07-11T16:41:48+0000

Your general idea will work, here are some details that I hope will help you continue:

Create an abstract data class using some common methods such as load , save , print , etc.
Creating specific subclasses for each particular type of data that interests you. This can be specific tasks (for example, data for natural language processing) or form-specific (data is given in the form of a matrix, where each row corresponds to a different observation)
As you said, create an abstract Analysis class.
Create specific subclasses for each form of analysis. Each particular subclass must override the process method, which takes a certain form of data and returns a new instance of the data with the results (if you think that the form of the results will differ from that of the input, use a different class Result)
Create a hierarchy of visualization classes. Each particular subclass must override the visualize method, which takes a particular data instance (or result if you use another class) and returns some graph of some form.

I have a warning: Python is abstract, powerful, and high level enough that you usually don’t need to create your own OO design - you can always do what you want with mininal codes using numpy , scipy and matplotlib , so Before you start performing additional encoding, make sure that you need it :)

Object-oriented processing of scientific data, how to skillfully approach data, analyze and visualize objects?

More articles: