How to write a flexible modular program with good interoperability between modules?

Question

How to write a flexible modular program with good interoperability between modules?

I went through answers to similar topics here on SO, but could not find a satisfactory answer. Since I know this is a pretty big topic, I will try to be more specific.

I want to write a program that processes files. Processing is non-trivial, therefore, the best way is to separate the various phases into autonomous modules, which will then be used as necessary (since sometimes I will only be interested in the output from module A, sometimes I need output from five other modules, etc.). The fact is that I need modules for cooperation, because the output of one of them may be the input of the other. And I need him to be FAST. Moreover, I want to avoid some processing more than once (if module A creates some data that should then be processed by modules B and C, I don’t want to run module A twice to create an input for modules B, C).

The information that needs to be shared with the modules will mainly be blocks of binary data and / or offsets in the processed files. The task of the main program would be quite simple - just analyze the arguments, run the necessary modules (and, perhaps, give some result, or will it be the task of modules?).

I do not need modules to load at runtime. It is wonderful that libs with the .h file and recompile the program every time there is a new module or some module is updated. The idea of the modules here is mainly due to the readability of the code, support and the ability to have more people working with different modules, without having to have any predefined interface or anything else (on the other hand, some “recommendations” on how to write probably need modules, I know that). We can assume that file processing is a read-only operation, the original file does not change.

Can someone point me in the right direction how to do this in C ++? Any advice is friendly (links, tutorials, pdf-books ...).

+6

c ++ architecture

PeterK May 28 '10 at 7:16

source share

3 answers

I am wondering if C ++ is the right level for reflection for this purpose. In my experience, it was always useful to have separate programs that connect together in the UNIX philosophy.

If your data is not too large, there are many advantages to splitting. First, you get the opportunity to test each phase of your processing independently, you run one program, redirecting the output to a file: you can easily check the result. Then you use several basic systems, even if each of your programs is uniprocessor and thus much easier to create and debug. And you also use operating system synchronization using channels between your programs. Perhaps some of your programs can be executed using existing utilities?

Your final program will create glue to collect all your utilities into one program, transfer data from the program to another (there are no more files at this time) and replicate it as necessary for all your calculations.

+2

Didier trosset May 28 '10 at 7:39

source share

It really seems pretty trivial, so I guess we missed some requirements.

Use Memoization to avoid calculating the result more than once. This should be done as part of the framework.

You can use some flowchart to determine how to transfer information from one module to another ... but the easiest way is to call each module they depend on directly. With memoization, it's not that expensive, because if it has already been calculated, you're fine.

Since you need to be able to run any module, you need to provide them with identifiers and register them somewhere to see them at runtime. There are two ways to do this.

Instance: you get a unique instance of this module and execute it.
Factory: you create the requested module, execute it and throw it away.

The disadvantage of the Exemplar method is that if you execute a module twice, you will not start it from a clean state, but from a state in which its last (possibly not executed) execution left it. For memoization, this can be considered an advantage, but if it fails, the result is not computed (urgh), so I would recommend against it.

So how are you ...?

Let's start with the factory.

 class Module; class Result; class Organizer { public: void AddModule(std::string id, const Module& module); void RemoveModule(const std::string& id); const Result* GetResult(const std::string& id) const; private: typedef std::map< std::string, std::shared_ptr<const Module> > ModulesType; typedef std::map< std::string, std::shared_ptr<const Result> > ResultsType; ModulesType mModules; mutable ResultsType mResults; // Memoization };

This is a very simple interface. However, since we need a new instance of the module every time we call Organizer (to avoid the re-login problem), we will need to work with our Module interface.

 class Module { public: typedef std::auto_ptr<const Result> ResultPointer; virtual ~Module() {} // it a base class virtual Module* Clone() const = 0; // traditional cloning concept virtual ResultPointer Execute(const Organizer& organizer) = 0; }; // class Module

And now it’s easy:

 // Organizer implementation const Result* Organizer::GetResult(const std::string& id) { ResultsType::const_iterator res = mResults.find(id); // Memoized ? if (res != mResults.end()) return *(it->second); // Need to compute it // Look module up ModulesType::const_iterator mod = mModules.find(id); if (mod != mModules.end()) return 0; // Create a throw away clone std::auto_ptr<Module> module(it->second->Clone()); // Compute std::shared_ptr<const Result> result(module->Execute(*this).release()); if (!result.get()) return 0; // Store result as part of the Memoization thingy mResults[id] = result; return result.get(); }

And a simple Module / Result example:

 struct FooResult: Result { FooResult(int r): mResult(r) {} int mResult; }; struct FooModule: Module { virtual FooModule* Clone() const { return new FooModule(*this); } virtual ResultPointer Execute(const Organizer& organizer) { // check that the file has the correct format if(!organizer.GetResult("CheckModule")) return ResultPointer(); return ResultPointer(new FooResult(42)); } };

And from the main:

 #include "project/organizer.h" #include "project/foo.h" #include "project/bar.h" int main(int argc, char* argv[]) { Organizer org; org.AddModule("FooModule", FooModule()); org.AddModule("BarModule", BarModule()); for (int i = 1; i < argc; ++i) { const Result* result = org.GetResult(argv[i]); if (result) result->print(); else std::cout << "Error while playing: " << argv[i] << "\n"; } return 0; }

+1

Matthieu M. May 28 '10 at 9:42

source share

Rudi · Accepted Answer · 2010-05-28T07:51:31+0000

This is similar to the plugin architecture. I recommend starting with a (unofficial) data flow scheme to determine:

how these blocks process data
what data needs to be transmitted
what results are returned from one block to another (data / error / exception codes)

With this information, you can start creating common interfaces that allow you to bind to other interfaces at run time. Then I would add a factory function for each module to request a real processing object from it. I do not recommend getting processing objects directly from the module interface, but returning a factory object where processing objects will be extracted. Then these processing objects are used to build the entire processing chain.

A simplified diagram will look like this:

struct Processor { void doSomething(Data); }; struct Module { string name(); Processor* getProcessor(WhichDoIWant); deleteprocessor(Processor*); };

From my mind, these patterns are more likely to emerge:

factory function: to get objects from modules
composite && decorator: processing chain formation

How to write a flexible modular program with good interoperability between modules?

More articles: