Guidelines for managing the complexity / visualization of components in your software?

We create tools for obtaining information from the Internet. We have several parts, such as

  • Scan data from the Internet
  • Extracting information based on templates and business rules
  • Analysis of the results in the database
  • Apply normalization and filtering rules
  • Etc etc.

The problem is troubleshooting and having a good “high-level picture” of what happens at each stage.

What methods have helped you understand and manage complex processes?

  • Use workflow tools such as the Windows Workflow Foundation.
  • Encapsulate individual functions in command-line tools and use scripting tools to combine them.
  • Enter a domain specific language (DSL) to indicate what should happen at a higher level.

Just curious how you get a pen in a system with many interacting components. We would like to document / understand how the system works at a higher level than tracing through the source code.

+7
complexity-theory
source share
8 answers

The code says what happens at every stage. Using DSL would be a boon, but perhaps not if it depended on how to write your own scripting language and / or compiler.

Higher-level documentation should not contain details of what happens at each stage; he should provide an overview of the steps and how they relate to each other.

Good advice:

  • Visualize database schema relationships.
  • Use visio or other tools (for example, the one you mentioned did not use) for process reviews (imho refers to the specification of your project).
  • Make sure your code is properly structured / separated, etc.
  • Make sure you have some kind of project specification (or some other “general” documentation that explains what the system does at an abstract level).

I would not recommend creating command-line tools if you are not actually using them. There is no need to use tools that you do not use. (This is not the same as saying that it may not be useful, but most of what you do looks more like it belongs in a library, rather than executing external processes).

+2
source share

I use AT&T renowned Graphviz , my simple and works great. The Doxygen library also uses it.

Also, if you make a little effort, you can get very beautiful graphics.

I forgot to mention that I use it as follows (because Graphviz parses Graphviz scripts), I use an alternative system for logging events in Graphviz format, so I just parse the log file and get a good graph.

+3
source share

My company writes functional specifications for each core component. Each specification conforms to the general format and, if necessary, uses various diagrams and images. Our specifications have a functional part and a technical part. The functional part describes what the component does at a high level (why, what goals it solves, what it does not do, what it interacts with, external documents that are related, etc.). The technical part describes the most important classes in components and any high-level design patterns.

We prefer text because it is the most versatile and easy to update. This is a big deal - not everyone is an expert (or even worthy) in Visio or Dia, and this can be an obstacle to updating documents. We write specifications on the wiki so that we can easily communicate between each specification (as well as track changes) and allow non-linear walks, although a system.

For an argument from authority, Joel recommends Functional Features here and.

+1
source share

I find the structure of the dependency structure a useful way to analyze the structure of the application. A tool like lattix might help.

Depending on your platform and toolchain, there are many really useful static analysis packages that can help you document the relationships between the subsystems or components of your application. A good example for the .NET platform is NDepend . However, for other platforms there are many others.

Having a good design or model before building a system is the best way to understand how a team should be structured, but tools like the ones I mentioned can help you comply with architectural rules and often give you an idea of ​​a design that just can't be spent through code .

+1
source share

I would not use any of the tools you mentioned.

You need to draw a high level diagram (I like pencil and paper).

I would develop a system in which different modules do different things, it would be advisable to create it so that you can have many instances of each module working in parallel.

I would consider using multiple queues for

  • URLs to crawl
  • Scanned pages from the Internet
  • Extracted information based on templates and business rules.
  • Developed Results
  • normalized and filtered results

You would have simple (possibly command lines without a UI) programs that would read the data from the queues and paste the data into one or more queues (The Crawler will feed both the "URL for crawling" and the "Crawled pages from the Internet" ). You can use:

  • Web crawler
  • Data extractor
  • Parser
  • Normalizer and filter

They would correspond to the queues, and you could run many copies of these files on separate PCs, allowing you to scale.

The last queue can be sent to another program that actually sends everything to the database for actual use.

+1
source share

Top down helps a lot. One mistake that I see makes a boot from top to bottom. Your top-level design needs to be reviewed and updated just like any other section of code.

0
source share

It is important to break these components down into the entire software development life cycle — development time, development time, testing, release and runtime. Just drawing a chart is not enough.

I found that adopting a microkernel architecture can really help “split and overcome” this complexity. The essence of microkernel architecture:

  • Processes (each component runs in an isolated memory space)
  • Themes (each component runs in a separate thread)
  • Communication (components interact through one simple message channel)

I wrote quite complex batch systems that are similar to your system using:

Each component displays a .NET executable. Executable lifetimes are controlled through Autosys (all on the same machine). Communication is through TIBCO Rendezvous.

If you can use a toolbox that provides some runtime introspection, even better. For example, Autosys allows me to see which processes are running, what errors have occurred, and TIBCO allows me to check message queues at runtime.

0
source share

I like to use NDepend to reverse engineer a complex .NET codebase. The tool comes with several great visualization features, such as:

Dependency Graph: alt text

Dependency Matrix: alt text

Code metric visualization using treemaping: alt text

0
source share

All Articles