How to reverse engineer a program that has no documentation

Question

How to reverse engineer a program that has no documentation

I have a python program source that has no documentation or comments. I tried to figure this out twice, but most of the time I lose my track because there are a lot of files. What should be the steps to understand this program completely and quickly.

+6

python open-source

Xinus Sep 09 '09 at 4:47

source share

8 answers

In the past, I used the 'Python call graph to understand the source structure
Use a debugger, for example. pdb to wak thru code.
Try reading the code again after one day break, which also helps

+5

Anurag uniyal Sep 09 '09 at 5:03

source share

I would recommend creating some documentation using epydoc http://epydoc.sourceforge.net/ . Of course, if no documents exist, the result will be bad, but it will give you at least one look of your application, and you can easily navigate through the classes.

Then you can try to document yourself when you understand something new, and then update the documents again. It's never too late to start something.

I hope this helps

+5

luc Sep 09 '09 at 5:08

source share

You are lucky in Python, which is easy to read. But, of course, you can write complex and understandable Python code.

Steps:

Launch the software and learn how to use it and at least read it a little.
Read though the tests, if any.
Read the code.
When you come across a code that you don’t understand, put a debug break there and go through the code looking at what it does.
If there are no tests or the level of testing is low, write tests to increase test coverage. This is a good way to learn the system.
Repeat until you feel that you have a vague grip on the code. Vague clutch is all you need if you are going to control the code. When you actually start working with the code, you will get a good grip. For a large system that can take years, so do not try to understand it first.

There are tools that can help you. As Stephen S says, the IDE is a good idea. I will explain why:

Many editors analyze code. This usually gives you code completion, but more importantly in this case, it just allows you to simply click on a variable to see where it came from. It really speeds things up when you want to understand the otehr people code.

In addition, you need to examine the debugger. You, in complex parts of the code, must go through them in the debugger to see what the code actually does. Python pdb works, but many IDEs have built-in debuggers to make debugging easier.

What is it. Good luck.

+3

Lennart Regebro Sep 09 '09 at 5:10

source share

I had a lot to do in my work. What works for me may differ from what works for you, but I will tell you about my experience.

First, I try to identify the data structures used and draw diagrams showing the relationships between them. Not necessarily something formal, like UML, but a sketch on paper that you understand that allows you to see the general structure of the data that the program manages. Only once I have an idea of the data structures used, I start trying to understand how data is manipulated.

Secondly, for a large amount of software, sometimes you just need to attack slices the size of slices. You will not get a complete understanding right away, but if you understand the small details in the details and keep coming off, eventually all the parts fall together.

I combine these two approaches, switch between them when I get too upset or boring. Regular walks on the block are recommended :) I find this gives me good results in the end.

Good luck

+2

James Sep 09 '09 at 5:09

source share

pyreverse from Logilab and PyNSource from Andy Bulka are also useful for generating UML diagrams.

+1

Silopolis Jul 08 '11 at 10:35

source share

I would start with a good python IDE. See Answers to this question .

0

Stephen c Sep 09 '09 at 4:53

source share

Sparx Systems' Enterprise Architect is very good at handling the source catalog and generating class diagrams. This is not free, but a very reasonable price for what you get. (I am in no way affiliated with this company, I have just been a satisfied user of my product for several years.)

0

Paulmcg Sep 09 '09 at 12:43

source share

Alex martelli · Accepted Answer · 2009-09-09T05:09:52+0000

Michael Feathers " Effectively works with legacy code " is an excellent starting point for such efforts - it doesn’t particularly depend on the language (its examples are presented in several languages other than python, but DO's methods and thinking are pretty good in Python and almost any other language )

The main focus is on the fact that you want to understand the code for a reason - it's changing and / or porting. Thus, the legacy code tool - with batteries and forests of testing and tracing / logging - is the decisive way to a long, tough approach to understanding and changing safely and responsibly.

Feathers offers heuristics and methods for focusing your efforts and how to start when the code is a complete mess (hence the "legacy") - no documents or misleading documents (describing something completely different, maybe in a subtle way , from what the code actually does), without tests, unchecked without refactoring the confusion of spaghetti dependencies. This may seem like an extreme case, but anyone who has been programming for a long time knew that it is actually more common than anyone would like :-).

How to reverse engineer a program that has no documentation

More articles: