How would I split a large set of tabular data into smaller corresponding tables? (Not a DB issue)

I really hope that I can describe this issue in an understandable way. This is a puzzle that I could not begin to solve, although I (mostly) understand it. I just don’t know where to start, and I really hope that someone out there can make me go in the right direction.

I have a BIG data table. It describes the relationship between objects. Let say that the Y axis has elements 1-1000, and the X axis also has elements 1-1000. If element No. 234 along the Y axis is associated with element No. 791 on X, there will be a sign in the table where the row and column intersect. In some industries, this refers to the truth table. You can, at a glance, see how many elements in the system are connected to each other. The signs in the table can help identify trends and patterns.

Here are some other useful information about the nature of the table:

  • The full range of the number of ratios (r) for each element on any axis can be 1 <= r <= axisTotal.
  • The X and Y axis will exchange common elements, but on each axis there will also be elements that the other axis does not have.
  • Each element will exist only once per axis. It can be on X and Y, but it will only be once every time.
  • The total number of elements on each axis is most likely NOT equal. Each axis can have from 50 to 1000 units.

The end result is that it will be a report that needs to be printed. We successfully printed a table on which there were about 100-150 objects on each axis on 11in X 17in paper. Moreover, and he becomes so small that he is unreadable.

What I'm trying to do is split super large tables into smaller tables, but the points associated with them should stay together. If I take the element 1-100 by X, then I will need every element to which they relate, from Y.

I created several of these tables, and although the number of CAN links is arbitrary, I have never seen an element refer to all other elements. Thus, in real practice, the range is more like 1 <= r <= (10% * axisTotal). If the element relationships exceed this range, it can be divided into several tables, but this is not optimal.

At the end of the day, I think that we and our customers would be happy if the 1000x1000 position table were divided into 8-10 printed pages of smaller related tables.

Any guidance would be a big help! Thanks.

--- EDIT --- Another thing worth noting is that there will be no empty rows or columns in the table. Each element on the x and y axis will refer to at least one element on the opposite axis.

--- EDIT --- Here is an example of a small truth table that I am describing: Example Truth Table . Each row and column has at least one relationship.

--- EDIT --- May 18, 2011 For what it's worth, I progressed very well on this project, and they dragged me on for a couple of weeks. So it will be a little while I will not return to this problem. But this is one that I will soon have to decide.

--- EDIT --- July 11, 2011 Bummer. Well, it looks like I can't solve this problem right now. I really hoped I could figure it out. During the discussion, we decided to present the truth table in the Excel spreadsheet as an additional resource for the main report. Excel 2007 and later will process 1000 columns, which will be more than sufficient. In addition, we added VBA, which allows the viewer to double-click on the column headings. This action will reduce the lines only to those where there is interaction. Then it removes the empty columns. That way, they can see a small subtable based on the item they want to view, and can print it if they want.

+4
source share
2 answers

This is not the answer, I just want to try to visualize your data a little better. Does it look like this?

Alice Bob Charlie ... Zelda Shoes XX Hats XX Gloves X ... Pants X 

EDIT

Is data required to be displayed in tabular format? Or could you just list each of them? Sort of:

  • Alice
    • footwear
  • Bean
    • Hats
    • Pants
  • Charley
    • footwear
    • Gloves
  • Zelda
    • Hats

Or in another way:

  • Footwear
    • Alice
    • Charlie
  • Hats
    • Bob
    • Zelda
  • Gloves
    • Charlie
  • Pants
    • Bob

EDIT 2

Well, I made another great truth table to hope to get a better idea of ​​how you want to separate things:

  ABCDEFGHIJKLMNOPQRSTU VWXYZ 1 xxxx 2 xxxxxx 3 xxxx 4 xxx 5 xxx 6 xxx 7 xxx 8 xxx 

For the sake of argument, let's just say that you can only fit 4 rows per page (because I don't want to print a giant table early in the morning), so we will split it into two pages. First, it is important to show each line, right? Secondly, you need to show columns that never matter. For example, Y and Z never matter for rows 1 through 8 in this table, can they be excluded from the report or should they still be there? Third, is row ordering important?

If it is not important to show completely empty columns, we could remove 10 columns from the above table and compress it to:

  ABCEFHILMOPQRUVW 1 xxxx 2 xxxxxx 3 xxxx 4 xxx 5 xxx 6 xxx 7 xxx 8 xxx 

Then, if the order of the lines is not important, you can compress it further by taking the optimal arrangement of the lines (not necessarily shown here). The two tables below further compress to 11 and 10 columns:

  ABCFHIMPQRU 1 xxxx 2 xxxxxx 5 xxx 7 xxx AEHILMOPUW 3 xxxx 4 xxx 6 xxx 8 xxx 

Am I going the completely wrong way here? These are all questions that will help me better understand your data and release requirements.

In addition, in all seriousness, is it possible to get larger printers / plotters? Also, is it possible to simply create a PDF file and use the print option in Acrobat format?

+1
source

Last year, I read an article in the journal PLoS in Computational Biology ( www.ploscompbiol.org ) that seems like your problem.

In short, he describes a new approach, when we already have a set of proteins and tabular data on their one-to-one interaction, and we want to group them so that the interaction within the group and the interaction between the two groups is either maximized or (this is an innovative idea) reduced to to a minimum.

If we build an initial data table with black for high and white for low interaction, it looks randomly gray. The table of results after performing calculations and reordering (therefore, the grouped elements are located next to each other), more closely resembles the orthogonal regions of black and white.

Article: Protein Interaction Networks - More Than Simple Modules ,

where there are also links to other older methods for grouping this type of data.

0
source

All Articles