I came across some code that was doing something like this:
Map<String,String> fullNameById = buildMap1(dataSource1); Map<String,String> nameById = buildMap2(dataSource2); Map<String,String> nameByFullName = new HashMap<String,String>(); Map<String,String> idByName = new HashMap<String,String>(); Set<String> ids = fullNameById.keySet(); for (String nextId : ids) { String name = nameById.get(nextId); String fullName = fullNameById.get(nextId); nameByFullName.put(fullName, name); idByName.put(name, nextId); }
I had to look at him for several minutes to understand what was happening. All this boils down to a connection operation by id and inversion of one of the source maps. Since Id, FullName and Name are always 1: 1: 1, it seemed to me that there should be some way to simplify this. I also found that the first two cards are never used again, and I find the code above is a little hard to read. So I'm thinking about replacing this with something like this, which (for me) reads a lot cleaner
Table<String, String, String> relations = HashBasedTable.create(); addRelationships1(dataSource1, relations); addRelationships2(dataSource2, relations); Map<String,String> idByName = relations.column("hasId"); Map<String,String> nameByFullName = relations.column("hasName"); relations = null;
In addRelationships1 I do
relations.put(id, "hasFullName", fullname);
And in addRelationships2, where my query gives the values โโfor id and name , I do
relations.put(relations.remove(id,"hasFullName"), "hasName", name); relations.put(name, "hasId", id);
So my questions are:
- Is there any hidden inefficiency in what I did, either with a processor or memory, or with a GC load? I donโt think so, but I am not familiar with table efficiency. I know that the Table object will not be GC'd after
relations = null , I just want to report that it is not used again in a rather long section of the following code. - Have I Got Efficiency? I persuade and unconvincingly that I have and not.
- Do you think this is more readable? Or is it just easy for me to read because I wrote it? I'm a little concerned about this front due to the fact that
Table not very well known. On the other hand, at the top level, itโs now quite clearly said: โCollect data from two sources and make these two cards from them.โ I also like that this does not leave you wondering if / where two other cards are used (or not). - Do you have an even better, cleaner, faster and easier way to do this than any of the above?
Please do not optimize the early / late discussion here. I know this trap well. If it improves readability without sacrificing performance, I am satisfied. Improving efficiency would be a good bonus.
Note. my variable and method names were sanitized here to distract the business sphere from discussion, I will definitely not call them addRelationships1 or datasource1! Similarly, the final code will, of course, use constants, not raw strings.