Your requirements are a bit vague, even after reading all the comments. Given your example and explanation, I assume your requirements are as follows:
- The input is a string consisting of (x) html tags. Your example does not contain this, but I assume that the input may contain text between tags.
- In the context of your problem, we do not care about nesting. Thus, input really is just text mixed with tags, where opening, closing, and self-closing tags are considered equivalent.
- Tags may contain quoted values.
- You want to truncate your string so that the string is not truncated in the middle of the tag. Thus, in a truncated line, each '<' character must have a corresponding character ">".
I will give you two solutions, simple, which may be incorrect, depending on what is entered exactly, and more complicated, which is correct.
First decision
For the first solution, we first find the last ">" before the truncation size (this corresponds to the last tag that was completely closed). After this symbol, text may appear that does not belong to any tag, so we are looking for the first '<' after the last closed tag. In code:
public static String truncate1(String input, int size) { if (input.length() < size) return input; int pos = input.lastIndexOf('>', size); int pos2 = input.indexOf('<', pos); if (pos2 < 0 || pos2 >= size) { return input.substring(0, size); } else { return input.substring(0, pos2); } }
Of course, this solution does not take into account quoted strings: '<' and '>' characters may appear inside the string, in which case they should be ignored. I mention the solution anyway, because you mention that your entry is sanitized, so maybe you can make sure the quoted lines never contain '<' and '>'.
Second solution
To look at quoted strings, we can no longer rely on standard Java classes, but we must scan the input ourselves and remember whether we are inside the tag and inside the string or not. If we encounter a '<' character outside the line, we remember its position, so that when we reach the truncation point, we know the position of the last open tag. If this tag has not been closed, we trim it before the start of this tag. In code:
public static String truncate2(String input, int size) { if (input.length() < size) return input; int lastTagStart = 0; boolean inString = false; boolean inTag = false; for (int pos = 0; pos < size; pos++) { switch (input.charAt(pos)) { case '<': if (!inString && !inTag) { lastTagStart = pos; inTag = true; } break; case '>': if (!inString) inTag = false; break; case '\"': if (inTag) inString = !inString; break; } } if (!inTag) lastTagStart = size; return input.substring(0, lastTagStart); }