Splitting a line before each ATG entry is simple, just use
result = subject.split(/(?=ATG)/i);
(?=ATG) is a positive statement meaning "Approve that you can match ATG starting at the current position in the line."
This will divide GGGATGTTTATGGGGATGCCC into GGG , ATGTTT , ATGGGG and ATGCCC .
So now you have an array of strings (in this case four). I would go and take them, discard the first one (this one will never contain and not start with ATG ), and then join the lines no. 2 + ... + n , then 3 + ... + n , etc., until you have exhausted the list.
Of course, this regular expression does not make any check as to whether the string contains only ACGT characters, since they correspond only to positions between characters, so this should be done before, i. e. that the input string matches /^[ACGT]*$/i .
source share