Splitting a string into parts with a specific pattern and conditions

I have a similar array of approximately 5k + lines as output from a specific application (for security reasons, I may not provide accurate data, but the format of the example is pretty much like the actual data):

kasdfhkasdhfaskdfj42345sdsadkfdkfhasdf5345534askfhsad asdfasdf66sdafsdfsdf4560sdfasdfasdf sdfaasdfs96sadfasdf65459asdfasdf sadfasdf8asdfasdas06666654asdfasdfsd fasdjfsdjfhgasdf6456sadfasdfasdf9sdfasdfsadf 

I just have an inextricable alphanumeric string consisting of 5 parts :

 [latin letters][1 or more digits][latin letters][1 or more digits][latin letters] 

The length of the parts of the letters, as well as the number of numbers is random, the total length of the string can vary from several to 2-3 hundred characters, but the pattern remains the same as above.

Almost I'm interested in the leading and ending parts of the string , i.e. [1 or more digits][latin letters][1 or more digits] may just be thrown away, but you need to extract 2 more lines to separate the cells.

I tried the SUBSTITUTE and SEARCH functions, but I still can't handle a random number of digits. VBA is the last approach desired, however it is acceptable if pure formulas are useless. Moreover, the solution should be flexible for possible future use with similar templates, so any approach to the board / general approach will be evaluated.

+4
source share
3 answers

If you do not mind using MS Word instead of Excel, there is a very simple approach for such tasks, which includes the built-in Search and Replace procedure using wildcards . Assuming data can be opened in Word, follow these steps:

  • Press CTRL + H to open the Replace dialog box.
  • Tick Use wildcards .
  • Part of your data that you want to discard corresponds to the following pattern: [0-9]{1,}*[0-9]{1,} - this means that any digit is 1 or more times with any characters in between. Depending on your regional settings you will need ; instead of,.
  • Indicate as a replacement any char that you like, for example. ^t (Tab) or ; - for further separation of parts.
  • Perform a replacement.
  • If you wish, you can convert the rest to a table using the function Insert > Table > Convert Text to Table...

Now you need to save / paste the result.

In fact, the approach is quite powerful, and many regular textual data analyzing tasks like yours can be quickly completed without special skills and / or programming. And you do not need any third-party tool for this - every PC has Word installed today.

More about patterns and applicable cases:

+3
source

based on this lesson from the great chandoo (whom you should follow if you want to be excellent in excel:

use this formula (note the array formula, you need to enter it with ENTER + SHIFT + CTRL ) to extract

 {=MIN(IFERROR(FIND(lstNumbers,G6),""))} 

where lstNumbers is the named range in the sheets with cells containing 0-9 (each number in the cell), and e1 is the cell containing the data.

this will return the first number and then you can extract the first section with:

 =LEFT(E1,G1-1) 

where e1 contains data and g1 is the previous formula

to get the end of the used numeric section:

 {=MAX(IFERROR(FIND(lstNumbers,E1),""))} 

then you can use the middle to extract the number section and use len (datacell) - len (from the max function) to extract the right (or middle) rest of the string. where we will use the same call - getting the first number with min, the last with max, etc.

Good luck this is a real harton, doing it with a real programming language would be easier, perhaps

+2
source

UPDATED:

This array formula will give you the first line:

  =LEFT(A1,MATCH(0,1*ISERROR(1*MID(A1,ROW(INDIRECT("$A1:$A"&LEN(A1))),1)),0)-1) 

This array formula will give you the last line:

  =RIGHT(A1,MATCH(0,1*ISERROR(1*MID(A1,LEN(A1)+1-ROW(INDIRECT("$A1:$A"&LEN(A1))),1)),0)-1) 
+2
source

All Articles