How to extract a string using SYMBOLS after a pattern in a URL string in Google BigQuery

I have two possible URL string forms

http://www.abcexample.com/landpage/?pps=[Y/lyPw==;id_1][Y/lyP2ZZYxi==;id_2];[5403;ord];
http://www.abcexample.com/landpage/?pps=Y/lyPw==;id_1;unknown;ord; 

I want to quit Y/lyPw==in both examples

therefore before ;id_1between brackets

will always appear after ?pps=part

What is the best way to approach this? I want to use a large query language, as my data is sitting here

+4
source share
3 answers

Here is one way to create a regular expression:

SELECT REGEXP_EXTRACT(url, r'\?pps=;[\[]?([^;]*);') FROM
(SELECT "http://www.abcexample.com/landpage/?pps=;[XYZXYZ;id_1][XYZZZZ;id_2];[5403;ord];" 
  AS url),
(SELECT "http://www.abcexample.com/landpage/?pps=;XYZXYZ;id_1;unknown;ord;"
  AS url)
+7
source

You can use this regex:

pps=\[?([^;]+)

Working demo

Regular expression visualization

The idea of ​​this regular expression is as follows:

pps=    -> Look for the pps= pattern
\[?     -> might have a [ or not
([^;]+) -> store the content up to the first semi colon

, url () (), :

enter image description here

BigQuery

REGEXP_EXTRACT('str', 'reg_exp')

:

REGEXP_EXTRACT: str, .

:

SELECT
   REGEXP_EXTRACT(word,r'pps=\[?([^;]+)') AS fragment
FROM
   ...

:

SELECT
   REGEXP_EXTRACT(url,r'pps=\[?([^;]+)') AS fragment
FROM
(SELECT "http://www.abcexample.com/landpage/?pps=;[XYZXYZ;id_1][XYZZZZ;id_2];[5403;ord];" 
  AS url),
(SELECT "http://www.abcexample.com/landpage/?pps=;XYZXYZ;id_1;unknown;ord;"
  AS url)
+6

This regex should work for you

(\w+);id_1

He will extract XYZXYZ

It uses the concept of group capture

See this demo

+2
source

All Articles