How can I query paginated BigQuery query results using pageTokens with the Google Client library for Java?

I want to run BigQuery queries with thousands of rows of total results, but I only want to get a page of 100 results at a time (using the maxResults and pageToken ).

The BigQuery API supports the use of pageToken parameters in collection.list methods. However, I run asynchronous queries and retrieve the results using the getQueryResult method and does not seem to support the pageToken parameter. Is it possible to use pageToken with getQueryResults ?

+6
source share
1 answer

Update: There appeared new documentation on how to page through the list results .

I myself answer this question because the developer asked me privately, and I want to share the answer with Stack Overflow with him.

The pageToken parameter is available for querying paginated results from the Tabledata.list method. Result sets are automatically paginated if, for example, the result data exceeds 100,000 rows or 10 MB of results. You can also request pagination by setting the maxResults parameter explicitly. Each results page will return the pageToken parameter, which can then be used to get the next page of results.

Each query results in a new BigQuery table. If you did not specify the table explicitly, it lasts 24 hours. However, even unnamed "anonymous" tables have an identifier. In any case, after inserting the query job, get the name of the newly created table. Then use the tabledata.list method (and the combination of maxResults / pageToken parameters) to request paginated results. Loop and continue to call tabledata.list using the previously received Token page until the Tokens page is no longer returned (this means that you have reached the last page.

Using the Google API client library for Java, code to insert a query job, polling to complete the query, and then searching the page after the query results page may look something like this:

 // Create a new BigQuery client authorized via OAuth 2.0 protocol // See: https://developers.google.com/bigquery/docs/authorization#installed-applications Bigquery bigquery = createAuthorizedClient(); // Start a Query Job String querySql = "SELECT TOP(word, 500), COUNT(*) FROM publicdata:samples.shakespeare"; JobReference jobId = startQuery(bigquery, PROJECT_ID, querySql); // Poll for Query Results, return result output TableReference completedJob = checkQueryResults(bigquery, PROJECT_ID, jobId); // Return and display the results of the Query Job displayQueryResults(bigquery, completedJob); /** * Inserts a Query Job for a particular query */ public static JobReference startQuery(Bigquery bigquery, String projectId, String querySql) throws IOException { System.out.format("\nInserting Query Job: %s\n", querySql); Job job = new Job(); JobConfiguration config = new JobConfiguration(); JobConfigurationQuery queryConfig = new JobConfigurationQuery(); config.setQuery(queryConfig); job.setConfiguration(config); queryConfig.setQuery(querySql); Insert insert = bigquery.jobs().insert(projectId, job); insert.setProjectId(projectId); JobReference jobId = insert.execute().getJobReference(); System.out.format("\nJob ID of Query Job is: %s\n", jobId.getJobId()); return jobId; } /** * Polls the status of a BigQuery job, returns TableReference to results if "DONE" */ private static TableReference checkQueryResults(Bigquery bigquery, String projectId, JobReference jobId) throws IOException, InterruptedException { // Variables to keep track of total query time long startTime = System.currentTimeMillis(); long elapsedTime; while (true) { Job pollJob = bigquery.jobs().get(projectId, jobId.getJobId()).execute(); elapsedTime = System.currentTimeMillis() - startTime; System.out.format("Job status (%dms) %s: %s\n", elapsedTime, jobId.getJobId(), pollJob.getStatus().getState()); if (pollJob.getStatus().getState().equals("DONE")) { return pollJob.getConfiguration().getQuery().getDestinationTable(); } // Pause execution for one second before polling job status again, to // reduce unnecessary calls to the BigQUery API and lower overall // application bandwidth. Thread.sleep(1000); } } /** * Page through the result set */ private static void displayQueryResults(Bigquery bigquery, TableReference completedJob) throws IOException { long maxResults = 20; String pageToken = null; int page = 1; // Default to not looping boolean moreResults = false; do { TableDataList queryResult = bigquery.tabledata().list( completedJob.getProjectId(), completedJob.getDatasetId(), completedJob.getTableId()) .setMaxResults(maxResults) .setPageToken(pageToken) .execute(); List<TableRow> rows = queryResult.getRows(); System.out.print("\nQuery Results, Page #" + page + ":\n------------\n"); for (TableRow row : rows) { for (TableCell field : row.getF()) { System.out.printf("%-50s", field.getV()); } System.out.println(); } if (queryResult.getPageToken() != null) { pageToken = queryResult.getPageToken(); moreResults = true; page++; } else { moreResults = false; } } while (moreResults); } 
+11
source

All Articles