I have several Azure storage accounts and am trying to use HDInsight to query storage analytics logs. I want to use a single query for all storage accounts, so I created an external Hive table and added a section for each storage account:
ADD JAR wasb:///HdiSamples/StorageAnalytics/hive-serde-microsoft-wa-0.13.0.jar;
CREATE EXTERNAL TABLE IF NOT EXISTS AggregateStorageLogs3 (
VersionNumber string,
RequestStartTime string,
OperationType string,
RequestStatus string,
HttpStatusCode string,
EndToEndLatencyInMs bigint,
ServerLatencyInMs bigint,
AuthenticationType string,
RequesterAccountName string,
OwnerAccountName string,
ServiceType string,
RequestUrl string,
RequestedObjectKey string,
RequestIdHeader string,
OperationCount bigint,
RequesterIpAddress string,
RequestVersionHeader string,
RequestHeaderSize bigint,
RequestPacketSize bigint,
ResponseHeaderSize bigint,
ResponsePacketSize bigint,
RequestContentLength bigint,
RequestMD5 string,
ServerMD5 string,
ETagIdentifier string,
LastModifiedTime string,
ConditionsUsed string,
UserAgentHeader string,
ReferrerHeader string,
ClientRequestId string)
COMMENT 'aggregated storage analytics log data'
PARTITIONED BY (StorageAccount string)
ROW FORMAT SERDE 'com.microsoft.hadoop.hive.serde2.windowsazure.StorageAnalyticsLogSerDe';
ALTER TABLE AggregateStorageLogs3 ADD IF NOT EXISTS PARTITION(StorageAccount = 'mystorageacc1')
LOCATION 'wasb://$logs@mystorageacc1.blob.core.windows.net/blob/';
ALTER TABLE AggregateStorageLogs3 ADD IF NOT EXISTS PARTITION(StorageAccount = 'mystorageacc2')
LOCATION 'wasb://$logs@mystorageacc2.blob.core.windows.net/blob/';
ALTER TABLE AggregateStorageLogs3 ADD IF NOT EXISTS PARTITION(StorageAccount = 'mystorageacc3')
LOCATION 'wasb://$logs@mystorageacc3.blob.core.windows.net/blob/';
Then I tried to count the entries in the external table to find the total number of log entries in all storage accounts, for example:
SET hive.mapred.supports.subdirectories= true;
SET mapred.input.dir.recursive=true;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode=nonstrict;
ADD JAR wasb:///HdiSamples/StorageAnalytics/hive-serde-microsoft-wa-0.13.0.jar;
SELECT COUNT(*)
FROM AggregateStorageLogs3
. Hadoop, . Log4J, , , , . , - , , .
, - , ?
:
Logging initialized using configuration in file:/C:/apps/dist/hive-0.13.0.2.1.9.0-2196/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/apps/dist/hadoop-2.4.0.2.1.9.0-2196/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/apps/dist/hbase-0.98.0.2.1.9.0-2196-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
converting to local wasb:///HdiSamples/StorageAnalytics/hive-serde-microsoft-wa-0.13.0.jar
Added D:\Users\hdp\AppData\Local\Temp\00f60c60-6a8e-4de8-87c1-92ba2a402fa6_resources\hive-serde-microsoft-wa-0.13.0.jar to class path
Added resource: D:\Users\hdp\AppData\Local\Temp\00f60c60-6a8e-4de8-87c1-92ba2a402fa6_resources\hive-serde-microsoft-wa-0.13.0.jar
Query ID = hdp_20150113225858_c1a01e81-7e6b-4153-a7b5-5c2f6266aca7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
log4j:ERROR Failed to rename [C:\apps\dist\hive-0.13.0.2.1.9.0-2196\logs/hive.log] to [C:\apps\dist\hive-0.13.0.2.1.9.0-2196\logs/hive.log.2015-01-13].
org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
at org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:1899)
at org.apache.hadoop.fs.azurenative.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1568)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1642)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:291)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:263)
at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:75)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:344)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:310)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:435)
at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:456)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:466)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
at com.microsoft.windowsazure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113)
at org.apache.hadoop.fs.azurenative.StorageInterfaceImpl$WrappingIterator.hasNext(StorageInterfaceImpl.java:86)
at org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:1874)
... 50 more
Caused by: com.microsoft.windowsazure.storage.StorageException: The server encountered an unknown failure: OK
at com.microsoft.windowsazure.storage.StorageException.translateException(StorageException.java:179)
at com.microsoft.windowsazure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:277)
at com.microsoft.windowsazure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109)
... 52 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1146584]
Message: Connection reset
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:596)
at com.microsoft.windowsazure.storage.core.DeserializationHelper.readElementFromXMLReader(DeserializationHelper.java:152)
at com.microsoft.windowsazure.storage.core.DeserializationHelper.readElementFromXMLReader(DeserializationHelper.java:129)
at com.microsoft.windowsazure.storage.blob.BlobDeserializer.readBlobProperties(BlobDeserializer.java:375)
at com.microsoft.windowsazure.storage.blob.BlobDeserializer.readBlob(BlobDeserializer.java:200)
at com.microsoft.windowsazure.storage.blob.BlobDeserializer.readBlobItems(BlobDeserializer.java:140)
at com.microsoft.windowsazure.storage.blob.BlobDeserializer.getBlobList(BlobDeserializer.java:87)
at com.microsoft.windowsazure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1236)
at com.microsoft.windowsazure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1200)
at com.microsoft.windowsazure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:200)
... 53 more
Job Submission failed with exception 'org.apache.hadoop.fs.azure.AzureException(java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask