Suppose you have an external table, namely. .employee organization like: (not including TBLPROPERTIES)
hive> show create table organization.employee; OK CREATE EXTERNAL TABLE `organization.employee`( `employee_id` bigint, `employee_name` string, `updated_by` string, `updated_date` timestamp) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'
You want to remove the updated_by, updated_date columns from the table. Follow these steps:
create a replica of the temp table to organize .employee as:
hive> create table organization.employee_temp as select * from organization.employee;
Drop the main organization.employee table.
hive> drop table organization.employee;
remove basic data from HDFS (you must exit the hive shell)
[ nameet@ip-80-108-1-111 myfile]$ hadoop fs -rm hdfs://getnamenode/apps/hive/warehouse/organization.db/employee/*
create a table with deleted columns as needed:
hive> CREATE EXTERNAL TABLE `organization.employee`( `employee_id` bigint, `employee_name` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'
insert source records into the source table.
hive> insert into organization.employee select employee_id, employee_name from organization.employee_temp;
finally discard the created temp table
hive> drop table organization.employee_temp;