Create JSON data in Apache Hive

PRASANTH
2 min readJan 13, 2021

Hello Friends,

We are going to discuss about few real time problems in Hive

Creating JSON data without using JSON SerDe or UDFs.

Table Structure:

CREATE TABLE customer(
customerid string,
firstname string,
lastname string,
address string,
city string,
state string,
zipcode string,
email string,
mobile string
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED by ‘\n’ STORED AS TEXTFILE;

Raw Data:

1,Niru,Kands,KVM St.,Austin,Texas,23452,nir.kandi@abc.com,765–432–9807
2,Havi,anipeu,ABS St.,Austin,Texas,23452,,123–897–2568
3,Suma,angr,KLM St.,Austin,Texas,23452,suma.angry@xyz.com,465–903–0789
4,Pras,naga,LOTUS St.,HYD,TN,50023,pras.naga@pqr.com,6307595897

select * from customer;

Problem 1: Columns names should be in mixed case for JSON data. (by default, lowercase in Hive)

Problem 1: sample format

Solution:

select
a.CustomerID,
map(“Customer”, map(“FirstName”,a.firstname, “LastName”, a.lastname),
“Address”, map(“Street”,a.street,
“City”,a.City,
“State”, a.state,
“ZipCode”, cast(a.zipcode as string)
),
“Email”, map(“Email”, a.email),
“Mobile”, map(“Mobile”, cast(a.mobile as string))
) as data,
current_timestamp() as report_date
from
customer a;

Note: map() converts the data into JSON format and wont convert column name into lowercase

Problem 2: Each column data should be in JSON format

Problem 2: sample format

Solution:

select
a.CustomerID,
named_struct(“FirstName”,a.firstname, “LastName”, a.lastname) as customer,
named_struct(“Street”,a.street,
“City”,a.City,
“State”, a.state,
“ZipCode”, cast(a.zipcode as string)
) as address,
named_struct(“Email”, a.email) as Email,
named_struct(“Mobile”, cast(a.mobile as string)) as Mobile,
current_timestamp() as report_date
from
customer a;

Note: named_struct() converts the data into JSON format

Problem 3: JSON data inArray format

Problem 3: sample format

Solution:

select
CustomerID,
collect_set(named_struct(“FirstName”,a.firstname, “LastName”, a.lastname)) as customer,
collect_set(named_struct(“Street”,a.street,
“City”,a.City,
“State”, a.state,
“ZipCode”, cast(a.zipcode as string)
)) as address,
collect_set(named_struct(“Mobile”, cast(a.mobile as string))) as Mobile,
current_timestamp() as report_date
from
customer a
group by CustomerID, current_timestamp();

Note: collect_set() required a group by for non collect_set columns. collect_set() is used to convert data into array format. you can use array() insetad of collect_set()

Tip: always execute Hive commands using .hql file instead of execute in hive shell in order avoid error or improper data format [ hive -f abc.hql]

Happy Learning!!

--

--

PRASANTH
0 Followers

Techie | Investor | Freelancer | Blogger | Vegetarian