Introduction to AWS Athena:
Hello Readers, this is will be a quick and crisp guide to AWS Athena. Here we shall learn the steps to setup Athena. Therefore, we shall go through the some of the core concept of it.
AWS Athena is a server-less service that is used to query data from the data source mainly from S3 bucket. Therefore, we can execute SQL like query in athena query editor.
Hence, the core story is, the data is stored inside S3 bucket that can be in different formats like csv,tsv and some apache formats despite that we can query the data.
There is a another aspect of AWS Athena is AWS Glue. Athena can also be used to query data extracted by Glue. As this a beginner’s guide, hence I shall pause the Glue topic here. I shall publish a post on AWS Glue, there we shall discuss in detail.
I have released a youtube video on this article, you can visit here.
Steps at high level for AWS Athena:
- Assign IAM role.
- Upload a CSVfile in a S3 bucket.
- Create database and table.
- Query data.
IAMĀ permission required for AWS Athena :
Lets first diccuss the permission part, without this we could be able to do any operation inside AWS Athena console.
Here we are assuming that, we have user(UserX) apart from root of the aws account. Therefore, we need to allow UserX to do Athena operation.
Create an user group for Athena and Glue services and attached two permission, as shown in the image below:
Once done, UserX has full access to Athena and glue. Therefore, now we shall move to S3 bucket. Upload toy CSV file to a specific S3 bucket. Here I have created a folder Student and kept the student.csv file.
I have knowingly added AWS glue permission of this. Because Athena is a services that is tightly coupled with Glue.
The student CSV file has around 100 rows. Some of them I have shown below. Similarly, you can directly copy paste the below lines of code else you can paste another CSV file according to your need!
FirstName,LastName,Marks,Address Aarav,Sharma,85,"123 Green St, Mumbai" Aisha,Khan,90,"456 Blue Rd, Delhi" Rohan,Verma,78,"789 Red Ave, Bangalore" Sneha,Patel,92,"101 Yellow Ln, Hyderabad" Vikram,Gupta,88,"202 Orange Blvd, Chennai" Isha,Nair,95,"303 Pink St, Kolkata" Kabir,Mishra,80,"404 Purple Rd, Pune" Tanya,Aggarwal,86,"505 Brown Ave, Jaipur" Manish,Jain,83,"606 Silver Ln, Lucknow" Priya,Saxena,91,"707 Gold Blvd, Chandigarh"
Note: Athena treats the first row, that is column name, as the data. Therefore, please delete the first row before uploading to S3 bucket.
Creating Database and Table in AWS Athena:
Similar to SQL, to store data we need to have database and table inside Athena as well. Therefore, first we small create database. Use the following query to create database
CREATE DATABASE IF NOT EXISTS toy;
Query to create Aws Athena console:
CREATE EXTERNAL TABLE IF NOT EXISTS Student( first_name string, last_name string, marks float, address string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 's3://yourbucketname/student/'
Note: Location is the folder address where csv file is located in the s3 bucket.
If our table query is successfully executed, then we shall see the student in the left hand side of the screen as shown in the picture below.
Now when our student table is created, lets execute a basic select query on it. The picture below show the result:
Conclusion:
AWS Athena is server-less service using this with minimal setup we have executed complex and powerful SQL like queries . You might have seen I have included Glue permission in this article. Now that’s because Athena and Glue are tightly coupled. That is, the tables are created in Glue can also be queried inside Athena. Therefore, this article is kind of a foundation for the next article, that is Glue.