R doesn't need to throttle AWS Athena anymore
RBloggers|RBloggers-feedburner
I am happy to announce that RAthena-1.9.0 and noctua-1.7.0 have been released onto the cran. They both bring two key features:
- More stability when working with
AWS Athena, focusing onAWSRate Exceededthrottling errors - New helper function to convert
AWS S3backend files to save cost
NOTE: RAthena and noctua features correspond to each other, as a result I will refer to them interchangeability.
Stability
Throttling AWS
One of the main problems when working with AWS API is stumbling into Rate Exceeded throttling error. With the latest update to the packages, the connection between AWS Athena and R has been made more robust through retry functionality. This allows R to automatically retry its request using an exponential backoff with jitter (Best practices for API packages)

Rsends a call toAWS Athena, let’s saydbListTables(con). HoweverAWSis over run with requests, and returns an error back toRsaying it is overwhelmed (this is arate exceededthrottling error). AsRAthenaandnoctuaretry noisely, the error will be printed to the console letting you knowAWSis busy ({expection message} + "Request failed. Retrying in " + {wait time} + " seconds...").Rwill then wait for a given time (please see error format above) and retry the request again.AWSreplies it is still busy and can’t do the request.- This time
Rwill back off for a long period of time, this givesAWSsome breathing room. Now whenRsends the request over toAWS,AWSis able to complete the call and return out desired results.
This feature is a great step in the right direction for making R and AWS Athena work together seamlessly. For anyone who wishes to create their own retry method both packages have enabled this through their ..._options() function. For more information please refer to link.
Save the pennies
Converting AWS S3 files
AWS Athena costs by the amount of data it scans. This makes it very important to have your AWS S3 backend files in the suitable format to reduce the cost of using AWS Athena. This is where the next key feature comes in. This feature basically creates a simple wrapper to allow you to convert AWS S3 files into a more suitable format.
library(DBI)
library(RAthena)
con <- dbConnect(athena())
# Upload iris data.frame to AWS Athena as a delimited file
dbWriteTable(con, "iris_delim", iris)
# Convert to parquet using AWS Athena
dbConvertTable(con,
obj = "iris_delim",
name = "iris_parquet",
file.type = "parquet")
In this example simply uploaded iris data.frame to AWS Athena in a default delimited file format (please see link for more information around how to upload data to AWS Athena). Then it is converted into parquet file format using AWS Athena. This wrapper isn’t limited to converting just AWS Athena tables, it can also convert SQL DML queries. Please refer to dbConvertTable for more documentation or to the dbConvertTable vignette link.
Finally for more informations around best practises with AWS Athena please look at Top 10 Perfromace Tuning Tips for Amazon Athena
Sum Up
These two new features bring R and AWS Athena that little bit closer together. As always if you have any new features or identify any bugs please feel free to raise a pull request or ticket on the corresponding package github pages (RAthena and nocuta)