RBloggers|RBloggers-feedburner
Intro: This is a quick update on the latest features for RAthena and noctua 2.6.0.
Latest Features: Endpoint Override: Introducing a new parameter within the function dbConnect, endpoint_override. This allows RAthena/noctua to override each AWS service endpoint they connect to. RAthena/noctua connect to the following AWS services:
AWS Athena ( https://aws.amazon.com/athena/ ): main service to manipulate AWS Athena. AWS Glue ( https://aws.amazon.com/glue/ ): service to the get AWS Glue Catalogue for AWS Athena.
RBloggers|RBloggers-feedburner
Intro: This is a quick update on the latest features for RAthena and noctua 2.4.0.
Latest features: dbplyr: RAthena and noctua now fully supports dbplyr backend api 2+. dplyr database generics will be deprecated in later versions of the dbplyr package development. This is to future proof RAthena and noctua, while keeping the same functionality developed for dbplyr backend api version 1.
dplyr and unload: RAthena and noctua can now set AWS Athena Unload on a session level.
RBloggers|RBloggers-feedburner
Intro: As it has been an while since RAthena and noctua updates have been announce, I thought I would try and get them all out of the way now. This blog will cover, key new features that has been made from version 1.9.0 to 2.3.0.
New Features: Big integers: Big integers from AWS Athena can be return to R in the following supported data types [integer64, integer, numeric, character] Extra AWS Athena data types: Added support to AWS Athena data types [array, row, map, json, binary, ipaddress] library(DBI) library(RAthena) # default conversion methods con <- dbConnect(RAthena::athena()) # change json conversion method RAthena_options(json = "character") RAthena:::athena_option_env$json # [1] "character" # change json conversion to custom method RAthena_options(json = jsonify::from_json) RAthena:::athena_option_env$json # function (json, simplify = TRUE, fill_na = FALSE, buffer_size = 1024) # { # json_to_r(json, simplify, fill_na, buffer_size) # } # <bytecode: 0x7f823b9f6830> # <environment: namespace:jsonify> # change bigint conversion without affecting custom json conversion methods RAthena_options(bigint = "numeric") RAthena:::athena_option_env$json # function (json, simplify = TRUE, fill_na = FALSE, buffer_size = 1024) # { # json_to_r(json, simplify, fill_na, buffer_size) # } # <bytecode: 0x7f823b9f6830> # <environment: namespace:jsonify> RAthena:::athena_option_env$bigint # [1] "numeric" # change binary conversion without affect, bigint or json methods RAthena_options(binary = "character") RAthena:::athena_option_env$json # function (json, simplify = TRUE, fill_na = FALSE, buffer_size = 1024) # { # json_to_r(json, simplify, fill_na, buffer_size) # } # <bytecode: 0x7f823b9f6830> # <environment: namespace:jsonify> RAthena:::athena_option_env$bigint # [1] "numeric" RAthena:::athena_option_env$binary # [1] "character" # no conversion for json objects con2 <- dbConnect(RAthena::athena(), json = "character") # use custom json parser con <- dbConnect(RAthena::athena(), json = jsonify::from_json) RStudio connection tab: Allowed RStudio connection tab to be optional, this is to speed up connection when users are connecting to large Data Lakes.
RBloggers|RBloggers-feedburner
I am happy to announce that RAthena-1.9.0 and noctua-1.7.0 have been released onto the cran. They both bring two key features:
More stability when working with AWS Athena, focusing on AWS Rate Exceeded throttling errors New helper function to convert AWS S3 backend files to save cost NOTE: RAthena and noctua features correspond to each other, as a result I will refer to them interchangeability.
Stability Throttling AWS One of the main problems when working with AWS API is stumbling into Rate Exceeded throttling error.
RBloggers|RBloggers-feedburner
RAthena 1.7.1 and noctua 1.5.1 package versions have now been released to the CRAN. They both bring along several improvements with the connection to AWS Athena, noticeably the performance speed and several creature comforts.
These packages have both been designed to reflect one another,even down to how they connect to AWS Athena. This means that all features going forward will exist in both packages. I will refer to these packages as one, as they basically work in the same way.
RBloggers|RBloggers-feedburner
Intro: After developing the package RAthena, I stumbled quite accidentally into the R SDK for AWS paws. As RAthena utilises Python’s SDK boto3 I thought the development of another AWS Athena package couldn’t hurt. As mentioned in my previous blog the paws syntax is very similar to boto3 so alot of my RAthena code was very portable and this gave me my final excuse to develop my next R package.
RBloggers|RBloggers-feedburner
Intro: For a long time I have found it difficult to appreciate the benefits of “cloud compute” in my R model builds. This was due to my initial lack of understanding and the setting up of R on cloud compute environments. When I noticed that AWS was bringing out a new product AWS Sagemaker, the possiblities of what it could provide seemed like a dream come true.
Amazon SageMaker provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly.
RBloggers|RBloggers-feedburner
Recap: RAthena is a R package that interfaces into Amazon Athena. However, it doesn’t use the standard ODBC and JDBC drivers like AWR.Athena and metis. Instead RAthena utilises Python’s SDK (software development kit) into Amazon, Boto3. It does this by using the reticulate package that provides an interface into Python. What this means is that RAthena doesn’t require any driver installation or setup. That can be particularly difficult when you are considering setting up the ODBC drivers and you are not familiar with how ODBC works on your current operating system.
RBloggers|RBloggers-feedburner
Intro: Currently there are two key ways in connecting to Amazon Athena from R, using the ODBC and JDBC drivers. To access the ODBC driver R users can use the excellent odbc package supported by Rstudio. To access the JDBC driver R users can either use the RJDBC R package or the helpful wrapper package AWR.Athena which wraps the RJDBC package to make the connection to Amazon Athena through the JDBC driver simpler.