Connecting to Oracle database from AWS Glue Python Shell — Not possible

TLDR

The short version of the story is, you cannot. It is not easy and even if you can, it’s not straight forward. Use AWS Batch instead

I have tried various ways to make this happen and finally gave up. I hope this article serves as a single place of answers for all your questions.

Oracle provides cx_Oracle python module to enable access to Oracle database. However, if you take a look at the installation instructions, you’ll clearly notice that the module is dependent on Oracle instant client to be available on the machine which is executing the python script. An environment variable with name LD_LIBRARY_PATH must be set pointing to the directory which contains the instant client libraries.

image.png Screenshot from cx_Oracle installation instructions

Unavailability of the instant client libraries in the path will result in the following exception (or a similar one) when connecting to Oracle database from the Python program.

DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory"

The catch is, this environment variable must be set even before the python script starts execution. So if you are planning to host the oracle client on an S3 bucket and download it to a location within the python script which is added to the AWS Glue Python Shell, it won’t work. Check this stack overflow question. It is not sufficient to make the instant client libraries available in the Glue Python Shell runtime environment. However, this approach may work in an AWS Lambda layer as AWS Lambda lets us configure environment variables where as Glue Python Shell does not.

The documentation around locating Oracle client libraries also mentions that the path where instant client is located can be passed in as a parameter to cx_Oracle using the [cx_Oracle.init_oracle_client()](https://cx-oracle.readthedocs.io/en/latest/api_manual/module.html#cx_Oracle.init_oracle_client) function. However, this works only on Windows and MacOS. This is not applicable to Linux.

The above stack overflow question also has an answer which shows a work around to update the rpath of a library using patchelf . I have tried and failed to take that path.

As an alternative to running a python script in AWS Glue Python Shell, the same application can be containerised and executed using the AWS Batch. The scheduling will then have to carried out with the help of AWS Cloudwatch event targets

Did you find this article valuable?

Support Pavan Andhukuri by becoming a sponsor. Any amount is appreciated!