Connect to different RDBMS from Spark

In this post, we will see how to connect to 3 very popular RDBMS using Spark. We will create connection and will fetch some records via spark. The dataframe will hold data and we can use it as per requirement. We will talk about JAR files required for connection and JDBC connection string to fetch data and load dataframe.

Connect to Netezza from Spark

RDBMS: Netezza
Jar Required: nzjdbc.jar

Step 1: Open Spark shell and add jar 
spark-shell --jars /tmp/nz/nzjdbc.jar

Step 2: Pass required parameters and create a dataframe with data from Netezza
val df_nz = sqlContext.read.format(“jdbc”).options(Map(“url” -> “jdbc:netezza://192.168.200.201:5480/DBNAME”, “user” -> “admin”, “password” -> “password”, “dbtable” -> “nz_usa_prez”, “driver” -> “org.netezza.Driver”)).load()

Step 3: Check data from Dataframe
df_nz.show(50)

scala> df_nz.show(50)
+——————–+———-+——————-+——————–+———-+———-+
| PRES_NAME| PRES_DOB| PRES_BP| PRES_BS| PRES_IN| PRES_OUT|
+——————–+———-+——————-+——————–+———-+———-+
| Andrew Jackson|1767-03-15| Waxhaws Region|South/North Carolina|1829-03-04|1837-03-04|
| Zachary Taylor|1784-11-24| Barboursville| Virginia|1849-03-04|1850-07-09|
| Abraham Lincoln|1809-02-12| Sinking spring| Kentucky|1861-03-04|1865-04-15|
| Ulysses S. Grant|1822-04-27| Point Pleasant| Ohio|1869-03-04|1877-03-04|
| Rutherford B. Hayes|1822-10-04| Delaware| Ohio|1877-03-04|1881-03-04|
| Theodore Roosevelt|1858-10-27| Manhattan| New York|1901-09-14|1909-03-04|
| John F. Kennedy|1917-05-29| Brookline| Massachusetts|1961-01-20|1963-11-22|
| Richard M. Nixon|1913-01-09| Yorba Linda| California|1969-01-20|1974-08-09|
| Ronald Reagan|1911-02-06| Tampico| Illinois|1981-01-20|1989-01-20|
| Thomas Jefferson|1743-04-13| Shadwell| Virginia|1801-03-04|1809-03-04|
| James Madison|1751-03-16| Port Conway| Virginia|1809-03-04|1817-03-04|
| James Monroe|1758-04-28| Monroe Hall| Virginia|1817-03-04|1825-03-04|
| John Quincy Adams|1767-07-11| Braintree| Massachusetts|1825-03-04|1829-03-04|
| Martin Van Buren|1782-12-05| Kinderhook| New York|1837-03-04|1841-03-04|
| Millard Fillmore|1800-01-07| Summerhill| New York|1850-07-09|1853-03-04|
| James Buchanan|1791-04-23| Cove Gap| Pennsylvania|1857-03-04|1861-03-04|
| Grover Cleveland|1837-03-18| Caldwell| New Jersey|1885-03-04|1889-03-04|
| Benjamin Harrison|1833-08-20| North Bend| Ohio|1889-03-04|1893-03-04|
| Grover Cleveland|1837-03-18| Caldwell| New Jersey|1893-03-04|1897-03-04|
| William Howard Taft|1857-09-15| Cincinnati| Ohio|1909-03-04|1913-03-04|
| Calvin Coolidge|1872-07-04| Plymouth| Vermont|1923-08-02|1929-03-04|
| Harry S. Truman|1884-05-08| Lamar| Missouri|1945-04-12|1953-01-20|
| George H. W. Bush|1924-06-12| Milton| Massachusetts|1989-01-20|1993-01-20|
| George W. Bush|1946-07-06| New Haven| Connecticut|2001-01-20|2009-01-20|
| James A. Garfield|1831-11-19| Moreland Hills| Ohio|1881-03-04|1881-09-19|
| William McKinley|1843-01-29| Niles| Ohio|1897-03-04|1901-09-14|
| Woodrow Wilson|1856-12-28| Staunton| Virginia|1913-03-04|1921-03-04|
| Warren G. Harding|1865-11-02| Blooming Grove| Ohio|1921-03-04|1923-08-02|
| Herbert Hoover|1874-08-10| West Branch| Iowa|1929-03-04|1933-03-04|
|Dwight D. Eisenhower|1890-10-14| Denison| Texas|1953-01-20|1961-01-20|
| Gerald R. Ford|1913-07-14| Omaha| Nebraska|1974-08-09|1977-01-20|
| Jimmy Carter|1924-10-01| Plains| Georgia|1977-01-20|1981-01-20|
| Donald Trump|1946-06-14| Queens| New York|2017-01-20| null|
| George Washington|1732-02-22|Westmoreland County| Virginia|1789-04-30|1797-03-04|
| John Adams|1735-10-30| Braintree| Massachusetts|1797-03-04|1801-03-04|
|William Henry Har…|1773-02-09|Charles City County| Virginia|1841-03-04|1841-04-04|
| John Tyler|1790-03-29|Charles City County| Virginia|1841-04-04|1845-03-04|
| James K. Polk|1795-11-02| Pineville| North Carolina|1845-03-04|1849-03-04|
| Franklin Pierce|1804-11-23| Hillsborough| New Hampshire|1853-03-04|1857-03-04|
| Andrew Johnson|1808-12-29| Raleigh| North Carolina|1865-04-15|1869-03-04|
| Chester A. Arthur|1829-10-05| Fairfield| Vermont|1881-09-19|1885-03-04|
|Franklin D. Roose…|1882-01-30| Hyde Park| New York|1933-03-04|1945-04-12|
| Lyndon B. Johnson|1908-08-27| Stonewall| Texas|1963-11-22|1969-01-20|
| Bill Clinton|1946-08-19| Hope| Arkansas|1993-01-20|2001-01-20|
| Barack Obama|1961-08-04| Honolulu| Hawaii|2009-01-20|2017-01-20|
+——————–+———-+——————-+——————–+———-+———-+

Connect to Teradata from Spark

RDBMS: Teradata
Jar Required: terajdbc4.jar,tdgssconfig.jar,teradata-connector-1.4.4.jar

Step 1: Open Spark shell and add jar 
spark-shell --jars /tmp/td/terajdbc4.jar,/tmp/td/tdgssconfig.jar,/tmp/td/teradata-connector-1.4.4.jar

Step 2: Pass required parameters and create a dataframe with data from Teradata
val df_td = sqlContext.load(“jdbc”, Map(“url” -> “jdbc:teradata://192.168.200.202/DBS_PORT=1025, TMODE=TERA, user=admin, password=password”,”dbtable” -> “dbname.td_usa_prez”,”driver” -> “com.teradata.jdbc.TeraDriver”));

Step 3: Check data from Dataframe
df_td.show(50)
+——————–+———-+——————-+——————–+———-+———-+
| pres_name| pres_dob| pres_bp| pres_bs| pres_in| pres_out|
+——————–+———-+——————-+——————–+———-+———-+
| Harry S. Truman|1884-05-08| Lamar| Missouri|1945-04-12|1953-01-20|
| Grover Cleveland|1837-03-18| Caldwell| New Jersey|1885-03-04|1889-03-04|
| Abraham Lincoln|1809-02-12| Sinking spring| Kentucky|1861-03-04|1865-04-15|
| Grover Cleveland|1837-03-18| Caldwell| New Jersey|1893-03-04|1897-03-04|
| Calvin Coolidge|1872-07-04| Plymouth| Vermont|1923-08-02|1929-03-04|
| Andrew Jackson|1767-03-15| Waxhaws Region|South/North Carolina|1829-03-04|1837-03-04|
| George W. Bush|1946-07-06| New Haven| Connecticut|2001-01-20|2009-01-20|
|Franklin D. Roose…|1882-01-30| Hyde Park| New York|1933-03-04|1945-04-12|
| Martin Van Buren|1782-12-05| Kinderhook| New York|1837-03-04|1841-03-04|
| Zachary Taylor|1784-11-24| Barboursville| Virginia|1849-03-04|1850-07-09|
| James A. Garfield|1831-11-19| Moreland Hills| Ohio|1881-03-04|1881-09-19|
| Jimmy Carter|1924-10-01| Plains| Georgia|1977-01-20|1981-01-20|
| Theodore Roosevelt|1858-10-27| Manhattan| New York|1901-09-14|1909-03-04|
| James Buchanan|1791-04-23| Cove Gap| Pennsylvania|1857-03-04|1861-03-04|
| John Tyler|1790-03-29|Charles City County| Virginia|1841-04-04|1845-03-04|
| William McKinley|1843-01-29| Niles| Ohio|1897-03-04|1901-09-14|
| Richard M. Nixon|1913-01-09| Yorba Linda| California|1969-01-20|1974-08-09|
| Thomas Jefferson|1743-04-13| Shadwell| Virginia|1801-03-04|1809-03-04|
| Lyndon B. Johnson|1908-08-27| Stonewall| Texas|1963-11-22|1969-01-20|
| George Washington|1732-02-22|Westmoreland County| Virginia|1789-04-30|1797-03-04|
| Herbert Hoover|1874-08-10| West Branch| Iowa|1929-03-04|1933-03-04|
| James Madison|1751-03-16| Port Conway| Virginia|1809-03-04|1817-03-04|
|William Henry Har…|1773-02-09|Charles City County| Virginia|1841-03-04|1841-04-04|
| Warren G. Harding|1865-11-02| Blooming Grove| Ohio|1921-03-04|1923-08-02|
| George H. W. Bush|1924-06-12| Milton| Massachusetts|1989-01-20|1993-01-20|
| Chester A. Arthur|1829-10-05| Fairfield| Vermont|1881-09-19|1885-03-04|
| Benjamin Harrison|1833-08-20| North Bend| Ohio|1889-03-04|1893-03-04|
| John Adams|1735-10-30| Braintree| Massachusetts|1797-03-04|1801-03-04|
| Woodrow Wilson|1856-12-28| Staunton| Virginia|1913-03-04|1921-03-04|
| Donald Trump|1946-06-14| Queens| New York|2017-01-20| null|
| Bill Clinton|1946-08-19| Hope| Arkansas|1993-01-20|2001-01-20|
| James Monroe|1758-04-28| Monroe Hall| Virginia|1817-03-04|1825-03-04|
| William Howard Taft|1857-09-15| Cincinnati| Ohio|1909-03-04|1913-03-04|
|Dwight D. Eisenhower|1890-10-14| Denison| Texas|1953-01-20|1961-01-20|
| John F. Kennedy|1917-05-29| Brookline| Massachusetts|1961-01-20|1963-11-22|
| James K. Polk|1795-11-02| Pineville| North Carolina|1845-03-04|1849-03-04|
| Rutherford B. Hayes|1822-10-04| Delaware| Ohio|1877-03-04|1881-03-04|
| Ronald Reagan|1911-02-06| Tampico| Illinois|1981-01-20|1989-01-20|
| Gerald R. Ford|1913-07-14| Omaha| Nebraska|1974-08-09|1977-01-20|
| John Quincy Adams|1767-07-11| Braintree| Massachusetts|1825-03-04|1829-03-04|
| Franklin Pierce|1804-11-23| Hillsborough| New Hampshire|1853-03-04|1857-03-04|
| Ulysses S. Grant|1822-04-27| Point Pleasant| Ohio|1869-03-04|1877-03-04|
| Millard Fillmore|1800-01-07| Summerhill| New York|1850-07-09|1853-03-04|
| Andrew Johnson|1808-12-29| Raleigh| North Carolina|1865-04-15|1869-03-04|
| Barack Obama|1961-08-04| Honolulu| Hawaii|2009-01-20|2017-01-20|
+——————–+———-+——————-+——————–+———-+———-+

Connect to Oracle from Spark

RDBMS: Oracle
Jar Required: ojdbc7.jar

Step 1: Open Spark shell and add jar 
spark-shell --jars /tmp/ora/ojdbc7.jar

Step 2: Pass required parameters and create a dataframe with data from Oracle
scala> val df_ora = sqlContext.load(“jdbc”, Map(“url” -> “jdbc:oracle:thin:username/password@192.168.200.203:1521/SID”,”dbtable” -> “ora_usa_prez”,”driver” -> “oracle.jdbc.driver.OracleDriver”));

Step 3: Check data from Dataframe
df_ora.show(50)
+——————–+——————–+——————-+——————–+——————–+——————–+
| PRES_NAME| PRES_DOB| PRES_BP| PRES_BS| PRES_IN| PRES_OUT|
+——————–+——————–+——————-+——————–+——————–+——————–+
| George Washington|1732-02-22 00:00:…|Westmoreland County| Virginia|1789-04-30 00:00:…|1797-03-04 00:00:…|
| John Adams|1735-10-30 00:00:…| Braintree| Massachusetts|1797-03-04 00:00:…|1801-03-04 00:00:…|
| Thomas Jefferson|1743-04-13 00:00:…| Shadwell| Virginia|1801-03-04 00:00:…|1809-03-04 00:00:…|
| James Madison|1751-03-16 00:00:…| Port Conway| Virginia|1809-03-04 00:00:…|1817-03-04 00:00:…|
| James Monroe|1758-04-28 00:00:…| Monroe Hall| Virginia|1817-03-04 00:00:…|1825-03-04 00:00:…|
| John Quincy Adams|1767-07-11 00:00:…| Braintree| Massachusetts|1825-03-04 00:00:…|1829-03-04 00:00:…|
| Andrew Jackson|1767-03-15 00:00:…| Waxhaws Region|South/North Carolina|1829-03-04 00:00:…|1837-03-04 00:00:…|
| Martin Van Buren|1782-12-05 00:00:…| Kinderhook| New York|1837-03-04 00:00:…|1841-03-04 00:00:…|
|William Henry Har…|1773-02-09 00:00:…|Charles City County| Virginia|1841-03-04 00:00:…|1841-04-04 00:00:…|
| John Tyler|1790-03-29 00:00:…|Charles City County| Virginia|1841-04-04 00:00:…|1845-03-04 00:00:…|
| James K. Polk|1795-11-02 00:00:…| Pineville| North Carolina|1845-03-04 00:00:…|1849-03-04 00:00:…|
| Zachary Taylor|1784-11-24 00:00:…| Barboursville| Virginia|1849-03-04 00:00:…|1850-07-09 00:00:…|
| Millard Fillmore|1800-01-07 00:00:…| Summerhill| New York|1850-07-09 00:00:…|1853-03-04 00:00:…|
| Franklin Pierce|1804-11-23 00:00:…| Hillsborough| New Hampshire|1853-03-04 00:00:…|1857-03-04 00:00:…|
| James Buchanan|1791-04-23 00:00:…| Cove Gap| Pennsylvania|1857-03-04 00:00:…|1861-03-04 00:00:…|
| Abraham Lincoln|1809-02-12 00:00:…| Sinking spring| Kentucky|1861-03-04 00:00:…|1865-04-15 00:00:…|
| Andrew Johnson|1808-12-29 00:00:…| Raleigh| North Carolina|1865-04-15 00:00:…|1869-03-04 00:00:…|
| Ulysses S. Grant|1822-04-27 00:00:…| Point Pleasant| Ohio|1869-03-04 00:00:…|1877-03-04 00:00:…|
| Rutherford B. Hayes|1822-10-04 00:00:…| Delaware| Ohio|1877-03-04 00:00:…|1881-03-04 00:00:…|
| James A. Garfield|1831-11-19 00:00:…| Moreland Hills| Ohio|1881-03-04 00:00:…|1881-09-19 00:00:…|
| Chester A. Arthur|1829-10-05 00:00:…| Fairfield| Vermont|1881-09-19 00:00:…|1885-03-04 00:00:…|
| Grover Cleveland|1837-03-18 00:00:…| Caldwell| New Jersey|1885-03-04 00:00:…|1889-03-04 00:00:…|
| Benjamin Harrison|1833-08-20 00:00:…| North Bend| Ohio|1889-03-04 00:00:…|1893-03-04 00:00:…|
| Grover Cleveland|1837-03-18 00:00:…| Caldwell| New Jersey|1893-03-04 00:00:…|1897-03-04 00:00:…|
| William McKinley|1843-01-29 00:00:…| Niles| Ohio|1897-03-04 00:00:…|1901-09-14 00:00:…|
| Theodore Roosevelt|1858-10-27 00:00:…| Manhattan| New York|1901-09-14 00:00:…|1909-03-04 00:00:…|
| William Howard Taft|1857-09-15 00:00:…| Cincinnati| Ohio|1909-03-04 00:00:…|1913-03-04 00:00:…|
| Woodrow Wilson|1856-12-28 00:00:…| Staunton| Virginia|1913-03-04 00:00:…|1921-03-04 00:00:…|
| Warren G. Harding|1865-11-02 00:00:…| Blooming Grove| Ohio|1921-03-04 00:00:…|1923-08-02 00:00:…|
| Calvin Coolidge|1872-07-04 00:00:…| Plymouth| Vermont|1923-08-02 00:00:…|1929-03-04 00:00:…|
| Herbert Hoover|1874-08-10 00:00:…| West Branch| Iowa|1929-03-04 00:00:…|1933-03-04 00:00:…|
|Franklin D. Roose…|1882-01-30 00:00:…| Hyde Park| New York|1933-03-04 00:00:…|1945-04-12 00:00:…|
| Harry S. Truman|1884-05-08 00:00:…| Lamar| Missouri|1945-04-12 00:00:…|1953-01-20 00:00:…|
|Dwight D. Eisenhower|1890-10-14 00:00:…| Denison| Texas|1953-01-20 00:00:…|1961-01-20 00:00:…|
| John F. Kennedy|1917-05-29 00:00:…| Brookline| Massachusetts|1961-01-20 00:00:…|1963-11-22 00:00:…|
| Lyndon B. Johnson|1908-08-27 00:00:…| Stonewall| Texas|1963-11-22 00:00:…|1969-01-20 00:00:…|
| Richard M. Nixon|1913-01-09 00:00:…| Yorba Linda| California|1969-01-20 00:00:…|1974-08-09 00:00:…|
| Gerald R. Ford|1913-07-14 00:00:…| Omaha| Nebraska|1974-08-09 00:00:…|1977-01-20 00:00:…|
| Jimmy Carter|1924-10-01 00:00:…| Plains| Georgia|1977-01-20 00:00:…|1981-01-20 00:00:…|
| Ronald Reagan|1911-02-06 00:00:…| Tampico| Illinois|1981-01-20 00:00:…|1989-01-20 00:00:…|
| George H. W. Bush|1924-06-12 00:00:…| Milton| Massachusetts|1989-01-20 00:00:…|1993-01-20 00:00:…|
| Bill Clinton|1946-08-19 00:00:…| Hope| Arkansas|1993-01-20 00:00:…|2001-01-20 00:00:…|
| George W. Bush|1946-07-06 00:00:…| New Haven| Connecticut|2001-01-20 00:00:…|2009-01-20 00:00:…|
| Barack Obama|1961-08-04 00:00:…| Honolulu| Hawaii|2009-01-20 00:00:…|2017-01-20 00:00:…|
| Donald Trump|1946-06-14 00:00:…| Queens| New York|2017-01-20 00:00:…| null|
+——————–+——————–+——————-+——————–+——————–+——————–+

One thing to observe here is that Dataframe from Oracle table has converted DATE datatype to Timestamp.

Now you know how to read data from different RDBMS using spark. This is very useful when you quickly need some data from EDW to compare it with HIVE table.

Leave a Reply

Your email address will not be published. Required fields are marked *