Apache NiFi is an open source software to automate and manage the flow of data between different systems . It provides a web-based UI for...
Monday, September 21, 2020
Wednesday, September 2, 2020
What is an RDD and Why Spark needs it?
Resilient Distributed Data set( RDD) is the core of Apache Spark. It is the fundamental data structure on top of which all the spark comp...
Tuesday, September 1, 2020
Deployment modes and Job submission in Apache Spark
Spark is a Scheduling Monitoring and Distribution engine, it can also acts as a resource manager for its jobs. When Spark runs job by its...
Saturday, August 29, 2020
Capture bad records while loading csv in spark Dataframe
Loading a csv file and capturing all the bad records is a very common requirement in ETL projects. M ost of the relational database loaders ...
Subscribe to:
Posts (Atom)