top of page


Search


Testing a Hive Patch on a Local System
[...] I needed to get a Hive cluster running my code and a Confluent cluster that could output Avro messages in the proper format to test.

David McGinnis
Mar 17, 20206 min read
Â
Â
Â


A Crash Course in Proper Oozie Usage
[...] focus on best practices such as when and why you should use Oozie, and when to use bundles.

David McGinnis
Mar 10, 20204 min read
Â
Â
Â


Debugging From The Field: The Case of the Empty Files
A team at a client was using Spark to read and write to a Kafka topic. [...] files that would be written that were completely empty.

David McGinnis
Feb 25, 20207 min read
Â
Â
Â


Debugging From The Field: The Case of the Ignored Configuration Change
We made the change on a Sunday, but four days later, the number of files had not appreciably changed in the YARN logs directory.

David McGinnis
Dec 3, 20195 min read
Â
Â
Â


A Crash Course in YARN Log Aggregation
The system which maintains the application logs in HDFS is called the Log Aggregation system and is flexible [...]

David McGinnis
Nov 26, 20194 min read
Â
Â
Â


Debugging from the Field: Smartsense Activity Explorer Stops Working
This client didn't use the activity explorer much [...] I tried to run one of the paragraphs and immediately [failed].

David McGinnis
Nov 12, 20194 min read
Â
Â
Â


Running Garbage Collection on Your Cluster
At a high level, [CGC] is merely going through the cluster, taking inventory of the data and processes that run on the cluster...

David McGinnis
Oct 22, 20194 min read
Â
Â
Â


YARN Capacity Scheduler and Node Labels Part 3
We will discuss how partitions play with YARN queues. Finally, we will return to the example given in the first part of this series.

David McGinnis
Oct 6, 20195 min read
Â
Â
Â


YARN Capacity Scheduler and Node Labels Part 2
How do we ensure that GPU jobs run on worker nodes with GPUs without buying expensive GPUs for all of our worker nodes?

David McGinnis
Sep 29, 20194 min read
Â
Â
Â


YARN Capacity Scheduler and Node Labels Part 1
I'm going to explore exactly how YARN works with queues, and the various mechanisms available to control how YARN does this.

David McGinnis
Sep 22, 20195 min read
Â
Â
Â


Debugging From the Field: Sudden Kerberos Failure in HiveServer2 Instance
A client has a medium sized kerberized HDP 2.6.0 installation. Hive Interactive is disabled and Hive is set up to use HTTP transport mode.

David McGinnis
Sep 8, 20194 min read
Â
Â
Â
bottom of page