top of page
Search


Testing a Hive Patch on a Local System
[...] I needed to get a Hive cluster running my code and a Confluent cluster that could output Avro messages in the proper format to test.

David McGinnis
Mar 17, 20206 min read
315 views
0 comments


A Crash Course in Proper Oozie Usage
[...] focus on best practices such as when and why you should use Oozie, and when to use bundles.

David McGinnis
Mar 10, 20204 min read
378 views
0 comments


Debugging From The Field: The Case of the Empty Files
A team at a client was using Spark to read and write to a Kafka topic. [...] files that would be written that were completely empty.

David McGinnis
Feb 25, 20207 min read
343 views
1 comment


Debugging From The Field: The Case of the Ignored Configuration Change
We made the change on a Sunday, but four days later, the number of files had not appreciably changed in the YARN logs directory.

David McGinnis
Dec 3, 20195 min read
163 views
0 comments


A Crash Course in YARN Log Aggregation
The system which maintains the application logs in HDFS is called the Log Aggregation system and is flexible [...]

David McGinnis
Nov 26, 20194 min read
7,855 views
1 comment


Debugging from the Field: Smartsense Activity Explorer Stops Working
This client didn't use the activity explorer much [...] I tried to run one of the paragraphs and immediately [failed].

David McGinnis
Nov 12, 20194 min read
302 views
0 comments


Running Garbage Collection on Your Cluster
At a high level, [CGC] is merely going through the cluster, taking inventory of the data and processes that run on the cluster...

David McGinnis
Oct 22, 20194 min read
110 views
0 comments


YARN Capacity Scheduler and Node Labels Part 3
We will discuss how partitions play with YARN queues. Finally, we will return to the example given in the first part of this series.

David McGinnis
Oct 6, 20195 min read
898 views
0 comments


YARN Capacity Scheduler and Node Labels Part 2
How do we ensure that GPU jobs run on worker nodes with GPUs without buying expensive GPUs for all of our worker nodes?

David McGinnis
Sep 29, 20194 min read
901 views
0 comments


YARN Capacity Scheduler and Node Labels Part 1
I'm going to explore exactly how YARN works with queues, and the various mechanisms available to control how YARN does this.

David McGinnis
Sep 22, 20195 min read
2,500 views
0 comments


Debugging From the Field: Sudden Kerberos Failure in HiveServer2 Instance
A client has a medium sized kerberized HDP 2.6.0 installation. Hive Interactive is disabled and Hive is set up to use HTTP transport mode.

David McGinnis
Sep 8, 20194 min read
592 views
0 comments
bottom of page