top of page
Search


Testing a Hive Patch on a Local System
[...] I needed to get a Hive cluster running my code and a Confluent cluster that could output Avro messages in the proper format to test.

David McGinnis
Mar 17, 20206 min read
315 views
0 comments


Running Garbage Collection on Your Cluster
At a high level, [CGC] is merely going through the cluster, taking inventory of the data and processes that run on the cluster...

David McGinnis
Oct 22, 20194 min read
110 views
0 comments


Writing Environment Agnostic Code
[...] we'll discuss some of the ways we can write environment agnostic code, which can be run on any environment within your enterprise.

David McGinnis
Oct 16, 20196 min read
1,311 views
0 comments


YARN Capacity Scheduler and Node Labels Part 2
How do we ensure that GPU jobs run on worker nodes with GPUs without buying expensive GPUs for all of our worker nodes?

David McGinnis
Sep 29, 20194 min read
903 views
0 comments


YARN Capacity Scheduler and Node Labels Part 1
I'm going to explore exactly how YARN works with queues, and the various mechanisms available to control how YARN does this.

David McGinnis
Sep 22, 20195 min read
2,500 views
0 comments


Machine Learning Solutions: Recommender System Design
With the help of tools like Spark’s MLlib ... [making a recommendation engine] is something that many companies have done and you can too.

David McGinnis
Sep 1, 201910 min read
110 views
0 comments
bottom of page