David McGinnisMar 17, 20206 minTesting a Hive Patch on a Local System[...] I needed to get a Hive cluster running my code and a Confluent cluster that could output Avro messages in the proper format to test.
David McGinnisOct 22, 20194 minRunning Garbage Collection on Your ClusterAt a high level, [CGC] is merely going through the cluster, taking inventory of the data and processes that run on the cluster...
David McGinnisOct 16, 20196 minWriting Environment Agnostic Code[...] we'll discuss some of the ways we can write environment agnostic code, which can be run on any environment within your enterprise.
David McGinnisSep 29, 20194 minYARN Capacity Scheduler and Node Labels Part 2How do we ensure that GPU jobs run on worker nodes with GPUs without buying expensive GPUs for all of our worker nodes?
David McGinnisSep 22, 20195 minYARN Capacity Scheduler and Node Labels Part 1I'm going to explore exactly how YARN works with queues, and the various mechanisms available to control how YARN does this.
David McGinnisSep 1, 201910 minMachine Learning Solutions: Recommender System DesignWith the help of tools like Spark’s MLlib ... [making a recommendation engine] is something that many companies have done and you can too.