Russell Spitzer's Blog: Russ Blog Home

Utilizing Multiple C* Clusters using the Spark Cassandra Connector

14 January 2016

Most folks don’t know that the Spark Cassandra Connector is actually able to connect to multiple Cassandra clusters at the same time. This allows us to move data between Cassandra clusters or even manage multiple clusters from the same application (or even the spark shell)

Read Post

Writing to the Driver FileSystem using Spark

15 December 2015

Spark Loves Distributed filesystems, but sometimes you just want to write to wherever the driver is running. You may try use a file:// or something of that nature and run into a lot of strange errors or files located in random places. Never fear there is a simple solution with toLocalIterator.

Read Post

Working with Cassandra UDTs and Spark Dataframes

09 December 2015

We just fixed a bug which was stopping DataFrames from being able to write into Cassandra UDTs. But I noticed there aren’t a lot of great documents around how this works. Here is just a quick example on how you can make a dataframe which can insert into a C* UDT.

Read Post

Folding with Spark

25 November 2015

I felt the need to write this post after I read the blog post which did a great job at explaining how fold and foldByKey worked. The only thing I thought was missing from this rundown was a bit of detail on how these operations work differently than their scala counterparts.

Read Post

Exploring Tombstone Behavior with CQL on Cassandra 2.0 and 2.1

23 January 2015

##Cassandra 2.1 17:06:16 ➜ ~/repos/RussellSpitzer.github.io/ExampleScripts git:(master) ✗ ./TombstoneExperiment.sh

Read Post

Loading a CassandraRDD into a HiveContext in Spark

10 January 2015

Spark is awesome and I love it. SparkSQL is also awesome but unfortunately is not fully mature. Although the folks at DataBrix have talked about how it will eventually become as full ANSI SQL langauge that time is honestly far off. This means that most folks will want to fall back onto HiveQL for doing their more complicated queries on Spark.

Read Post

It isn't fast enough

28 October 2014

It isn’t fast enough

I’m often confronted with people asking my why a certain technology or program isn’t fast enough. This is a good question since we should always be interested in making things fast. But usually I hear these questions in response to a perceived slowness which is hard to define or can only be explained in terms of other technologies.

Read Post

Setting up github.io.pages

20 August 2014

Setting up github.io.pages with Jekyll

I was a little tired of how difficult it was for me to manage my wordpress style blog and I love git. So after I saw a couple of my friends with their awesome github.io.pages blogs I knew I needed one as well.

Read Post

Russell Spitzer's Blog

Utilizing Multiple C* Clusters using the Spark Cassandra Connector

Writing to the Driver FileSystem using Spark

Working with Cassandra UDTs and Spark Dataframes

Folding with Spark

Exploring Tombstone Behavior with CQL on Cassandra 2.0 and 2.1

Loading a CassandraRDD into a HiveContext in Spark

It isn't fast enough

It isn’t fast enough

Setting up github.io.pages

Setting up github.io.pages with Jekyll

About

Recent Posts

Elsewhere