Google, a few months ago, released a paper detailing the infrastructure surrounding their new database technology, Spanner. Spanner allows for databases that span multiple datacenters hubs.
Like most of the new tech we discussed over the semester, Spanner will be quickly adopted by others in the industry. Obviously only certain applications will make use of such a large-scale infrastructure; but for the likes of Facebook, Amazon, EBay, and others with globe spanning services this will provide a huge improvement. Wired spells out some of the potential benefits. Google uses the Spanner technology currently in the backend of its advertising services where the timespan of a few microseconds makes a difference.
Spanner uses a new Google API called TrueTime to prevent data conflicts across its servers. It utilizes timestamps created both through a GPS Antenna located atop the facility and a . Some of the paper is a little too dense to parse but the part about TrueTime isn’t too bad. Instead of logging time as a single possible value, TrueTime logs it as an interval of possible times factoring in the possibility of uncertainty. Instead of asking for data you must associate a time with your request and the system will look for a “snapshot” of the data that fulfills that time stamp.
It seems clear that the big data innovations will be coming from the big firms focused in that area. As knowledge production in this critical area becomes restricted to a small share of those working in computational science we should hope that companies will continue to be as forthcoming as Google has been.