From the course: Amazon Web Services: Data Services

Scenario: Use cloud file storage - Amazon Web Services (AWS) Tutorial

From the course: Amazon Web Services: Data Services

Start my 1-month free trial

Scenario: Use cloud file storage

Our first architectural scenario is file storage. And I put this one first on purpose. I most generally get called as a Big Data and Cloud architect to talk about moving relational workloads, data warehouses or working with behavioral greenfield Big Data projects that the enterprise thinks should be based on Hadoop to the Cloud. And I generally advise against moving any of these mission-critical scenarios as a first step to Cloud-based data storage. The reason is complexity. It's really important when your teams are getting used to working with Cloud-based solutions that you start with something simple so that you can learn the vernacular, the tools and the processes of the Cloud, and you can have an early success. And the simplest possible scenario is file storage. So I always start customers with file storage, and it's been a key to my success in helping customers to implement subsequent and continually complex data projects to the cloud. So let's look at this architecture. On the left you can see the gray represents source data. And it's on premise or could be located somewhere else but it's not in the Amazon Cloud. So tape storage in this case. So you've got Mobile Client, Server, Users, Clients, Tape Storage. And then within the Amazon Cloud you have a number of services, and the core services for file storage are S3 which is warm storage and Glacier. What I commonly see is that my customers are using S3 but maybe not aware of the S3 properties as we discussed in the movie about S3 in terms of the bucket properties in particular. And also, my customers are not using glacier really to the extent I think that they should be, because you may remember when we looked at the pricing of S3 versus Glacier, Glacier is exponentially cheaper because it's designed for archival storage. And as customers move more and more data up into the Cloud, although storage is really cheap it will actually become a cost factor. So Glacier uses the concept of a vault. And again I'm sharing the vernacular for you to use as well. Now in addition, I commonly will use either the storage gateway which is a service to connect your on-prem of file sources to the Amazon Cloud directly for ongoing transfer of information, and/or other tools, some of which are provided by Amazon like the import/export tool, and some of which I use commercial tools such as tools from companies like CloudBuried Lab which I showed in the partner highlight movie for file storage. I have had really good success with partner tools that are GUI based and look like Explorer, or a file management system from the OS that the end users are comfortable with. It really is important to consider tooling and processes for accessing the Cloud services. There's no problem in using the Console, and clicking through the Console when you're first starting now. Eventually you'll probably want to automate with tools or scripts, but I have many a customer that's been using the Cloud for one or more years that still works with the Console for some situations. And given the wealth of features that are shown through the S3 Console, that's a very common situation. Now in addition to using regular file storage, I'll also remind you that there is a cheaper version of S3 which is Reduced Redundancy. So when I worked with clients who have a huge amount of data, social gaming was an example, and storage costs were actually a concern. We partitioned the file data by usage, so we had the warm storage in S3, the Standard Redundancy, and then we had some that was Reduced Redundancy which was used less often. And then the archival data we moved using policies over to Glacier. So we actually followed the processes that I talked about in the movie about file storage, that's just one case. For other customers, they simply just use S3 and they're done with it. But it is the basis for bringing data into other data services in the Amazon Cloud as well. So it's a great way to get started, and it's an architecture that I use with nearly every customer who moves data to the Amazon Cloud.

Contents