code to calculate requests each bucket

Following is the snippet, of the code I used to calculate requests each bucket:

region_bucks=boto.s3.connect_to_region('ap-southeast-1')
list_bucket_bucks=region_bucks.get_all_buckets()
for a_bucks in list_bucket_bucks:
bucket_name_bucks=str(a_bucks)[9:-1]

print 'For Bucket : %s ' %(bucket_name_bucks)

post_bucks=subprocess.Popen("grep -r REST.POST /home/mayank/Desktop/s3_costing/s3_logs/access.log | grep "+bucket_name_bucks+" | wc -l",stdout=subprocess.PIPE,shell=True)
post_bucks_f,e=post_bucks.communicate()
print 'Number of POST request on %s : %s' %(bucket_name_bucks, post_bucks_f)

put_bucks=subprocess.Popen("grep -r REST.PUT /home/mayank/Desktop/s3_costing/s3_logs/access.log | grep "+bucket_name_bucks+" | wc -l",stdout=subprocess.PIPE,shell=True)
put_bucks_f,e=put_bucks.communicate()
print 'Number of PUT request on %s : %s' %(bucket_name_bucks, put_bucks_f)

list_bucks=subprocess.Popen("grep -r REST.GET.BUCKET /home/mayank/Desktop/s3_costing/s3_logs/access.log | grep "+bucket_name_bucks+" | wc -l",stdout=subprocess.PIPE,shell=True)
list_bucks_f,e=list_bucks.communicate()
print 'Number of LIST request on %s : %s' %(bucket_name_bucks, list_bucks_f)

copy_bucks=subprocess.Popen("grep -r REST.COPY /home/mayank/Desktop/s3_costing/s3_logs/access.log | grep "+bucket_name_bucks+" | wc -l",stdout=subprocess.PIPE,shell=True)
copy_bucks_f,e=copy_bucks.communicate()
print 'Number of COPY request on %s : %s' %(bucket_name_bucks, copy_bucks_f)

get_bucks=subprocess.Popen("grep -r REST.GET /home/mayank/Desktop/s3_costing/s3_logs/access.log | grep "+bucket_name_bucks+" | wc -l",stdout=subprocess.PIPE,shell=True)
get_bucks_f,e=get_bucks.communicate()
get_only=float(get_bucks_f) - float(list_bucks_f)
print 'Number of GET request on %s : %s' %(bucket_name_bucks, get_only)

head_bucks=subprocess.Popen("grep -r REST.HEAD /home/mayank/Desktop/s3_costing/s3_logs/access.log | grep "+bucket_name_bucks+" | wc -l",stdout=subprocess.PIPE,shell=True)
head_bucks_f,e=head_bucks.communicate()
print 'Number of HEAD request on %s : %s' %(bucket_name_bucks, head_bucks_f)

5. Calculating storage on each bucket

When we calculate the size of the bucket it is not always equal to the size it shows in the AWS billing, that is because the size is calculated by the data that AWS has charged you for, so far. Following is the formulae to calculate the total size shown on the present date:
(Total Size in GB / Total number of days in the month) * (Day of the month)
i.e In my case the storage was calculated to be equal to 5 GB on the 24th of the month and in the AWS billing it showed total size equal to 3.950 GB, so according to the formulae,
(5/30) * 24 = 4 GB, this gives me an error rate of 0.0125%.

Following is the snippet of code i used to calculate, current storage size:

import boto
from boto.s3.connection import Location

def sizeof_fmt(num):
   for x in ['bytes','KB','MB','GB','TB']:
       if num < 1024.0:
           return "%3.1f %s" % (num, x)
       num /= 1024.0

region=boto.s3.connect_to_region('ap-southeast-1')
list_bucket=region.get_all_buckets()
array=[]
t_size=0
for a in list_bucket:
        bucket_name=str(a)[9:-1]
        s3 = boto.connect_s3()
        bucket = s3.lookup(bucket_name)
        total_bytes = 0
        print bucket_name
        for key in bucket:
                total_bytes += key.size
        t_size=t_size+total_bytes
        print sizeof_fmt(total_bytes)
total_size=sizeof_fmt(t_size)
TOTAL_SIZE=total_size[:-3]

import datetime
datetime=datetime.datetime.now()
day=datetime.day

print '\n'

size=(float(TOTAL_SIZE)/30) * float(day)

print 'Chargeable size as on current date : %s' %(size)

print '\n'

print 'Storage cost as on current date is : %s' %(float(size)*0.0295)

6. Calculate data transferred from s3

AWS charges for data transfer outside the region. It can be calculated using the logs, as the size of data transferred in each request is present in the log. Following is the snippet of a normal HEAD request, with data transfer size marked in it.HEAD

The above shown, request initiated a data transfer of 293 bytes. The data size is mentioned in the 15th column of the log. The total data size can be calculated using the following code:

datatransfer=subprocess.Popen("awk '{print $15}' /home/mayank/Desktop/s3_costing/s3_logs/access.log | awk '{ sum += $1} END {print sum/1024/1024/1024}'",stdout=subprocess.PIPE,shell=True)

datatransfer_f,e=datatransfer.communicate()

print 'Total data transfer is : %s GB' %float(datatransfer_f)

Hope this blog has given you a proper understanding, on how to analyze the S3 bucket logs to get detailed analyses of your buckets hosted in AWS S3.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s