data visualization in python/django

Click here to load reader

Upload: kenluck2001

Post on 05-Dec-2014

7.426 views

Category:

Technology


4 download

DESCRIPTION

This is a slide talk presentation in Aalto University.

TRANSCRIPT

  • 1. Data Visualization in Python/Django By KENNETH EMEKA ODOH By KENNETH EMEKA ODOH
  • 2. Table of ContentsIntroductionMotivationMethodAppendicesConclusionReferences
  • 3. Introduction My background Requirements ( Python, Django, Matplotlib, ajax ) and other third-party libraries. What this talk is not about ( we are not trying to re-implement Google analytics ). Source codes are available at ( https://github.com/kenluck2001/PyCon2012 _Talk )."Everything should be made as simple as
  • 4. MOTIVATIONThere is a need to represent the business analytic data in a graphical form. This because a picture speaks more than a thousand words. Source: en.wikipedia.org
  • 5. Where do we finddata? Source: en.wikipedia.org
  • 6. Sources of Data CSV DATABASES
  • 7. Data Processing Identify the data source. Preprocessing of the data ( removing nulls, wide characters ) e.g. Google refine. Actual data processing. Present the clean data in descriptive format. i.e. Data visualization See Appendix 1
  • 8. Visual Representation of data Charts / Diagram format Texts format Tables Log filesSource: devk2.wordpress.com Source: elementsdatabase.com
  • 9. Categorization of dataReal-time See Appendix 2Batch-based See Appendix 2
  • 10. Rules of Data Collection Keep data in the easiest processable form e.g database, csv Keep data collected with timestamp. Gather data that are relevant to the business needs. Remove old data
  • 11. Where is the data visualization done? Server See Appendix from 2 - 6 Client Examples of Javascript library DS.js ( http://d3js.org/ ) gRaphael.js ( http://g.raphaeljs.com/ )
  • 12. Factors to Consider forChoice of Visualization Where do we perform the visualization processing? Is it Server or Client?It depends Security Scalability
  • 13. Tools needed for dataanalysis Csvkit ( http://csvkit.readthedocs.org/en/latest/ ) networkx ( http://networkx.lanl.gov/ ) pySAL ( http://code.google.com/p/pysal/ )
  • 14. AppendicesLet the codes begin Source: caseinsights.com
  • 15. Appendix 1## This describes a scatter plot of solar radiation against the month.This aim to describe the steps of data gathering.CSV file from data sciencehackathon website. The source code is available in a folder namedplotCodeimport csvfrom matplotlib.backends.backend_aggimport FigureCanvasAgg as FigureCanvasfrom matplotlib.figure import Figuredef prepareList(month_most_common_list): Prepare the input for process by removing all unnecessary values. Replace "NA"with 0 output_list = [] for x in month_most_common_list: if x != NA: output_list.append(x) else: output_list.append(0) return output_list
  • 16. Appendix 1def plotSolarRadiationAgainstMonth(filename): contd. trainRowReader = csv.reader(open(filename, rb), delimiter=,) month_most_common_list = [] Solar_radiation_64_list = [] for row in trainRowReader: month_most_common = row[3] Solar_radiation_64 = row[6] month_most_common_list.append(month_most_common) Solar_radiation_64_list.append(Solar_radiation_64) #convert all elements in the list to float while skipping the first element for the 1st element is adescription of the field. month_most_common_list = [float(i) for i in prepareList(month_most_common_list)[1:] ] Solar_radiation_64_list = [float(i) for i in prepareList(Solar_radiation_64_list)[1:] ] fig=Figure() ax=fig.add_subplot(111) title=Scatter Diagram of solar radiation against month of the year ax.set_xlabel(Most common month) ax.set_ylabel(Solar Radiation) fig.suptitle(title, fontsize=14) try: ax.scatter(month_most_common_list, Solar_radiation_64_list) #it is possible to make other kind of plots e.g bar charts, pie charts, histogram except ValueError: pass canvas = FigureCanvas(fig) canvas.print_figure(solarRadMonth.png,dpi=500) if __name__ == "__main__": plotSolarRadiationAgainstMonth(TrainingData.csv)
  • 17. Appendix 2From the project in folder named WebMonitorclass LoadEvent:def fillMonitorModel(self): for monObj in self.monitorObjList: mObj = Monitor(url = monObj[2], httpStatus =monObj[0], responseTime = monObj[1], contentStatus= monObj[5]) mObj.save()#also see the following examples in project namedYAAStasks.py This shows how the analytic tables areloaded with real-time data.
  • 18. Appendix 3from django.http import HttpResponsefrom matplotlib.backends.backend_aggimport FigureCanvasAgg as FigureCanvasfrom matplotlib.figureimport Figurefrom YAAS.stats.models import RegisteredUser, OnlineUser, StatBid #scatter diagram of number of bidsmade against number of online users# weekly report@staff_member_requireddef weeklyScatterOnlinUsrBid(request, week_no): page_title=Weekly Scatter Diagram based on Online user verses Bid weekno=week_no fig=Figure() ax=fig.add_subplot(111) year=stat.getYear() onlUserObj = OnlineUser.objects.filter(week=weekno).filter(year=year) bidObj = StatBid.objects.filter(week=weekno).filter(year=year) onlUserlist = list(onlUserObj.values_list(no_of_online_user, flat=True)) bidlist = list(bidObj.values_list(no_of_bids, flat=True)) title=Scatter Diagram of number of online User against number of bids (week {0}){1}.format(weekno,year) ax.set_xlabel(Number of online Users) ax.set_ylabel(Number of Bids) fig.suptitle(title, fontsize=14) try: ax.scatter(onlUserlist, bidlist) except ValueError: pass canvas = FigureCanvas(fig) response = HttpResponse(content_type=image/png) canvas.print_png(response) return responseMore info. can be found in YAAS/graph/The folder named"graph"
  • 19. Appendix 4# Example of how database may be deleted to recover some space.From folder named YAAS. Check task.py@periodic_task(run_every=crontab(hour=1, minute=30, day_of_week=0))def deleteOldItemsandBids(): hunderedandtwentydays = datetime.today() -datetime.timedelta(days=120) myItem = Item.objects.filter(end_date__lte=hunderedandtwentydays).delete() myBid = Bid.objects.filter(end_date__lte=hunderedandtwentydays).delete()#populate the registereduser and onlineuser model at regularintervals
  • 20. Appendix 5Check project inYAAS/stats/for more information onstatistical processing
  • 21. Appendix 6 # how to refresh the views in django. To keep the charts. updated. See WebMonitor project {% extends "base.html" %} {% block site_wrapper %}
    Updating tables ...
    {% endblock %}
  • 22. References Python documentation ( http://www.python.org/ ) Django documentation ( https://www.djangoproject.com/ ) Stack overflow ( http://stackoverflow.com/ ) Celery documentation (http://ask.github.com/celery/)Pictures email logo ( http:// ambrosedesigns.co.uk ) blog logo ( http:// sociolatte.com )
  • 23. Thanks for listening Follow me using any of @kenluck2001 [email protected] http://kenluck2001.tumblr.com / https://github.com/kenluck200 1