overview of the data processing error analysis system (dpeas)

33
DOD Center for Geosciences / Atmospheric Research Colorado State University Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S. Jones Colorado State University (CSU) Cooperative Institute for Research in the Atmosphere (CIRA) DOD Center for Geosciences / Atmospheric Research (CG/AR) Fort Collins, CO

Upload: the-hdf-eos-tools-and-information-center

Post on 14-Dec-2014

109 views

Category:

Technology


4 download

DESCRIPTION

Slides 23 and 24 mentions experience with HDF-EOS. Source: http://hdfeos.org/workshops/ws04/presentations/Jones/000901%20DPEAS%20Overview%20-%20HDFEOS%20Workshop.ppt

TRANSCRIPT

  • 1. Overview of the Data Processing and Error Analysis System (DPEAS) Andrew S. Jones Colorado State University (CSU) Cooperative Institute for Research in the Atmosphere (CIRA) DOD Center for Geosciences / Atmospheric Research (CG/AR) Fort Collins, CODOD Center for Geosciences / Atmospheric ResearchColorado State University

2. What is it? Data processing system for large data analysis tasks using common PCs Features: 2nd generation system (replaces an earlier system called PORTAL (Jones et al., 1995)) Parallel implementation Web-based documentation and monitoring Incorporates a Fortran-interpreter for input tasks Virtualized I/O subsystem (only memory-resident datastructures are needed, data algorithms now function like a model) Able to failover to redundant hardware Extensible User ModuleError Analysis code is still under development Implemented on Windows NT/2000 OSDOD Center for Geosciences / Atmospheric ResearchColorado State University 3. What Does it Do? Global merge capabilities for numerous data sets Current system in operational use for 2+ years at CIRA Simplifies Current average operational throughput rates using 15 processors on 8 PCs is 17 TB/yr (47 GB/day). Measured max. throughput rate is: 2.5 PB/yr (7.1 TB/day) Powerful abstraction layers allow anyone to write parallel code Virtual I/O subsystem reduces end-user code complexities Users interact using a language most already knowEasily Scales Limited process cross-talk improves scaling behavior Tests have shown that a 2000 machine cluster is physically feasible. Basically just add hardware.DOD Center for Geosciences / Atmospheric ResearchColorado State University 4. 10 Data Types Are Currently Supported Readsand Writes HDF-EOS natively GOES IMAGER (McIDAS) NOAA AVHRR GAC and LAC (McIDAS) NOAA AMSU-A and B (HDF-EOS) DMSP SSM/I (Byte Stream) DMSP SSM/T-2 (NGDC OIS) DMSP OLS (NGDC OIS) TRMM TMI and VIRS (HDF) User extensible (your format here) DOD Center for Geosciences / Atmospheric ResearchColorado State University 5. The HardwareSTORAGE VIEWLegend Primary Backup Wn WorkerMirrored Set PrimaryBackupW1 66 GB240 GB 240 GB PROCESSOR VIEWW2 240 GBClusterSummary - All Ingest Processes - Most Higher Level Remapped Products PrimaryBackupW1W2W3OPERATIONAL CLUSTER (24/7)9 Processors 3.0 GFlops 2.25 GB RAMClusterSummary - Large Global Sectors W4W5W6EXPERIMENTAL CLUSTER (nights only/7) DOD Center for Geosciences / Atmospheric Research6 Processors 2.5 GFlops 2.5 GB RAM Colorado State University 6. Failover ModeSTORAGE VIEWXLegend Primary Backup Wn WorkerMirrored SetPrimaryBackupW1 66 GB240 GB 240 GB PROCESSOR VIEWW2 240 GBFailover Steps:X PrimaryAutomated 1. Synchronize states 2. Promote the Backup BackupW1W2W3OPERATIONAL CLUSTER (24/7)W4W5Restore Steps: Manually initiated 1. Demote the Backup 2. Restore Mirror Set 3. Synchronize states 4. Reactivate PrimaryW6EXPERIMENTAL CLUSTER (nights only/7) DOD Center for Geosciences / Atmospheric ResearchColorado State University 7. Module Context GUIsBatch Job ClientExplorerCommand LineWeb BrowserCommand Line Script Command Shell Interpreter DPEAS Input ScriptOther ApplicationsDPEAS Data Processing Engine Spawn SubtaskDPEAS SubtaskDPEAS Fortran InterpreterBatch Job ServiceAnalysis ModulesDPEAS System StateUser ModulesDPEAS HDF-EOS Virtual I/O Subsystem Translation ModulesOutput ModulesThis is DPEAS Internet Information ServicesOperating System (Windows 2000)DOD Center for Geosciences / Atmospheric ResearchColorado State University 8. An example of a DPEAS input script fileDOD Center for Geosciences / Atmospheric ResearchColorado State University 9. How DPEAS Starts Program Start DPEAS Initialization Interpreting DPEAS script declarations Interpreting DPEAS script executable statementsDOD Center for Geosciences / Atmospheric ResearchColorado State University 10. How DPEAS Ends Interpreting DPEAS script executable statementsDPEAS SummaryProgram EndDOD Center for Geosciences / Atmospheric ResearchColorado State University 11. How Are Spawned Input Scripts and Jobs Created? All spawned DPEAS jobs run machine-generated DPEAS input scripts which are generated by the data processing engine from the Master DPEAS input script (The examples shown previously were examples of DPEAS machine-generated code) This is automated within DPEAS and the user code goes along for the free ride since it is part of the DPEAS executable (its like meeting a friendly virus which helps to spread your code along with it)DOD Center for Geosciences / Atmospheric ResearchColorado State University 12. What Does DPEAS Parallelism Look Like? Do loop contents are sent to other resources in parallel The new jobs run the same DPEAS.exe, but execute only the subtask operations Completed Jobs allow additional jobs to startDOD Center for Geosciences / Atmospheric ResearchColorado State University 13. The 3 Programming Steps to Add a User Routine to DPEAS 1.Insert a program hook The program hook makes the main DPEAS program aware of the existence of your wrapper routine.2.Create a wrapper routine The wrapper routine tells the DPEAS fortran interpreter how to parse and interact with your application subroutine arguments.3.Create an application routine The application routine performs the real work. You can do anything you want within the application routine.DOD Center for Geosciences / Atmospheric ResearchColorado State University 14. How does the User_Module.f90 relate to my DPEAS Input Scripts? Compile User_Module.f90 Program Hook Wrapper Routine Application RoutineOrdinary Fortran CompilerInterpretAutomated ParallelizationDPEAS Input ScriptUsing Self-Replication"DPEAS.exe"DPEAS Input Script SubtaskInterprets DPEAS Input Script"DPEAS.exe" Interprets DPEAS Input Script Return to MasterEnd DOD Center for Geosciences / Atmospheric ResearchColorado State University 15. User Example: The users application routine Using the virtual I/O data via pointers 1. Find each MW channel 2. Allocate a new output array data structure Your science code looks like thisDOD Center for Geosciences / Atmospheric ResearchColorado State University 16. User Example: The results: Complete integrationThe new user routine is now fully integrated into DPEASDOD Center for Geosciences / Atmospheric ResearchColorado State University 17. User Example: The output HDF-EOS fileDOD Center for Geosciences / Atmospheric ResearchColorado State University 18. User Example: The output image representation150 GHz Effective Emissivity Calculated from: GOES-08 IMAGER NOAA-15 AMSU-BDOD Center for Geosciences / Atmospheric ResearchColorado State University 19. User Example: Summary Creates2 new routines:Wrapper routine Application routine Requires25 lines of executable code:2 Program hook Small overhead for gaining massive parallelism capabilities! 4 Wrapper routine 19 Application routine 2 Variable assignments 3 Science algorithm 14 Virtual I/O library calls (using only 2 Virtual I/O library routines)DOD Center for Geosciences / Atmospheric ResearchColorado State University 20. User Example: How complex would the user routine be, if written without the Virtual I/O library? Creates 2 new routines: Wrapper routine Application routineRequires 59 lines of executable code: 2 Program hook 4 Wrapper routine 53 Application routine 2 Variable assignments 3 Science algorithm 48 HDF-EOS library calls (using 26 HDF-EOS library routines)DOD Center for Geosciences / Atmospheric ResearchAnswer: Without the DPEAS Virtual I/O library there would be: 24 additional I/O routines called by the user (+1200%) 34 additional lines of user code (+236%) Colorado State University 21. User Example: Conclusions Implementation Insights Virtual I/O Insights Minimal amount of end-user code is required The effort and resources involved are small (The DPEAS program recompiled in < 30 s on the users desktop) The DPEAS virtual I/O access method is less complex than traditional HDF-EOS file access methodsEnd users perspective End users are protected from technical data format issues End users can develop higher quality code by leveraging shared robust common modules Scalability is greatly enhanced with little end user effortDOD Center for Geosciences / Atmospheric ResearchColorado State University 22. Summary DPEAS can process large data sets in an efficient manner while maintaining centralized management controls and error handling behaviors Parallelism of the code is automatic and runs on cheap hardware Failover capabilities make the system more robust User code is shielded from complexities of the system using software abstraction layers Little training is needed since user interfaces are in a known scientific language User modules directly access data from memory obsolesces traditional file access methods but maintains needed file compatibilityDOD Center for Geosciences / Atmospheric ResearchColorado State University 23. What did I learn about HDF-EOS in the process? HDF-EOS is an excellent universal data format It works for all satellite sensors types I have encountered to date (10+) HDF-EOS requires serious software design before the implementation stage It is my experience that Time information as a geo/time field for sectorizing is overrated and is likely to cause future software design headaches with the more complex sensors if encouraged to be the normDOD Center for Geosciences / Atmospheric ResearchColorado State University 24. My 2 cents: How HDF-EOS could be made even better (Hopefully someone has already thought of these things, and this short list will be a reaffirmation) Given that GOES data, for example, and other multi-detector sensors can have multiple times for each channel for the same geolocation position, and that in addition, they can and do interrupt their sensor scans at any time Treat Time as a data attribute Currently I associate Time and other associated arrays with its principle data array by nomenclature It would be better to use data array attribute groups. Then Time, Calibration, and other associated arrays could be grouped with the data array through the data format.DOD Center for Geosciences / Atmospheric ResearchColorado State University 25. Why Data Attributes? Many data channels have associated information For example, it might be very meaningful to associate the min. and max. of a grid location with its mean valueIt would be better if there was a standard way of showing that group association, so we dont have to understand each others unique nomenclatures, intent, or have to resort to the use of unusual mixed HDF/HDF-EOS data files Data attributes should not be arbitrarily limited in scope, but have full data type ranges Units could also be incorporated through data attributesDOD Center for Geosciences / Atmospheric ResearchColorado State University 26. The End [email protected] Center for Geosciences / Atmospheric ResearchColorado State University 27. Appendix The following series of slides show how a user can easily modify DPEAS 1.The users program hook2. wrapper routine3. application routine (using the virtual I/O data via pointers)4. 5.Usage of the new user routine in a DPEAS input script file The Results: Complete IntegrationDOD Center for Geosciences / Atmospheric ResearchColorado State University 28. User Example: The users program hook2 lines of codeDOD Center for Geosciences / Atmospheric ResearchColorado State University 29. User Example: The users wrapper routine4 lines of executable codeDOD Center for Geosciences / Atmospheric ResearchColorado State University 30. User Example: The users application routine Using the virtual I/O data via pointers 1. Find each MW channel 2. Allocate a new output array data structure Your science code looks like thisDOD Center for Geosciences / Atmospheric ResearchColorado State University 31. User Example: Usage of the new user routine in a DPEAS input script fileDOD Center for Geosciences / Atmospheric ResearchColorado State University 32. User Example: The results: Complete integrationThe new user routine is now fully integrated into DPEASDOD Center for Geosciences / Atmospheric ResearchColorado State University 33. Where Do I Find DPEAS? DPEAS Home Page: http://luna.cira.colostate.edu/DPEAS/DPEAS_frame.htm Please direct questions to [email protected] Center for Geosciences / Atmospheric ResearchColorado State University