Understanding EIS Selections and Data Management
Exploring the world of National Emissions Inventory (NEI) and Environmental Information Systems (EIS), this review delves into how selections are built, default hierarchy overrides, data exclusion rules, and features of EIS standard reports. The role of a Lead Information Technology Specialist and the limitations of the reviewer are also highlighted, offering insights into managing environmental data effectively.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Reviewing the NEI Jonathan Miller Lead Information Technology Specialist US EPA / OAR / OAQPS / OID / NADG
Topics How EIS Builds a Selection What is a Selection? Default Hierarchy Overrides Data Exclusion by Rule Features of EIS Standard Reports About Filters About Filtering a Data Set Built by a Selection About Report Columns Types of Reports Using External Applications to Evaluate Data Excel Access
DISCLAIMER What I m Good At Knowing What is In EIS Able to Bend What is In EIS Into Something Else that May Be Useful What I m Not Good At I m Not a Data Analyst I m Not an Inventory Developer
Section 1: How EIS Builds a Selection Everything You Didn t Want to Know About the Magic of Selections
Selections What are They? A Piece of EIS Software That Merges Information From Multiple Datasets That May Have Overlapping Data on a Per Emissions Process Pollutant Basis and Saves the Results to a Data Set Best Choice is Based on the Data Set Hierarchy Defined for the Given Selection for the Data Category Data Sets Used in the Hierarchy May Come From Different Inventory Years The National Emissions Inventory (NEI) is an Example of a Selection Responsible Agency Data Set is (Usually) the SLT Data Set for the Given Year Only Operates on Annual (and Routine for Point Data) Reporting Period Types A Given Selection May Exclude All Tagged Data Data That is Suspect or Otherwise Not Wanted to be Included in the Selection Can by Tagged by Inventory Developers.
Example Using the Default Hierarchy Facility 111 111 Unit Process 113 114 113 114 Pollutant VOC CO SLT 100 Tons 2017EPA_EGU 98 Tons 40 Tons 2017EPA_Airports 300 Tons 300 Tons Selected 100 Tons 40 Tons
Selection Overrides The Default Hierarchy Can Be Overridden Based on Values of the Attributes of the Process or Pollutant and Use a Different Hierarchy Completely (or Select No Data At All) Overrides Can be Based on (Shown in Reverse Priority Order): State County Tribe Sector Facility Source Type SCC Pollutant Code Usually Used to Exclude Pollutants from the Selection
Exclusion By Rules Pre-Defined Rules that Will Remove Selected Data Based on Criteria You Cannot See What Was Removed Due to an Exclusion by Rule Selections May Be Run Without Applying the Rules NEI Selections Will Usually Have the Rules Applied (Never say Never ) Rules Are Applied to Selections by Default Rules Can Apply to a Specific Data Category Rules: By Pollutant Group NP Survey Rule By Option Group / Option Set Unit Data Set Rule Facility Data Set Rule
Data Exclusion by Rule Pollutant Group Some Data May be Removed from the Selection Based on a Set of Rules Set by EPA EIAG Pollutant Group Rule Some Pollutants Have a Parent Child Relationship to Each Other. These Are Called Pollutant Groups . Counting Both a Parent and Child From a Pollutant Group Would Create Double-Counting. If Parent and Child Pollutants from the Same Pollutant Group are Reported to the Same Process, the Selection Will See How the Highest Ranking Data Set Reported Pollutants, and Remove Any Other Pollutants From Different Data Sets That Are Not at the Same Level as the Highest Ranking Data Set. Facility Facility Unit Process Unit Process Pollutant Pollutant Description Description SLT 2017EPA_ EGU 2017EPA_ EGU 2017EPA_ Airports 2017EPA_ Airports Selected Selected SLT 111 111 111 113 113 113 114 114 114 1330207 95476 108383 1330207 108383 Xylenes (Mixed) o-Xylene m-Xylene Xylenes (Mixed) m-Xylene 50 LBS 50 LBS 50 LBS 20 LBS 15 LBS Delete 20 LBS 15 LBS 111 111 111 113 113 113 114 114 114 95476 o-Xylene Delete 50 LBS Delete 20 LBS 20 LBS 15 LBS 15 LBS
Data Exclusion by Rule Pollutant Group Notes This Rule is Used by All Data Categories Defined Pollutant Groups for Selections Pollutant Group Names Xylenes Cresols PCBs Glycol Ethers Chromium Nickel Benzofluoranthenes PAH PAC Pentachlorobiphenyl Tetrachlorobiphenyl Hexachlorobiphenyl
Exclusion by Rule NP Survey Rule Used Only by Non-Point Data Category If an Agency Says We Don t Have Data for This Source , Delete Any Data for Those SCCs If an Agency Says Supplement Only for Reported Locations , Delete Any Data Outside of Location Reported by the Responsible Agency FIPS State County 01-001 01-001 01-003 SCC Pollutant Description SLT 2017EPA_ EGU 2017EPA_ Airports Selected SO2 1330207 108383 Sulfur Dioxide Xylenes (Mixed) m-Xylene 50 Tons 30 Tons 20 LBS 50 Tons 20 LBS 15 LBS Delete 15 LBS
Exclusion by Rule Option Group / Option Set Used Only by the Non-Point Data Category Looks to See How the Highest Ranking Data Set Reported the Data and Then Backfills with Other Data Sets That Reported at the Same Level .
Exclusion by Rule Option Group / Option Set Example FIPS State County 01-001 01-001 01-001 SCC Pollutant Description SLT 2017EPA_ NONPOINT Selected 2104008400 2104008410 2104008420 CO CO CO Carbon Monoxide Carbon Monoxide Carbon Monoxide 30 Tons 20 Tons 15 Tons 30 Tons 50 Tons 15 Tons Delete 50 Tons
Exclusion by Rule Unit Group Rule Applies Only to Point Data Category Some Data Sources EPA Uses Aggregate the Data to a Unit-Level Summary. Therefore, There May Be Some Overlaps. This Rule Deletes Data From Lower Hierarchy Data Sets If They Belong to the Same Unit They Are Measuring the Same Pollutant At Least One of the Data Sets is Aggregated to the Unit Level. This is a Property of the Data Set and Controlled by EIS Content Managers. 2017EPA_ EGU (Unit Based) Facility Unit Process Pollutant Description SLT 2017EPA_ Airports Selected 111 111 111 113 113 113 114 214 314 95476 95476 95476 o-Xylene o-Xylene o-Xylene 50 LBS 50 LBS 20 LBS 8 LBS 20 LBS Delete 8 LBS 15 LBS
Exclusion by Rule Facility Group Rule Applies Only to Point Data Category Some Data Sources EPA Uses Aggregate the Data to a Facility-Level Summary. Therefore, There May Be Some Overlaps. This Rule Deletes Data From Lower Hierarchy Data Sets If They Belong to the Same Facility They Are Measuring the Same Pollutant At Least One of the Data Sets is Aggregated to the Facility Level. This is a Property of the Data Set and Controlled by EIS Content Managers. Unit Process Pollutant Description 2017EPA_ TRI (Facility Based) Facility SLT 2017EPA_ Airports Selected 111 111 111 113 213 313 114 414 514 95476 95476 95476 o-Xylene o-Xylene o-Xylene 50 LBS 50 LBS 20 LBS 15 LBS Delete 20 LBS 15 LBS
Events and Selections Although Event Data is Submitted Using Daily Values, Creating Selections for Events Work the Same As for Non-Point, On-Road, and Non-Road. Based on Annual Aggregations of the Data on a per Process Pollutant Basis The Same Rules Also Apply to Events Data Selections The Selection is Based on the Annual Values, but the Daily Values that Composed Those Annual Values Are Created and Associated with the Selection Data Set
Section II: Features of EIS Standard Reports Hidden Gems You May Have Missed
Standard Report Filter Screen Summary Section Filter Accordions Action Buttons
Standard Features For All EIS Reports - Filters Process Filters Data Filters Geography Data Set By Region Most Reports Allow More Than 1 to be Chosen By Tribe Usually Must Choose At Least 1 By State Pollutants By County Individually or by Groups Facility Must Choose At Least 1 Facility Source Types Reporting Period Types ( Annual is Default) NAICS Facility Operating Type Operating Type Code ( Routine is Default Only Applicable to Point) Sectors Tagged (Include Both Tagged & Untagged is Default) Source Classification Codes
Thing to Considers With Filters Filters Operate with a Logical OR Within the Same Filter and With Logical AND Between Filters Since Selections Only Apply to Annual & Routine Types of Data, If You are Selecting From a Selection, No Point In Changing the Default for the Reporting Period Type, Data Tags, or the Operating Type. If You Want All of the Items in a Filter (All Sectors for Example), Don t Specify any Filtered Items in that Category Required Filter Items Are Highlighted at the Top Summary Area. All Specified Items Appear in this List Too. If the Report Allows Only 1 Item to be Selected, it Will Tell You When You Try to Submit the Report.
Report Columns Columns Currently Selected Report Columns Can Be Required, Defaulted, or Not Defaulted Depending on the Report Being Run All Columns That Will Be on the Report Appear in the Summary Section of the Report Yellow Highlighted Columns Cannot be Unselected (These are the Required Columns) Blue Columns are Currently Selected, but Can be Unselected White Columns are Currently Unselected, but Can be Selected Optionally Selected Columns in Blue Moral to the Story: There May be More Columns of Interest Not Currently Selected to be Included Probably Worth Looking At the Full List at Least Once. Required Columns in Yellow Unselected Columns in White
General Rules About Reports SLT Data Sets Process Much, Much, Much Faster than the NEI Data Sets Remember That Selections Can Have Over 100 Million Records In It Same Holds True for Most EPA Data Sets (EGU, TRI, HAPAug, PMAug, Etc) Even If You Are Only Looking for 1 County in an NEI Data Set, It Still Has a Lot of Data to Process Reports Run Against the Live Database. As Soon As Data is Changed in the Database, it is Accessible by the Reports. The Reports Job Runs Every 10 Minutes. Every EIS User Shares the Same Report Queue.
Standard Reports in EIS Types of Reports Emissions Summary Reports Facility Emissions Summary Reports Facility Configuration Reports Event Reports Daily Values SMOKE Flat File Reports XML Snapshots Comparison Reports
Emissions Summary Reports Aggregates Selected Data to Different Geographic Levels By Data Category By Sector By Sector Data Category By SCC Works for All Data Categories THE EXCEPTION: Area Emissions Process Report Does Not Work for POINT Differences Between Area Emissions Process Report (AEP) and the County by SCC (CSCC) Report: AEP Cannot be Run For Point. CSCC Can be Run for All Data Categories AEP Includes Shapes. CSCC Aggregates All Shape Data to the County As Well as Point Source Data to a SCC County Aggregate.
Emissions Summary Report Example Let s Get the Non-Point Data Submitted by State of New York for 2014 and 2017 I want the data Summarized on a State Data Category Basis, but I only want the Non-Point Data Use the Default Columns Defined for the Report Just Want the Annual Data Only Need for CAPS Pollutants
Final Results Pick Data Sets Pick Pollutants Question: Why Don t I have To Specify any Geographic Locations for this Example? Pick Data Category
Facility Emissions Summary Reports Aggregates Data to Different Levels of Facility Definitions Emissions by Facility Emissions by Unit Emissions by Release Point Emissions by Process SPPD RTR Modeling File Data Retrieval Only Applies to Point Data Sources Special Types of Filters By NAICS By Facility Source Type
Facility Configuration Reports Provides Additional Details About the Facility Inventory That May Not be Available in the Emissions Reports Different Reports for Different Types of Facility Inventory Components Facility Process and Unit Regulation Release Point Facility Alternate Identifiers Unit Alternate Identifiers Facility Controls
Event Reports Provides Details For the Daily Events Values Only Applies to the Events Data Category If the Data is from a SLT Data Set, then These Are the Values that Were Supplied to EIS If the Data is from a Selection (Like the NEI), then These Are the Values that the Selection Built After Selecting a Given Annual Record for the Process and Pollutant
Inventory Development Reports Reports Used Primarily by EIAG to Track QA and Submission Activities by SLTs Point Data Tagging Report Nonpoint, Onroad, Nonroad, Event Data Tagging Report Nonpoint Survey Summary Nonpoint Survey Detail May be of Interest to Users to See What (and Why) Values Have Been Tagged to be Excluded from Selections by Inventory Developers Good Mechanism to Get a Spreadsheet Version of Answers to the NP Survey
SMOKE Flat Files FF10 File Formats Used in the EMF Models Point Nonpoint Nonroad Onroad Events Daily Columns for These Reports are Fixed and Cannot be Modified Some of These Reports Have Multiple Files Within Them Historically, the Entire US file is Posted. Many SLTs Find the Size of this File Difficult. Use the Filters to Include Only the Geographies of Interest!
Facility and Emission XML Snapshots Produces Data in the CERS XML Format Facility Snapshot Point Emissions Snapshot NonPoint, OnRoad, NonRoad Emissions Snapshot Event Emissions Snapshot Very Useful if You Need to Make Changes to a Previous Submission You Will Need to Adjust the Header Record Before Resubmitting
Comparison Reports Compare the Results of One Dataset Versus Others State/Tribe Sector Comparison State/Tribe SCC Comparison Area Emissions Process SCC Comparison Point Comparison Facility Level Point Comparison Process Level
Section III: Using External Applications to Evaluate Data Making Friends with Access and Excel
When is a Good Time to Use Excel? You Have the Data You Want Pretty Much in the Format You Want Generally, More of a One-Time Use of the Data Pluses EIS Report Outputs Migrate to Excel Readily Excellent for Simple Charts & Graphs Pivot Tables Are Your Friend! Minuses If you Apply a Formula, Can be Annoying to Copy / Paste If you Like Codes, Often will Drop Leading Zeros on Codes
When is a Good Time to Use Access? When You may Add to the Data Later When You Have Reference Tables that Can Be Updated When You Need More Flexibility in How to Create Data Tables Pluses Flexibility & Scalability Minuses Higher Learning Curve Not as Nice Interfaces
EXAMPLE #1: What Percentage of the 2014 NEI For My State Uses Data Outside of SLT Data by Pollutant Type? Always Good to Make Sure You Understand the Question Common Ways of Looking at the Data Include By Geography By Pollutants (or Pollutant Type) By Sector By Data Category Having the Detail and Rolling Up Always a Good Idea If You Need to Know What Data Set a Data Point Comes From, You Will Need the Most Granular Data Possible.
Some Helpful Excel Tricks VLOOKUP Standard Excel Function that Can Pull Back 1 Column From a Different Worksheet Based on a Unique Key The Unique Key Can Be From EIS or It Could be One You Make Up Sorting and Grouping Functions in Excel Very Helpful in Finding Duplicates Good for Looking at Summaries Then Drilling Into Details Where Suspect Data Exists
Example #2: I Want to Be Able to Look at Data Trends for Several NEI s Again, Look at the Question and Determine What Information You are Going to Need to Accomplish the Task Since There Isn t a Need to Differentiate Where the Data Comes From, You Can Pull the Data to the Lowest Perceived Needed Aggregation. I ll Assume County-SCC in This Example Given That We are Looking to Add to This Over Time, I ll Go with Access.
Help Me Help You What Is (and Isn t) Working With the EIS Reports for Your Analyses? Are Different Report Aggregations Needed? Additional Filtering Functions? Audience Participation Time!! What Sort of Data Requests Do you Frequently Get that We Can Start Solving?