Wednesday, December 6, 2017

GIS 5935: Spatial Data Aggregation

This week we went over Spatial Data Aggregation and MAUP (Modifiable Areal Unit Problems). MAUP is most affected by scale effect and zonation effect.

MAUP, a Modified Areal Unit Problem, arises from imposition of a spatial recording's artificial units when working with a continuous geographical event. This leads to the creation of artificial spatial patterns, skewing data and causing for erroneous analysis. MAUP arises from issues with scale and zoning in GIS. The most common example of this is Gerrymandering, which is manipulating boundaries to favor certain results, political party, or class. Gerrymandering is a major issue in regards to politics, districts often times separated across multiple counties. 

To deal with Scale MAUP, the best thing to do is to pick the correct scale for what you are interested in measuring and analyzing. If possible, and dependent on your research question, it is better to view your map/data at a finer scale, which allows for it to be aggregated and give a better result. Coarser maps (due to lower resolution) does not allow for you to aggregate the data as well and lead to erroneous assumptions. This was observed in Part A of our lab, where we had to compare the Percent of the Non-White Population with the Percent Below Poverty. This was compared over four different scaling methods: by original block groups, by zip code, by house (voting), and by counties.

Figure 1: Table displaying the changes regression results by changing the scale.

To deal with Zone MAUP, map out the zones in your map in a simple, but critical method. You want to avoid things such as gerrymandering by making the zones easy to understand and able to be duplicated. One way to simplify the zones is to keep the same zonal shape among all zones. Make sure to incorporate factors in what you are studying to make the zone shape that has the best fit for your data. This practice was performed in Part B of the lab. We had to determine the Top 10 Worst Districts in Regards to 'Compactness' and 'Community'. To measure Compactness, a ratio was calculated based on the area and perimeter of each district. Through this I was able to calculate the Top Worst in 'Compactness'.
Figure 2: Screenshot showing how spaced out one voting district can be. In this situation, GEOID 26ZZ is spaced out across 4 different counties.

Figure 3: Screenshot displaying GEOID 3712, which was ranked 1st out of the 10 Worst in 'Compactness'

Figure 1: Table showing the Top 10 Worst Districts in 'Compactness'

To find 'Community', I intersected the districts from the counties, determining which counties were divided among different districts and how many districts they were spread across. By joining these two analysis together I was able to list the top 10. Overall the 'Compactness' calculations in Part B was not difficult. I had a lot of difficulty in determining the 'Community' Top 10. I also noticed some issues with running OLS Analysis on Counties in Part A, having to finagle the data to even run the analysis. I learned a lot in regards to MAUP, and it was nice to apply it to a real situation: gerrymandering in regards to voter populations.

Wednesday, November 29, 2017

GIS 5935: Scale and Resolution

This week we reviewed the differences between LIDAR and STRM DEMs. To compare the differences between the two DEMs, I decided to run a Minus Geoprocessing Analysis. Through this tool, I can compare the two DEMs and find the differences, both positive and negative, between the two of them. I did this for both the elevation and the slope.

When comparing the differences between LIDAR and STRM, there is a much larger difference in cell value than compared to slope. The difference in cell value (elevation) range from -83.7 to 81.2. Whereas the difference in slope only varies from -28.2 to 23.6. One factor that could apply to this is because I set the Resolution to 90 m. Slope did have a large range of change though considering that it is only measured up to 90 degrees. STRM gets its cell values from shuttle radar topography. LIDAR gets its cell values based on light and the range of its available detection. When comparing an area that has major elevation changed, and STRM is more ideal. With different elevation comes forecasting of shadows, which can alter the LIDAR to have erroneous data.

Tuesday, November 21, 2017

GIS 5935 Geographically Weighted Regression: OLS vs GWR

OLS Regression Analysis observing the Rate of Residential Burglaries
GWR Analysis observing the Rate of Residential Burglaries               

This week in GIS 5935 we learned about different spatial analysis that you can use in GIS to analyze various factors. These analysis are used to correlate different factors to events and predict future results, such as crime rate in an area. This week we had to choose a crime and run two types of spatial analysis with the factors that we believed would be most influential on the rate of the crime. For my project I chose residential burglary. I ran an OLS Regression Analysis including five explanatory factors that I believed would have a major influence on the rate of burglaries in the area (for example, on explanatory variable was household income). I then ran a GWR Analysis, using the explanatory factor that was correlated the strongest to the rate of crime (which was calculated by running a coefficient correlation matrix).

In an OLS analysis, the independent and dependent variables are calculated (via a regression) and mapped out onto a ArcMap, with residuals displayed on the map (the difference between the observed value in the original data and the estimated value based on the regression formula). In an OLS analysis, a single best-fit model is created for the entire dataset. This ignores where the observations are located and how close they are together. GWR creates a linear regression for every location (indicated by the symbol u in the formula). This means the best-fit model varies by location and the explanatory variable chosen to analyze. It is more unique to the neighborhood of focus whereas OLS is generalized based on all data.

In many analysis, the GWR is more detailed and unique to local areas, giving a more accurate account for how the explanatory variables effect the rate of crime and a more accurate prediction for future crime rates in that area.

Wednesday, November 8, 2017

GIS 5935: Introductory Statistics, Correlation, and Bivariate Regression

Figure 1: The excel table with the estimates for unknown rainfall at Station A from 1931-1949. Data was calculated using the slope equation (y=mx+b) and values calculated from a bivariate regression.

 This week we went over introductory statistics that would be incorporated in future GIS labs. This lab was a great refresher, considering the last statistics course I took was sophomore year of my undergraduate. Throughout the lab we reviewed different ways to calculate statistics, correlations, and regressions on different kinds of data within excel. One end goals was to be able to estimate missing data.

 The data we were given was for rainfall in two stations: Station A and Station B. For the years 1931-1949, no data was collected for Station A. To estimate the values, a bivariate regression was run on the two datasets from 1950 onwards. From that data, I was able to collect a coefficient and an intercept. Since both rest on a slope, I used the slope formula y=mx+b, where m equaled the intercept (0.84556) and b equaled the coefficient (161.8340). I plugged the formula into excel and dragged it down to receive all the values. Although there is no way to fully know the true amount of rainfall during that period (1931-1949), based on the patterns of the later years, running a regression analysis and using those values is a strong estimate.

Wednesday, November 1, 2017

GIS 5935: Accuracy of DEMs

Figure 1: Points (types of land cover distinguished in different colors) resting on top of the DEM File

This week in lab we learned about how to determine the accuracy of DEMs. The goal of the project was to find the DEM calculated elevation for the 287 data points that were collected within a DEM file called "Lidar". The DEM was located within a small county of North Carolina. The DEM elevation values were then compared to the true elevation values (which were supplied in an attribute table and Excel sheet) to see how much the values differed from each other. By comparing these values, were were able to determine the accuracy of the points.

To attain the 287 points within the DEM, I transformed to Excel sheet into an attribute table and then clipped it to the DEM. Under Spatial Analysis Tools, I selected "Extract Values to Point". From there I was able to create a shapefile and attribute table that had all the DEM elevation values for every point. I converted these new attribute tables into Excel files, where I created a unique file for each land cover type and one with all 287 data points. Within Excel I converted the DEM elevation from ft to meters and than calculated the difference accuracy percentile (68th and 95th) as well as the RMSE. I also calculated the bias in the data by land cover type by calculating the standard deviation.

 Figure 2: Data Table displaying Accuracy Percentiles and RMSE

Figure 3: Data Table showing bias among the different land cover types and all the sites combined.

Wednesday, October 25, 2017

GIS 5935: Surface Interpolation

Figure: The IDW Interpolation Map. I personally feel that this interpolation method was most ideal for analysis of the data we were interested in.

This week we were introduced to surface interpolation and the various interpolation methods that can be applied to analyze data. In this project we had to analyze the water quality at various sampling locations in Tampa Bay, Florida. At first we conducted a statistical analysis on the Non-Spatial data (collecting the min, max, average, and standard deviation for the data before any interpolation methods were applied to it). Once this information was acquired, several different spatial interpolation methods were applied to the same set of data. The min, max, average, and standard deviation was recorded for all of them. We then compared the data between the different interpolation methods to notice any discrepancies and decide which method was most ideal for the data we were interested in.

Thiessen and IDW were more ideal for the data analysis, their calculations for estimating the un-sampled areas not resting as heavily on the the sample points (dependent on location; the further away the neighbor point was from the un-sampled area, the less its value was incorporated into the un-sampled area). Spline Interpolation, weaving through every point, rested heavily on the data points, causing for a more drastic difference in surface water quality results. This caused for a lot of error.

Tuesday, October 17, 2017

GIS 5935: TINs and DEMs

This week we began to learn about DEM and TIN models in both ArcMap and ArcScene. This lab was very enjoyable and easy to work with. I personally really enjoyed the lab this week, feeling that I was gaining hands on experience with GIS applications that I would want to use in my future career. In this lab we focused on different elevations with TINS in ArcMap and ArcScene, comparing the differences between them. TINS can be really great for supplying detailed characteristics of elevation in small study areas. However for covering a greater area, DEMs are a better choice. DEMs contours are not as pointed as TINs (do to the triangles) and can bring more detail to the table for larger areas of analysis. 

The screenshot above shows how detailed TINs can be in smaller areas. In this TIN model I set up to have different angles of elevation, slope, and edges displayed differently to show the different textures in the elevation. By using applications like this you could locate ideal slopes for ski resorts, ideal location for observance towers, and a plethora of other applications.