Jason Greenberg
Research Question: Starting as early as their undergraduate career, do women face barriers to entry into or have strong preferences against entering STEM related fields, specifically the sciences and engineering, due to their gender? Also, does region within the United States impact this gender disparity?
Changing gender expectations and increasingly more equal societal treatment of women have led researchers from different disciplines to analyze what may contribute to the lingering gender gap in earned wages. Many factors influence a worker's salary, including job industry, experience, inherent ability, lifestyle preferences, and performance. Many of these predictors are subjective and difficult to measure. This presentation will not focus on explaining the causes of higher median incomes for men, but will instead examine the gender disparity in college major choice, which is an indicator of eventual career track and earnings. Systematic gender preferences for and against science and engineering majors are visibile after looking at undergraduate major data from the American Community Survey from 2015. Moreover, while there exists a strangely constant gap between the number of male versus female science and engineering majors across the country at the state level, the relative percentages of male degree holders with majors in science and engineering against their female parallel figures suggest that different parts of the country face varying levels of gender disparity in the sciences at the undergraduate level.
library(ggplot2)
library(maps)
library(RColorBrewer)
These r packages above are necessary to run the graphics that will support the argument developed.
bachelors <- read.csv("bachelors.csv", header = TRUE, stringsAsFactors = FALSE)
dim(bachelors)
head(bachelors)
The "bachelors.csv" file includes information on bachelor's degree holders from 2015 for men and women across the United States and Puerto Rico in various geographical regions. As this was a dataset used for Problem Set 2 of the class, no major data cleaning was necessary. For the purposes of this presentation, only the combined state figures and not the urban, rural, or city specific data will be used.
desired_columns <- c(3, 4, 16, 28, 40, 52, 64)
desired_rows <- seq(2,53) #all states, Washington DC, and Puerto Rico
subsetTotal <- bachelors[desired_rows, desired_columns]
colnames(subsetTotal) <- c("State","Total", "SciEng", "SciEngRelated", "Business", "Education", "HumArts")
dim(subsetTotal)
subsetTotal
This dataframe presents the total number of bachelor's degrees for those aged 25 or older from the year 2015 for all 50 states, Washington DC, and Puerto Rico. Five categories of majors are included. The US Census Bureau American Community Survey defines science and engineering related majors to include nursing, architecture, and mathematics teacher education degrees, while the science and engineering category includes biology, chemistry, physics, mathematics, computer science, and social science degrees.
colnames(subsetTotal) <- c("State","Total", "SciEng", "SciEngRelated", "Business", "Education", "HumArts")
bandNames <- colnames(subsetTotal[,3:7])
par(mfrow = c(3,2))
par(mar = c(0,0,0,0))
for(j in 1:5){
hist(as.numeric(subsetTotal[,j+2])/as.numeric(subsetTotal[,2]),breaks = seq(0,0.7, by=0.05),ylim = c(0,50),
axes = FALSE, main = "", xlab = "", ylab = "", col = "grey")
box()
text(x = .33, y=40, label = bandNames[j])
}
Before beginning an analysis of gender and region bias in college major selection for men and women, it is helpful to see the distributions of majors for the combined figures that include men and women for all observation points. The visual above includes five histograms for each category of college major type. The x-axis measures percentage of degree holders for that major and ranges from 0 to 70% as defined by the "breaks = seq(0,0.7, by=0.05)" code, while the y-axis indicates frequency in terms of number of states, which ranges from 0 to 50. Each bar represents a bin range of 5%. The use of "par" and the forloop generate the grouped set of histograms, and the individual histogram titles connect to the original dataframe "subsetTotal" column names through the use of the "colnames" function. For this non region specific state major data for all degree holders over the age of 25, science and engineering majors represented the highest percentage of total degrees. This is signified by high median levels, over 20 states, being around 30% of all degrees in the states measured and a relatively even, bell-curve shaped distribution. Meanwhile, the lower median levels for science and engineering related fields, 30 states having between 5 and 10 percent of degree holders with this type of degree, and education major degrees, about 20 states having 10 to 15 percent of these degrees, signifies lower popularity.
Desired_columnsMale <- c(8, 20, 32, 44, 56, 68) #men totals
Desired_rowsMale <- seq(2,53) #all states
SubsetMale <- bachelors[Desired_rowsMale, Desired_columnsMale]
colnames(SubsetMale) <- c("TotalMale", "SciEngMale", "SciEngRelatedMale", "BusinessMale",
"EducationMale", "HumArtsMale")
bandNames <- colnames(SubsetMale[,-1])
par(mfrow = c(3,2))
par(mar = c(0,0,0,0))
for(j in 1:5){
hist(as.numeric(SubsetMale[,j+1])/as.numeric(SubsetMale[,1]),breaks = seq(0,0.7, by=0.05),ylim = c(0,50),
axes = FALSE, main = "", xlab = "", ylab = "", col = "grey")
box()
text(x = .33, y=40, label = bandNames[j])
}
Desired_columnsFemale <- c(12, 24, 36, 48, 60, 72) #women totals
Desired_rowsFemale <- seq(2,53) #all states
SubsetFemale <- bachelors[Desired_rowsFemale, Desired_columnsFemale]
colnames(SubsetFemale) <- c("TotalFemale", "SciEngFemale", "SciEngRelatedFemale", "BusinessFemale",
"EducationFemale", "HumArtsFemale")
bandNames <- colnames(SubsetFemale[,-1])
par(mfrow = c(3,2))
par(mar = c(0,0,0,0))
for(j in 1:5){
hist(as.numeric(SubsetFemale[,j+1])/as.numeric(SubsetFemale[,1]),breaks = seq(0,0.7, by=0.05),ylim = c(0,50),
axes = FALSE, main = "", xlab = "", ylab = "", col = "grey")
box()
text(x = .33, y=40, label = bandNames[j])
}
These two sets of five histograms divided by gender help display the differences in frequencies of major choice for men and women. The same coding technique and structure were used in the original non-gender specific set of histograms above. This time, new subsets "subsetMale" and "subsetFemale" were used as opposed to "subsetTotal," where the data came from gender specific columns from the original Excel file. One of the most drastic differences in the distributions exists in the science and engineering major histograms. The center of the male SciEng distribution is about 20% higher than the center of the female SciEng distribution. No other college major type faces this sort of gender disparity. Further statistical analysis will be able to help clarify on some aspects of the relationship between the number of men with science and engineering degrees and the number of women with science and engineering degrees.
desired_columns <- c(3,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72)
desired_rows <- seq(2,53) #all states
subset <- bachelors[desired_rows, desired_columns]
colnames(subset) <- c("State","Total","TotalMale","TotalFemale",
"SciEngTotal", "SciEngMale", "SciEngFemale",
"SciEngRelatedTotal", "SciEngRelatedMale", "SciEngRelatedFemale",
"BusinessTotal","BusinessMale","BusinessFemale",
"EducationTotal","EducationMale","EducationFemale",
"HumanitiesTotal","HumanitiesMale","HumanitiesFemale")
subset$percentMaleTotal <- as.numeric(subset$TotalMale)/as.numeric(subset$Total)*100
subset$percentFemaleTotal <- as.numeric(subset$TotalFemale)/as.numeric(subset$Total)*100
subset$percentOfMaleInSciEng <- as.numeric(subset$SciEngMale)/as.numeric(subset$SciEngTotal)*100 #this pair of percentages will sum to 100%
subset$percentOfFemaleInSciEng <- as.numeric(subset$SciEngFemale)/as.numeric(subset$SciEngTotal)*100
subset$percentSciEngMale <- as.numeric(subset$SciEngMale)/as.numeric(subset$TotalMale)*100 #while this pair has no immediate, direct relationship
subset$percentSciEngFemale <- as.numeric(subset$SciEngFemale)/as.numeric(subset$TotalFemale)*100
subset$percentSciEngRelatedMale <- as.numeric(subset$SciEngRelatedMale)/as.numeric(subset$TotalMale)*100
subset$percentSciEngRelatedFemale <- as.numeric(subset$SciEngRelatedFemale)/as.numeric(subset$TotalFemale)*100
subset$percentBusinessMale <- as.numeric(subset$BusinessMale)/as.numeric(subset$TotalMale)*100
subset$percentBusinessFemale <- as.numeric(subset$BusinessFemale)/as.numeric(subset$TotalFemale)*100
subset$percentEducationMale <- as.numeric(subset$EducationMale)/as.numeric(subset$TotalMale)*100
subset$percentEducationFemale <- as.numeric(subset$EducationFemale)/as.numeric(subset$TotalFemale)*100
subset$percentHumanitiesMale <- as.numeric(subset$HumanitiesMale)/as.numeric(subset$TotalMale)*100
subset$percentHumanitiesFemale <- as.numeric(subset$HumanitiesFemale)/as.numeric(subset$TotalFemale)*100
subset$SciEngRatio <- subset$percentSciEngFemale/subset$percentSciEngMale
head(subset)
By taking the number of degree holders for each major for men and women and then dividing them by the total number of degree holders for each gender, a percentage of degree holders for each state, major, and gender can be derived. Having access to both raw counts and relative figures is important for a more complete analysis. Clarifying on the annotations in the code for the second and third pairs of calculations above, the "percentOfMaleInSciEng" and "percentofFemaleinSciEng" indicate the percentage of men and women with science and engineering degrees compared to the sums of the two genders of degree holders. This is why the two values will sum to 100%. Meanwhile, "percentSciEngMale" and "percentSciEngFemale" signify the percentage of men and women who hold science and engineering degrees compared to other degrees, not the other gender, which is why these two percentages will most likely not sum to 100%. Both computations are important for understanding the relationship between men, women, and degree choice.
median(as.numeric(subset$TotalMale))
median(as.numeric(subset$TotalFemale))
median(as.numeric(subset$SciEngMale))
median(as.numeric(subset$SciEngFemale))
By calculating the median number of total degree holders per state for men and women, it can be seen that women have more degrees per state on average. The ratio of women degree holders to men degree holders is about 1.2 to 1.0 while the ratio of female science and engineering degree holders to male science and engineering degree holders is about 1.0 to 1.59, which indicates that even with higher female averages in general, men have many more degrees in science and engineering than women on average.
sum(as.numeric(subset$TotalMale))
sum(as.numeric(subset$TotalFemale))
sum(as.numeric(subset$SciEngMale))
sum(as.numeric(subset$SciEngFemale))
Computing the same ratios for all state counts summed together, the women to men total ratio is about 1.1 to 1.0, while the parallel science and engineering totals ratio is about 1.0 to 1.5. Therefore, for both the state median ratios and the total sum ratios, women have more degrees in general, but the relative difference between number of male and female science and engineering degrees is even greater and in the opposite relationship.
median(subset$percentMaleTotal)
median(subset$percentFemaleTotal)
median(subset$percentOfMaleInSciEng )
median(subset$percentOfFemaleInSciEng)
median(subset$percentSciEngMale)
median(subset$percentSciEngFemale)
median(subset$percentSciEngMale)/median(subset$percentSciEngFemale)
median(subset$percentSciEngRelatedMale)/median(subset$percentSciEngRelatedFemale)
median(subset$percentBusinessMale)/median(subset$percentBusinessFemale)
median(subset$percentEducationMale)/median(subset$percentEducationFemale)
median(subset$percentHumanitiesMale)/median(subset$percentHumanitiesFemale)
Looking at the percentage calculations, women had about 52.5% of all bachelor's degrees in 2015 for those over the age of 25. Yet they only held about 39.4% of science and engineering degrees. In line with the ratios examined just previously, men had a stronger inclination to go into the sciences and engineering than women. On average, 42.3% of all degrees were science and engineering based for men, while only 24.1% of degrees for women were science and engineering degrees on average. The ratio of those percentages was 1.75, which was the largest of all the college majors, with the business major having the second greatest at 1.46. The discrepancy between total number of degrees, state averages, and state average percentages all point towards there being some societal trend for women to not enter the sciences or engineering. One paper from 2009 that analyzed a survey of 161 students from Norhtwestern University determined through econometric modeling based on survey results that the most significant reason for women deciding not to enter particular majors was linked to expectations of enjoyment of the coursework. The author suggested that the difference in expectations between men and women for particular department courses might be linked to gender discrimination in society (Zafar 29). In an article from 1984 written using data from the National Longitudinal Studies of the High School Class of 1972, the authors argued that "substantial differences appear in their preferences as of [the students'] senior year in high school, for various types of work and in their subsequent preparation for the labor market during college" (Daymont 414). Again, an economic regression model based on survey answers determined which factors weighed in most on what most contributed to the gender gap. If the preferences and impressions of students on their career choice impact college major selection and therefore career path and earnings, then the data seen in the American Community Survey from 2015 add support to the idea that the gender gap begins to take form even before students finish their education.
options(scipen=2000000) #converts scientific notation to regular decimals for numbers under two million
summary(lm(as.numeric(subset$SciEngFemale) ~ as.numeric(subset$SciEngMale)))
The code above outputs a linear regression summary for the raw count of male held science and engineering degrees acting on the raw count of female held science and engineering degrees for each state, DC, and Puerto Rico. The same summary output is then performed for the four other types of majors. The important summary statistics that give evidence to support how women systematically avoid majoring in science or engineering are analyzed further on.
summary(lm(as.numeric(subset$SciEngRelatedFemale) ~ as.numeric(subset$SciEngRelatedMale)))
summary(lm(as.numeric(subset$BusinessFemale) ~ as.numeric(subset$BusinessMale)))
summary(lm(as.numeric(subset$EducationFemale) ~ as.numeric(subset$EducationMale)))
summary(lm(as.numeric(subset$HumanitiesFemale) ~ as.numeric(subset$HumanitiesMale)))
The dependency of the number of female degree holders for each major on the number of male degree holders is positive for each of the five majors. Looking at the raw counts of bachelor's degree holders, this result is not surprising. A rising amount of degrees for each gender implies that states with more degree holders for one gender have relatively more for the other as well. However, the slope with the lowest magnitude was the science and engineering major relationship at 0.67, while the other slopes were 0.80, 1.40, 2.33, and 3.36. Based on the p-values of each regression being negligibly close to zero, the regression coefficients are all statistically significant.
ggplot(subset, aes(as.numeric(SciEngMale), as.numeric(SciEngFemale))) +
geom_point()+
#scale_x_continuous(name="Total Male SciEng Degree Holders", limits=c(0, 150000)) +
#scale_y_continuous(name="Total Female SciEng Degree Holders", limits=c(0, 150000))+
labs(x= "Total Male SciEng Degree Holders by State") +
labs(y = "Total Female SciEng Degree Holders by State")+ ylim(0,2000000)+
labs(title= "Relationship Between Male and Female SciEng Degree Holders") +
stat_smooth(method = lm, se = FALSE, color = "black") +
geom_vline(xintercept = 134349, linetype="dotted", colour="red")+
geom_hline(yintercept = 84527, linetype="dotted", colour="red")+
geom_vline(xintercept = 0)+
geom_hline(yintercept = 0)+
annotate("text", label = "r^2 == 0.9895", parse = TRUE,x= 1400000, y = 1500000) +
annotate("text", label = "slope = 0.676634", x= 1475000, y = 1250000)
#qplot(as.numeric(SciEngMale), as.numeric(SciEngFemale), data = subset, color = I("darkblue"),
# xlab = "Total Male SciEng Degree Holders", ylab = "Total Female SciEng Degree Holders",
# main = "Relationship Between Total Male and Female SciEng Degree Holders") + geom_smooth(method = "lm", se = FALSE)
#qplot version of the above ggplot
The ggplot above is a visual representation of the first linear regression run in the series of five regressions run earlier. The x-axis represents the total number of male science or engineering bachelor’s degree holders aged twenty-five and older from each of the 50 states, DC, and Puerto Rico. The y-axis represents the same figure for women. The median number of male degree holders with a major in science or engineering was 134,349, while the median for females was 84,527. The dotted line crosshair intercept indicates this point. Also, the r-squared value indicates that almost ninety-nine percent of the variability in female science and engineering degree holders is accounted for by variability in the number of male science or engineering degree holders. This value is incredibly high for a regression with just one independent variable, but looking at the other r-squared values for the regressions on other college major categories, it can be seen that similarly high values are present. Outside of data distortion possibilities, this indicates that the number of male degree holders in a state for a particular major for the year of 2015 was an incredibly precise indicator of how many female degree holders for that major there will be.
summary(lm(subset$percentSciEngFemale ~ subset$percentSciEngMale))
ggplot(subset, aes(percentSciEngMale, percentSciEngFemale)) +
geom_point()+
labs(x= "Male SciEng Degree Holders by State") + xlim(32.5,52.5)+
labs(y = "Percentage of Female SciEng Degree Holders by State")+ ylim(15.25,35.25)+
labs(title= "Relationship Between Male and Female SciEng Degree Holders by Percentage") +
stat_smooth(method = lm, se = FALSE, color = "black")
Looking at the percentage of degree holders who majored in science and engineering for each state, as opposed to the raw counts, a relatively one-to-one slope is seen, as computed in the regression above with a value of about 0.99. This relationship may at first appear to conflict with the interpretation of the raw counts, but due to the difference in observation values, the slope actually further supports the notion that the number of women in science and engineering is significantly lower than men. For every one percent point increase in the percentage of bachelor's degree holders who majored in science and engineering, the percentage of women degree holders with a major in science and engineering is expected to increase by one percent point as well. However, the range of percentage values for women, as seen by the y-axis, are all lower by around the 18% difference seen in the median state percentage values for the two genders, where the median for men was 42.3% for men and 24.1% for women. Importantly, this relationship indicates that states with higher percentages for both men and women having science degrees have more similar percentages than those states with lower figures, due to higher numerator and denominator values meaning a fraction closer to the value of one. This will be more visually apparent later on by taking the ratio of these two variables and mapping the state values with a choropleth.
Initializing state longitude and latitude data and creating choropleth graphs will help to get a better sense of the regional implications of these findings.
states <- map_data("state")
head(states)
dim(states)
head(subset)
names(subset) <- tolower(names(subset))
subset$region <- tolower(subset$state)
head(subset)
The above code modifies the original "subset" data by renaming the columns with lowercase titles and adds a final column named "region" that the state data shares with the given state as each entry.
choro_df <- merge(states, subset, by = "region") #merge(df1,df2,by="column vector")
head(choro_df)
After creating a column that matches both dataframes, they can be merged with the "merge" command and then ordered by the order column.
The next two maps entitled "Women Degree Holders with a Major in Science/Engineering" and "Men Degree Holders with a Major in Science/Engineering" display the findings of the raw count statistical regression analysis conducted earlier. States with more men with science and engineering degrees also have more women with science and engineering degrees. The legend next to each will indicate the relative gap between the two genders in number of degrees. In a regional context, because of the strong direct connection between the two counts of science and engineering degrees, the maps look very similar.
choro <- choro_df[order(choro_df$order),] #order by "order" column
head(choro)
choro$breaks <- cut(as.numeric(choro$sciengfemale),breaks = seq(0,1400000, by = 100000), include.lowest = TRUE,
labels = c("0-100,000","100,001-200,000","200,001-300,000","300,001-400,000","400,001-500,000",
"500,001-600,000","600,001-700,000","700,001-800,000","800,001-900,000","900,001-1,000,000",
"1,000,001-1,100,000","1,100,001-1,200,000","1,200,001-1,300,000","1,300,001-1,400,000"))
#choro$breaks <- cut(as.numeric(choro$sciengfemale),breaks = seq(0,1500000, by = 250000), include.lowest = TRUE,
# labels = c("0-250,000","250,000-500,000","250,000-500,000",
# "500,000-750,000","750,000-1,000,000","1,000,000-1,250,000"))
qplot(long, lat, data = choro, group = group, fill = breaks, geom = "polygon",
main = "Women Degree Holders with a Major in Science/Engineering") +
scale_fill_brewer(name = "Number of Degrees", palette = "Reds")
For the raw degree count map for women above, break separation units of 100,000 were used, while break separation units of 200,000 were used for the men. Even with the difference in intervals, the trend of all states having about 1.5 times as many degrees in the sciences for men is apparent through the coloration similarities.
choro$breaks <- cut(as.numeric(choro$sciengmale),breaks = seq(0,2200000, by = 200000), include.lowest = TRUE,
labels = c("0-200,000","200,001-400,000","400,001-600,000","600,001-800,000","800,001-1,000,000",
"1,000,001-1,200,000","1,200,001-1,400,000","1,400,001-1,600,000","1,600,001-1,800,000","1,800,001-2,000,000",
"2,000,001-2,200,000"))
qplot(long, lat, data = choro, group = group, fill = breaks, geom = "polygon",
main = "Men Degree Holders with a Major in Science/Engineering") +
scale_fill_brewer(name = "Number of Degrees", palette = "Blues")
While the raw counts suggest that there is no regional difference in degree counts for men and women in the sciences, looking at the percentage of degrees for each state that are in science or engineering paints a different picture. Comparing the two, some parts of the country have more men in the sciences than they do women and vice versa. Some states do have high percentages for both, like California and New York, but others only have high percentages for men and relatively lower for women like Wyoming and Florida. Aside from the regional differences, the maps do further the argument that more men are in the sciences than women. The degree rates are significantly lower for women, as shown earlier by the median percentage of men and women with science and engineering degrees.
choro$breaks <- cut(choro$percentsciengfemale,breaks = seq(15,45, by = 5), include.lowest = TRUE,
labels = c("15%-20%","20%-25%","25%-30%",
"30%-35%","35%-40%","40%-45%"))
qplot(long, lat, data = choro, group = group, fill = breaks, geom = "polygon",
main = "Percentage of Women Degree Holders with a Major in Science/Engineering") +
scale_fill_brewer(name = "Degree Rates", palette = "Reds")
choro$breaks <- cut(choro$percentsciengmale,breaks = seq(30,55, by = 5), include.lowest = TRUE,
labels = c("30%-35%","35%-40%","40%-45%",
"45%-50%","50%-55%"))
qplot(long, lat, data = choro, group = group, fill = breaks, geom = "polygon",
main = "Percentage of Men Degree Holders with a Major in Science/Engineering") +
scale_fill_brewer(name = "Degree Rates", palette = "Blues")
Taking the ratio of the percentage of women with science and engineering degrees compared to other degrees to the percentage of men with science and engineering degrees, we can see from just one, rather than two, maps how region impacts the relative rates of science and engineering degrees for men and women. Darker states indicate higher ratios women, but the ratio never reaches the value of one. A general trend is that the west and east coasts have higher ratios than the rest of the country. Unlike the first two maps that displayed very similar regional patterns, this ratio map distinctly displays how different parts of the country have different magnitudes of college major gender bias.
choro$breaks <- cut(choro$percentsciengfemale/choro$percentsciengmale,breaks = seq(0.35,0.85,by = 0.05), include.lowest = TRUE,
labels = c("0.35-0.40","0.40-0.45","0.45-0.50","0.50-0.55",
"0.55-0.60","0.60-0.65","0.65-0.70","0.70-0.75","0.75-0.80","0.80-0.85"))
#choro$breaks = cut(choro$percentsciengfemale/choro$percentsciengmale, 6)
qplot(long, lat, data = choro, group = group, fill = breaks, geom = "polygon",
main = "Ratio of Percentages for Women to Men with Major in Science/Engineering") +
scale_fill_brewer(name = "Ratio of Percents",
palette = "Purples")
desired_columns <- c(3,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72) #re-inputting the origional "subset" dataframe not changed by the choropleth merging
desired_rows <- seq(2,53) #all states
subset <- bachelors[desired_rows, desired_columns]
colnames(subset) <- c("State","Total","TotalMale","TotalFemale",
"SciEngTotal", "SciEngMale", "SciEngFemale",
"SciEngRelatedTotal", "SciEngRelatedMale", "SciEngRelatedFemale",
"BusinessTotal","BusinessMale","BusinessFemale",
"EducationTotal","EducationMale","EducationFemale",
"HumanitiesTotal","HumanitiesMale","HumanitiesFemale")
subset$percentSciEngMale <- as.numeric(subset$SciEngMale)/as.numeric(subset$TotalMale)*100 #while this pair has no immediate, direct relationship
subset$percentSciEngFemale <- as.numeric(subset$SciEngFemale)/as.numeric(subset$TotalFemale)*100
subset$SciEngRatio <- subset$percentSciEngFemale/subset$percentSciEngMale
head(subset)
sort(as.numeric(subset$SciEngMale), decreasing = TRUE)[1:10]
subset[which(subset$SciEngMale == "2040615"), "State"]
subset[which(subset$SciEngMale == "1074765"), "State"]
subset[which(subset$SciEngMale == "912146"), "State"]
sort(as.numeric(subset$SciEngFemale), decreasing = TRUE)[1:10]
subset[which(subset$SciEngFemale == "1386950"), "State"]
subset[which(subset$SciEngFemale == "711283"), "State"]
subset[which(subset$SciEngFemale == "645017"), "State"]
By sorting the states with the highest degree count for the sciences, it is possible to indentify which states these counts belong to using the "which" function. California has the most male and female science and engineering major degree holders. New York and Texas also have the next highest counts of both.
sort(as.numeric(subset$percentSciEngMale), decreasing = TRUE)[1:10]
subset[which(subset$percentSciEngMale == "51.9724320009824"), "State"]
subset[which(subset$percentSciEngMale == "50.63696225976"), "State"]
subset[which(subset$percentSciEngMale == "49.9569814248343"), "State"]
sort(as.numeric(subset$percentSciEngFemale), decreasing = TRUE)[1:10]
subset[which(subset$percentSciEngFemale == "41.6149338278437"), "State"]
subset[which(subset$percentSciEngFemale == "33.1419517414361"), "State"]
subset[which(subset$percentSciEngFemale == "32.9437484466618"), "State"]
Washington DC has the highest percentage of male and female science and engineering degree holders. Washington (state) and Maryland have high concentration for men, while Massachusetts and Virginia have high levels for women.
subset[which(subset$percentSciEngMale > 46), "State"]
subset[which(as.numeric(subset$SciEngMale) > 410000), "State"]
order(subset$percentSciEngMale)[42:52]
order(as.numeric(subset$SciEngMale))[42:52]
subset[which(subset$percentSciEngFemale > 29), "State"]
subset[which(as.numeric(subset$SciEngFemale) > 290000), "State"]
order(subset$percentSciEngFemale)[42:52]
order(as.numeric(subset$SciEngFemale))[42:52]
order(subset$SciEngRatio)[42:52]
Looking at the state names and their row entry numbers through the "which" and "order" function, the states with the highest concentrations of science and engineering majors for both men and women do not necessarily match with the states with the highest degree counts. Likewise, the states with the higher female to male percentage ratios do not necessarily match with the states with the highest female percentage values. What these figures indicate is that although there appears to be a relatively constant 3:2 ratio of male to female science and engineering degree totals across the country, the percentages of men and women in the sciences at the state level is not nearly as consistent. Therefore, the level of gender disparity in the sciences depends on the region of the country. Note that the "which" commands are listing states alphabetically, while the order command is listing states with the desired units in ascending order, which is why the row counts of 42 to 52 are used for indexing the largest values.
summary(lm(log(as.numeric(subset$SciEngRatio)) ~ log(as.numeric(subset$SciEngTotal))))
The regression above serves to establish that the ratio map is not strongly coorelated with total number of science and engineering degrees per state. This was an area of concern due to the seemingly similar appearance of the science and engineering major degree count maps and the ratio map. If the ratio map was simply suggesting states with more science and engineering degree holders had higher percentages of women in the sciences compared to the percentage of men in the sciences, this might have indicated nothing more than that more degree holders equates to more equitable percentages. The log was taken of both the dependent variable, the science and engineering major gender ratio, and the independent variable of total number of degrees per state, to determine what a percentage change in the total number of degrees would have on a percentage change in the ratio. Otherwise, the unit differences would have produced meaningless statistics. Having a slope of just 0.03825 indicates that there is almost no relationship between the number of degrees in a state and how much of a gender bias there is in the percentages of men and women majoring in the sciences and engineering. A one percent increase in the number of degrees in a state will only have about a 3.1% change on the ratio.
There is not a lot of literature that analyzes region, college major choice, and gender together in the same context. However, given that preferences are a signficant factor in why women choose not to major in fields like engineering, the next step in incorporating the findings of this presentation is bringing regional influence into the narrative by claiming that certain parts of the country create environments for women to have more positive impressions of the sciences. In his Federal Reserve Bank of New York Staff Report, Basit Zafar found through his econometric model mentioned earlier that "60% of the gender gap in engineering is due to differences in preferences, while 30% is due to differences in how much females and males believe they will enjoy studying engineering" (Zafar 4). There are other explanations offered for why students choose one major over another including a strong emphasis on the connection between major choice and political ideology. One 2006 paper found that "liberal students [are] more likely to choose a non-science major" (Porter 2006). This explanation seems to run counter to the findings that the typically more liberal coastal United States regions have high figures for percentage of science degrees compared to total degrees for both men and women. However, the survey used for Porter and Umbach's paper only tested one highly selective liberal arts college and agknowledged that the results cannot be extrapolated to a larger sample of students from different types of schools. Similarly, the 2015 data examined in this paper is a single snapshot in time of the relationship between gender, major, and region. Further temporal analysis should be considered to determine the changing landscape of gender imbalance in science and engineering major selection.
The complexities of gender gap analysis go beyond data limitations. The chosen scope of the inspection inherently changes the range of possible interpretations. In a study that examined not only gender but also socioeconomic status (SES), Ma found "that women from lower SES backgrounds are as likely as their male counterparts to choose a lucrative college major" and "the role of lucrative college major choice in potentially uplifting students’ and their families’ SES outweighs the traditional gender role socialization that contributes to the divergent career paths toward which men and women are oriented" (228 Ma). In a paper on citizenship status, the author found "a higher propensity to enroll in SEM [Science, Engineering and Math] fields for foreign-born populations and a lower propensity to enroll in social sciences compared to citizens" (Nores 138). In order to completely disagregate all of the possible effects on the gender gap in college major choice, all concievable variables would have to be included in the analysis.
Although the spatial maps and ratios calculated suggest that different parts of the country experience different magnitudes of gender bias in college major choice, the ability to prove geographic cause is not within the scope of this presentation. However, if government policies, educational backgrounds, or cultural differences are attached to region, then the analysis conducted may be a starting point in identifying why varying levels of women across the country are systematically choosing not to go into the sciences or engineering during their undergraduate careers. Additionally, college students do not necessarily come from the same state they study in. State bias might then indicate quality discrepancies in academic institutions in particular states rather than any gender equality differences. Better schools might have more resources for scientific and engineering research. What can be said for certain based on the data of the US Census Bureau American Community Survey is that there does exist a reason why women are not entering the sciences and engineering at the same rate as men and there is at least an indirect relationship between region and level of gender disparity.
Bibliography
Daymont, Thomas N., and Paul J. Andrisani. "Job Preferences, College Major, and the Gender Gap in Earnings." The Journal of Human Resources 19, no. 3 (1984): 408-28. doi:10.2307/145880.
Ma, Yingyi. "Family Socioeconomic Status, Parental Involvement, and College Major Choices—Gender, Race/Ethnic, and Nativity Patterns." Sociological Perspectives 52, no. 2 (2009): 211-34. doi:10.1525/sop.2009.52.2.211.
Nores, Milagros. "Differences in College Major Choice by Citizenship Status." The Annals of the American Academy of Political and Social Science 627 (2010): 125-41. http://www.jstor.org/stable/40607409.
Porter, Stephen R., and Paul D. Umbach. "College Major Choice: An Analysis of Person–Environment Fit." Research in Higher Education 47, no. 4 (2006): 429-49. doi:10.1007/s11162-005-9002-3
United States Census Bureau. (2015). American Community Survey [bachelors.csv]. Retrieved from http://factfinder.census.gov/faces/nav/jsf/pages/index.xhtml
Zafar, Basit. "College Major Choice and the Gender Gap." SSRN Electronic Journal (2013): 1-50. doi:10.2139/ssrn.1348219.