Table Of ContentPROFESSIONAL FILES | SPRING 2015 VOLUME
Supporting quality data and decisions for higher education.
© Copyright 2015, Association for Institutional Research
Letter from the Editor
This volume of Professional Files brings us two articles on very different topics, but with a
common theme—relationships.
Wang introduces Social Network Analysis, a data visualization technique that focuses on the
relationships among cases, instead of just their attributes. Her three examples applying this
technique to common IR study questions will make you want to start building your own SNA
models right away!
Carpenter-Hubin, Sullivan, and Herbers share their experience building relationships with faculty through a collaborative
study of faculty workload and resources. Their insights into how to work as peers respectful of each other’s expertise can
serve as a model for our own research partnerships.
Consider this a reminder to nurture your own relationships with IR colleagues by sharing your work in AIR Professional
Files!
Sincerely,
Sharron Ronco
IN THIS ISSUE...
EDITORS
Article 136 Page 1
Author: Ning Wang Sharron Ronco
Applications of Social Network Analysis in Institutional Research
Coordinating Editor
Marquette University
Article 137 Page 11
Leah Ewing Ross
Authors: Julie Carpenter-Hubin, Jason Sullivan and Joan M. Herbers
Managing Editor
Two Heads Are Better Than One: A Collaboration between Institu-
tional Research and Faculty for a More Meaningful Analysis of the Association for Institutional Research
STEM Faculty Experience
Lisa Gwaltney
Editorial Assistant
Association for Institutional Research
ISSN 2155-7535
PROFESSIONAL FILE
ARTICLE 136
© Copyright 2015, Association for Institutional Research
APPLICATIONS OF SOCIAL NETWORK
ANALYSIS IN INSTITUTIONAL RESEARCH
Ning Wang better served. Traditional descriptive publications, faculty collaboration
and inferential statistics, from simple on research projects, peer influence
About the Author frequencies and cross-tabulations, among students with specific ethnic
Ning Wang is the Director of to the whole family of regressions, or social backgrounds, mentorship
Institutional Research at University of to more-advanced techniques such between faculty members and
California, San Francisco. as survival analysis and structural students, formation of learning
equation modeling, have sufficiently communities among students with
Abstract fulfilled a large part of IR’s analytical shared academic interests, and so forth.
Social network analysis (SNA), with functionality. At the same time, the Relations also extend beyond people:
its distinct perspective on studying large amount of data found in IR for example, majors within a discipline
relations and its exceptional capability and the nature of IR research that are interrelated by overlapping course
to visualize data, should be embraced emphasizes identification of patterns, offerings, colleges and universities are
by institutional researchers as a predictions, and possible interventions, interrelated by students transferring
promising new research methodology coupled with high-capacity software in and out, and states form a network
complementary to inferential and such as SAS, have made exploratory through out-of-state student
exploratory statistics. This article statistics a new frontier in IR. The recent enrollment.
introduces SNA through discussion of interest in data mining and predictive
three analytical studies on topics highly modeling exemplifies this shift. Such networks of relations are
relevant to institutional research (IR): extensive in higher education, but few
(1) double-majors, (2) gatekeeping However, a missing piece of IR analytics studies have addressed their dynamics
courses, and (3) STEM pipeline leaking. is the study of relations. Traditional and implications, partly because of
The unique approach of SNA in statistical methods assume the the methodological limitations of
exploring, analyzing, and presenting observation independence—that is, inferential and exploratory statistics.
data has great potential for advancing they assume that observations of a As Wasserman and Faust (1994) in their
IR’s analytical capacity. study are not related to one another, classic book of social network analysis
but rather can be independently (SNA) stated, “The focus on relations,
INTRODUCTION examined by various internal and and the patterns of relations, requires
external attributes (Chen & Zhu, a set of methods and analytic concepts
Institutional research (IR) professionals
2001). The observations in higher that are distinct from the methods of
frequently adopt new analytical tools
education settings, however, often traditional statistics and data analysis”
and research methodologies. This has
are not independent. The activities (p. 3). The inadequate understanding
allowed more sophisticated studies
of higher education and the people of relations and interactions among
to be carried out that better inform
involved are relational and interactive the various entities in higher education
institutions’ policy making, which leads
in nature. Examples of these activities calls for the addition of network
in the long term to students being
include co-authorship of scholarly analysis into IR’s analytical paradigm.
SPRING 2015 VOLUME | PAGE 1
At the intersection of inferential This article introduces SNA to the IR undirected graph, weight, modularity,
statistics, exploratory statistics, and community. As a well-established and centrality.
network analysis is data visualization— method that has been widely used in
or the representation of data through social sciences, SNA can contribute Borrowed from graph theory, the
graphical means. However, “data a great deal to IR with its unique interconnected objects in SNA
visualization . . . involves more than perspective on relations and its power are represented by mathematical
just representing data in a graphical in visual presentation. The following abstractions called vertices (more
form (instead of using a table). The will (1) introduce basic concepts in commonly called nodes), while the
information behind the data should SNA, (2) present three studies that links that connect some pairs of nodes
also be revealed in a good display; used SNA, and (3) discuss issues key to are called edges. The number of edges
the graphic should aid the readers or successfully applying SNA in IR. incident upon a node is defined as
viewers in seeing the structure in the degree. Typically, a graph is depicted
data” (Chen, Härdle, & Unwin, 2008, SOCIAL NETWORK in diagrammatic form as a set of dots
p. 6). As the well-known statistician for the nodes, joined by lines or curves
ANALYSIS AND ITS
and pioneer in data visualization for the edges. When applied to a study,
Edward Tufte stated, “At their best, BASIC CONCEPTS nodes represent the observations
graphics are instruments for reasoning of a study, and edges represent the
SNA is inherently an interdisciplinary
about quantitative information. . . . Of relations between the observations
endeavor that uses social psychology,
all methods for analyzing and of a study. If the relations are initiated
sociology, statistics, and graph
communicating statistical information, from certain observations to others,
theory. Beginning in the 1970s,
well-designed data graphics are usually the edges would be represented
the empirical study of various
the simplest and at the same time the networks has played an increasingly with arrows from the initiators to the
most powerful” (Tufte, 2001, p. 13). important role in the social sciences. receivers, and the graph would be
Among many of its applications, directed. Conversely, if the relations
Founded on graph theory, network SNA has been used to understand between two observations are mutual,
analysis is exceptionally well developed the diffusion of innovations, the the edge would be represented with
in generating meaningful and communication of news, the spread a line segment connecting the two,
intriguing visual representations of of diseases, the culture and structure and the graph would be undirected.
data. While charts and graphs are of social organizations and business A graph is weighted if a value or a
integral components of inferential and corporations, the formation of political weight is assigned to each edge.
exploratory statistics, graphics is at the views and affiliations, and so forth Depending on the problem at hand,
heart of network analysis. It is the way (Carrington, Scott, & Wasserman, such weights might represent a diverse
that an underlying network structure 2005). More recently SNA has gained set of attributes of the relationship
can be uncovered, while at the significant use in studying online (Hanneman & Riddle, 2005).
same time providing the vocabulary communities and social media such as
through which network properties Facebook and Twitter. For demonstration, Figure 1 is
can be described. At a time when a weighted undirected graph
effective communication of findings The complicated mathematical representing a hypothetical network
to institutions’ administrators and background of SNA is beyond the of faculty collaboration. Nodes 1–10
other constituents is more important scope of this article. However, it would are faculty members. Edges exist
than ever to further data-driven be helpful to explain in simple terms between those who collaborated
and research-based policy making, several basic yet essential concepts on grant proposals, and weights on
network analysis, with its expertise in used in the examples of analytical the edges denote the number of
data visualization, can be especially works described in this paper: vertice grant proposals that the two faculty
beneficial to IR. or node, edge, degree, directed and members submitted together. As seen
PAGE 2 | SPRING 2015 VOLUME
degree, closeness, and betweenness.
Degree centrality is defined as the
number of edges that a node has.
The nodes having higher degrees are
related to other nodes, and therefore
are at positions in the network that
are more central. Closeness centrality
emphasizes the distance of a node
to all other nodes in the network.
Betweenness centrality focuses on the
position of a node between pairs of
nodes. The higher betweenness of a
node means more nodes depend on it
to make connections with other nodes.
Centrality can be evaluated with a set
of statistics, such as Freeman Degree
Centrality, Geodesic Path Distances,
Eigenvector Centrality, Hierarchical
Reduction, and so forth (Hanneman
& Riddle, 2005). This article does not
attempt to elaborate on details of these
statistics; the readers are encouraged
to obtain more information (e.g.,
Carrington et al., 2005; Chen et al.,
Figure 1. Demonstration of Basic Concepts in Social Network Analysis
2008; Tufte, 1990, 2001; Wasserman
& Faust, 1994). The output of the
in the graph, Faculty 2 worked with measure is to compute the difference above-mentioned statistics for the
Faculty 1 once, with Faculty 5 once, between the number of edges falling hypothetical network in Figure 1 is
and with Faculty 3 three times on within groups and the expected provided in Table 1 (next page).
grant proposals; the node representing number of edges in an equivalent
Faculty 2, therefore, has a degree of network where edges are placed at For SNA, however, the statistics are
three and a weighted degree of five. random (Newman & Girvan, 2004). often not the end product. Unlike
Large differences would indicate nodes inferential and exploratory statistics,
Modularity is one important measure being densely interconnected while the graphs in SNA are at the core
of the network structure. It divides being only sparsely connected with the of explaining and understanding
a network into modules, also called rest of the network—in other words, findings, as the relational statistics
groups, clusters, or communities. forming modules. Network analysis are incorporated into graphs through
Networks possessing community software can generate this measure the visualization process. Figure
structures function differently from and partition the network by its 1 shows two modules; Module A,
average networks, so identification underlying community structures. consisting of faculty members 1
of such community structures can through5 and faculty member 10,
have substantial importance in Centrality is another important and Module B, consisting of faculty
understanding the dynamics and measure, examining the relative members 6 through9. Members of
properties of the network. The importance of a node within a graph. each module worked more frequently
mathematical idea of the modularity There are three main types of centrality: within rather than across the modules.
SPRING 2015 VOLUME | PAGE 3
Table 1. Demonstration of Basic Relational Statistics Output in Social Network Analysis
Id Label Modularity Degree Weighted Closeness Betweenness Eigenvector
Class Degree Centrality Centrality Centrality
1 Faculty 1 0 2 3 0.47 0.00 0.44
2 Faculty 2 0 3 5 0.50 0.01 0.57
3 Faculty 3 0 3 6 0.53 0.03 0.59
4 Faculty 4 0 3 4 0.53 0.22 0.49
5 Faculty 5 0 6 7 0.75 0.65 1.00
6 Faculty 6 1 4 6 0.60 0.18 0.71
7 Faculty 7 1 4 7 0.60 0.18 0.71
8 Faculty 8 1 2 3 0.41 0.00 0.40
9 Faculty 9 1 2 4 0.41 0.00 0.40
10 Faculty 10 0 1 1 0.36 0.00 0.14
Faculty 5 worked mainly with faculty Degree of 7), as a good collaborator failed before dropping out of the
1 through4, but also worked once with all other researchers by the high institution; and (3) a study of STEM
with faculty6 and once with faculty closeness centrality (shown in Table (science, technology, engineering,
7, thus bridging the two modules. 1 as 0.75), and as the key person for and mathematics) pipeline leaking
A closer look at the departmental promoting interdisciplinarity between that examined students who started
affiliation shows that faculty in Module the two fields by the high betweenness in STEM majors but subsequently
A are from the biology department, centrality (shown in Table 1 as 0.65). graduated in non-STEM majors.
and faculty in Module B are from the
psychology department. Faculty5, APPLICATION OF The three studies were conducted
a professor in biology, has research using the open source software Gephi
SOCIAL NETWORK
interests in neuroscience and has (http://gephi.org). As a tool specifically
actively collaborated with professors in ANALYSIS IN THREE developed for network analysis, Gephi
psychology. Faculty 10 is a statistician STUDIES has at its core a set of algorithms, called
from the mathematics department layouts, that detect and generate
This section will describe the
who built a collegial relationship with graphical representations of network
application of SNA through three
Faculty 4 and who was once asked to structures. The layout ForceAtlas, for
examples of small-scale analytical work:
work with him on a grant. example, probably the most used
(1) a study of double-majors that used
force-directed layout, simulates
the modularity measure of SNA to
It can also be observed that Faculty 5 is a physical system in which nodes
reveal the connectivity among majors
at the center of the network in all three repulse each other like magnets, while
that can inform student advising;
centrality measurements. Faculty 5 is edges attract their nodes like springs.
(2) a study of gatekeeping courses
identified as an active researcher in the These forces create a movement that
that used the measure of centrality to
two fields of biology and psychology eventually converges to a balanced
identify major-specific and general-
by the high degree centrality (shown state of spatialization of the nodes
education courses that students
in Table 1 as Degree of 6 and Weighted and edges, revealing the structure and
PAGE 4 | SPRING 2015 VOLUME
features of the network. Layouts have
their specialties that suit networks of
different sizes and emphasize different
features. Layouts such as ForceAtlas2
and OpenOrd work with big networks,
Circular and Radial Axis emphasize
ranking, and GeoLayout uses latitude/
longitude coordinates to visualize
geographical networks.
The software also provides calculations
of relational statistics unique to
network analysis. Measures for
modularity and centrality, among
other statistics, can be generated with
relative ease. The statistics can then
be used in visualization; for example,
the computed modularity allows
partitioning of nodes into groups and
reveals the community structure of
the network. The statistics can also be
Figure 2. Double-Major Combinations of Bachelor’s Degree Recipients
saved into the data set and used in
other statistical analysis; for example,
the eigenvalue for centrality of each Five years (2009–13) of undergraduate employed. Three areas of study
observation can be a new predictive degree data were compiled to ensure appeared prominently in the graph
variable in a regression model. adequate sample size and to minimize where double-majors concentrated—
fluctuations over the years. The data economics/business, arts/humanities,
Graphs generated through Gephi are file contained majors, combinations and biological sciences/psychology.
the main tool used to present findings of double-majors, and the number Four free-standing yet strongly tied
of the three studies. Main features of students awarded degrees in each pairs of majors were also identified—
are shown, while detailed institution- double-major. After applying the international affairs/political science,
specific figures that could have been layout algorithm of ForceAtlas, the housing/consumer economics, exercise
shown as labels accompanying the partitioning based on the statistics and sport science/athletic training, and
nodes and edges are removed from of modularity, and the filtering that consumer foods/dietetics. Clustering
the graphs. eliminated majors with fewer than of majors into groups provides an
five students graduating with double- empirical verification that double-
Study 1: Double-Majors majors every year over the study majors occur most often within
Many college students concurrently period, a network structure emerged disciplines where connectivity between
pursue studies in two or more majors. with more than 1,500 baccalaureate course offerings, degree requirements,
Faculty and student advisors may graduates in two of the approximately and administrative procedures
anecdotally know some of the popular 40 or (Figure 2). facilitates the pursuit of double-majors.
combinations of majors in their The font size of the major titles is
discipline; IR analysts, however, would Majors clustered into groups based on proportionate to the weighted degree
want to approach the phenomenon of their connections with one another of the major—that is, the number
double-major with empirical evidence. after the modularity measure was of students in this major who also
SPRING 2015 VOLUME | PAGE 5
graduated with a degree in another
major. It can be seen that finance,
psychology, biology, international
business, and economics had the most
students graduating with double-
majors. The thickness of the edges
is proportionate to the number of
students taking on the corresponding
pair of majors. It is then observed that
over the five-year period psychology/
biology, housing/consumer economics,
international affairs/political science,
finance/international business, and
finance/economics were the top
five most popular double-major
combinations.
As Edward Tufte (2001) stated, “Modern
data graphics can do much more than
simply substitute for small statistical
tables” (p. 9). The visual presentation
in Figure 2 of the double-major data
not only conveys information in a
more coherent and succinct fashion
than a tabular presentation, but also
reveals the data at multiple levels not
conveniently available in table form. It
provides a broad overview of the areas
of study within which double-majors
tend to form, as well as the details of
specific majors and major combinations.
Figure 3. Failing Courses and Last Major of Undergraduate Dropouts
As groupings of majors surface through
the modularity measure of SNA, more
insights emerge. . These patterns
Study 2: Gatekeeping Courses This study tracked students from four
of double-majors that graduates
Entry-level gatekeeping courses have first-time, full-time freshmen cohorts
have successfully followed can serve
been known to pose challenges to (Fall 2004–Fall 2007) to identify
as evidence for student advisors
students and to potentially lead to dropouts—those who had neither
in their discussions with students
attrition, particularly in STEM majors. graduated nor remained enrolled six
contemplating taking on another major
It is very important for institutions years after their initial matriculation.
of study. University administrators
focused on retaining and engaging For those dropouts who had failing
might want to strengthen existing
students to help those students grades on record, the failed courses
partnerships or explore new linkages
succeed in courses that most frequently and the majors that they last enrolled
between majors to enrich students’
serve as gatekeepers. Identification of in before leaving the institution were
educational experiences and promote
these courses is inevitably the first step. compiled. Over 1,500 students from 17
their future employability.
PAGE 6 | SPRING 2015 VOLUME
majors with 42 potential gatekeeping centrality in this course-major network. graph. Perhaps factors like a large-
courses were included in the study. Failing of certain major-specific courses lecture form of pedagogy, one-way
was potentially related to dropping passive learning, or an emphasis
Figure 3 is the visual representation out of these majors. For example, on memorization over critical
of relations between failed courses, two foundation courses in computer thinking, might have contributed
indicated by green nodes, and last science, Systems Programming to the students’ failings. Strategies
majors, indicated by red nodes. Plotting [CSCI1730] and Discrete Mathematics could then be developed to engage
was based on the degree centrality for Computer Science [CSCI2610], both the faculty and the students
of the majors in this course-major were probably weeding out students. to change these gatekeepers into
network. The star network at the center An introductory accounting course, gateways of student success. The
of the graph made it clear that most Principles of Accounting I [ACCT2101] department head of biology might
of the dropouts left the institution and an introductory economics learn from the graph that for students
with an unspecified major—in other course, Principles of Macroeconomics intending to major in biology,
words, they left early in their college [ECON2105], were stumbling blocks Freshman Chemistry I (CHEM1211)
life before declaring a major—and for some students in prebusiness. The and II (CHEM1212) together with
the many courses surrounding the introductory statistics course [STAT2000] Principles of Biology I (BIOL1107) were
unspecified major were the failed might have been a source of struggle for the most challenging courses, and
courses that could be potential some students with sociology, speech that for students who succeeded in
hurdles to student retention. Among communication, international affairs, these courses and officially enrolled
them, five introductory courses— and psychology majors. in biology as a major, the next set of
Precalculus (MATH1113), American courses in the sequence—Modern
Government (POLS1101), Elementary One of the principles that Tufte (1990) Organic Chemistry I (CHEM2211)
Psychology (PSYC1101), Freshman suggested for the good practice of and II (CHEM2212), and Principles
Chemistry I (CHEM1211), and Basic statistical graphics is “enhancing of Biology II (BIOL1108)—were road
Concepts in Biology (BIOL1103)—have the dimensionality and density of blocks. A long-term plan focusing on
prominent edges in the graph. The portrayals of information” (p. 9). building a solid foundation for further
thickness of the edges between these Figure 3 combined three dimensions study in this major may be needed.
courses and the unspecified major of information—the gatekeeping Curriculum and pedagogy designed
is proportionate to the number of courses, the majors that lost students, with intentional sequencing may help
students with an unspecified major and the relationship between majors ensure adequate preparation and
who failed these courses. Furthermore, and courses—in one graph, while smooth transition of students for each
these five courses were actually the the same information in tabular form section of the course sequence.
most challenging for students from would have been cumbersome and
all majors, as indicated by the size lacked clarity. Instead of providing Study 3: STEM Pipeline Leaking
of their title in the figure. The size is an isolated view of students and Government, educators, and industry
proportionate to the total number courses confined to a specific major, leaders have long been concerned
of students who failed these courses Figure 3 allows examination of more about STEM pipeline leaking, where
regardless of their major. comprehensive course-taking patterns students depart from academic and
across majors. More importantly, career paths in science, technology,
The university also lost students in the the graph vividly points to possible engineering, and mathmatics.
other red-node majors—computer directions for further investigation According to the BusinessHigher
science, prebusiness, psychology, and action. University administrators Education Forum (2010), only 4 percent
biology, and so forth. These majors might want to evaluate teaching of the 4 million ninth-graders in the
are located on the periphery of the and learning in the five introductory United States in 2001 would be STEM
graph because of their relatively low courses revealed as gatekeepers in the college graduates by 2011. This study
SPRING 2015 VOLUME | PAGE 7
attempted to revealan aspect of the
leakage along the STEM pipelineby
identifying undergraduate students
in STEM majors who changed their
academic pursuit to non-STEM majors.
Students from five first-time full-time
freshmen cohorts (Fall 2003–Fall 2007)
were tracked through fiscal year 2013
for bachelor’s degree attainment.
Those who first declared a major in
STEM (based on the National Science
Foundation definition) and later
graduated in non-STEM majors, and
whose major GPA was 3.0 or above
when leaving STEM, constituted the
group for this study.
A directed graph using the Circular Figure 4. STEM Major Students Graduating in Non-STEM Majors
Layout was built to show the migration
between majors. For focus and clarity,
only STEM majors with ten or more in STEM majors at the institution is science in family and consumer
students in the five freshmen cohorts illuminated. The graph does not intend sciences; and so on.. Certificate
who later graduated in non-STEM to address the many facets of the programs can be another option—for
majors were retained. The results in issue, but rather to show the non- example, a certificate program in
Figure 4 represent about 800 students STEM destinations for STEM majors science journalism could be an option
in eight starting STEM majors who who were in solid academic standing for students in biology or chemistry
graduated in ten non-STEM fields. The in their STEM major. These students who are also interested in journalism;
blue nodes on the left side represent might intend to pursue postgraduate a program in math education could be
starting STEM majors, sorted and sized professional programs, or plan for an option for mathematics students
by the number of students leaving for careers other than basic research, or who have an interest in education; or a
any non-STEM major. The yellow nodes simply want to explore studies beyond program in health promotion could be
on the right side represent ending non- STEM. Instead of a divisive view of a good fit for biology students aspiring
STEM majors classified into disciplines STEM versus non-STEM, the linkages in to a career in health professions. If the
by the first two digits of the major CIP the graph present an opportunity for demanding workload of a STEM major
code, sorted and sized by the number cooperation between the two fields. prohibits formal pursuit of another
of students transferring in from all area of study, an area of emphasis
STEM majors. The thickness of the edge A major-minor partnership can be that blends in courses from a relevant
between two nodes is proportionate one way to bridge the two fields. non-STEM major may meet students’
to the number of students changing Possibilities exist for interdisciplinary needs. Other possibilities may include
majors. collaboration between computer joint projects or the incorporation of
science and management information governmental, societal, or cultural
Figure 4 is mainly descriptive. By systems in business; mathematics and implications of science and technology
mapping the migration of students, econometrics or finance in business; into the teaching of STEM.
the status of retention and persistence biology and dietetics study or nutrition
PAGE 8 | SPRING 2015 VOLUME