2025.01.17 edition

Developer Insights on GitHub Pull Requests: Survey Analysis, Key Findings, and Methodologies.

NB: This article is an edited excerpt from a technical report originally published during my graduate studies in April 2024.

Postscript of a thesis. As the concluding segment of my master's research journey, an exploratory survey was conducted (by me!) to gather software developers’ and engineers’ perspectives on pull request (PR) review processes and the quality of these reviews on GitHub. This effort aimed to understand developers' methodologies during PR reviews, the criteria prioritized when evaluating a submitted PR, and the factors influencing the outcome and merge-time of PRs. The survey responses were subsequently curated and published as a survey response dataset, serving as the foundation for this analysis and visualizations.

This article not only explores the key findings but also demonstrates a systematic approach to analyzing survey data to extract meaningful themes and insights. It examines the methodologies used to process the survey dataset, highlighting the structured and methodical approach necessary for identifying complex patterns in developer behavior. Through various analytical techniques, we aim to uncover significant trends and draw conclusions that can inform best practices and improve the GitHub pull request review process.

In the following sections, we will briefly describe the survey and the nature of the responses before presenting the analysis framework. We will then discuss the key themes, providing visualizations, tables, insights, and summaries to enhance the PR review experience based on the findings.

The survey was carefully structured into three distinct sections. The initial section focused on the participants’ demographic and professional background, featuring six primary questions and an optional seventh question. Prioritizing participant confidentiality, the survey was designed to safeguard anonymity! The second section transitioned to questions on PR factors and review practices. It included two multiple-choice queries and two Likert-scale questions, providing structured insights into participants’ approaches and preferences. The third and final section encouraged detailed responses through two open-ended questions, offering participants an opportunity to elaborate on their PR review experiences and techniques.

The survey was live from May 3, 2023, to May 13, 2023, spanning a total of 10 days. Recognizing the challenges some participants faced in adhering to the timeframe, the deadline was extended for a select few—four participants, to be precise. This extension led to the inclusion of three additional responses beyond the original window. In total, 22 completed responses were collected. Although the response volume might appear modest, it reflects a commendable rate of 34% (22 out of 65 genuinely interested participants!), far exceeding the established minimum response rate of 10% in similar studies [1].

A tabulated breakdown highlighting the nature and type of questions incorporated within the survey is outlined in Table 1.

Question Description Type Additional Details
Demographic Information:
- Role, Tenure in Role
- Years in Geographically Distributed Projects
- Region of Residence
- Average Monthly PRs Reviewed (Workload)
Select the type of merge that you use predominantly Multiple Choice Question Options:
(i) GitHub Merge
(ii) Squashing
(iii) Cherry-picking
Rate the factors that affect the outcome of the PR review process Likert-scale - Strongly Agree
- Agree
- Undecided
- Disagree
- Strongly Disagree
During the review process, where do you most commonly provide the comments? Multiple Choice Question Options:
(i) On PR itself
(ii) On PR Code/In-line Code
(iii) In Individual Commits
What steps do you follow when asked to review a PR?
What factors do you use to examine the quality of the contributions?
Open-ended Question
Table 1: Survey Questions and Response Types for GitHub PR Review Process Study


Analysis Framework

To provide a systematic understanding of the survey results, this section outlines the methodologies employed for analyzing the various types of questions included in the survey. The approaches were tailored to each question type, ensuring accurate and insightful interpretations of the data. By applying structured methods and visualization techniques, the findings were thoroughly examined to identify key patterns and potential themes.

Demographic Questions. The responses in the first section, focusing on participants' backgrounds and work habits, were analyzed through various visualizations. These representations provided insights into their professional experience, roles in the software industry, and typical workloads, offering a comprehensive view of the diversity among respondents.

Likert-scale Questions. In the second section of the survey, two Likert-scale questions were used to assess developers’ perceptions of the influence of various PR factors on PR review outcomes and merge times. A 5-point Likert scale, ranging from Strongly Agree to Strongly Disagree, measured the perceived significance of each PR factor. Selecting Strongly Agree for a factor indicated that respondents perceived it as having a significant impact, while Strongly Disagree reflected the opposite. To quantify these perceptions, a weighted average for each PR factor was calculated by assigning numerical values to each response:

mathematical formula

Typically, w5 = 5, w4 = 4, w3 = 3, w2 = 2, w1 = 1. The number of responses corresponding to each option is represented as:

mathematical formula

The weighted average for any PR factor, X, is calculated as:

mathematical formula

Here, R represents the total number of responses, calculated as R = r5 + r4 + r3 + r2 + r1.

The representative response or the effective response for PR factor X corresponds to the Likert-scale option whose weight is closest to weighted_avgX. These weighted averages were visualized to provide a comprehensive overview, with detailed interpretations presented in the subsequent section.

Open-ended Questions. In the third section of the survey, which comprised two open-ended questions, the grounded theory methodology [2] was employed to analyze and interpret the qualitative data. This systematic approach, commonly used in social sciences, builds theory inductively from data, ensuring findings reflect participants' authentic experiences and perspectives. The methodology involved the following steps:


Geographical and Professional Distribution of Participants:

The survey drew participants from various parts of the world, reflecting a diverse range of backgrounds and experiences. A breakdown reveals:

  1. Asia: The dominant contributor, with 13 participants from the region.
  2. North America: A total of 9 participants represented this region.

This distribution aligns with my recent relocation from an Asian country to Canada and the predominant use of LinkedIn for participant recruitment, making it logical to observe a significant number of responses from Asia and North America.

Examining the professional roles of participants highlights two different job titles:

  1. Variety in Job Titles: Among the 22 participants, there was a diverse range of roles, encompassing 13 distinct job titles. This underscores the varied specializations within the software development field.
  2. Classification of Roles: Given the primary objective of targeting software developers and engineers, a manual review of job titles was conducted. This revealed:
    • Software Developer: 4 participants aligned closely with this designation.
    • Software Engineer: The roles of 18 participants fell under the broad category of ‘Software Engineer,’ reflecting its overarching nature in the software development domain.

Analyzing Developers' Workload and Experience Patterns

Developers’ workloads were initially evaluated using a scatter chart that compares participants against their PR workload (PRs reviewed), as illustrated in Fig. 1. The dataset—though limited to 22 responses—revealed an outlier with one participant managing an average of 100 PRs monthly!

a scatter graphFigure 1: Scatter Plot of Participants vs. Mean Monthly PR Workload with Average and Outlier Reference Lines

To better interpret the distribution, three reference lines were added to the scatter plot: the mean workload and two lines representing one standard deviation above and below the mean. In a typical normal distribution, most data points fall within this range, with points outside considered potential outliers.

To explore the relationship between a developer’s experience and PR workload, additional analysis was performed, as depicted in Fig. 2. The analysis involved (i) defining parameters for comparison (tenure in current roles and experience with geographically distributed projects), (ii) visualizing these metrics in a scatter plot, and (iii) color-coding points based on participants’ broader role designations. The resulting plot revealed no clear correlation, suggesting that PR workloads vary widely, unaffected by participants’ professional experience or their involvement in distributed projects.

a scatter graphFigure 2: Comparative Scatter Plot of Experience Level in Current Role and Experience in Distributed Projects against Mean Monthly PRs (Workload), Categorized by Role - Excluding the Outlier

Participants were further grouped by years of experience to analyze workload trends. This process involved:

  1. Segmentation Based on Experience: Participants were classified into two-year experience intervals.
  2. Average Workload Computation: The average PRs handled per month were calculated for each segment.
  3. Visualization with a Bar Chart: A bar chart was created to display these averages (Fig. 3). The outlier participant was excluded to ensure data integrity!

The analysis revealed that developers with 0–2 years of experience had the highest average workload at 22 PRs per month, followed by those with 2–4 years of experience averaging 20 PRs. Participants with 4–6 years of experience reported a reduced average of 15 PRs, while the most experienced group (10–12 years) reported the lowest average of 5 PRs! It is important to note that only one participant fell into the 10–12 year bracket, warranting caution in drawing broad conclusions from this datapoint.

a scatter graphFigure 3: Bar Chart of Average Mean Monthly PR Workload by Experience Bins (2-Year Intervals) in Current Role, Excluding Outlier

Summary. Developers with less experience (0–4 years) handle higher PR review workloads, whereas those with greater experience tend to report lower workloads.


Developer Preferences for PR Merges and Review Interaction

In the pull-based development workflow, developers have significant flexibility in handling PRs. They may integrate only a specific portion of a PR that benefits the project or merge the entire PR. When merging an entire PR, they face another decision: retain the original commits from the PR, which preserves a detailed historical record, or squash these commits into a single commit. Squashing ensures a more streamlined and tidy commit history on the main branch.

Understanding developers’ preferences for these merging techniques provides insights into their workflow priorities—whether they emphasize historical traceability or prefer a cleaner commit history. To investigate this, the survey included a multiple-choice question on merging strategies. As shown in Fig. 4, 13 out of 22 respondents primarily relied on the built-in GitHub Merge feature for their PRs. The Squashing method was favored by 8 participants, while Cherry-picking was the least utilized, with just 1 participant selecting this option.

a pie chartFigure 4: Predominant PR Merge Types Selected by Participants

Comments are a crucial component of the PR review process, functioning as a key channel for feedback, suggestions, and fostering discussions among developers. Identifying where developers prefer to place comments can provide valuable insights into their interaction habits and focus areas during PR reviews.

a pie chartFigure 5: Participants’ Preferred Locations for Providing Comments During the PR Review Process (No Responses for ‘In Individual Commits’ Option)

To explore this aspect, the survey included a multiple-choice question on preferred commenting locations during PR evaluations. The results, illustrated in Fig. 5, highlight a clear preference: 77.3% of respondents predominantly used in-line code comments for providing feedback. Another 22.7% preferred commenting on the PR itself. Notably, none of the participants opted for commenting within individual commits, suggesting that this approach is either less favored or less familiar among the surveyed group.

Summary. Developers mainly use GitHub Merge and Squash for PR integration, with minimal reliance on Cherry-picking. In-line code comments dominate PR reviews, followed by general PR comments, while commenting on individual commits is rare.


Perceptions of Influential Factors for PR Outcome

To identify key factors influencing the outcome of a PR review, a Likert-scale question was employed. Participants rated their agreement with various factors, and values were assigned to the five response options (Strongly Agree, Agree, Undecided, Disagree, Strongly Disagree) to compute weighted averages, capturing the overall sentiment for each factor.

a pie chartFigure 6: PR Outcome Factors - Effective Response Analysis

Figure 6 highlights factors considered significant for PR outcomes, including Test Inclusion (weighted average: 4.36), PR Reviewer Experience (4.27), Technical Debt (4.18), PR Size in Lines of Code (LOC) (4.14), Quality of PR Description (4.09), and CI Build Status (4.09). These factors consistently elicited an Agree response as the predominant sentiment.

Notably, no factor received a predominant response of Disagree, Strongly Disagree, or Strongly Agree. Instead, responses largely clustered around the Agree and Undecided scales.

A closer examination of response distributions across Strongly Agree, Agree, Disagree, and Strongly Disagree, as shown in Fig. 7, reveals nuanced perceptions. For factors like Test Inclusion and Quality of PR Description, most participants selected Strongly Agree or Agree. Conversely, factors such as PR Size, PR Reviewer Experience, Source Churn, and Technical Debt elicited minimal dissent, with only 1 or 2 responses falling under Disagree. When considering the Undecided scale, factors like Source Churn and Quality of PR Description showed relatively higher ambiguity, with 4 responses each marked as Undecided. This divergence suggests a more variable perception of Source Churn's influence compared to other factors.

These findings underscore the perceived importance of Test Inclusion, Quality of PR Description, PR Size, PR Reviewer Experience, and Technical Debt in shaping PR review outcomes. They highlight the multifaceted nature of PR assessments and the necessity for evaluation criteria that integrate both quantitative and qualitative metrics.

a pie chartFigure 7: Distribution of Participant Responses on Influential Factors to PR Outcome: A Contrast Across Strongly Agree, Agree, Disagree, and Strongly Disagree Scales

Summary. Developer’s perception of most influential factors for PR outcome: Test Inclusion, PR Reviewer Experience, Technical Debt, PR Size in LOC, Quality of PR Description, and CI Build Status.


Perceptions of Influential Factors for PR Merge Time

A 5-point Likert scale was employed again to identify the factors perceived to influence PR merge time. Values were assigned to each scale point, and a weighted average was calculated for every surveyed factor. This method enabled the determination of the effective response for all factors.

a pie chartFigure 8: PR Merge Time Factors: Effective Response Analysis

As shown in Fig. 8, PR Reviewer Workload emerged as the most significant factor, achieving a weighted average of 4.5, indicative of a Strongly Agree response. Other notable factors included PR Size (weighted average: 4.27), Test Inclusion (4.05), and PR Reviewer Experience (4.05), all of which aligned with an effective response of Agree.

No surveyed factor elicited a predominant response of Disagree or Strongly Disagree. Most responses were distributed between Undecided and Agree, with PR Reviewer Workload standing out as the sole factor achieving a Strongly Agree consensus.

Further analysis of response distributions across Strongly Agree, Agree, Disagree, and Strongly Disagree scales, as illustrated in Fig. 9, revealed that PR Reviewer Workload and PR Size received unanimous agreement, with no responses in the Disagree or Strongly Disagree categories. Test Inclusion and CI Build Status followed closely, with each receiving 1 Disagree response. Notably, PR Reviewer Experience garnered 1 response in the Strongly Disagree category.

Incorporating the Undecided responses provided additional nuance. PR Size and Test Inclusion each recorded 2 Undecided responses, while CI Build Status had 4. Including the Undecided responses highlights the nuanced nature of these factors. While overall agreement was observed, the presence of Undecided responses reflects varying levels of uncertainty. These findings suggest that PR Reviewer Workload, PR Size, PR Reviewer Experience, and Test Inclusion are widely regarded as influential to PR merge time. The distribution of responses highlights the intricate interplay of these factors and their varying degrees of perceived importance.

a pie chartFigure 9: Distribution of Participant Responses on Influential Factors to PR Merge Time: A Contrast Across Strongly Agree, Agree, Disagree, and Strongly Disagree Scales

Summary. Developer’s perception of most influential factors for PR merge time: PR Reviewer Workload, PR Size in LOC, Test Inclusion, and PR Reviewer Experience.


Analysis and Thematic Interpretation of the PR Review Process

For the open-ended question regarding participants’ procedures in reviewing PRs, responses underwent manual examination to correct minor spelling errors. Specifically, “coment” was corrected to “comment” in two instances, and “reviewr” was amended to “reviewer” in one instance. Following these corrections, a manual coding methodology was applied to analyze and distill the responses. Using the grounded theory method, the analysis was conducted in three stages—ultimately identifying the overarching themes detailed in Table 2:

Open Coding. Individual ideas, insights, or perspectives were identified and labeled with specific codes.

Axial Coding. Relationships between individual codes were examined, grouping them into broader themes or categories.

Theme Finalization. Each theme was explored in greater depth to fully understand its nuances. The finalized themes are as follows:

  1. Understanding and Analyzing Changes: Participants emphasized thoroughly understanding the proposed changes within a PR. This process involves carefully reviewing the PR title, description, and file changes to assess technical details and verify alignment with project requirements, scope, and objectives. Such an approach fosters a detailed understanding, forming the basis for a comprehensive and informed review.
  2. PR Documentation Standards and Conventions: Adherence to commit conventions and documentation standards was highlighted as essential for maintaining consistency across the codebase. Participants stressed the importance of clear, standardized supporting documentation to ensure accessibility for all stakeholders.
  3. Collaboration and Communication: The role of teamwork in the PR review process was underscored, with participants noting the importance of clear communication. In-line comments and attachments, such as files and test results, were used to provide evidence-based feedback and facilitate constructive discussions.
  4. Coding Conventions and Organization: Attention to coding structure and standardization was deemed crucial for avoiding technical debt and ensuring maintainable, coherent code. Adherence to predefined conventions, such as consistent variable naming and style guidelines, was emphasized.
  5. Code Quality Assurance, Optimization, and Reusability: Participants highlighted examining code design, complexity, and efficiency. Principles like “Don’t Repeat Yourself” (DRY), optimization, and reusability were frequently mentioned as vital for maintaining high-quality, maintainable code.
  6. External Resources Evaluation: Careful analysis of external tools, libraries, and tests in a PR was identified as a key step. Participants focused on ensuring these resources met project standards without introducing unforeseen complexities.
  7. Testing and Verification: Comprehensive testing and verification were integral to ensuring the validity and security of changes. Participants stressed the importance of reviewing components, test results, and security checks while verifying successful builds and tests.
  8. Review Process Formalization: The use of structured, in-built review processes was highlighted, ensuring evaluations were consistent and aligned with predefined criteria. This systematic approach promotes efficiency and alignment with project goals in responses referencing the existence of an “in-built review process”.

Initial Codes Axial Coding Final Themes
• Read PR title/description
• Check feature/ticket description
• Review file changes only
• Commit conventions
• PR Documentation
• Understanding Changes
• Understanding Requirements
• Understanding Scope and Objective
• Understanding and Analyzing Changes
• PR Documentation Standards and Conventions
• Approve/reject changes
• Comment on code
• In-line comments
• Attach files, images in PR, test results
• Feedback and Decision Making
• Comments and Communication
• Supporting Materials
• Collaboration and Communication
• Coding Conventions
• Technical Debt
• Variable naming conventions
• Ensure coding style and guidelines
• Coding Standards
• Coding Structuring
• Coding Conventions and Organization
• Check code design
• Check complexity
• Check for optimization
• Check if unnecessary code
• Check code quality
• DRY principle
• Code reusability
• Code maintainability
• Complexity Analysis
• Naming and Design Consistency
• Code Quality Checks
• Code Efficiency and Optimization
• Code Maintainability and Reusability
• Code Quality Assurance, Optimization and Reusability
• Checking tests, libraries used • Libraries and Tools Evaluation • External Resources Evaluation
• Review components, test results
• Security checks
• Test changes
• Verify all changes
• Check build and tests passed
• Testing
• Verification
• Build and Test Validation
• Security and Compliance
• Testing and Verification
• In-built review process • Specific Review Strategies • Review Process Formalization
Table 2: Analysis of Steps Taken in PR Reviews: Initial Codes, Axial Codes, and Final Themes

Summary: The analysis of PR review practices reveals a methodical and detailed approach. Key aspects include a comprehensive understanding of changes, adherence to documentation and coding standards, collaboration, quality assurance, resource evaluation, rigorous testing, and structured review protocols.
These elements collectively support high code quality, project alignment, and a collaborative development environment.


Analysis and Thematic Interpretation of Factors to Ascertain Quality of PRs

For the open-ended question regarding the factors used by participants to assess the quality of PRs, individual responses were first reviewed for spelling errors. One correction was made, changing “readablity” to “readability.” Following this, a manual coding methodology was applied to analyze and summarize the responses. Using the grounded theory method, the analysis proceeded through three stages, detailed below and summarized in Table 3:

Open Coding. Individual ideas, insights, or perspectives were identified and labeled with specific codes.

Axial Coding. The relationships between the individual codes were examined and grouped into broader themes or categories.

Theme Finalization. The themes were refined to capture their nuances fully as follows:

  1. Coding Conventions and Readability: Adherence to established coding conventions and ensuring code readability were emphasized. Participants highlighted the importance of consistent naming, styling, and organized formatting. Readable code fosters a shared understanding among team members and simplifies future maintenance. These aspects promote quality and collaboration within development projects.
  2. Code Quality and Optimization: Code quality and efficiency were critical considerations. Participants assessed correctness through testing, ensuring alignment with project goals and absence of errors. Performance optimization was another priority, involving the use of efficient library functions, adherence to best practices, and avoidance of unnecessary complexity. Underscores a balance between writing precise, functional code and performance enhancement through optimization.
  3. Code Reusability and Maintainability: In the responses, this theme emerged as a significant factor in evaluating PR quality. Participants valued code that could serve multiple purposes within the project or across future projects. Maintainability, defined as the ease of understanding and modifying code, was equally prioritized. Together, these principles facilitate long-term project scalability and adaptability.
  4. Contributor Role and Experience: Participants also considered the contributor’s role, experience level, and historical contributions. The perceived quality of a PR often reflects trust in the contributor, built through consistent, high-quality submissions. This theme highlights the interplay between technical evaluation and the contributor's credibility and experience!
  5. PR Documentation, Size, and Comment Conventions: The clarity of PR documentation, inline comments, and appropriate PR size were recurring factors. Participants valued comprehensive explanations and comments that clarify complex code. Additionally, the size of the PR often provided insight into its focus and comprehensiveness. These elements collectively support understanding, standardization, and maintainability.
  6. Testing and Verification: Systematic testing and verification processes emerged as critical factors in quality assessment. Participants emphasized the need for unit tests, corner case evaluations, and robust error handling. Security measures, well-defined inputs and outputs, and structured problem-solving approaches were also prioritized. These practices ensure the reliability, robustness, and integrity of the code.

Initial Codes Axial Coding Final Themes
• Code Clarity and Style
• Coding Conventions
• Code Readability
• Code Readability and Clarity
• Code Standards and Formatting
• Coding Conventions and Readability
• Code Testing
• Code Correctness
• Performance Optimization
• Utilization of Library Functions
• Code Complexity
• Code Efficiency
• Code Quality and Correctness
• Code Efficiency and Optimization
• Code Efficiency through Libraries
• Code Quality and Optimization
• Code Maintainability
• Reusability
• Code Reusability and Maintainability • Code Reusability and Maintainability
• Trust in Contributors
• PR Author Contribution Frequency
• PR Author Experience
• SDE Level Consideration
• Trust and Contribution Frequency
• SDE Level Consideration
• Experience and Contributor History
• Contributor Role and Experience
• Documentation
• Comments and Unit Test
• Code Comments
• Lines of Code
• PR Documentation and Comments
• PR Size and Composition
• PR Documentation, Size and Comment Conventions
• Security Measures
• Test Results
• Corner Test Cases
• Error handling
• Approach to Problem Statement
• Presence of Sample Input/Output
• Security Compliance
• Unit Testing
• Problem-solving and Error Handling
• Code Testing and Examples
• Testing and Verification
Table 3: Analysis of Factors Used to Examine PR Quality: Initial Codes, Axial Codes, and Final Themes

Summary. The study of PR quality factors revealed 6 key factors that that emphasize a balance between technical proficiency, individual contributor’s role, and collaboration. Participants emphasized the importance of coding standards, efficiency, reusability, trust in contributors, clear documentation, and rigorous testing.
These findings illustrate the comprehensive methodology adopted by software professionals, underscoring their focus on maintaining high standards of quality, teamwork, and individual accountability in the PR review process.


A Graduate's Guide to Technical Reporting. The visualizations, datasets, methodologies, and other materials discussed in this article are provided for reference and use. For any errors, clarifications, or questions, feel free to reach out via email. However, please ensure proper citation of the original works listed in the references, and especially cite the full thesis if utilizing this content. The complete thesis delves deeper into the survey dataset, methodologies, survey design, execution, and the underlying reasoning. It can be accessed here: Master's Thesis: Reinforcement Learning for GitHub Pull Request Predictions: Analyzing Development Dynamics [web].


Citation:

@MASTERSTHESIS{Joshi2023_rl,
title = "Reinforcement Learning for {GitHub} Pull Request Predictions: Analyzing Development Dynamics",
author = "Joshi, Rinkesh",
publisher = "Carleton University",
month = nov,
year = 2023,
language = "en" }

References:

  1. R. M. Groves, F. J. Fowler Jr, M. P. Couper, J. M. Lepkowski, E. Singer, and R. Tourangeau, Survey methodology. John Wiley & Sons, 2011.
  2. J. M. Corbin and A. Strauss, “Grounded theory research: Procedures, canons, and evaluative criteria,” Qualitative Sociology, vol. 13, no. 1, pp. 3–21, Mar 1990. [Online]. Available: https://doi.org/10.1007/BF00988593
  3. R. Joshi, “GitHub Pull Request Analysis: Sentiment Data and Developer Survey Responses”. Zenodo, Aug. 2023. doi: 10.5281/zenodo.10049493.

Tools/Libraries/Datasets: