Hopp til innholdet
Hjem » Exploring Bibliometric Methods: Citation Analysis in Research

Exploring Bibliometric Methods: Citation Analysis in Research

    Citation analysis

    This article will explore citation analysis and its various methods, tools and applications. Our goal is to provide you with a comprehensive understanding of what citation analysis is, why it is important, and how it can be used to evaluate and enhance your research.

    Citation Analysis and Its Importance

    What is citation analysis?

    At its core, citation analysis is a set of methods used to measure the impact and influence of scholarly works by examining the patterns and frequency of citations. It is a key component of bibliometrics and scientometrics, which are broader fields that use quantitative methods to study various aspects of scientific research and communication.

    What are bibliometrics and scientometrics?

    Bibliometrics is the application of statistical methods to study bibliographic data, especially in scientific, library and information science contexts. It is closely associated with scientometrics, which involves analyzing scientific metrics and indicators.

    Bibliometric indicators can be broadly categorized into three main types:

    • Quantity indicators assess a researcher’s productivity by quantifying their research output.
    • Quality indicators evaluate a researcher’s performance by measuring the impact and significance of a researcher’s work.
    • Structural indicators analyze the relationships and connections between publications, authors and research fields.

    Citation analysis is a tool for understanding the flow of ideas and the evolution of knowledge within and across different fields. By tracking citations, we can identify influential works, map out research fronts, and gain insights into the structure and dynamics of scholarly networks.

    The origins of citation analysis

    The origins of citation analysis can be traced back to the pioneering work of Eugene Garfield in the 1950s and 1960s. Garfield developed the concept of the Science Citation Index, which laid the foundation for modern citation databases and metrics. Since then, citation analysis has evolved into a sophisticated and multifaceted field with a wide range of methods and applications.

    Over the years, various citation analysis methods have been developed and refined. Here is a brief chronological overview of some key milestones:

    1. 1960s: Eugene Garfield introduces the Science Citation Index and the concept of journal impact factor.
    2. 1970s: Co-citation analysis is developed to identify clusters of related documents based on their co-occurrence in reference lists.
    3. 1980s: Bibliographic coupling is introduced as a method for linking documents that share common references.
    4. 1990s: Jorge E. Hirsch proposed the h-index as a way to quantify a researcher’s scientific output based on their publication and citation records.
    5. 2000s: Network-based approaches such as PageRank and Eigenfactor gain popularity for analyzing citation networks. Altmetrics emerge as a way to capture the online impact of research.
    6. 2010s: Text mining and natural language processing techniques are increasingly used to extract insights from citation data, and machine learning is applied to citation analysis tasks.

    Purpose and significance of citation analysis in research

    So why is citation analysis so important in research? There are several key reasons:

    1. Evaluating research impact: Citation analysis provides a quantitative measure of the impact and influence of individual researchers, publications, journals, and institutions. It can help identify highly cited works that significantly contribute to a field.
    2. Mapping research trends: By analyzing citation patterns over time, we can track the emergence and evolution of research topics, identify hot spots of activity, and forecast future directions.
    3. Facilitating literature search: Citation-based methods such as co-citation and bibliographic coupling can help researchers navigate the vast landscape of scientific literature and discover relevant papers in their field.
    4. Informing research policy: Citation data is often used by funding agencies, universities and governments to evaluate research performance, allocate resources and make policy decisions.
    5. Enhancing scholarly communication: Citation analysis can shed light on the flow of ideas between disciplines, the formation of research communities and the role of collaboration in scientific progress.

    Citation analysis is invaluable for making sense of the ever-growing body of scientific knowledge. It allows us to cut through the noise and identify the most significant and influential works in a field. At the same time, it is important to use citation data responsibly and in conjunction with other indicators of research quality and impact.

    Introduction to citation metrics

    Citation metrics are quantitative measures that use citation data to assess the impact and influence of research outputs. Some of the most commonly used metrics include:

    • Citation count: The total number of times a publication has been cited by other works. While simple, citation count can be a useful indicator of a paper’s popularity and influence.
    • Journal impact factor: A measure of the average number of citations received by articles published in a particular journal over a two-year period. The impact factor is often used to assess the relative importance of journals within a field.
    • h-index: Proposed by physicist Jorge E. Hirsch in 2005, the h-index quantifies a researcher’s scientific productivity based on their publication and citation record. A researcher with an h-index of h has published h papers that have each been cited at least h times.

    These metrics form the foundation of modern citation analysis. However, it is important to recognize and use their limitations in appropriate contexts

    The role of citation metrics in evaluating research impact

    Citation metrics play a central role in evaluating the impact and significance of research outputs at various levels – from individual researchers to institutions and countries. They are widely used by funding agencies, hiring committees and tenure review boards to assess the productivity and influence of scholars.

    Citation metrics can be a valuable complement to expert judgment in evaluating research quality and impact. They provide an objective and data-driven perspective that can help counteract biases and subjectivity in peer review. At the same time, metrics should never be used as the sole criterion for assessing research merit. They should be interpreted in light of disciplinary norms, publication practices, and other contextual factors.

    Limitations and criticisms of citation metrics

    Despite their widespread use, citation metrics have become increasingly scrutinised and criticised in recent years. Some of the main limitations and concerns include:

    • Disciplinary differences: Citation practices vary widely across fields, making it difficult to compare metrics between disciplines. For example, papers in mathematics tend to have fewer citations than those in biomedical sciences.
    • Publication type biases: Different types of publications (e.g., review articles and methodological papers) tend to have different citation patterns, which can skew metrics.
    • Self-citation and citation cartels: Authors may artificially inflate their citation counts by excessively citing their own work or engaging in citation cartels with colleagues.
    • Lack of context: Citation metrics do not capture the reasons why a work is cited or the nature of the citing relationship (e.g., positive vs. negative citation).
    • Language biases: Papers published in English tend to receive more citations than those in other languages, even if they are of equal quality.

    It is important for researchers and evaluators to be aware of the limitations of citation metrics and use them judiciously in combination with other indicators of research impact.

    Why do you need citation analysis?

    You may be wondering why you might need it in your own research. Here are a few important reasons:

    • Enhancing your literature search: Citation analysis can help you navigate the vast and complex landscape of scientific literature in your field. Tracing citation networks and identifying influential papers lets you quickly focus on the most relevant and significant works for your research question.
    • Identifying research gaps and opportunities: Analyzing citation patterns can reveal areas that are understudied or ripe for further investigation. You can spot emerging trends, novel connections between fields, and potential collaborators who share your research interests.
    • Benchmarking your research impact: Citation metrics can help you gauge the visibility and influence of your own work relative to others in your field. You can track your citation counts, h-index, and other indicators over time to monitor your progress and identify areas for improvement.
    • Funding and promotion: In today’s competitive academic environment, demonstrating the impact and significance of your research is crucial for securing grants, fellowships, and tenure. Citation data can provide tangible evidence of your contributions to your field and help you stand out from other applicants.

    As we discussed earlier, citation analysis has limitations and drawbacks. However, when used judiciously and combined with other research tools and strategies, it can be a powerful asset in your scholarly toolkit.

    Other methods for evaluating research impact include expert peer review, altmetrics (e.g., social media mentions, news coverage), and qualitative assessments of research quality and significance. Each has its strengths and weaknesses, and the choice of method depends on the specific research context and goals.

    Applications of Citation Analysis

    Citation analysis has a wide range of applications across different fields and contexts, from research evaluation and policy to science mapping and knowledge discovery. Some of the most common applications of citation analysis include:

    • Research evaluation: Citation analysis is often used in research evaluation processes, such as tenure and promotion decisions, grant funding allocations and institutional rankings. By providing quantitative measures of research impact and influence, citation analysis can help decision-makers assess the quality and significance of individual researchers, departments, or institutions.
    • Science mapping: Citation analysis can also map scientific fields’ intellectual structure and evolution by identifying key publications, authors, and topics and visualizing their relationships and networks. Science mapping can help researchers identify emerging trends, research fronts, and knowledge gaps and inform planning and priority-setting.
    • Literature searching: Citation analysis can be a powerful tool for searching for and discovering literature. It helps researchers identify relevant and high-impact publications in their field of interest. By following citation chains and co-citation networks, researchers can quickly identify key papers and authors and uncover important insights and connections that might otherwise be missed.
    • Research collaboration: Citation analysis can also be used to study research collaboration patterns and networks by identifying co-authorship and co-citation relationships between researchers and institutions. This can help researchers identify potential collaborators and partners and inform research management and policy decisions around research collaboration and internationalization.
    • Research policy: Citation analysis can inform research policy and strategy by providing evidence and insights into the strengths, weaknesses, and impacts of different research programs, funding models, and evaluation systems. By analyzing citation data in conjunction with other research indicators and expert input, policymakers can make more informed and evidence-based decisions about research priorities, investments, and governance.

    These are just a few examples of the many ways in which citation analysis is being used across different fields and contexts. As the methods and data sources for citation analysis continue to evolve and expand, we can expect to see even more diverse and innovative applications of this powerful analytical tool in the years to come.

    Tools and Resources for Citation Analysis

    Citation analysis would not be possible without the various tools and databases that have been developed to collect, organize, and analyze citation data. This section introduces some of the key resources available for citation analysis across different disciplines.

    Citation analysis in the natural and life sciences

    Web of Science and Scopus are the most widely used citation databases in the natural and life sciences. These databases cover a broad range of scientific journals and conference proceedings and provide various citation metrics and analytical tools.

    Web of Science offers the Journal Citation Reports, which rank journals based on their impact factor and other citation indicators. It also allows users to create citation reports for individual researchers or institutions and visualize citation networks using tools like CitNetExplorer.

    Scopus provides the SCImago Journal Rank (SJR) and the Source Normalized Impact per Paper (SNIP) as alternative journal-level metrics. It also offers the Scopus Author Identifier, which disambiguates author names and tracks individual citation records.

    In addition to these subscription-based databases, there are freely available resources such as PubMed Central (for biomedical literature) and arXiv (for physics, mathematics, and computer science). These repositories allow researchers to access full-text articles and preprints and often provide citation data as well.

    Citation analysis in the social sciences

    Citation analysis has traditionally been more fragmented and less standardized in the social sciences than in the natural sciences. However, several key resources are still available for researchers in this area.

    The Social Sciences Citation Index (SSCI), part of the Web of Science suite, covers many social science journals and provides citation metrics and tools similar to the Science Citation Index. It is particularly useful for fields like psychology, economics, and political science.

    The Social Science Research Network (SSRN) is a platform dedicated to the global distribution of research in the social sciences. It comprises various specialized research networks, each focusing on a specific discipline within the social sciences.

    Subject-specific databases exist for more specialized disciplines, such as the Education Resources Information Center (ERIC) for education research and the EconLit database for economics literature. These databases may not always provide citation data, but they can be valuable for identifying relevant papers and tracking research trends.

    In recent years, Google Scholar has emerged as a popular tool for citation analysis in the social sciences. While its coverage and data quality are not as well-curated as subscription databases, it offers a broad and inclusive perspective on scholarly communication, including books, conference papers and grey literature.

    Citation analysis in the humanities

    Citation analysis in the humanities has traditionally been more challenging than in the sciences and social sciences due to the different nature of scholarly communication in these fields. Humanities researchers often publish in books, chapters, and specialized journals that standard citation databases may not cover well.

    However, some resources are still available for citation analysis in the humanities. The Arts & Humanities Citation Index (A&HCI), part of the Web of Science suite, covers a selection of leading humanities journals and provides citation data for fields like literature, history, and philosophy.

    More recently, Google Scholar has become a popular tool for citation analysis in the humanities due to its broad coverage of books, chapters, and non-traditional publication formats. However, its data quality and accuracy can be variable, and it may not always distinguish between scholarly and non-scholarly sources.

    Other specialized databases for the humanities include the Modern Language Association (MLA) International Bibliography for literature and language studies, and the Philosopher’s Index for philosophy research. While these databases may not always provide citation data, they can be valuable for identifying key works and mapping out research trends.

    Citation Analysis Methods

    Let’s take a closer look at some of the specific methods and techniques that researchers can use to analyze and interpret citation data with a brief overview of some of the most common citation analysis methods, along with their strengths, limitations, and applications.

    • Citation counting: This is the most basic and widely used method of citation analysis, which simply involves counting the number of times a given publication, author, or journal has been cited by other sources. While citation counts can provide a rough indication of the popularity or impact of a research output, they do not consider the quality, context, or significance of the citations.
    • Co-citation analysis: This method involves identifying pairs of publications frequently cited by other sources and using this information to map a given field’s intellectual structure or research fronts. Co-citation analysis can help identify influential publications or authors and the relationships and networks between research topics or communities.
    • Bibliographic coupling: This method is similar to co-citation analysis, but instead of looking at pairs of publications that are cited together, it looks at pairs of publications that cite the same sources. Bibliographic coupling can help identify publications or authors working on similar research questions or using similar methods, even if they are not directly citing each other.
    • Citation network analysis: This method involves constructing and analyzing networks of publications, authors, or journals based on their citation relationships. Citation network analysis can help identify important nodes or hubs in the network, as well as the overall structure and dynamics of the research community. Some common metrics used in citation network analysis include degree centrality, betweenness centrality, and PageRank.
    • Text mining and natural language processing: As mentioned earlier, these methods involve using computational techniques to extract insights from the full text of scholarly documents beyond just the citation links. Some common applications of text mining in citation analysis include identifying the semantic content and context of citations, mapping the co-occurrence of keywords or phrases, and analyzing the sentiment or tone of scholarly discourse.
    • Altmetrics: As discussed earlier, altmetrics involve using a wide range of online and social media data sources to track the broader impact and engagement of research outputs beyond just traditional citation counts. Some common altmetrics indicators include social media mentions, news media coverage, blog posts, and downloads or views of full-text articles.

    This is not an exhaustive list of all the possible citation analysis methods. Researchers are constantly developing new and innovative techniques to analyze and interpret citation data. The choice of method will depend on the specific research question, data availability, and disciplinary context of the study.

    It’s also important to note that citation analysis methods are not mutually exclusive, and researchers often use a combination of different methods to get a more comprehensive and nuanced view of research impact. For example, a researcher might use citation counting to identify highly cited publications in a given field, then use co-citation analysis to map the intellectual structure of the field, and finally, use text mining to analyze the content and context of the citations.

    How do you choose your citation analysis method?

    With so many tools and resources available, choosing the right citation analysis method for your research needs can be overwhelming. Here are some factors to consider:

    1. Disciplinary norms and practices: Different fields have different citation cultures and expectations. Choose a method well-suited to your discipline and aligns with established practices.
    2. Research goals and questions: Consider what you hope to achieve with your citation analysis. Are you looking to evaluate the impact of a specific paper or researcher? Map out research trends over time? Identify key collaborators or funding sources? Different methods may be more or less appropriate depending on your specific objectives.
    3. Data availability and quality: Not all citation databases cover the same range of sources or provide the same level of data accuracy and completeness. Choose a resource that covers your field well and provides reliable citation data.
    4. Time and resource constraints: Some citation analysis methods can be time-consuming and require specialized expertise or access to expensive databases. Consider your available time, budget, and technical skills when choosing a method.

    It is often best to use a combination of methods and data sources to get a more comprehensive and nuanced view of citation patterns and impacts. For example, you might use Web of Science or Scopus to identify highly cited papers in your field, then use Google Scholar to track their broader impact across different publication types and audiences. You might also supplement citation data with other indicators of research impact, such as altmetrics or expert peer reviews.

    The key is to be transparent about your methods and data sources, and to interpret your results in light of their limitations and potential biases. By doing so, you can harness the power of citation analysis to gain new insights into your field and advance your research goals.

    Differences in citation practices across disciplines and regions

    One of the biggest challenges in citation analysis is accounting for the wide variation in citation practices across different disciplines and regions. For example, papers in biomedical fields tend to have much higher citation rates than those in mathematics or humanities, due to differences in research output, collaboration patterns, and publication timelines.

    Similarly, there can be significant regional differences in citation practices, with some countries or language groups having different norms around self-citation, co-authorship, and reference list length. For example, studies have found that papers from China tend to have higher rates of self-citation than those from other countries, which can inflate their citation metrics.

    To address these challenges, it is important to use field-normalized indicators that take into account the average citation rates and practices within a given discipline or region. This can help ensure that comparisons across fields or countries are more fair and meaningful.

    It is also important to be aware of potential language biases in citation databases, as papers published in English tend to be more visible and highly cited than those in other languages. Using databases with good coverage of non-English literature, such as Scopus or regional citation indexes, can help mitigate this issue.

    Addressing issues of self-citation and citation manipulation

    Another key challenge in citation analysis is self-citation and other citation manipulation forms. Self-citation refers to the practice of authors citing their own previous work, which can artificially inflate their citation counts and metrics. While some degree of self-citation is normal and expected, excessive or abusive self-citation can distort the true impact of a researcher’s work.

    Citation manipulation can also take other forms, such as citation cartels (groups of authors who agree to cite each other’s work) or coercive citation practices (pressuring authors to cite certain papers or journals). These practices can undermine the integrity and reliability of citation data and create perverse incentives for researchers.

    To address these issues, some best practices include:

    1. Using author-level metrics that exclude self-citations, such as the h-index or the g-index.
    2. Analyzing the distribution and context of self-citations to identify potential red flags or anomalies.
    3. Using citation databases with algorithms to detect and flag suspicious citation patterns, such as excessive self-citation or citation cartels.
    4. Encouraging researchers to follow ethical guidelines around citation practices, such as the Committee on Publication Ethics (COPE) guidelines.
    5. Educating peer reviewers and editors about identifying and addressing potential citation manipulation cases.

    Addressing self-citation and citation manipulation issues requires a multi-pronged approach that involves both technical solutions and cultural change within the research community.

    Ensuring the accuracy and integrity of citation data

    A final key challenge in citation analysis is ensuring the accuracy and integrity of the underlying citation data. Citation databases are not perfect and can contain errors, omissions, or inconsistencies that can affect the reliability of citation metrics and analyses.

    Some common data quality issues include:

    1. Incomplete or inconsistent coverage of sources, especially for non-English or non-traditional publication types.
    2. Errors in bibliographic data include misspelt author names, incorrect publication years, or missing volume/issue numbers.
    3. Duplicate or fragmented records for the same publication or author can lead to undercounting citations.
    4. Incorrect or missing links between citing and cited references can distort citation networks and metrics.

    To ensure the accuracy and integrity of citation data, it is important to use reputable and well-curated databases, such as Web of Science, Scopus, or discipline-specific indexes. These databases have quality control processes in place to minimize errors and inconsistencies and provide detailed documentation of their coverage and methodology.

    It is also important to carefully clean and validate your citation data before conducting analyses or drawing conclusions. This may involve steps such as:

    1. Deduplicating and disambiguating author or publication records.
    2. Standardizing and normalizing bibliographic data fields.
    3. Verifying the accuracy and completeness of citation links and counts.
    4. Identifying and correcting any outliers or anomalies in the data.

    Various software tools and scripts, such as OpenRefine, R packages for bibliometric analysis, and custom Python scripts, are available to help with data cleaning and validation. However, it is also important to have a good understanding of the underlying data structure and semantics, and to document any data processing steps taken.

    Advancements in Citation Analysis

    The field of citation analysis is constantly evolving, with new methods, tools, and data sources emerging to address the limitations and challenges of traditional approaches. In this section, I will highlight some of the key advancements and future directions in citation analysis and how they are shaping the way we evaluate and understand research impact.

    Altmetrics: Exploring new ways to measure research impact beyond citations

    One of the most significant developments in recent years has been the rise of altmetrics, which seeks to capture the broader social and online impact of research beyond traditional citation counts. Altmetrics include a wide range of indicators, such as:

    • Social media mentions and shares (e.g., on Twitter, Facebook, or LinkedIn)
    • News media coverage and press releases
    • Blog posts and online discussions
    • Downloads and views of full-text articles or datasets
    • Citations in policy documents or grey literature

    Altmetrics can complement traditional citation metrics and provide insights into how different audiences share, discuss, and use research by providing a more diverse and real-time view of research impact. Altmetrics can also help identify emerging topics or influential works that may not yet be well-cited in the scholarly literature.

    However, altmetrics also come with their own challenges and limitations, such as data quality and consistency issues, potential gaming or manipulation, and lack of standardization across different platforms and providers. As such, they should be used cautiously and in conjunction with other evidence and expert judgment forms.

    Text mining and natural language processing techniques

    Another key advancement in citation analysis is the use of text mining and natural language processing (NLP) techniques to extract insights from the full text of scholarly documents beyond just the citation links. These techniques can help identify:

    • The semantic content and context of citations, such as whether they are positive or negative, central or peripheral to the citing paper’s argument.
    • The key concepts, methods, and findings most frequently mentioned or discussed in a given field or topic.
    • The relationships and networks between different research topics based on the co-occurrence of keywords or phrases.
    • The sentiment and tone of scholarly discourse, such as whether a field is more collaborative or competitive or whether certain topics are more controversial or polarizing.

    Text mining and NLP techniques can help researchers identify new research questions, map the intellectual structure of a field, or track the diffusion and evolution of ideas over time by providing a more nuanced and contextual view of scholarly communication.

    However, these techniques require significant computational resources, expertise, and access to large-scale full-text databases. They also raise important questions about copyright, privacy, and ethics involving the automated analysis of potentially sensitive or proprietary content.

    Incorporating citation analysis in machine learning algorithms and artificial intelligence

    A final frontier in citation analysis is the use of machine learning and artificial intelligence (AI) techniques to automate and scale up the process of analyzing and interpreting citation data. Some potential applications include:

    • Predicting future citation counts or impact based on early citation patterns or other features of a paper or author.
    • Recommending relevant papers or authors to researchers based on their citation network or research interests.
    • Identifying emerging research fronts or trends based on clustering or pattern recognition algorithms.
    • Detecting potential cases of citation manipulation or misconduct based on anomaly detection or outlier analysis.

    By leveraging the power of big data and advanced analytics, these techniques can help researchers and institutions make more informed and data-driven decisions around research funding, hiring, and evaluation. They can also help make sense of the vast and complex landscape of scholarly communication and identify new opportunities for collaboration and innovation.

    Ethical Considerations in Citation Analysis

    As the use of citation analysis becomes more widespread and influential in research evaluation and decision-making, it is important to consider these methods’ ethical implications and potential risks.

    Maintaining privacy and confidentiality in citation analysis studies

    One of the most important ethical considerations in citation analysis is the need to protect the privacy and confidentiality of individual researchers and their work. Citation data can reveal sensitive information about a researcher’s productivity, collaborations, and intellectual influences, which could be used to make judgments about their career prospects or reputation.

    To address these concerns, it is important for citation analysis studies to follow best practices around data protection and confidentiality, such as:

    • Anonymizing or pseudonymizing individual-level data where possible and only reporting aggregate or summary statistics.
    • Obtain informed consent from researchers before using their citation data in studies or evaluations, and allow them to review and comment on the results.
    • Establish clear data governance policies and procedures around collecting, storing, and using citation data in compliance with relevant laws and regulations (e.g., GDPR).
    • Training researchers and administrators on the ethical and responsible use of citation data and these methods’ potential risks and limitations.

    By taking these steps, researchers and institutions can help ensure that citation analysis is used in a way that respects the privacy and autonomy of individual researchers and does not create unintended consequences or harms.


    In this article, we have taken an in-depth look at the world of citation analysis, exploring its key concepts, methods, tools, and applications. We have seen how citation analysis has evolved over time, from the early days of citation indexing to the latest advances in text mining, network analysis, and altmetrics.

    Some of the key takeaways from this article include:

    • Citation analysis is a powerful tool for evaluating the impact and significance of research. It examines the frequency and pattern of citations to a given publication, author, or journal.
    • There are many different methods of citation analysis, ranging from simple citation counting to more complex approaches like co-citation analysis, bibliographic coupling, and network analysis.
    • Citation analysis has a wide range of applications across different fields and contexts, from research evaluation and science mapping to literature searching and research policy.
    • Citation analysis also has some important limitations and ethical considerations, such as the need to account for disciplinary differences in citation practices, avoid misuse or misinterpretation of citation data, and protect individual researchers’ privacy and confidentiality.
    • As the methods and data sources for citation analysis continue to evolve and expand, it will be important for researchers and institutions to stay up-to-date with the latest developments and best practices in this field.


    For those interested in learning more about citation analysis, there are many excellent resources available, including:

    • The Handbook of Bibliometric Indicators by Roberto Todeschini and Alberto Baccini provides a comprehensive overview of the different methods and techniques for citation analysis.
    • The Metric Tide by James Wilsdon explores the role of metrics in research assessment and management, and provides recommendations for responsible metrics.
    • The Metrics Toolkit website provides a user-friendly guide to the different citation metrics and tools available, along with their strengths and limitations.