PeerJ — Latest Matching Preprints

1

Discovery of new deregulated miRNAs in gingivo buccal carcinoma using Group Benjamini Hochberg method: a commentary on "A quest for miRNA bio-marker: a track back approach from gingivo buccal cancer to two different types of precancers"

Koner, S.; De Sarkar, N.; Laha, N.

2023-02-21 cancer biology 10.1101/2023.02.17.529013 medRxiv

Top 0.1%

33.9%

Show abstract

This formal comment is in response to "A quest for miRNA bio-marker: a track back approach from gingivo buccal cancer to two different types of precancers" written by De Sarkar and colleagues in 2014. The above-mentioned paper found seven miRNAs to be significantly deregulated in 18 gingivo-buccal cancer samples. However, they suspected more miRNAs to be deregulated based on their exploratory statistical analysis. To control the false discovery rate (FDR), the authors used the Benjamini Hochberg (BH) method, which does not leverage any available biological information on the miRNAs. In this work, we show that some specialized versions of the BH method, which can exploit positional information on the miRNAs, can lead to seven more discoveries with this data. Specifically, we group the closely located miRNAs, and use the group Benjamini Hochberg (GBH) methods (Hu et al., 2010), which reportedly have more statistical power than the BH method (Liu et al., 2019). The whole transcriptome analysis of Sing et al. (2017) and previous literature on the miRNAs suggest that most of the newly discovered miRNAs play a role in oncogenesis. In particular, the newly discovered miRNAs include hsa-miR-1 and hsa-miR-21-5p, whose cancer-related activities are well-established. Our findings indicate that incorporating the GBH method into suitable microarray studies may potentially enhance scientific discoveries via the exploitation of additional biological information.

2

Twitter and Mastodon presence of highly-cited scientists

Siebert, M.; Siena, L. M.; Ioannidis, J. P. A.

2023-04-24 scientific communication and education 10.1101/2023.04.23.537950 medRxiv

Top 0.1%

28.8%

Show abstract

Social media platforms have an increasing influence in biomedical and other disciplines of science and public health. While Twitter has been a popular platform for scientific communication, changes in ownership have led some users to consider migrating to other platforms such as Mastodon. We aimed to investigate how many top-cited scientists are active on these social media platforms, the magnitude of the migration to Mastodon, and correlates of Twitter presence. A random sample of 900 authors was examined among those who are at the top-2% of impact based on a previously validated composite citation indicator using Scopus data. Searches for their personal Twitter accounts were performed in early December 2022, and re-evaluations were performed at 2 weeks, 4 weeks, and 2 months (February 6, 2023). 262/900 (29.1%) of highly-cited scholars had Twitter accounts, and only 9/800 (1%) had Mastodon accounts. Female gender, North American and Australia locations, younger publication age, and clinical medicine or social science expertise correlated with higher percentages of Twitter use. The vast majority of highly-cited author users of Twitter had few followers and tweets. Only 6 had more than 10,000 followers and none had more than 100,000. One limitation of our study is that it is possible that some accounts, especially with Mastodon, could not be detected. However, the study suggests that Twitter remains the preferred social media platform for highly-cited authors, and Mastodon has not yet challenged Twitters dominance. Moreover, most highly-cited scientists with Twitter accounts have limited presence in this medium.

3

Seasonal Variation in the Internet Searches for Cancer Recurrence: An Infodemiological Study

Wang, F.; Lou, q. X.; Hu, t. D.; Zhang, M.; Xie, Q.; Zou, Y.

2019-12-12 cancer biology 10.1101/2019.12.12.873984 medRxiv

Top 0.1%

23.9%

Show abstract

BackgroundWhile few clinical and epidemiological studies have assessed how seasonality affects cancer recurrence, it has not been studied with the utility of the internet data. In this study, we aim to test whether cancer recurrence presents seasonality on a population level, utilizing internet search query data. MethodsThis infodemiological study used Google Trends to find query data for the term "cancer recurrence" from January 01, 2004, to December 31, 2018 in the USA, the UK, Canada, and Australia. Time series seasonal decomposition and the cosinor analysis were used to analyze and describe the seasonal trends for cancer recurrence. ResultsA general upward trend in UK and northern hemisphere were observed. Statistically significant seasonal trends on "cancer recurrence" in the USA (p=1.33x10-5), the UK (p=0.012), and northern hemisphere (p=5.67x10-7) were revealed by cosinor analysis, with a peak in early summer and nadir in early winter. Besides, a seasonal variation was also found in Australia (p=2.3x10-4), with a peak in late summer and nadir in late winter. ConclusionsThe evidence from internet search query data showed a seasonal variation in cancer recurrence, with a peak in early summer(northern hemisphere)/late summer(southern hemisphere). Besides, the relative search volume of "cancer recurrence" appeared a general upward trend in UK and northern hemisphere in recent years.

4

Research landscape of lymphovascular invasion in Oral Squamous Cell Carcinoma: A bibliometric analysis from 1994 to present

Tandon, A.; Sandhya, K.; Singh, N. N.; Gulati, N.

2023-02-27 oncology 10.1101/2023.02.27.23286490 medRxiv

Top 0.1%

23.4%

Show abstract

BackgroundThe primary factor affecting tumor biology is neo-lymphangiogenesis in solid epithelial malignancies like OSCC. Determining the impact of lymphovascular invasion is critical in order to determine OSCCs loco-regional, and global dissemination. Bibliometric landscapes are vital to learning about the most recent advancements in the aforementioned topic because the ongoing research in OSCC is multifaceted. This analysis can reveal the progressions that might modernize OSCC diagnosis and treatment. ObjectivesTo study the relevance and effects of lymphovascular invasion in oral squamous cell carcinoma utilizing co-occurrence of keywords analysis and co-authorship analysis for the PubMed database. MethodologyCross-sectional bibliometric analysis of full-text PubMed articles from 1994 to the present using VOSviewer (Version 1.6.19) was performed. The keywords for the search of data included "Lymphovascular invasion in oral squamous cell carcinoma" using the Boolean operator (AND). The data obtained was analyzed for co-occurrence and co-authorship analysis using the VOSviewer standard protocol. ResultsThe query revealed 296 searches in the PubMed database. Seven clusters were found with default colors in the representation of the entire term co-occurrence network, which also displayed a total link strength of 22262. The items were categorized into clusters based on their commonalities. The labels weights, as determined by Links and Occurrences, did not depend on one another, and the co-occurrence of keywords does not imply a causal association. In the item density visualization, item labels represented individual things. The number of items from a cluster that was close to the point was represented by the weight given to its color, which was formed by combining the colors of other clusters. A network of 57 authors who matched the search parameters was discovered by the co-authorship analysis. The network visualization map displayed three clusters with a total link strength of 184. The quantity of co-authorship relationships and the number of publications did not appear to be significantly correlated. ConclusionThis investigation uncovered a sizable body of bibliometric data that emphasizes key trends and advancements in the aforementioned theme. The observed variances may be a result of the various objectives of the researchers and journals, who collaborate to provide the best possible literature dissemination. Graphic Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=109 SRC="FIGDIR/small/23286490v1_ufig1.gif" ALT="Figure 1"> View larger version (42K): org.highwire.dtl.DTLVardef@18417fdorg.highwire.dtl.DTLVardef@1431e4aorg.highwire.dtl.DTLVardef@179c036org.highwire.dtl.DTLVardef@3a264b_HPS_FORMAT_FIGEXP M_FIG C_FIG

5

Are there systematic biases in RNA-seq data analysis? A case study for Amphimedon queenslandica sponge as a model object.

Feranchuk, S.

2020-03-02 developmental biology 10.1101/2020.02.28.969642 medRxiv

Top 0.1%

22.9%

Show abstract

BACKGROUNDThe performance of a functional annotation approach for RNA-seq bioinformatics pipelines was to be compared with the method where groups of genes are generated with no relation to ontologes. Three publicly available RNA-Seq experiments for Amphimedon queenslandica sponge were used for the designed comparison. One of these experiments was referred in the publication where stages of embryo development were compared for a wide range of animal species. METHODSThe expression levels were re-calculated here for three independent series of experiments. The functional annotation of differential expression levels was than conducted. This allow to compare an applicability of the two approaches, and to re-evaluate the interpretation provided in the mentioned publication. RESULTSIt was confirmed by the conventional approach that Wnt and Notch pathways do operate in a development of a sponge embryo. The method of annotation which uses unbounded grouping of genes was effective in an ability to separate development stages of sponge embryo. In addition, the published results were by a suggestion distorted by an artifact, caused by a positive feedback in the stage of data processing.

6

The effect of floral symmetry and orientation on the consistency of pollinator's entry angle

Jirgal, N.; Ohashi, K.

2022-06-09 plant biology 10.1101/2022.06.09.495424 medRxiv

Top 0.1%

22.8%

Show abstract

Since the publication of Sprengels (1793) observations, it has been considered that flowers with zygomorphic (or bilaterally symmetrical) corollas evolved to restrict the movement of pollinators into the flower by limiting the pollinators direction of approach. However, little empirical support has been accumulated so far, except Culbert and Forrest (2016) who found that zygomorphy reduced variance in pollinators flower entry angle. Our aim was to build on this work and observe whether floral symmetry or orientation had an effect on pollinator entry angle in a laboratory experiment using bumble bees, Bombus ignitus. Using nine different combinations of artificial flowers created from three symmetry types (radial, bilateral and disymmetrical) and three orientation types (upward, horizontal and downward), we tested the effects of these two floral aspects on the consistency of bees entry angle. Our results show that horizontal orientation significantly reduced the variance in entry angle, while symmetry had little effect. We also found no significant interactions between angle and symmetry in their effect on entry angle. Thus, our results suggest that horizontal orientation forces the bees to orient themselves relative to gravity rather than the corolla and stabilizes their flower entry. This stabilizing effect may have been mistaken for the effect of zygomorphic corolla as it is presented horizontally in most species. Consequently, we suggest that the evolution of horizontal orientation preceded that of zygomorphy as indicated by some authors, and that the reason behind the evolution of zygomorphy should be revisited.

7

RNA-seq analyses: Benchmarking differential expression analyses tools reveals the effect of higher number of replicates on performance.

Salifu, S. P. P.; Nyarko, H. N.; Doughan, A.; Msatsi, H. K.; Mensah, I.; Bukari, A.-R. A.

2020-06-10 bioinformatics 10.1101/2020.06.10.144063 medRxiv

Top 0.1%

22.6%

Show abstract

The introduction of several differential gene expression analysis tools has made it difficult for researchers to settle on a particular tool for RNA-seq analysis. This coupled with the appropriate determination of biological replicates to give an optimum representation of the study population and make biological sense. To address these challenges, we performed a survey of 8 tools used for differential expression in RNA-seq analysis. We simulated 39 different datasets (from 10 to 200 replicates, at an interval of 5) using compcodeR with a maximum of 100 replicates. Our goal was to determine the effect of varying the number of replicates on the performance (F1-score, recall and precision) of the tools. EBSeq and edgeR-glmRT recorded the highest (0.9385) and lowest (0.6505) average F1-score across all replicates, respectively. We also performed a pairwise comparison of all the tools to determine their concordance with each other in identifying differentially expressed genes. We found the greatest concordance to be between limma voom treat and limma voom ebayes. Finally, we recommend employing edgeR-glmRT for RNA-seq experiments involving 10-50 replicates and edgeR-glmQLF for studies with 55 to 200 replicates. Author summaryDownstream analysis of RNA-seq data in R often poses several challenges to researchers as it is a daunting task to choose a specific differential expression analysis tool over another. Researchers also find it challenging to determine the number (replicates) of samples to use in order to give comparable and accurate results. In this paper, we surveyed eight differential expression analysis tools using different number of replicates of simulated RNA-seq count data. We measured the performance of each tool and based on the recorded F1-scores, recall and precision, we made the following recommendations; consider edgeR-glmRT and edgeR-glmQLF for replicates of 10-50 and 55-200 respectively.

8

If not a fake, what's in the lake?

Foxon, F.

2023-03-02 zoology 10.1101/2023.03.01.530639 medRxiv

Top 0.1%

20.1%

Show abstract

An animal dubbed Champ has been sighted by hundreds of eyewitnesses in a large, near-oligotrophic lake in North America. A widely-publicised photograph taken by Mansi purportedly depicting the animal was published to much fanfare. In the present study, sightings were coded and analysed using interrupted time-series models, Pearson correlation coefficients, and descriptive statistics. The number of sightings per year was statistically significantly higher after publication of the Mansi photograph compared to before, which may be evidence of expectant attention, or publicity leading to more lake-goers and therefore more animal sightings. Sightings were consistent in condition (mostly Summer, from Noon to Evening, > 1 witness, and a calm lake surface) which may be interpreted as consistency of when lake-goers visit Champlain, or as evidence of consistent behavioural characteristics of Champ animals. Sightings were highly inconsistent in reported Champ characteristics with widely varying morphology, and most sightings were missing morphological data entirely. More than a quarter of sightings were likened to logs, land mammals, birds, fish, and boats, which are all found in the lake. There were no associations between distance to sighting, estimated length, and estimated height of objects witnessed, which may suggest that eyewitnesses provide inaccurate estimates of these measurements in lake settings. If not a fake, whats in the lake may be ordinary phenomena mistaken for Champ. Alternatively, Lake Champlain is inhabited by as-yet undiscovered multi-humped, dark-coloured serpents approximately seven meters in length, which locomote in a fast and sinuous fashion, and which enjoy pleasant Summer evenings and crowds. Deciding which explanation best accounts for the data is left as an exercise for the reader.

9

Polymorphism Q 223R of the leptin receptor (LEPR) - a possible relationship with adaptation to non-tropical climate in Yakuts

Pavlova, N. I.; Bochurov, A. A.; Alekssev, V. A.; Krylov, A. V.; Sydykova, L. A.; Kurtanov, K. A.

2023-10-10 endocrinology 10.1101/2023.10.09.23296771 medRxiv

Top 0.1%

19.5%

Show abstract

Obesity is an energy imbalance that occurs due to a lack of energy intake and consumption. We studied the variability of the Q223R polymorphism of the LEPR gene in the Yakut population and the relationship with body mass index (BMI) and abdominal obesity in a sample of Yakuts (n=336), consisting of individuals with obesity (n=185) and normal weight (n=151). For genotyping, we used the classical methods of PCR-RFLP analysis. A comparative analysis of the obtained data on the frequencies of alleles and genotypes with data on other populations of the world was also carried out. The G variant allele frequency was 79.5% in normal weight patients and 82.7% in obese patients. Genotype analysis showed a high frequency of genotypes GG - 64.2% and GA - 30.5% in the group with normal BMI and GG - 69.7% and GA - 25.9% in the group with high BMI. There was no significant difference in the frequency of alleles and genotypes of the Q223R polymorphism between the groups. It was established that the frequency of the G allele of the Yakuts (79.5%) with the populations of East Asia (86.9%). When analyzing the average anthropometric values, depending on the genotype, a statistically significant difference in waist circumference was found in persons with abdominal obesity (p = 0.03), so it was greater in carriers of the heterozygous AG genotype than in carriers of the GG genotype. In conclusion, our study demonstrates that SNP Q223R (LEPR) is possible and has some effect on anthropometric parameters in the Yakut population, but differs from studies conducted on samples of European ethnicity. It can be assumed that the accumulation of the G allele of the Q223R polymorphism (LEPR) in the Yakut population, as well as in the populations of East Asia, is probably the result of metabolic adaptation to living conditions in a non-tropical climate.

10

Relationship Between Helminthasis And Gastric Cancer: A Systematic Review

Ramirez, D. F.

2021-08-10 oncology 10.1101/2021.08.07.21261231 medRxiv

Top 0.1%

19.5%

Show abstract

Introductionhelminths are parasitic worms able to produce diverse clinical manifestations in humans, mainly in the gut. Gastric cancer its a high incidence entity in Colombia, being the highland regions where its incidence is the highest in comparison with the lower incidence coastal region. From the above it is intended to determine the relationship between helminthiasis and the development of gastric cancer. MethodologyA systematic review was performed in four databases for studies evaluating the relationship between helminthiasis and the development of gastric cancer. ResultsWe included 16 articles from 929 records, with 11 articles reporting a positive relationship and 5 articles with negative relationship. ConclusionsParasitic infections of the gastrointestinal tract by helminths promote TH-2 type immune responses and decrease TH-1 type that are involved in the progression of precancerous lesions associated with Helicobacter pylori infection.

11

A retrospective cluster analysis of COVID-19 cases by county

Megahed, F. M.; Jones-Farmer, L. A.; Rigdon, S. E.

2020-11-12 bioinformatics 10.1101/2020.11.12.379537 medRxiv

Top 0.1%

19.4%

Show abstract

The COVID-19 pandemic in the U.S. has exhibited distinct waves, the first beginning in March 2020, the second beginning in early June, and additional waves currently emerging. Paradoxically, almost no county has exhibited this multi-wave pattern. We aim to answer three research questions: (1) How many distinct clusters of counties exhibit similar COVID-19 patterns in the time-series of daily confirmed cases?; (2) What is the geographic distribution of the counties within each cluster? and (3) Are county-level demographic, socioeconomic and political variables associated with the COVID-19 case patterns? We analyzed data from counties in the U.S. from March 1 to October 24, 2020. Time series clustering identified clusters in the daily confirmed cases of COVID-19. An explanatory model was used to identify demographic, socioeconomic and political variables associated the cluster patterns. Four patterns were identified from the timing of the outbreaks including counties experiencing a spring, an early summer, a late summer, and a fall outbreak. Several county-level demographic, socioeconomic, and political variables showed significant associations with the identified clusters. The timing of the outbreak is related both to the geographic location within the U.S. and several variables including age, poverty distribution, and political association. These results show that the reported pattern of cases in the U.S. is observed through aggregation of the COVID-19 cases, suggesting that local trends may be more informative. The timing of the outbreak varies by county, and is associated with important demographic, socioeconomic and geographic factors.

12

Impact of experimental bias on compositional analysis of microbiome data

Hu, Y.; Satten, G. A.; Hu, Y.

2023-02-13 bioinformatics 10.1101/2023.02.08.527766 medRxiv

Top 0.1%

19.3%

Show abstract

Microbiome data are subject to experimental bias that is caused by DNA extraction, PCR amplification among other sources, but this important feature is often ignored when developing statistical methods for analyzing microbiome data. McLaren, Willis and Callahan (2019) proposed a model for how such bias affects the observed taxonomic profiles, which assumes main effects of bias without taxon-taxon interactions. Our newly developed method, LOCOM (logistic regression for compositional analysis) for testing differential abundance of taxa, is the first method that accounted for experimental bias and is robust to the main effect biases. However, there is also evidence for taxon-taxon interactions. In this report, we formulated a model for interaction biases and used simulations based on this model to evaluate the impact of interaction biases on the performance of LOCOM as well as other available compositional analysis methods. Our simulation results indicated that LOCOM remained robust to a reasonable range of interaction biases. The other methods tended to have inflated FDR even when there were only main effect biases. LOCOM maintained the highest sensitivity even when the other methods cannot control the FDR. We thus conclude that LOCOM outperforms the other methods for compositional analysis of microbiome data considered here.

13

Cannibalism as a feeding strategy for mantis shrimp Oratosquilla oratoria (De Haan, 1844) in the Tianjin coastal zone of Bohai Bay

Bo, Q.-K.; Lu, Y.-Z.; Mi, H.-J.; Yu, G. Y.; Gu, D.-X.; You, H.-Z.; Hao, S.

2019-08-19 molecular biology 10.1101/740100 medRxiv

Top 0.1%

19.3%

Show abstract

A representative semi-enclosed bay of China, Bohai Bay has experienced severe interference in recent decades and is under threat from rapid human development. Although the mantis shrimp Oratosquilla oratoria plays an important role in the ecosystem and fishery, its feeding ecology and the impact of habitat changes on its feeding habits are poorly known. In this study, we sought to identify the prey consumed by O. oratoria through the separation of stomach contents and to describe its trophic ecology during maturation, from March to July, in the Tianjin coastal zone of Bohai Bay. A total of 594 specimens were collected and 347 (58.59%) stomachs were found to have food remains. More than half of the O. oratoria individuals had poor feeding activity, and the degree of feeding activity of females was higher than that of males, but there was no significant difference in the visual fullness index and the fullness weight index (FWI) between sexes for each month. And the feeding activities of O. oratoria were consistent over the study months. A total of 207 prey items yielded 231 readable sequences and 24 different taxa were identified. Prey detected in O. oratoria consisted mainly of crustaceans, which accounted for 71.86 % of the clones detected; 16.02% corresponded to fishes, 8.23% corresponded to mollusks and the remaining 3.90% corresponded to other marine organisms. Cannibalism (occured frequently, 69.08%) in this study was noticeably higher than that seen in previous studies and confirmed that cannibalism may be a significant feeding strategy in the mantis shrimp O. oratoria in the Tianjin coastal zone of Bohai Bay. The ecological environment in Bohai Bay has been affected by anthropogenic activities and the macrofaunal biodiversity and abundance have noticeably declined, which might make the food scarce for the mantis shrimp O. oratoria. Then, the starvation obviously increased cannibalistic tendencies.

14

The impact of distributional assumptions in gene-set and pathway analysis: how far can it go wrong?

Ho, C.-H.; Huang, Y.-J.; Lai, Y.-J.; Mukherjee, R.; Hsiao, C. K.

2021-02-02 bioinformatics 10.1101/2021.02.01.429279 medRxiv

Top 0.1%

19.2%

Show abstract

Gene-set analysis (GSA) has been one of the standard procedures for exploring potential biological functions when a group of differentially expressed genes have been derived. The development of its methodology has been an active research topic in recent decades. Many GSA methods, when newly proposed, rely on simulation studies to evaluate their performance with a common implicit assumption that the multivariate expression values are normally distributed. The validity of this assumption has been disputed in several studies but no systematic analysis has been carried out to assess the influence of this distributional assumption. Our goal in this study is not to propose a new GSA method but to first examine if the multi-dimensional gene expression data in gene sets follow a multivariate normal distribution (MVN). Six statistical methods in three categories of MVN tests were considered and applied to a total of twenty-two datasets of expression data from studies involving tumor and normal tissues, with ten signaling pathways chosen as the gene sets. Second, we evaluated the influence of non-normality on the performance of current GSA tools, including parametric and non-parametric methods. Specifically, the scenario of mixture distributions representing the case of different tumor subtypes was considered. Our first finding suggests that the MVN assumption should be carefully dealt with. It does not hold true in many applications tested here. The second investigation of the GSA tools demonstrates that the non-normality does affect the performance of these GSA methods, especially when subtypes exist. We conclude that the use of the inherent multivariate normality assumption should be assessed with care in evaluating new GSA tools, since this MVN assumption cannot be guaranteed and this assumption affects strongly the performance of GSA methods. If a newly proposed GSA method is to be evaluated, we recommend the incorporation of multivariate non-normal distributions or sampling from large databases if available.

15

A real data-driven simulation strategy to select an imputation method for mixed-type trait data

May, J. A.; Feng, Z.; Adamowicz, S. J.

2022-10-27 bioinformatics 10.1101/2022.05.03.490388 medRxiv

Top 0.1%

19.2%

Show abstract

Missing observations in trait datasets pose an obstacle for analyses in myriad biological disciplines. Considering the mixed results of imputation, the wide variety of available methods, and the varied structure of real trait datasets, a framework for selecting a suitable imputation method is advantageous. We invoked a real data-driven simulation strategy to select an imputation method for a given mixed-type (categorical, count, continuous) target dataset. Candidate methods included mean/mode imputation, k-nearest neighbour, random forests, and multivariate imputation by chained equations (MICE). Using a trait dataset of squamates (lizards and amphisbaenians; order: Squamata) as a target dataset, a complete-case dataset consisting of species with nearly complete information was formed for the imputation method selection. Missing data were induced by removing values from this dataset under different missingness mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). For each method, combinations with and without phylogenetic information from single gene (nuclear and mitochondrial) or multigene trees were used to impute the missing values for five numerical and two categorical traits. The performances of the methods were evaluated under each missing mechanism by determining the mean squared error and proportion falsely classified rates for numerical and categorical traits, respectively. A random forest method supplemented with a nuclear-derived phylogeny resulted in the lowest error rates for the majority of traits, and this method was used to impute missing values in the original dataset. Data with imputed values better reflected the characteristics and distributions of the original data compared to complete-case data. However, caution should be taken when imputing trait data as phylogeny did not always improve performance for every trait and in every scenario. Ultimately, these results support the use of a real data-driven simulation strategy for selecting a suitable imputation method for a given mixed-type trait dataset. Author summaryThe issue of missing data is problematic in trait datasets as the missingness pattern may not be entirely random. Whether data are missing may depend on other known observations in the dataset, or on the value of the missing data points themselves. When only complete cases are used in an analysis, derived results may be biased. Imputation is an alternative to complete-case analysis and entails filling in the missing values using information provided by other trait values present in the dataset. Including phylogenetic information in the imputation process can improve the accuracy of imputed values, though results are dependent on the amount and pattern of missingness. Most previous evaluations of imputation methods for trait datasets are limited to numerical simulated data, with categorical traits not considered. Given a particular dataset, we propose the use of a real data-driven simulation strategy to select an imputation method. We evaluated the accuracies of four different imputation methods, with and without phylogeny information, and under different simulated missingness patterns using an example reptile trait dataset. Results indicated that data imputed using the best-performing method better reflected the original dataset characteristics compared to complete-case data. As imputation performance varies depending on the properties of a given dataset, a real data-driven simulation strategy can be used to provide guidance on best imputation practices.

16

National Consumption of Antimalarial Drugs and COVID-19 Deaths Dynamics : an Ecological Study

Izoulet, M.

2020-04-24 pharmacology and therapeutics 10.1101/2020.04.18.20063875 medRxiv

Top 0.1%

19.1%

Show abstract

COVID-19 (Coronavirus Disease-2019) is an international public health problem with a high rate of severe clinical cases. Several treatments are currently being tested worldwide. This paper focuses on anti-malarial drugs such as chloroquine or hydroxychloroquine. We compare the dynamics of COVID-19 daily deaths in countries using anti-malaria drugs as a treatment from the start of the epidemic versus countries that do not, the day of the 3rd death and the following 10 days. We then use a ARIMA modeling to realize a short-term forecast of deaths dynamics for each group. We show that the first group have a much slower dynamic in daily deaths that the second group. This ecological study is of course only one additional piece of evidence in the debate regarding the efficiency of anti-malaria drugs, and it is also limited as the two groups certainly have other systemic differences in the way they responded to the pandemic, in the way they report death or in their population that better explain differences in dynamics. Nevertheless, the difference in dynamics of daily deaths is so striking that we believe it is useful to present these results as a clue in the researches about the efficiency of hydroxychloroquine. In the end, this data might ultimately be either a piece of evidence in favor or anti-malaria drugs or a stepping stone in understanding further what other ecological aspects place a role in the dynamics of COVID-19 deaths.

17

Does charging for corrections in the bioscience literature disincentivize pre-publication handling of problematic image data? An ImageTwin-AI study.

Brookes, P. S.

2026-01-28 cancer biology 10.64898/2026.01.16.700000 medRxiv

Top 0.1%

19.0%

Show abstract

The Committee on Publication Ethics (COPE) recommends that publishers do not charge for corrections to published papers. Until late 2024 the Journal of Cancer levied a charge on authors (50% of the original article processing charge, APC) for publication of a correction. Herein, it was hypothesized this could disincentivize the discovery and removal of problematic data prior to publication, since post-publication discovery and correction would generate additional revenue. The correction charge policy at J. Cancer was rescinded in 2025, permitting a test of the hypothesis by comparing the prevalence of problematic image data in the journal before and after the policy change. Recently developed artificial intelligence (AI) tools afford the ability to screen scientific publications for problematic image data. As such, the 2024-2025 output of J. Cancer was analyzed using ImageTwin-AI, followed by human verification and annotation of identified problems. Of 754 papers analyzed, 510 contained image data. Of these, 95 (18.6 %) showed evidence of inappropriate image manipulation, with 19 papers (3.7 %) having images that overlapped with unrelated papers. The prevalence of papers with problem images was 20.3% in 2024, and 15.9% in 2025, suggesting only a modest impact of the policy change on pre-publication handling of such problems.

18

Wild-type and mutated beta-catenin differently repress RND3/RHOE expression in hepatocellular carcinoma

BASBOUS, S.; SENA, S.; DANTZER, C.; NEAUD, V.; PIQUET, L.; GRISE, F.; MARTINS, F.; Varon, C.; GERBAL-CHALOIN, S.; FAVEREAUX, A.; Colnot, S.; LAGREE, V.; BILLOTTET, C.; Moreau, V.

2025-10-16 cancer biology 10.1101/2025.10.16.682743 medRxiv

Top 0.1%

18.9%

Show abstract

Background & AimsTumor development and progression are mainly driven by oncogenic mutations but are also regulated by physical factors, such as applied forces or microenvironment stiffness. Through its structural and transcriptional functions, {beta}catenin is a key factor that acts on both aspects to promote liver tumorigenesis, leading to hepatocellular carcinoma (HCC) development. However, the mechanisms by which these two functions regulate downstream targets remain poorly understood. Herein, we describe Rnd3, also called RhoE, an atypical member of the Rho GTPase family, as a common target of both functions of {beta}-catenin. We previously demonstrated that RND3 expression is downregulated in HCC, which correlates with intrahepatic metastasis. Yet, a molecular understanding of how Rnd3 expression is dysregulated in cancer is largely missing. Approach & ResultsUsing human HCC samples and cultured cell lines, we demonstrate that Rnd3 expression is regulated by {beta}-catenin pathways, regardless of their mutational status. Both the transcriptional and the structural activity of {beta}-catenin repress the expression of RND3. Indeed, we found that wild-type {beta}-catenin suppresses RND3 transcription through the Hippo pathway, whereas oncogenic {beta}-catenin downregulates RND3 expression through miRNA targeting its 3UTR. ConclusionRnd3 may constitute a key protein involved in the transcriptional program driven by oncogenic {beta}-catenin in HCC and as a mediator of the mechanosensitive response associated with cell-cell adhesion.

19

Computational identification of cross-kingdom microRNA compatibility between Moringa oleifera miR156 and the human CDK4 transcript

Govindaraj, P. R.; AKAYE, M. P.

2026-03-09 cancer biology 10.64898/2026.03.05.709853 medRxiv

Top 0.1%

18.8%

Show abstract

Triple-negative breast cancer (TNBC) remains one of the most aggressive breast cancer subtypes and lacks durable targeted therapies. Dysregulation of cell-cycle control, particularly through CDK4/6 signaling, is a defining feature of TNBC biology (Garrido-Castro et al., 2019). Extracts of Moringa oleifera have repeatedly been shown to induce G1-phase arrest in breast cancer models, yet the molecular basis of this phenotype remains unclear (Al-Asmari et al., 2015) (Gaffar et al., 2019). Emerging work on cross-kingdom regulation has raised the possibility that plant-derived microRNAs may, under specific conditions, interact with mammalian transcripts (Zhang et al., 2012) (Chin et al., 2016). Sequence shuffling for the negative control was performed with set.seed(42) to ensure reproducibility. Additional visualisations (nucleotide alignment and thermodynamic analyses) were generated using Python 3 (matplotlib v3.7). Here, we performed a high-stringency computational screen of conserved Moringa microRNAs against 30 genes implicated in TNBC pathogenesis using local sequence alignment. We identify a predicted high-affinity interaction between mol-miR156 and the human CDK4 3' untranslated region (3'UTR), characterized by an uninterrupted 12-nucleotide complementary motif that exceeds canonical mammalian microRNA seed requirements. These findings support the hypothesis that conserved plant microRNAs may exhibit latent structural compatibility with oncogenic human transcripts. While physiological delivery and functional repression are not demonstrated here, this work establishes a molecular framework for future experimental investigation into cross-kingdom RNA interactions relevant to cancer cell-cycle regulation. Impact StatementA high-stringency computational screen identifies latent molecular compatibility between a conserved plant microRNA and the human CDK4 oncogene, establishing a testable framework for cross-kingdom RNA interference in triple-negative breast cancer.

20

Comparative Analysis of De Novo Assemblers and Quantification Software for RNA-sequencing Data in Non-Model Arthropods

Brasseur, M. V.; Leese, F.; Mayer, C.

2025-08-03 bioinformatics 10.1101/2025.08.01.668104 medRxiv

Top 0.1%

18.7%

Show abstract

BackgroundRNA-sequencing has greatly improved our understanding of the transcriptomic regulation of fundamental biological processes. Although the method has matured significantly within the last decade, bioinformatic processing of the resulting high-dimensional data sets is still challenging and the performance of algorithms can vary between data sets. As a consequence, for most non-model organisms, in particular arthropods, there is no or limited literature evidence which software is best suited to handle taxon-specific data characteristics. Therefore, we evaluated the performance of different de nonvo transcriptome assembler (Trinity, rnaSPAdes, IDBA-tran) and transcript quantification software (RSEM, Salmon) on transcriptomic data of a non-model insect and freshwater crustacean species, as well as the impact of different quality trimming strategies on the downstream bioinformatic processing results. ResultsWhile the trimming strategy had no considerable effect on the quality of transcriptome assemblies, the choice of the assembler had a substantial impact. IDBA-tran was less sensitive than the two other assemblers and produced the most fragmented transcriptome assemblies. The low remapping rates of reads against IDBA-tran assemblies further suggest that the input read data was not effectively leveraged by this algorithm. In contrast, Trinity and rnaSPAdes both generated comprehensive and contiguous de novo transcriptome assemblies, although Trinity appeared to be slightly more sensitive. This increased sensitivity, however, was associated with a higher redundancy in Trinity-generated assemblies compared to assemblies produced with rnaSPAdes. When the quality of the transcriptome assembly was high, RSEM and Salmon were able to identify the origin of at least 90% of the read data in the reference. Despite their different underlying quantification approaches, the estimated transcript counts of both tools were highly correlated and their expression signal was consistent. Notably, the alignment-free quantification algorithm Salmon was substantially faster than the alignment-based approach of RSEM. Furthermore, it was also slightly more sensitive, increasing the average re-mapping rate to [~]98%. ConclusionSince the performance of bioinformatic algorithms, especially of de novo assemblers, varies for different RNA-sequencing data sets, establishing an appropriate analysis workflow remains an important task. Our results show that the better performing combinations of algorithms produce congruent count data sets with consistent expression signal, highlighting the robustness of RNA-sequencing data analysis software.