Table Of ContentSUPPLEMENTARY INFORLMetATtIeOrNs
https://doi.org/10.1038/s41564-017-0053-y
In the format provided by the authors and unedited.
Discovery of an expansive bacteriophage family
that includes the most abundant viruses from
the human gut
Natalya Yutin1, Kira S. Makarova1, Ayal B. Gussow 1, Mart Krupovic 2, Anca Segall1,3,
Robert A. Edwards 3 and Eugene V. Koonin 1*
1National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA. 2Institut Pasteur, Unité Biologie Moléculaire du Gène
chez les Extrêmophiles, Paris, France. 3Viral Information Institute, Department of Biology, San Diego State University, San Diego, CA, USA.
*e-mail: [email protected]
NAtuRE MicRoBioloGY | www.nature.com/naturemicrobiology
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Supplementary Figure 1
97 crAssphage group
68
gene 17 LAZR01000126 marine sediment metagenome
91
97 MDTC01214930 5050 6474
CERP01008271 3565 2135
gene 3 CERP01038047
99
100 marine metagenome
100 CEQT01285884 2820 1363
76 CETY01042930 2651 4087
99 gene 31x BCSF01000238
96
OBV-13 group
100
FUFK010039141 group
99
97 Azobacteroides phage group
29 CERZ01033664 900 2342
77 gene 62 MDTC01014143
marine metagenome
99 CERP01044421 2 1093
91 CEVZ01368921 3 683
LDNP01000001 72071 70605 glacier metagenome
100 JQIA01027087 1073 2524
marine sediment metagenome
100 LAZR01019712 338 1789
99 Flavobacter phage group
92
91 CEON01166197 656 3
99 gene 30 CEUT01009082
marine metagenome
99 LUMI01013615 1795 785
84
CEUI01128689 3 1034
JRYD01003789 9911 11191 activated sludge metagenome
80
100 AMWB02071050 8849 7347 bioreactor metagenome
78 LSQX01260017 4651 3155 anaerobic digester metagenome
Woesebacteria@KKS76822
groundwater metagenome
77 100 gene 7 WoesebacteriaLCET01000015
AMWB02067752 1561 59 bioreactor metagenome
98 LSQX01104064 1492 2997 anaerobic digester metagenome
100
98 86 FUIF01000105 11015 9471 marine metagenome
100 AMWB02084468 1816 299 bioreactor metagenome
JFJP01038274 27556 26021 microbial mat metagenome
CEPR01041845 1123 2640
80
100 CERO01037666 22524 24023 marine metagenome
100 100 AACY023225513 1566 37
Cellulophaga phage/IAS virus group
98
0.2
Phylogenetic tree of the MCP for all identified members of the crAss-like family.
Translated wgs sequences denoted by three numbers: contig ID, orf start and orf end coordinates.
Representative sequences are shown in red, nr proteins (denoted by their source and protein ID) are shown in green. Branches
corresponding to large groups are collapsed into triangles. The next 6 panels show the collapsed branches expanded. The tree
was constructed using FastTree as described under Methods. Support values were obtained using 100 bootstrap replications;
values greater than 50% are shown.
CrAssphage group
AUXO010375845 307 1776
95
sheep rumen metagenome
AUXO014937192 5774 7252
87
crAssphage@YP 009052551
97
0 gene 76 crAssphage
87 gene 82 CEAZ01003514 human gut metagenome
CDYY01026109 15134 13653
5971 BAAZ01027254 177 800
CDZU01006070 3550 2075
AUXO012437250 657 1
100 sheep rumen metagenome
90 AUXO018147442 3698 5176
99
94 ADGC01028306 3 641 wallaby gut metagenome
AUXO015129578 1136 3
81
97 AUXO017043953 46 1134
AUXO014497437 16 864
95 sheep rumen metagenome
96 AUXO015028150 29 796
AUXO012631188 647 3
90
28 AUXO014551569 23 811
gene 68 CDZH01002743 human gut metagenome
AUXO018470499 10 912
93
96
AUXO013988952 14522 16000
99
AUXO014905140 611 3
92
AUXO012036836 806 147 sheep rumen metagenome
94
AUXO017968307 8344 6869
27
96 87 AUXO011522798 659 3
84 AUXO014840908 22 885
gene 65 CDZN01024782
human gut metagenome
99 100 CDYU01054141 11854 10382
LAZR01004511 457 1932 marine sediment metagenome
Shapirobacteria@KKQ29905 groundwater metagenome
LSQX01245648 2781 1309 anaerobic digester metagenome
47 100
gene 4 CEAR01029167
61 human gut metagenome
gene 23 CDZK01015469
98
33 gene 15 LSPZ01000006 termite gut metagenome
97
CXWL01085395 1202 3 groundwater metagenome
100 FLYM01014259 4 1332 hot springs metagenome
CEVQ01042973 174 1694 marine metagenome
CXWK01096684 1651 176 wastewater metagenome
91
13 ABPX01011892 2 880 microbial mat metagenome
OBV-13 group
100 LFFM01004272 1494 2930
Soda lake metagenome
gene 53 LFIK01000274
LAZR01020328 1043 3 marine sediment metagenome
LFUF01007398 817 2 hypersaline lake metagenome
100
99 CXWF01021354 664 2
99 CEQV01012130 668 3
0
gene 3 LULO01000006
42
36 FLOH01001411 27547 26096
CERP01008503 6886 8331
94
95 gene 14 BCSF01000013
69 CERN01035752 726 1
1100 CEOU01071048 261 1703
91 gene 65 CESO01030555
97
gene 33 FUWD013401386
96 CEPM01065224 719 3
CEQR01107071 731 3
MDTG01132391 2964 4394
100
93 gene 59 CENS01015162
81 CEPN01078830 922 1803 marine metagenome
92
AACY021639558 767 3
99
89 CESA01145476 2593 1130
CENK01012722 611 3
99 LULN01000116 6666 5983
4195 MEHZ011276598 2 1087
CEPE01120164 857 3
CEPE01123027 1719 3176
95
CEQA01120592 286 1002
93
81 CERT01069763 7706 6288
6
CEPE01070474 635 3
32 LULN01000116 7384 6662
96 CEQA01030506 698 3
92 CENK01012723 666 1
FUFK010039141 group
98 CERP01021470 2650 1280
99 gene 13 contig0001
marine metagenome
99
gene 34 contig0002
gene 1 FUFK010039141
LGVF01562826 1286 2704 marine sediment metagenome
99
CEUP01025753 723 2156 marine metagenome
96 99 LSQX01289159 7661 9109 anaerobic digester metagenome
60 JFJP01045210 2 1396 microbial mat metagenome
92
Woesebacteria@OGM08893
sediment metagenome
100 gene 93 Woesebacteria MGFQ01000035
99
LSQX01297766 3 788
anaerobic digester metagenome
LSQX01227598 1374 2414
99
FUWD013028243 3225 4745 marine metagenome
53
LSQX01207435 11963 13480
99
anaerobic digester
100 MTKW01030166 2812 4305
metagenome
92 LFRM01188503 16531 14990
Azobacteroides phage group
Veillonella@CCX56870
human gut metagenome
91 CAWP010000212 3879 2380
100
CDTW01010151 1549 65 human feces
97
MSGT01000071 20264 18891 wombat feces
70 NFZC01014097 2566 1082 rat feces
99
96 MSHB01000026 90624 89152 wombat feces
36
JFJP01037653 13724 15208 microbial mat metagenome
AUXO014820547 14853 13924
sheep rumen
AUXO010631414 680 3
100
95 LAHS01000275 50534 49011 activated sludge metagenome
35
Chlamydia trachomatis@CRH69151
85
human feces
99 gene 122x Chlamydia CVNZ01000019ext
57 JFJP01036957 8877 10355 microbial mat metagenome
95 LAHU01170744 214 1029 activated sludge metagenome
gene 10 LSPY01000004
97
gene 49 LSPY01000006
100
termite gut metagenome
Azobacteroidesphage ProJPtBp1@BAX03432
86
100 gene 38 AzobactphAP017903
LAHU01170744 1029 1667 activated sludge metagenome
Flavobacter phage group
Chitinophaga@SEW21463
99
populusroot microbiome
86 gene 76 ChitinophagaFOJF01000001
97 CEPI01076610 10948 9368 marine metagenome
85 MWVX01081672 688 2 soil metagenome
AACY021484223 1 675 marine metagenome
98
100 MTKZ01054588 2377 932 anaerobic digester metagenome
MDTC01148845 37 1671 marine metagenome
86 78 MTKZ01127835 1188 589 anaerobic digester metagenome
70 BABB01000534 833 2539 human gut metagenome
AACY020978206 958 2 marine metagenome
99 56 CXWK01031504 4897 6456 wastewater metagenome
gene 33 contig0005
95
99 gene 10 FUFK010040477 marine metagenome
CETC01069434 372 1820
AERA01000073 3044 1851 activated sludge metagenome
98
FLOH01000672 28691 30151 marine metagenome
99
99 JQIA01000332 13872 12418 marine sediment metagenome
97 CEWQ01036811 5933 4467 marine metagenome
LSQX01229325 944 2425 anaerobic digester metagenome
99
Flavobacteriumphage Fpv3@YP 009321139
96
Flavobacteriumpsychrophilum(fish pathogen)
100gene 14 Flavobact ph NC 031904
Cellulophaga phage/IAS virus group
48 FUWD013151812 1 1041
marine metagenome
100 FUWD012779922 710 3
TM6 bacterium@KKP51371
groundwater metagenome
59 25 gene 9 TM6 bacterium LBPE01000022
LAZR01041763 1444 101 marine sediment metagenome
73 gene 12 LDZU01000035 terrestrial metagenome
98 AMWB02049739 1 1059 bioreactor metagenome
92 MDTG01072167 2867 1365
0 gene 9 metagFUWD013321667
98 91 gene 13 FUWD013255170 marine metagenome
MDTC01070125 1515 2594
74
100 CEPX01060845 5953 7440
JRYI01013689 911 2377 activated sludge metagenome
100 CERN01007417 6956 8443
marine metagenome
CENI01005536 2405 897
80
8 AZIE01023474 1 1152
marine sediment metagenome
98 JQIA01000011 81919 83436
74 JFJP01037659 8511 6997 microbial mat metagenome
98 Cellulophaga phage phi14 2@YP 008242315
99
seawater
100gene 91 Cellulophaga ph NC 021806
JQIA01000420 6526 8052
marine sediment metagenome
98 JQIA01000692 8316 6820
99
100 CESI01155406 1979 480 marine metagenome
JRYH01077460 1750 335 activated sludge metagenome
NFYZ01005814 1 630 rat feces
96
LSQX01149823 3 1034
anaerobic digester metagenome
90 63 LSQX01012907 1115 2218
90 NFYZ01008333 782 3
86 rat feces
NFZJ01008357 3451 4467
AUXO011498239 964 2460 sheep rumen
74
65
IAS virus group
93
0.2
98 AUXO018152716 441 1199 IAS virus group
93 AUXO011823787 4451 3630
39 AUXO018473460 764 3
100 AUXO014831412 2 811
sheep rumen
23 AUXO010311099 3099 1564
AUXO012228105 945 2453
AUXO011283716 15785 14253
99
AUXO014016799 1137 2681
99
100 96 AUXO014821180 656 3
Chlamydia trachomatis@CRH75220
0 human feces
81 CVNZ01001200 1077 2600
93 AUXO011912714 989 3
AUXO011309371 2157 2900
97 AUXO017921668 1 828
AUXO012914240 941 2455
30
93 AUXO012087495 1103 2092
90
AUXO011724975 803 3 sheep rumen
52
97 AUXO017842996 3140 1599
AUXO010533526 3 1106
100
AUXO016399016 728 2302
96
AUXO015486605 1639 2592
98 AUXO013805842 3579 2164
91 98 Chlamydia trachomatis@CRH70440 human feces
gene 33 IAS virus KJ003983
96 NFYZ01012286 49323 47812
NFZD01015089 822 1 rat feces
93
94 NFZC01010889 880 59
93 88 AUXO011083819 700 2
74 AUXO018222967 1097 1972
80 AUXO013838079 2002 3507
sheep rumen
89 AUXO017219518 3 695
AUXO016398204 734 3
99
87 AUXO012181536 1513 704
97
AUXO010333673 67242 65719
75 JFBN01011653 2512 992 chicken gut metagenome
NFYY01004923 2 1276 rat feces
AUXO012536657 3 827
87 sheep rumen
99 AUXO014440562 2750 3697
NFZD01012400 3453 4199
99 rat feces
93 NFZC01008938 16366 17904
AUXO018071378 2 1042
79
99 AUXO010651591 793 2 sheep rumen
28 AUXO014690013 1720 2541
LAHS01002140 11098 9542 activated sludge metagenome
26 97
97 AUXO013149761 861 1
92 sheep rumen
AUXO017655782 2352 817
CDZN01015494 21255 19732 human feces
AUXO014640476 789 163
15 sheep rumen
96 62 AUXO014440562 1804 2910
90 NFYY01008030 705 1 rat feces
AUXO017464177 60048 58498
18 AUXO012885478 814 2
AUXO018658775 1295 66
38 sheep rumen
AUXO014409617 2982 1444
89
AUXO010899684 937 2478
100
15 AUXO015912377 1041 2471
0.2
Supplementary Figure 2
100 gene 17 crAssphage
98 gene 32 CEAZ01003514
DNAp (family B)
98
gene 38 CDZH01002743
99 gene 12 CDZN01024782
gene 64 CDZK01015469
gene 35 CEAR01029167
99
46 gene 48 LSPZ01000006
93 V ds Salmonella phage 9NA@YP 009101235
93 V ds Salmonella phage FSL SP 062@AGF89344
V ds Edwardsiella phage MSW 3@YP 007348961
17
94 V ds Vibrio phage CP T1@YP 007003043
V ds Shewanella sp phage 1 44@YP 009103744
20
78 99 V ds Pseudoalteromonas phage B8b@AII27474
B Pr Arcobacter@WP 076086334
97
94 gene 74 Flavobact ph NC 031904
B Pr Bradyrhizobium pachyrhizi@KRQ11647
98
V ds Agrobacterium phage 7 7 1@YP 007006473
75
91 B Pr Ensifer aridi@WP 085044344
96
100 V ds Pseudomonas phage phiPMW@ANA49321
95 59 V ds Klebsiella phage JD001@YP 007392876
gene 18x Cellulophaga ph NC 021806
gene 10 Chitinophaga FOJF01000001
83
Bacteria, phages
99
Bacteria, phages
92
Bacteria
100
Eukaryotes, archaea, eukaryotic viruses
97
0.5
Phylogenetic trees for crAss-like family PolA, PolB, primase, and ligase.
Translated wgs sequences denoted by three numbers: contig ID, orf start and orf end coordinates.
Representative sequences are shown in red, nr proteins (denoted by their source and protein ID) are shown in green. Branches
corresponding to large groups are collapsed into triangles. The next 6 panels show the collapsed branches expanded. The tree
was constructed using FastTree as described under Methods. Support values were obtained using 100 bootstrap replications;
values greater than 50% are shown.
DNAp (family A)
80 gene 47 CESO01030555
95 gene 26 LULO01000006
98
gene 35 BCSF01000013
98 gene 11 CENS01015162
gene 23 CERP01038047
99 gene 9 BCSF01000238
59
86 gene 51 CEUT01009082
99
gene 22 MDTC01014143
gene 13 LFIK01000274
99
98 A Ea Arc I group archaeon@KYC57730
gene 27 LDZU01000035
45 A Ea Thermoplasmatales archaeon@KYK22721
63
85 gene 66 LAZR01000126
63 gene 40 Chlamydia CVNZ01000007ext
99 gene 65 IAS virus KJ003983
98
Bacteria, archaea, phages
99
81
V ds Thermus phage TMA@YP 004782240
V ds Brochothrix phage A9@YP 004301474
95
79
Bacteria, phages
99
V ds Shigella phage SHSML 45@YP 009280314
V ds Bacillus phage SP 10@YP 007003448
99
99 V ds Bacillus phage CP 51@YP 009099134
V un Bacillus phage Mgbh1@AMQ66716
V ds Listeria phage LP 110@YP 008240460
90
66 V en uncultured virus@AGX13766
COG0749
70
0.2
Description:anaerobic digester metagenome groundwater metagenome bioreactor metagenome anaerobic digester metagenome marine metagenome bioreactor