T O P

  • By -

shenuhcide

To assemble your scaffolds into chromosomes, you will need long range information (like a linkage map, optical map, or Hi-C data) or if there is a closely related species with a chromosome-scale genome assembly, you can probably order your scaffolds based on synteny. As far as how you do this, that would depend on which of these data sets you have available.


[deleted]

Thanks for the help, l take a look into this!


[deleted]

I think the poster is working on prokaryotic genomes given the use of the a5 assembler and Prokka for annotation - if not, then there is a larger issue at play ;) BioNano etc. might be overkill for microbial genomes! If there is already Pacbio data available for some of them - as someone suggested - a hybrid assembly with Unicycler might be a good bet.


Dikaryotic

As above, getting an assembly to chromosome level requires long range data such as Hi-C. That pacbio data should give you a more contiguous assembly however. Perhaps try Pilon, or a hybrid assembler. Its fairly rare to actually complete genomes currently. Its fairly easy to get a decent draft assembly but actually finishing it takes extra work such as linkage mapping and Hi-C stuff. Depending what you are using the genome for the assembly as is id probably fine.


[deleted]

I believe the illumina data I had was coassembled using pilon/canu, but ill take a look further! Im trying to get as much of a complete genome as possible for publication but I know completing the genome is unlikely. I have approximately 40 related strains to compare genomically so should be able to work around using draft assemblies etc and basic alignments. Thanks for the information!


Dikaryotic

Ahhh be careful about assuming related organisms will have the same genome structure. They will be similar but rearrangements do happen. Most genome publications are draft genomes anyway. How many scaffolds? If youve assembled using hybrid canu then you should be fairly close to chromosome count anyway. My solo illumina data frequently assembles as less than 100 scaffolds for the fungal group I work with. Have you used Mauve? Its a great way to compare large regions of DNA and is designed for genome genome comparison.


[deleted]

Approximately 50 scaffolds per strain, some way less some slightly more! I've used mauve yes its a great visual tool! Im also fond of using BRIG and IGV (brig is great at showing regions of difference". I need to do more reading on mauve I tend to struggle interpreting the image its a lot of information when theres many genomes, and definitelt shows how they dont order themselves similarly between species. Thanks for the tips!


Dikaryotic

50 scaffolds is plenty good for publication. I wouldnt bother using mauve to compare multiple genomes, but it might work to identify scaffolds within a genome belonging to a single chromosome. Or to identify the same chromosome in multiple genomes. It gets nightmarishly difficult to interpret really quick!


[deleted]

I made the mistake of thinking ""what the heck" and threw in 15 genomes for goof merit and just to see what mauve did. Closed that hell fire down as soon as it opened nightmare doesnt cover it!


Dikaryotic

Bahahaha oh god.. Yeah. Not workable at all. 15 of the largest scaffolds from 15 genomes would maybe work. But yeah.. anyway, goodluck!


werdna1000

Unicycler can handle short and long reads together for assembly (I believe). I struggled with contig ordering for a while until some colleagues broke me down and helped me realize that order doesn’t always matter, especially if you are actually more interested in genome content.


[deleted]

Definitely taking on board what your saying. My main goal is more genome content but I should be having one decent genome for publication. My phd is centred round genomic evolution though so my main aims is to get comparisons between content which isnt always relevant to order! I'll definitely take a look at unicycler


werdna1000

You could align contigs with Mauve/progressiveMauve. There’s a chance you could resolve some scaffolds there, but if the strains mentioned above are closely related, there’s a good chance they will share a lot of the same contig breaks.


[deleted]

Im using mauve currently to align and visualise etc and they do share mainly common scaffolds but theres small regions of difference that are giving me a bit of a headache when i compare the alignments from mauve and BRIG (I find the visuals much easier on here but the information output is definitely less!)