oligocounter logo
Home

Services







banner

JCircleGraph


Atlases of oligonucleotide parameters

Images are created by a Java program JCircleGraph with data from two programs, OligoWords and OligoCounter.


Interpreting an image

The dark grey inner ring is the scale, with ticks every 0.25MB or 0.5MB depending on genome size.

The four innermost rings are mononucleotide or tetranucleotide (4mer) parameters derived from a 10kb sliding window using the Python program OligoWords (see the downloads page on this server, or Reva and Tümmler 2004 for the program and a full description of parameters).

1: GC content (proportion of G and C in one window)
2: Distance (of a local 10kb pattern relative to the global genomic pattern)
3: Pattern skew (distance between patterns on the leading and lagging strands in one window)
4: Oligonucleotide variance (variation of word deviations)

atlas


The next two rings plot, if information is available, the presence of overrepresented 8-14mers. This is taken as the number of bases in a 5kbp region occupied by an overrepresented 8-14mer, divided by 5000 and multiplied by 100. Hence the results are a percentage occupancy of the 5kbp regions bases by overrepresented 8-14bp oligos.

Example: a single 14mer in a 5000bp region. (14 / 5000) * 100 = 0.28% occupancy. 

Example: 2 half-overlapping 8mers in a 5000bp region. ( 12 / 5000) * 100 = 0.24% occupancy. Percentage occupancy reduces the amount of redundancy in a dataset with overlapping repeats. One characteristic of OligoCounter is overlapping repeats since it uses a window size of 1bp.

5: Percentage occupancy of the 5kbp regions bases by overrepresented 8-14bp oligos at default chi-sq. level 500 
6: Percentage occupancy of the 5kbp regions bases by overrepresented 8-14bp oligos at default chi-sq. level 1200

Filtering: Should an inappropriately high chi squared value have been selected for rings 5 or 6, few or no data will be available resulting in a dark blue ring which dominates the rest of the graph. To avoid this, where the for the ring the average minus one standard deviation is less than zero (which is the case when few data are available), the whole ring is filtered and left grey. The solution is to use a lower chi squared value (i.e. rerun OligoCounter).

The correlation class circle, ring 7, indicates the differences between tetranucleotides - in this case oligonucleotide variance in ring 4 - and the innermost 8-14mer percentage occupancy in ring 5.

7: 4mer-8mer correlation class derived from rings 4 (OUV, 4mer) and 5(% occupancy, 8-14mers)

Colours

Colours range from dark blue (below average) through light grey(average) to dark red (above average). These colours cover 3 standard deviations above and 3 below the average, and thus over 99% of normally distributed data. Extremes that do not lie within 3 standard deviations so are coloured more emphatically. By using this colour dimension, regions of the genome which are divergent in various parameters from average can be clearly seen. These may be genome islands, integrated phages, horizontally transferred genomic elements, rDNA or repeat regions.



How to create JCircleGraph images

The following list of files are needed to create a circle graph

Four output files from the Python program OligoWords

NC_006156gc.out
NC_006156d.out
NC_006156ps.out
NC_006156ouv.out

Two output files from the Java program OligoCounter.

resultsPositionsNC_006156Borrel_garin_500.txt
resultsPositionsNC_006156Borrel_garin_1200.txt


Create a new working directory and deposit JCircleGraph.jar in this.

Obtain fasta files (.fna) from the NCBI RefSeq collection.


Run JCircleGraph

java -Xmx500m -jar JCircleGraph.jar

You can run the jar file by double clicking on it on some Windows PCs, however the memory assigned by default is not sufficient to run the program properly. Therefore it is best to run it from the command line with the -Xmx500m switch to make 500 megabytes of additional memory available.

JCircleGraph requires all 4 tetranucleotide parameters from OligoWords, else will refuse to work. A dropdown list of available genome RefSeqs are currently derived from the resultsPositions files from OligoCounter, so you need to have these files in the same directory.

How to create the OligoWords output files

Obtain the Python command line program OligoWords (see Reva and Tümmler 2004).

Install the Python programming language if necessary

Type

python

at the command line i.e. DOS prompt or shell, feedback such as this:

"Python 2.4.3 (#1, Oct 23 2006, 14:19:47)"

indicates python is installed.

Add all files to the working directory

Run OligoWords from the command line four times with the four necessary parameters as below.

python OligoWords1.2.exe.py task=n0_4mer:XXX, frame=10000, step=5000
Where XXX = GC, D, PS, V for each respective run

After each run manually add the parameter to the end of the filename, else it will be overwritten by the next run.

NC_006156.out -> NC_006156gc.out, 
NC_006156.out -> NC_006156d.out,
...
NC_006156ps.out,
NC_006156ouv.out.

Move all these created files into the working directory or .......

Alternatively, if you are analysing many genomes and do not want to have to rename each file individually:

create the following directories in the directory where you plan to use JCircleGraph, and insert the raw output files from OligoWords into the relevant directory (take care not to mix them up though !).

gc, d, ps, ouv         

How to create the OligoCounter output files

Run OligoCounter (see tutorial elsewhere on this site) with chi squared thresholds 500, 1200.

java -Xmx1500m -jar OligoCounter.jar

Move resultsPositionsNC_xxxxxx_chisq.txt data files to the JCircleGraph directory.

resultsPositionsNC_006156Borrel_garin_500.txt
resultsPositionsNC_006156Borrel_garin_1200.txt